Introduction to Cloudera Navigator
Cloudera Navigator is complete solution for data governance, data management tasks and auditing that are related and fully integrated with Hadoop platform. It thus enables users in effortless exploration and tagging data through intuitive search-based interface. It collects artifacts, lineage, auditing, and metadata automatically, and lets compliance groups, administrators, data stewards, and others to work effectively with data scaling, i.e., finding the specified data entities. As a part of Cloudera, it is critical in enabling high performance analytics with continuous optimization of data architecture by meeting regulatory requirements. Let us dig deeper into the topic and see how the Server is managed and the latest features added up or the issues that were fixed as a part of various versions(if any).
Cloudera Navigator Server Management
As it runs in two different roles, i.e., Navigator Metadata Server and Navigator Audit Server in context with Cloudera Navigator Manager Server. Configurations and features available for server-side processes are being managed by Cloudera Manager Admin Console. Management task include backing up system for Cloudera Navigator Metadata and the configuration of Security features to support client server and HTTPS encrypted communications on the network, also include fine tuning audit and Navigator Metadata Server for better performance.
Cloudera Navigator Metadata Server Management
- The Metadata Server is one of the role that provides Navigator data management, it indexes, manages, stores entity metadata that is extracted from cluster services.
- It can tag entities with metadata or perform some other action at the time of extraction, this metadata enables data discovery and lineage functions.
- Navigator Architecture for Metadata is show below,
- Some of the important processes that are included in the architecture are Indexing, Extraction, and storing metadata from clusters, both in cloud and on premises.
- It can extract metadata also from entities stored on Amazon S3 cloud service and clusters running on Cloudera Altus or EC2.
- The extracted metadata is been indexed and is embedded with Solr instance which comprises of indexes stored on disk in storage directory.
- This Manager Server manages the authorization data for users of Cloudera Navigator.
- It manages the audit report Metadata and generates audit analytics, also exposes Cloudera Navigator APIs and hosts Web Server that provides Navigator Console.
Cloudera Navigator Audit Server Management
- The Audit Server tracks events from Cloudera Manager and stores in Cloudera Audit database.
- It provides high level audit architecture and lets the administrators know how to use Cloudera Manager Admin Console to add Navigator Audit Server to existing cluster and how to configure its features.
- Below image shows the high-level architecture of Audit server,
- At the time of setup, various other services like Hbase, HDFS, Hive, etc are enabled via plugins. Such plugins work with services to collect and filter the events that are emitted by each service, write these events to audit log on local file system.
- Plugin that fails to write event to audit log file can either shut down processes or drop-down event in which it runs depending on configured queue.
- Cloudera Navigator Audit Server Manager tracks events obtained from Cloudera Manager and stores events to Navigator Audit database.
New Features of Cloudera Navigator
Latest release of Cloudera Navigator is 6.3.4.
Basically, this version was introduced as a maintenance release that has few fixes, as listed below.
1. Installation or Upgradation of Cloudera Manager and the CDH requires Authentication in accessing downloads, as downloading newer versions with 6.3.3 required valid Cloudera Enterprise licensed file and also the Username and password obtained from Cloudera.
2. Discontinuation of Cloudera Express, with Cloudera Navigator 6.3.3 version, Cloudera Express is not available. Upgrading the Cloudera Manager or Navigator are not supported while running Cloudera Express
Cloudera Navigator Issues Fixed
Cloudera Navigator 6.3.4 latest version has provided fixed for following issues,
1. High DDL use in Hue Impala Editor that may issue flooding of Invalidate cells
DDL statement using Impala Editor or invoking “Refresh Cache” function in metadata browser to the left results issuing Invalidate cells to Impala service. This issue got fixed in Hue 8882.
End users affected were users using Impala Editor in Hue. For Customers using version 5.x, had to contact Customer support and for those using Cloudera Navigator Version 6.x, had to Upgrade to 6.3.4 which had the fix.
2. Default Limits for Pressure Aware Compaction Throughput Controller is too low
CDH and HDP releases suffer for low compaction throughput limits that cause store files to have a backup faster than the compactions re writing them. This issue was identified in HBASE – 21000.
End users affected were users using HDP 3.0.0 version to 3.1.2 versions and CDH version from 6.0.X to 6.3.3. Customers had to upgrade to CDH version 6.3.4 with CDP 7.1.4 and HDP 3.1.5.
3. Kudu tablet server crash in certain workflows where tablet is dropped right after ALTER table statement
DML and DDL statements can accumulate in Kudu tablets replica WAL (Write Ahead Log) at time of normal operation. Upon shutdown, information on the accumulated operations, i.e., first 50 are printed in INFO log file. Bug was introduced with fix for KUDU 2690, it contained a flip if condition that resulted in dereferencing of invalid pointer that itself resulted in KUDU tserver process crashing with segmentation fault. File slowness in system operations increased.
End users affected were users using KUDU clusters with the impacted releases. Hence the solution was to upgrade to CDH version 6.3.4.
4. YARN resource Manager stay in standby state after failover/ startup
On failover or startup, the YARN resource manager stays in standby state due to failure to load the recovery data. This failure is logged as NULL pointer exception in log file.
End users affected were the clusters running Hadoop YARN service with Scheduler set to Fair and the YARN Resource Manager Work Preserving Recovery Feature being in enable mode. Customers using CDH 6.2.X, or previous versions had to raise support request to raise a new patch and Customer using CDH 6.3.X version had to upgrade to latest 6.3.4 version.
5. Upstream Issue
Apache Avro, Apache Hadoop had lot of minor fixes along with HDFS, MapReduce 2, YARN, Apache HBase and Hive, Hue, Impala and Kafka, KUDU, Oozie, Pig, Solr, Sentry and finally Spark.
With this we shall wind up the topic “Cloudera Navigator”. We have seen what Cloudera Navigator is and how is it related to other components of Cloudera. We have also gone through two roles of Cloudera Navigator Server Management i.e., Navigator Metadata Server and the Navigator Audit Server. Have introduced new features of Cloudera Navigator with the latest version i.e., 6.3.4 and also listed out the issues that were fixed as part of version 6.3.4.
Hope this article helps to get to know what Cloudera Navigator is capable of. Thanks! Happy Learning!!
This is a guide to Cloudera Navigator. Here we discuss the Definition, Server Management, meta server management, features with implementation. You may also have a look at the following articles to learn more –