Introduction to Cloudera CDH
Cloudera CDH is Cloudera’s 100% open-source platform distribution provided by Cloudera Inc; it is a Palo Alto-based American Enterprise Software Company that includes Apache Hadoop, built to meet enterprise demands. Cloudera CDH delivers everything a user needs specifically for enterprise use. CDH, Cloudera Distributed Hadoop, is the most complete, tested, and popular distribution of Apache Hadoop and other related projects. It delivers core elements of Hadoop, distributed computing, and scalable storage along with vital enterprise capabilities and Web User Interface. We shall see how to use the Cloudera CDH and how it is to be connected to Cloud Storage and its setup.
How to use Cloudera CDH?
Before looking into how to use Cloudera CDH, we need to go through the installation process of Cloudera,
Step 1: Before installing Cloudera Manager CDH and other managed services, the user needs to take care of Storage Space and plan for Cloudera Manager.
- Cloudera Manager needs to track job metrics and applications in background processes. Depending on organization size, storage can be local, remote, or disk-based as all the metrics require storage.
- Failing in planning the storage needs, CDH can negatively impact in many ways,
- A cluster might miss critical audit information that was not retained or gathered for the length of time required.
- A cluster might not be able to get the historical operational data in meeting internal requirements.
- Gaps might be present in collections and charts.
- Administrators may not have historical Yarn, MR1, or the Impala data usage when they need to reference or report on it later.
- Administrators may be unable to have the research health status or past data.
- Configuration host in Clusters allows all the members to communicate with each other.
- Setting the unique host name
sudo hostnamectl set-hostname sample.example.com
- Editing /etc/hosts/ with IP address and qualified Domain name for each host in the cluster.
- Editing /etc/sysconfig/network with Domain name of the host.
- Verifying each host consistently identifies a network.
- Firewall disabling, save the existing rule set for iptables and then disable the firewall based on the compatibility, be it RHEL 7 and SLES or Ubuntu.
- Setting Secured Enhanced Linux allows in setting the control access through policies. For example, if there is an issue in Deploying CDH, then SELinux should be set in permissive mode on every host before deploying CDH on a cluster.
- Enabling NTP Service: Cloudera CDH needs Network Time Protocol(NTP) configuration on each machine in the cluster. And have the Software Collection Library Repository installed in the system.
Step 2: Connection of Cloudera Manager to Cloud Storage/ Setup Connectivity of CDH
- Configuring Repository for Cloudera Manager. Cloudera Manager is installed using a package tool such as zypper for SLES, yum for RHEL, and apt-get for Ubuntu.
- Install JDK, Java Development Kit. It can be either Oracle JDK by Cloudera using the Cloudera Manager or an Open JDK. Most of the Linux distributions that Cloudera supports include Open JDK.
- Installation of Cloudera Manager Server: Here, the user has to install Cloudera Manager packages on the Cloudera Manager Server host and enable auto-TLS(optional). Based on the Operating system, the syntax for installing packages as below,
sudo yum install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server à Oracle Linux, CentOS, RHEL
sudo zypper install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server à SLES
sudo apt-get install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server à Ubuntu
- Installation and Configuration of Database. Cloudera uses various databases and datastore to store information about Cloudera configuration and the health of the system or the task. Users can use MariaDB, PostgreSQL, Oracle DB, or MySQL for Cloudera Server and other services.
- Setting up the Cloudera Database, Cloudera Server includes a script that helps to create, configure the database. First, the script can create a Cloudera Server Database config file. Then, create a database for the Cloudera server for usage and create, configure a user account for Cloudera Server.
- Installation of CDH and other related software, After the Cloudera database is setup, you need to start Cloudera Manager Server and login to the Admin console. Default credentials would be admin, and the password is also admin by default.
- Setting up Cluster using Wizard, after completion of adding the clusters, installation wizard, Add Cluster Configuration Wizard will automatically start.
Getting started with Cloudera CDH
Cloudera CDH is a complete, tested, and popular distribution of Apache Hadoop. CDH is the one that delivers core elements of Hadoop, distributed storage, and scalable storage along with a Web-based User Interface. In addition, CDH is the only Hadoop solution for unified batch processing, interactive search and interactive SQL, and role-based access.
Cloudera CDH provides,
- Compatibility: It leverages the IT infrastructure and investment.
- Flexibility: It stores any type of data and manipulates it with various computation frameworks that include batch processing, free text search, interactive SQL, statistical computation, and machine learning.
- High Availability: It performs mission-critical business tasks with utmost confidence.
- Scalability enables a broad range of applications that scale and extend to suit user requirements.
- Security: It processes and controls sensitive data.
Cloudera CDH – Classic Clusters
Classic clusters track the total number of clusters that are enabled for Replication Manager, tracks cluster in error state, clusters that are active, and clusters for which a warning has been issued.
Users must register on existing on-premise Cloudera Distribution of Hadoop on Management Console, and after which users can copy or move the data to the cloud. These are called classic clusters.
Classic Clusters show below statuses,
Active, Warning, Error, Total.
To investigate the status of the classic cluster, Cloudera Manager for CDH has to be used.
With this, we shall conclude the topic “Cloudera CDH.” We have seen what CDH is and how it is used and how it is to be installed, steps required to install, and pre requisites before installation have also been looked into. We have also gone through Cloudera Connectivity to Cloud Storage or the setup of Cloudera CDH. Finally, I have gone through the Classic clusters in Cloudera CDH, that will help to give a deeper insight into the concept.
This is a guide to Cloudera CDH. Here we discuss how to use the Cloudera CDH and how it is to be connected to Cloud Storage and its setup. You may also have a look at the following articles to learn more –