Updated June 27, 2023
Introduction to Cloudera CDH
Cloudera CDH is Cloudera’s 100% open-source platform distribution provided by Cloudera Inc; it is a Palo Alto-based American Enterprise Software Company that includes Apache Hadoop, built to meet enterprise demands. Cloudera CDH delivers everything a user needs specifically for enterprise use. CDH, Cloudera Distributed Hadoop, provides the most complete, tested, and popular distribution of Apache Hadoop and other related projects. It delivers core elements of Hadoop, distributed computing, scalable storage, vital enterprise capabilities, and Web User Interface. We will explore how to use Cloudera CDH, establish its connection to Cloud Storage, and set it up.
How to Use Cloudera CDH?
Before looking into how to use Cloudera CDH, we need to go through the installation process of Cloudera,
Step 1: Before installing Cloudera Manager CDH and other managed services, the user needs to take care of Storage Space and plan for Cloudera Manager.
Cloudera Manager needs to track job metrics and applications in background processes. Storage requirements can vary depending on the organization’s size, and it can be either local, remote, or disk-based, as all the metrics require storage.
Failing to plan the storage needs, CDH can negatively impact in many ways –
- A cluster might miss critical audit information not retained or gathered for the required time.
- A cluster might not be able to get the historical operational data to meet internal requirements.
- Gaps might be present in collections and charts.
- Administrators may not have historical Yarn, MR1, or Impala data usage when they need to reference or report on it later.
- Administrators may be unable to have the research health status or past data.
Configuration host in Clusters allows all the members to communicate with each other.
- Setting the unique hostname
sudo hostnamectl set-hostname sample.example.com
- Editing /etc/hosts/ with IP address and qualified Domain name for each host in the cluster.
- Editing /etc/sysconfig/network with Domain name of the host.
- Verifying each host consistently identifies a network.
Firewall disabling, save the existing rule set for iptables, and then disable the firewall based on the compatibility, be it RHEL 7 and SLES or Ubuntu.
Setting Secured Enhanced Linux allows for setting the control access through policies. For example, if there is an issue in Deploying CDH, then SELinux should be set in permissive mode on every host before deploying CDH on a cluster.
Enabling NTP Service: Cloudera CDH needs Network Time Protocol(NTP) configuration on each machine in the cluster. And have the Software Collection Library Repository installed in the system.
Step 2: Connection of Cloudera Manager to Cloud Storage/ Setup Connectivity of CDH
Configuring Repository for Cloudera Manager. You install Cloudera Manager using a package tool such as zypper for SLES, yum for RHEL, and apt-get for Ubuntu.
Install JDK, Java Development Kit. It can be either Oracle JDK by Cloudera using the Cloudera Manager or an Open JDK. Most of the Linux distributions that Cloudera supports include Open JDK.
Installation of Cloudera Manager Server: Here, the user has to install Cloudera Manager packages on the Cloudera Manager Server host and enable auto-TLS(optional).
Based on the Operating system, the syntax for installing packages is as below:
sudo yum install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server à Oracle Linux, CentOS, RHEL
sudo zypper install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server à SLES
sudo apt-get install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server à Ubuntu
Installation and Configuration of Database. Cloudera uses various databases and datastore to store information about Cloudera’s configuration and the system’s health or task. Users can use MariaDB, PostgreSQL, Oracle DB, or MySQL for Cloudera Server and other services.
Setting up the Cloudera Database, Cloudera Server includes a script that helps to create and configure the database. First, the script can create a Cloudera Server Database config file. Then, create a database for the Cloudera server for usage and create and configure a user account for Cloudera Server.
After setting up the Cloudera database, you must start the Cloudera Manager Server and log in to the Admin console to install CDH and other related software. The default credentials would be admin, and the password is also admin by default.
Setting up Cluster using Wizard, after completion of adding the clusters, the installation wizard, Add Cluster Configuration Wizard, will automatically start.
Getting Started with Cloudera CDH
Cloudera CDH represents a complete, tested, and popular distribution of Apache Hadoop. CDH is the one that delivers core elements of Hadoop, distributed storage, and scalable storage along with a Web-based User Interface. In addition, CDH is the only Hadoop solution for unified batch processing, interactive search and interactive SQL, and role-based access.
Cloudera CDH provides –
- Compatibility: It leverages IT infrastructure and investment.
- Flexibility: It stores any data and manipulates it with various computation frameworks that include batch processing, free text search, interactive SQL, statistical computation, and machine learning.
- High Availability: It performs mission-critical business tasks with utmost confidence.
- Scalability: It enables various applications that scale and extend to suit user requirements.
- Security: It processes and controls sensitive data.
Cloudera CDH – Classic Clusters
Classic clusters track the total number of clusters enabled for Replication Manager, track clusters in error state, active clusters, and clusters for which a warning has been issued.
Users must register on existing on-premise Cloudera Distribution of Hadoop on the Management Console, after which users can copy or move the data to the cloud. These are called classic clusters.
Classic Clusters show below statuses –
Active, Warning, Error, Total.
You must use Cloudera Manager for CDH to investigate the classic cluster’s status.
Conclusion
With this, we shall conclude the topic “Cloudera CDH.” We have seen what CDH is and how it is used, and how it is to be installed; the steps required to install and prerequisites before installation have also been looked into. We have also gone through Cloudera Connectivity to Cloud Storage or the setup of Cloudera CDH. Finally, I have gone through the Classic clusters in Cloudera CDH, which will help to give a deeper insight into the concept.
Recommended Articles
We hope that this EDUCBA information on “Cloudera CDH” was beneficial to you. You can view EDUCBA’s recommended articles for more information.