Updated June 15, 2023
Introduction to Hbase Interview Questions And Answers
Part 1 – HBase Interview Questions (Basic)
This first part covers basic HBase Interview Questions And Answers.
1. When should you use HBase?
Answer:
Hbase is not suitable for all use cases.
i. Data volume: We should have petabytes of data to be processed in a distributed environment.
ii. Application: HBase is not suitable for OLTP(Online Transaction Processing) systems which require complex multi-statement transactions.
iii. Cluster Hardware: HBase runs on top of HDFS. And HDFS works efficiently with many nodes (minimum 5). So HBase can be a good selection only with good hardware support.
iv. Not Traditional RDBMS: Hbase cannot support any use case which requires traditional features like Join multiple tables, Complex SQls with nested or window functions, etc.
v. Quick random access to data: HBase is a suitable candidate if you need random and real-time access to your data.
2. What is the difference between Cassandra and HBase?
Answer:
Both HBase and Cassandra have distributed NoSQL database for Big Data from the Hadoop ecosystem. Both are built for different use cases.
The HBase has a kind of master-slave architecture with several components like Zookeeper, Namenode, HBase Master(Hmaster), and Data Nodes. Cassandra treats all nodes as masters, which means all nodes are equal and perform all functions.
HBase is optimized for reads; write only happens to the master node and has strong consistency for reading after write. Initially, Hbase was created by Google, and they named it BigTable. Even now, the APIs of Bigtable and HBase are compatible. The origin of Cassandra is from a paper for DynamoDB, which is a NoSQL database from AWS.
Let us move to the next HBase Interview Questions.
3. What are the Major Components of Hbase?
Answer:
HBase has three important components- HMaster, Region Server, and ZooKeeper.
i. HBase Master: HBase Tables are divided into regions. At the same time, the startup Master decides which region to assign to which region server(The region server will be a node in a cluster). It also handles table metadata operations like create, or changing the schema. This component also plays an important role in failure recovery
ii. Region Server: As mentioned above, this is where actual data write and read happens. These are actual cluster nodes. A typical region server can serve up to a thousand regions
iii. ZooKeeper: ZooKeeper is a cluster coordination framework widely used in the Hadoop ecosystem. Zookeeper tracks all servers (Master and region servers) present in cluster HMaster contacts ZooKeeper and notifications are produced in case of errors.
4. What is HBase Bloom Filter?
Answer:
Normally, the only way to decide if a row key is present in a store file is to check in file’s block index, which has the start row key of each block in the store file. Bloom filters act as an in-memory data structure that reduces disk reads to only the files likely to contain that row – Not all store files. So it acts like an in-memory index to indicate the probability of finding a row in a particular store file.
Part 2 – HBase Interview Questions (Advanced)
Let us now have a look at the advanced HBase Interview Questions.
6. How HBase version data?
Answer:
Actual deletion is happening only during compaction.
7. What is the difference between getting and Scanning?
Answer:
Get will return only a single row from the Hbase table based on the row key given. Scan command returns a set of rows depending upon a given search condition. Usually, getting is faster than scanning. So should prefer to use that if possible.
Let us move to the next HBase Interview Questions.
8. What happens when deleting a row?
Answer:
At the time of deletion, command data is not physically deleted from the file system; instead made invisible by setting a marker. Physical deletion happens during compaction.
Column, Version, and Family Delete Markers are three different markers that mark the deletion of a Column, Version of Column, and Column Family, respectively.
9. Explain the difference between HBase and Hive.
Answer:
Hive acts as an abstraction layer on top of Hadoop with SQL support. HBase is ideal for real-time data processing, whereas Hive is ideal for batch data processing.
10. What are Hlog and HFile?
Answer:
HLog, also known as the write-ahead log file or WAL is responsible for logging data changes before they are written to the actual data storage file, which is HFile.
Recommended Articles
This has been a guide to List of Hbase Interview Questions and Answers. Here we have covered the few commonly asked interview questions with their detailed answers to help candidates crack interviews with ease. You may also look at the following articles to learn more –