Course Overview
View Offline

Course Overview

3h 59m | 25 Videos | 76339 Views |

Intermediate| English[Auto-generated]

trigger text

hidden content

What is NoSQL

Before going into Cassandra first let’s understand what is NoSQL. A NoSQL database is used to store and retrieve data which is in various forms. NoSQL databases are schema free with a simple design, horizontal scaling, greater consistency and can handle very huge amount of data.

wbcr_snippet

What is Cassandra Developer

Apache Cassandra is a free open source distributed database system which is used to manage very large amount of structured, semi structured and unstructured data. It is highly scalable and it can be used as real time operational data store as well as a read intensive database for large scale businesses. Cassandra is designed in such a way that there is no single point of failure because of its peer to peer symmetric nodes. In a short period of time Cassandra has become so popular because of its outstanding technical features and consistent performance. It can store hundreds of terabytes of data and offers a schema free data model.

Features of Cassandra

Below listed are some of the key features of Cassandra

Scalable
Fault tolerant
Architecture that has no single point of failure
Linear scale performance
Flexible data storage
Easy data distribution
Transaction support
Fast writes

Course Objectives

After successful completion of this course you will be able to

Understand the basics of Cassandra
Install and configure Cassandra
Understand the architecture of Cassandra
Learn about various Cassandra monitoring and administration techniques
Have an understanding of the nodes in the cluster and how it works
Understand the various modelling techniques and the key components of reading and writing data

Pre requisites for taking this course

This tutorial does not need much skills. You just need to have a basic knowledge in Java Programming. It is an added advantage if you have a prior experience in database concepts or SQL syntax or Linux commands.

Target Audience for this course

This tutorial will be very much useful for software professionals and anyone who has a passion towards Cassandra can also take up this course.

Course Description

Section 1: Cassandra Developer

Agenda – Cassandra

This chapter gives a overall introduction to Cassandra. It includes the difference between relational databases and NoSQL databases, history of Cassandra, the features, Application of Cassandra in various fields and its uses.

Architecture

The architecture of Cassandra is designed in such a way that it can handle big data workloads across multiple nodes. Cassandra has a peer to peer distribution system which is much more elegant and easy to set up and maintain. There is no concept of master node in Cassandra. Cassandra’s architecture is capable of handling perabytes of information. The components of Cassandra are explained in this chapter

Node
Data centre
Cluster
Commit Log
Memtable
SSTable
Bloom Filter

Architecture continued

Cassandra architecture also provides automatic distribution of data across all the nodes in a ring. Cassandra also provides built-in and customizable replication which can store redundant copies of data in all the nodes. Replication in Cassandra is easy to configure. This chapter deals with distributing and replicating data in Cassandra

The other topics included in this chapter are

Multi Datacenter and Cloud Support
Reading and Writing Data
Data Consistency

Introduction to Data Model

The data model of Cassandra is very different from other data models of RDBMS. The Cassandra data model is designed to distribute data on a very large scale. In this chapter we will take a bottom approach to understand Cassandra’s data model. In Cassandra keyspace is the holder of your data and the keyspace contains one or more family objects. The topics covered under this chapter are

Keyspace – Replication factor, Replica placement strategy, Column families
Column Family – keys_cached, rows_cached, preload_row_cached
Column
Super Column
Difference between data models of Cassandra and RDBMS

Data Model Queries

Planning a data model in Cassandra has different design considerations. The data model designing depends upon the data which you want to access and capture. The best way to design data modelling in Cassandra is to start with queries. This includes thinking about what actions needs to be taken, how the data can be accessed and then finally design column families.

In Depth CQL

This chapter gives an introduction to Cassandra Query Language and explains how to use the commands in CQL. CQLSH can be used to define a schema, insert data and execute a query. Cassandra query language can be started using the command cqlsh.

Start cqlsh options – cqlsh –help, cqlsh –version, cqlsh –color, cqlsh –debug, cqlsh –execute cql_statement, cqlsh –file= “file name”
Documented shell commands – HELP, CAPTURE, CONSISTENCY, COPY, DESCRIBE, EXPAND, EXIT, PAGING, SHOW, SOURCE, TRACING
CQL Data Definition Commands – CREATE KEYSPACE, USE, ALTER KEYSPACE, DROP KEYSPACE, CREATE TABLE, ALTER TABLE, DROP TABLE, TRUNCATE, CREATE INDEX, DROP INDEX
CQL Data Manipulation Commands – INSERT, UPDATE, DELETE, BATCH
CQL Clauses – SELECT, WHERE, ORDERBY

Data Modelling

Data modelling is one of the important step to ensure about the performance of the Cassandra applications. Data modelling is the process of identifying the pattern of data access and the queries that has to be performed. The other topics included in this section are

Rules of Cassandra Data Modelling
Data Modelling concepts, principles and methodology
Time series data modelling
Examples of data modelling

Complex Queries

Under relational database system to join many tables into one table, you need to create more complex SQL statements and as a result the process becomes slow. Whereas in Cassandra complex queries and multiple joins becomes very easy with simple statements and it is also done fast. This section deals with how complex queries are handled in Cassandra.

Whiteboard

Whiteboard is where the exact data model fits the database’s constraints without wanting to make any translation steps to the initial mapping out of the data model.

Section 2: Cassandra Administration

Introduction to Architecture

The Cassandra Architecture is very sophisticated and it depends on the use of several theoretical constructs. The topics included in this chapter are discussed in brief here

System Keyspace – Cassandra has an internal keyspace called system which is used to store the metadata. The metadata of Cassandra includes the node’s token, cluster name, keyspace and schema definitions and migration data
Peer to Peer – Cassandra has a peer to peer distribution model and so there is no master node. This peer to peer model improves the database availability. This design also makes it easy to add new nodes
Gossip and Failure detection – A gossip protocol is used in Cassandra to support decentralization and partition tolerance. In this section you will learn about the working process of gossip.
Anti Entropy and Read Repair – Anti Entropy is the replica synchronization mechanism in Cassandra. This topic also contains details about read repair and how it is performed.
Memtables, SSTables and Commit Logs – The Commit log is a crash recovery mechanism in Cassandra. From commit log the value is written into a memory resident data structure called Memtables. When the contents in the Memtables reaches its maximum the remaining contents are stored in a file which is called the SSTable. Compaction operation is used to merge several SS Tables.
Bloom Filters – These are performance boosters in Cassandra.
Staged Event Driven Architecture (SEDA) – SEDA is a architecture for highly concurrent internet services.
Managers and Services

Deployment

When planning a Cassandra Cluster Deployment you should first have an idea of the amount of data you need to store and an estimation of the workload. This deals with selecting the hardware, RAM, CPU, Disk and Network. The other topics included in this section are

Planning an Amazon EC2 cluster
Capacity Planning
Choosing Node Configuration Options
Snitch Settings
Choosing Keyspace Replication Options

Replication

Replication is the process of storing multiple copies of data in various nodes. The Replica Placement Strategy in creating a keyspace lets you decide how many number of replicas should be created and how it should be distributed. The total number of replicas created are called as the replication factor. The other topics included in this chapter are

Replica Placement Strategy – Simple strategy, Network Topology Strategy,
Snitches – Simple Snitch, DSE Simple Snitch, Rank Inferring Snitch, Property file snitch, EC2 snitch, EC2 multi region snitch, Dynamic Snitch
Client requests
About Write Requests
About Read Requests

Sharding

Sharding is used to scale a relational database. In order to Shard your data, you first need to find a way to order your records. Sharding can also be considered as a kind of shared nothing architecture where there is decentralization and each node in a distributed system is considered independent. Sharding will help you to scale horizontally as well as precisely based on the strategies you select. There are three basic strategies for determining the Shard structure

Feature Based Shard or Functional Segmentation
Key Based Sharding
Lookup Table

Performance Monitoring Strategies

Performance plays an important role in the high sales of Cassandra. One of Cassandra’s hallmark is its performance in read and write operations. In terms of scalability Cassandra outpaces its NoSQL competitors. By monitoring the performance of Cassandra you will be able to identify the weaker sections and resource limitations. There are certain areas where the performance monitoring is a must in Cassandra. They are

Read and write requests
Read and write latency
Disk usage
Garbage collection frequency and duration
Errors and Overruns

Performance Monitoring Strategies continued

There are a lot of performance metrics which can be used through a lot of tools. Few of the performance metrics are explained in detail in this chapter

Throughput – Read throughput, Write Throughput
Latency – Read Latency, Write Latency, Key Cache Latency
Disk Usage – Load, Total disk space used, Complete Compaction tasks, Pending Compaction tasks
Garbage collection – ParNew Count, ParNew Time, Concurrent MarkSweep count, Concurrent MarkSweep time.
Errors and Overruns – Timeout Exceptions, Unavailable Exceptions, Pending Exceptions, Currently Blocked Tasks

FAQ’s General Questions

Why should I get certified in Apache Cassandra ?

The demand for Apache Cassandra and NoSQL skills are skyrocketing. This has made the Cassandra developers demand for highest salaries among the other database technology. Getting certified in Cassandra will help to increase your confidence about the knowledge of Cassandra. This certification makes you an expert in Cassandra and you will have a growth in your career. You can add your name to the Certified Cassandra Developers group in various professional social networking sites like LinkedIn.

What benefits will I get from this course ?

From this course you will learn how to perform the various operations that can be performed in the database like creating, inserting or deleting the data. You will know how to monitor your databases and the working of cluster nodes in Cassandra. After this course you will be able to perform database operations in Cassandra like an expert.

Testimonials

Stephen

It was a wonderful experience learning from educba. This is a great course with all the basics of Cassandra developer and administration. Each topic in the course is well structured and well explained with examples wherever needed. This course helped me to enhance my knowledge about Cassandra and start my career in Cassandra with great confidence. I am greatly impressed with this course and would definitely recommend this course to others

Angelia

I took this course two weeks back and it is an amazing course. The course starts with the basic concepts and flows into deeper concepts of Cassandra. This is suited for both beginners as well as professionals. One can start up working with Cassandra like an expert after taking this course. The topics of the course are self explanatory and easy to understand. Overall an excellent course for Cassandra developers.

Where do our learners come from?

Professionals from around the world have benefited from eduCBA’s Comprehensive Cassandra Developer & Administration Training courses. Some of the top places that our learners come from include New York, Dubai, San Francisco, Bay Area, New Jersey, Houston, Seattle, Toronto, London, Berlin, UAE, Chicago, UK, Hong Kong, Singapore, Australia, New Zealand, India, Bangalore, New Delhi, Mumbai, Pune, Kolkata, Hyderabad and Gurgaon among many.

Course Overview

What is NoSQL

What is Cassandra Developer

Features of Cassandra

Course Objectives

Pre requisites for taking this course

Target Audience for this course

Course Description

FAQ’s General Questions

Testimonials

View Offline

Follow us!

APPS

Company

Work with us

EDUCBA for Enterprise

Resources

Popular Categories