Course Overview
View Offline

Course Overview

2h 3m | 21 Videos | 75400 Views |

Beginner| English[Auto-generated]

What is Apache Storm

Apache Storm is a real time distributed framework for real time processing of Big Data. It allows you to perform real time analytics of wide variety of streamed data. Apache storm is written using Java and Clojure. It is designed to process vast amount of data in a fault tolerant and horizontal scalable method. Apache storm continues to be the leader in real time analytics. It is easy to install, set up and operate. Apache storm is used in many fields to deal with large data.

wbcr_snippet

Benefits of Apache Storm

Here is a list of all the benefits of Apache Storm

Apache Storm is open source, highly scalable and user friendly
It is fault tolerant and flexible
Apache storm is reliable and it can support any programming language
Apache storm allows real time stream processing
It is very quick and processes the data very fast that is millions of tuples in per second in per node.
Storm provides guaranteed data processing even if any of the nodes are lost in the cluster messages
Storm can be used with any programming language and it is fun to use Storm

Use Cases of Apache Storm

Apache Storm is suitable in various use cases and a few are listed below

Stream Processing
Continuous Computation
Distributed RPC
Real Time Analytics
Online Machine Learning

Course Objectives

At the end of this course you will be able to

Master the fundamentals and architecture of Apache Storm
Understand where to Apache storm for real time analytics
Setup Apache storm cluster on your computer
Understand the basics of storm interfaces with Java and others
Learn about storm technology stack and groupings
Implement Spouts and Bolts
Work on projects using Apache Strom

Pre Requisites for taking this course

Before taking this course you should have a basic knowledge in Java programming and any of the Linux based systems. Basic knowledge of data processing and knowledge in Hadoop will be an added advantage.

Target Audience for this course

This course is meant for professionals who are willing to start their career in Big data analytics using Apache storm framework. The others include Software professionals, Data scientists, ETL developers and Project managers.

Apache Storm Course Description

Section 1: History

Apache Storm was originally created by Nathan Marz while he was working at BackType. Nathan discovered Storm because of the cumbersome and brittle system of distributed queues and workers faced in other real time component. Storm was the first to introduce the concept of stream which is fault tolerant and reliable model. Storm is now acquired and open sourced by Twitter. In a very short period of time Apache Storm has become more popular and leading real time processing system. This chapter contains a brief introduction to Apache and explains the history of Apache Storm.

Section 2: Features

Apache Storm has a lot of good features than other real time processing system. The below mentioned are the top most features of Apache Storm

Simple programming model – Topology, Spouts, Bolts
Programming language agnostic – Clojure, java, Ruby, Python
Fault tolerant
Horizontally scalable
Guaranteed message processing
Very fast
Local mode

Architecture of Apache Storm

One of the main advantage of Apache Storm is its fault tolerant and no single point of failure. In this chapter a pictorial representation of the architecture of Apache Storm is given for easy understanding of the cluster design of storm. Apache storm has two types of nodes, the Nimbus which is the Master node and the Supervisor which is the worker node. The goal of nimbus is to run the storm topology and the work of the supervisor is to delegate the tasks to the worker processes. The below mentioned components are explained in detail under this chapter

Nimbus
Supervisor
Worker Process
Executor
Task
Zookeeper framework

Architecture Explanation in Detail

Storm has an architecture which helps it to process the real time data in a best possible and quickest way. It contains a monitoring tool called monit which helps the process to restart if there is any failure. Storm has an advanced topology called Trident Topology which provides a high level API like Pig. All these features are discussed in detail in this chapter.

Topology

Topology is a graph of operators and streams. Topology is a combination of Spouts and Bolts. Topology helps to define a streaming application in Storm. The node in a topology contains processing logic. The links in topology determines how the data should be run through the nodes. The process of running a topology is straightforward. Apache storm’s main objective is to run the topology as many times until the topology is killed. The topology command is explained in detail in this lesson.

Topology Creation

The topology in Apache Storm is a thrift structure. Topology builder has simple and easy methods to create topologies. Topology Builder has a Create Topology syntax to create a new topology. The code to create a topology is given in this chapter.

Trident

Apache Storm has an advanced topology called Trident Topology. Trident has functions, filters, joins, grouping and aggregation. These components are explained in detail in this chapter. The other topics included in this chapter are listed below

Trident Tuples
Trident Spout
Trident Operations
State Maintenance
Distributed RPC
When Trident should be used
Example of Trident
Formatting the call information
CSV Split
Log Analyzer

Spouts

A topology usually starts with Spouts. These are the sources of streams in a topology which are used for data creation. Spout reads tuples from a messaging framework and transfers them to one or more bolts. Tuple is a named list of values in Apache Storm.

Spout Creation

Spout will implement an “IRichSpout” interface which has the following components

Open – conf, context, collector
nextTuple – contains the signature of the nextTuple method
close – signature of the close component is mentioned under this topic
declareOutputFields – this is used to specify the output schema of the tuple
ack – ensures that the specific tuple is processed
fail – this method informs if there is any failure in the processing of the tuple
Fake Call Log Reader Spout – The call log contains caller number, receiver number and duration

Bolt

Bolt is considered to be a node in a topology. Bolts helps to process the input stream and produce new stream. Bolts have the smallest processing logic. The output of one bolt can be used as input for another bolt.

Bolt Creation

Bolt is a component which takes a tuple as input, processes it and produces a new tuple or tuples as output. This implements “IRichBolt” interface. The operations are carried out using two classes CallLog CreaterBolt and CallLog CounterBolt. The interface in bolt has the following methods

prepare – conf, context and collector
execute – this method processes a single tuple at a time. Multiple tuples can also be processed but it produces a single output tuple as the output
cleanup – signature of the cleanup method is given here
declareOutputFields – the parameter declarer is used to declare output stream ids, output fields and others
Call Log Creator Bolt – this receives the call log tuple and it has caller number, receiver number and call duration. this topic gives the complete code of CallLog Creator Bolt.
Call log counter bolt – this method receives call and its duration as a tuple. This bolt method creates a dictionary object in the prepare method. The coding of Call log counter bolt is given in this section

Stream

A stream is an unordered sequence of tuples which is processed by the application. Apache storm reads raw stream data from one end and this data goes through a sequence of processing units which produces the output at the other end. Streams of data flows from spouts to bolts and from one bolt to another. The stream concept in Storm is discussed in this section with example.

Stream Grouping

Stream grouping helps to control the route of the tuples in a topology and helps to understand the work flow of the topology. There are four in built groupings as explained in this chapter

Shuffle Grouping – In this grouping equal number of tuples are distributed randomly to all the workers who are executing the bolts
Field Grouping – In this grouping the fields with same values are grouped together. Such field values are sent to the same worker who are executing the bolts
Global Grouping – Under this grouping, all the streams are grouped and sent to a single bolt usually to the bolt with the lowest ID
All Grouping – This grouping sends a copy of each tuple to all the bolts. This is used for join operations

Section 3: Installation Process

Apache Storm can be installed in your system using three steps

Installation of Java – The steps of java installation are listed below

Download JDK
Extract files
Move to opt directory
Set path
Java Alternatives
Commands to check whether the Java is installed

Installation of Zookeeper framework

Below mentioned are the steps to install Zookeeper framework

Download Zookeeper
Extract tar file
Create configuration file
Start Zookeeper Server
Start CLI
Stop Zookeeper Server

Installation of Apache Storm framework

Download Storm
Extract tar file
Open configuration file
Start the nimbus
Start the Supervisor
Start the UI

FAQ’S General Questions

What is the benefit of learning Apache Storm ?

The google search trend in the recent years prove that Apache Storm will be the next big thing in the field of real time analytics. Apache Storm is said to be the most preferred platform for Real time Big data analytics by many companies and professionals. This course will help you to understand Apache storm from the basic and you will come across a lot of real world examples which you can implement in your workplace. So either if you are a professional already with Apache Storm or you are a beginner to Apache storm, this course will help you to go up the career ladder.

How would this course help me in building my career ?

The world is trending in real time. This course will help you to become an Apache expert on the top of Big Data Hadoop Developer. This course will definitely provide you with skill sets which you require to build a career path from Big Data Hadoop developer to Big Data Hadoop Architect.

Testimonials

Jenifer

This is a great course on Apache Storm at a very reasonable price. The course is highly interactive and also practical oriented. The course structure is well designed and the content is delivered properly. The course content is of high quality and all updated. The contents are also very interesting to learn. I never thought it would be this easy to learn Apache Storm before taking this course. For beginners it is an excellent course. I am very much gratified that from this course I have learned what I wanted to know about Apache Storm.

Christopher

This is a mind blowing course on Apache Storm. This course delivered what I expected from it. It contains all knowledgeable material. My experience from this course was awesome as well as rewarding. The course contains all valuable information under different sections and each section was neatly organized. The flow of the section was continuous and was interconnected with the previous topic. I found this course very informative and interesting. Thanks to educba for providing this course.

Joseph

This online course on Apache Storm was extremely convenient and interesting. It helped to get deeper insights into the architecture and concepts of Apache storm. Great course and I wish educba to provide more such courses.

Where do our learners come from?

Professionals from around the world have benefited from eduCBA’s Oracle SOA Suite 11g Comprehensive courses. Some of the top places that our learners come from include New York, Dubai, San Francisco, Bay Area, New Jersey, Houston, Seattle, Toronto, London, Berlin, UAE, Hong Kong, Singapore, Australia, New Zealand, Bangalore, New Delhi, Mumbai, Pune, Kolkata, Hyderabad and Gurgaon among many.

Course Overview

What is Apache Storm

Benefits of Apache Storm

Use Cases of Apache Storm

Course Objectives

Pre Requisites for taking this course

Target Audience for this course

Apache Storm Course Description

Section 1: History

Section 2: Features

Section 3: Installation Process

FAQ’S General Questions

Testimonials

View Offline

Follow us!

APPS

Company

Work with us

EDUCBA for Enterprise

Resources

Popular Categories