Course Overview
Learning spark programming:
Apache Spark™ is a fast and general engine for large-scale data processing. It is supposed to run programs at a high speed as compared to Hadoop and Mapreduce. It is quite easy to use and can write applications quickly in Java, Scala, Python, R. In terms of performance it can combine SQL, streaming, and complex analytics. It is quite versatile and can runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
Through this overview course on Spark you shall understand the fundamental mechanisms and basic internals of the framework and understand the need to use Spark, its programming and machine learning.
The training will include the following;
- Infrastructure setup
- Overview of Spark programming with textual dataset: Level-1
- Simplifying application development with notebooks
- Spark programming with textual dataset: Level-2
- Machine learning with Mlib
Target Customers:
- Students/Professionals Interested in learning about Apache Spark
- Anyone who wants to learn about data and analytics
- Data Engineers
- Analysts
- Architects
- Software Engineers
- IT operations
- Technical managers
Pre-Requisites:
- Basic Computer Knowledge
- Experience of coding
- Knowledge of Mapreduce paradigm
- Basic knowledge of any these- Java/Scala/Python