What is Apache Spark? Why there is a serious buzz going-on about this? If you are into BigData analytics business then, should you really care about Spark? Hope this post will help to answer some of these questions which might have coming to your mind these days.
Apache Spark is a powerful open source processing engine for Hadoop data built around speed, easy to use, and sophisticated analytics. It was originally developed in UC Berkeley’s AMPLab and later-on it moved to Apache. Apache Spark is basically a parallel data processing framework that can work with Apache Hadoop to make it extremely easy to develop fast, Big Data applications combining batch, streaming, and interactive analytics on all your data.
Lets go through some of its features which are really highlighting it in the Bigdata world!
(Spark Performance over Hadoop. Image Courtesy: Cloudera. Visit this link to see how Jai & Matei explains the delightful experience giving by Spark to its developers.)
Here is what Cloudera says about Sparks Streaming abilities:
(Streaming Performance over Storm. Image Courtesy:Cloudera.com)
Below are some useful links to start with:
Lets go through some of its features which are really highlighting it in the Bigdata world!
- Lighting Fast Processing
(Spark Performance over Hadoop. Image Courtesy: Cloudera. Visit this link to see how Jai & Matei explains the delightful experience giving by Spark to its developers.)
- Ease of Use as it supports multiple languages
- Support for Sophisticated Analytics
- Real time stream processing
Here is what Cloudera says about Sparks Streaming abilities:
- Easy: Built on Spark’s lightweight yet powerful APIs, Spark Streaming lets you rapidly develop streaming applications
- Fault tolerant: Unlike other streaming solutions (e.g. Storm), Spark Streaming recovers lost work and delivers exactly-once semantics out of the box with no extra code or configuration
- Integrated: Reuse the same code for batch and stream processing, even joining streaming data to historical data
(Streaming Performance over Storm. Image Courtesy:Cloudera.com)
- Ability to integrate with Hadoop and existing HadoopData
- Active and expanding Community
Below are some useful links to start with:
- Downloadthe latest release of Spark!
- Read the quick start guide.
- Watch some free training videosand exercises.