Apache Spark

Apache Spark is an open-source, distributed computing system used for big data processing and analytics. It offers an interface for programming entire clusters with impeccable data parallelism and fault tolerance. With its high-level APIs in Java, Scala, Python and R, it provides a framework for distributed task dispatching, scheduling and basic I/O functionalities. Notable modules include SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for incremental computation and stream processing. Apache Spark can run standalone, on Hadoop, or in the cloud, and is capable of accessing diverse data sources such as HDFS, Apache Cassandra, Apache HBase, and Amazon S3.

Visit the following resources to learn more:

@official@ApacheSpark
@article@Spark By Examples
@feed@Explore top posts about Apache Spark