This workshop will focus on understanding the theoretical motivations behind Apache Spark and how they affect its use in practice. Given that many tools overlap with the functionality available in Spark, we will work through some example workflows and reason about things like performance, reliability, flexibility, and ease of use.
Discussion Points
-An intro to RDDs and DAGs
-Is Apache Spark really the fastest?
-Complex out-of-core algorithms
-Distributed Machine Learning
Add comment