Open Source Data Science With Python, Spark And Jupyter – Daniel Tidström

One of the major advantages with Hadoop is the schema-on-read and data lake architecture that simplifies storing huge amounts of data in its raw form. Having solved the storage part is a good thing but accessing, transforming and analyzing the data is obviously an even more important step towards achieving tangible business value from your data. This session will show how Svenska Spel uses Python and Spark to process and analyze large amounts of data with Jupyter notebooks as a unified interface.

Key Questions

  • How to establish an agile and powerful data science environment using the latest in open source tools?
  • Data Frames as the glue between distributed processing on Spark and in-memory analytics with Python
  • What are Jupyter Notebooks and why are they the perfect interface for data science?

Add comment