Need For Distributed Speed – Anders Arpteg

As a data scientist, working at a data-first company leads to many interesting challenges. It is not only about building music recommendations, but also about being able to performing advanced analytics and machine learning on peta-byte level.

Key Questions

  • What do Spotify use all peta-bytes of data for?
  • Isn’t it sufficient to take a sample and train models on a single machine?
  • Is Apache Spark a silver-bullet to distributed computing?

Add comment