Need For Distributed Speed - Anders Arpteg

As a data scientist, working at a data-first company leads to many interesting challenges. It is not only about building music recommendations, but also about being able to performing advanced analytics and machine learning on peta-byte level.

Key Questions

  • What do Spotify use all peta-bytes of data for?
  • Isn't it sufficient to take a sample and train models on a single machine?
  • Is Apache Spark a silver-bullet to distributed computing?


10 months ago

If you can understand what distributed speed is, and get it, then most of the work that you do at your job will be done rather quickly. This anecdote has helped me out very much in this life.