At Soundtrack Your Brand we face the challenge of having to build a state-of-the-art music recommendation system, whilst not having much interactive usage data to work with. So what does it mean to build machine-learning models that only have audio to learn from? And what are the characteristics that set music apart from other audio data like speech?
- Music has additional characteristics compared to other audio that can be utilised
- There is a lot of interesting overlap between the physics of sound and music theory
- Several layers of embeddings can limit the need for high-quality labeled data