We discuss how to use parts of musical audio to search for clips of a similar feel and timbre. The solution includes self supervised learning and a triplet loss function. We provide practical insights from the work.
- Self supervision circumvents the need for huge labeled data sets
- Musical perception is subjective and user testing is necessary
- Practically, deep learning is error-prone and it pays to have a solid setup