Text representations as few-shot classifiers – Melanie Beck, Cloudera


Text classification is a ubiquitous capability with a wealth of use cases. While dozens of techniques now exist for this fundamental task, many of them require massive amounts of labeled data in order to prove useful. Collecting annotations for your use case, however, is typically one of the most costly parts of any machine learning application. In this talk, I’ll explain how text representations (embeddings) can be leveraged as classifiers, trained with only a small amount of labeled data, or even with no labeled data at all. I’ll also give a demo of this method in action.

Key Takeaways

  • Learn about various limited-labeled data paradigms and strategies
  • Understand how popular text embedding models (SentenceBERT, Word2Vec) can be used as classifiers
  • Prototype demonstration via a simple Streamlit application
  • Insights on the strengths and limitations of text embeddings as classifiers

Add comment