The huge advances in the latest AI-models for NLP, based on Google’s BERT, provide many opportunities for organisations to efficiently find and reuse information from a large corpus of internal documents. This talk will discuss engineering challenges and architectural choices for building a textual similarity service based on SentenceBERT in a scalable and robust way.
- How should a corpus of internal documents be indexed for use by SentenceBERT?
- What are good and bad architectural choices for a text similarity service?
- How can cloud services be used to maximize robustness and scalability?