Nordic Data Science and Machine Learning Summit 2017

Detecting Order Fraud For 500K Merchants: Machine Learning At Scale – Nevena Francetic

November 22, 2017

Receiving a fraudulent dispute is an emotionally and financially devastating experience for business owners. Entrepreneurs on our platform should be focused on building their business, not on becoming fraud experts. We relieve them of this burden by building an integrated fraud detection system. In this presentation, I outline the history, learnings and decisions we made to build a reproducible machine learning pipeline which trains on 40 million instances and scores hundreds of checkouts per minute in production.

Key Points:

Domain knowledge, reliable ground truth and good features are the foundation of powerful models
Investing in a structured pipeline speeds up model development
PMML is a language agnostic way to transfer predictive models from batch training to production scoring systems
Reconciling training and production is a critical last step for building a reliable model