Receiving a fraudulent dispute is an emotionally and financially devastating experience for business owners. Entrepreneurs on our platform should be focused on building their business, not on becoming fraud experts. We relieve them of this burden by building an integrated fraud detection system. In this presentation, I outline the history, learnings and decisions we made to build a reproducible machine learning pipeline which trains on 40 million instances and scores hundreds of checkouts per minute in production.
- Domain knowledge, reliable ground truth and good features are the foundation of powerful models
- Investing in a structured pipeline speeds up model development
- PMML is a language agnostic way to transfer predictive models from batch training to production scoring systems
- Reconciling training and production is a critical last step for building a reliable model