The financial sector and the fintech companies were probably among the first early adopters to deploy machine learning (ML) and artificial intelligence (AI) to improve performance and enhance the quality of services and products offered to consumers. Today, given the development of technology and regulations, there is an opportunity for new improvements and innovations.
One of the new sources of innovation shaping the financial sector, especially the banking industry, is open banking. Because of the data and services exchanged between financial institutions and third-party providers, there are challenges that every fintech company faces.
What are those challenges? How can they be met and overcome? What is the importance of the team in this process? We talked with the Tink data experts – Andrew Wu, Machine Learning Tech Lead and Eliisabet Hein, Data Scientist. They are among the speakers at the 7th edition of the Data Innovation Summit.
Hyperight: Can you introduce yourself, your professional background and current working focus?
Andrew Wu: I was born and raised in China and spent my first eighteen years there before my college in Finland, and later a master’s degree in Sweden. During the last thirteen years in the industry, I started as a software developer, and gradually transitioned to a data engineer, and now a machine learning engineer. I see myself as a backend generalist, as I have worked with different technologies, from building websites to infrastructure as code. I did my master’s thesis on the topic of Machine Learning and contributed to several ML-related projects at Eniro, US Bank, Swedbank, and H&M. Currently, I am working with a team of brilliant engineers to support and simplify the go-to production process for data scientists at Tink.
Eliisabet Hein: I’m originally from Tallinn, Estonia, but I moved to Scotland to study Computer Science & AI at the University of Edinburgh for my bachelor’s. The machine learning courses I took as part of this degree really captured me, so I continued my studies on the Machine Learning master’s program at KTH. I started at Tink as a thesis student, and continued as a full-time Data Scientist in the Enrichment Categorization team, where our focus is on identifying different spending and income categories to empower people to better manage their finances. I’ve also always been interested in languages and linguistics, so I’m very lucky to be working with NLP in a professional capacity.
Hyperight: During the Data Innovation Summit 2022, you will share more on the topic “Scaling ML in one of Europe’s hottest fintech companies”. Can you tell us what the delegates can expect from the presentation?
Andrew Wu: As the first dedicated machine learning professional Tink hired, I have to made a lot of decisions from tooling and architecture, to who to hire. All of those decisions have long-term effects when scaling with Tink – a fintech company rocketing. In this talk, we want to share our journey at Tink on those decisions, the reason behind them, was it a good decision, and how did we evolve from there. A lot of companies are now investing in data-driven decisions and data products. I hope this talk can be useful for people before or on the same journey.
Hyperight: To start with, can you tell us about Tink, what open banking is and what’s the role of AI and ML in the banking industry and the fintech companies?
Andrew Wu: There is an open banking definition out there for both the EU and other parts of the world. In simple terms, open banking is the exchange of data and services between financial institutions and third-party providers, allowing companies and developers to build services and applications for the banks and end-users.
Tink is a fintech startup, a pioneer in open banking now part of VISA. Tink is the most robust open banking platform – with the broadest, deepest connectivity and powerful services that create value out of the financial data.
When it comes to AI and ML, the banking industry probably is one of those early adopters, they do financial modeling, anti fraudulent transactions, and anti-money laundering with mathematics and data before the term “data science” existed. However, there is still a large room for improvements and innovations, especially for the end-users. At Tink, ML powers data products directly and indirectly. For example, the transaction categorization model and recurring transaction model are helping us to understand our purchase pattern, predicting balance, saving goals, and our ability to borrow. With ML/AI, those features of private banking previously reserved for wealthy individuals are becoming available to everyone.
Hyperight: In the summary of your presentation, you mentioned the growth of the company you work for, starting with no data scientists and only one handcrafted data model, to becoming a company that offers an ML product. What are the reasons behind this success?
Andrew Wu: A tiny correction there, Tink today offers a portfolio of data products that are directly or indirectly powered by ML, including Income Check, Risk Insights, and Money Management. It is a collective effort from everyone involved that made us today. We have been good at creating sustainable MVPs that are based on data, but not necessarily powered by machine learning. We do use the opportunity to design flexible APIs and feedback systems that will help us evolve into a machine learning solution in the end.
Eliisabet Hein: I agree. I think to be successful in a commercial environment, you have to be pragmatic, and design a simple MVP for a new product while keeping in mind a more complex ML solution down the road. The categorization product is a good example of that, from its beginnings as simple rule-based systems, to the automated pipeline that trains new neural network models we have now. Having a feedback loop from end-users has been crucial to know where our models make mistakes and be able to adjust for them, as well as collect new training and test data over time.
Hyperight: What are the key challenges a fintech company can face regarding data?
Andrew Wu: This can be different from company to company. At Tink, the complexity comes from data ownership. Our partners and their end users are the owners of their data, which means it is very limited on what we can do with it. Another challenge probably shared across all companies is GDPR, especially the “right to be forgotten”. This means it is impossible to save training data to repeat the training process.
Hyperight: You emphasize the importance of scaling the team to overcome the barriers a fintech company can have. Can you tell us how to scale a team and how you’ve scaled your data team?
Andrew Wu: At Tink, all teams are constantly having their hands full, and this applies to my team even more so. When I was asked about what kind of profile we want to hire as ML Engineer, I put down things in my mind that matter now and in the future of this team, for example, data engineering, machine learning, and infrastructure as code. This has been proven to be a wrong strategy. Instead of hunting for superstars, we broke down this profile into three: infrastructure engineer/SRE, data engineer, and software engineer. This split made recruitment way easier and provided room for future development into the field of ML.
Hyperight: What would be your recommendations to those who are just starting to look into this topic, where should they start, and what should they pay attention to?
Andrew Wu: If your job is to support ML products and data scientists like mine, I would suggest talking to your data scientists as your step one, and understanding where they are situated on the engineer to researcher spectrum. This decides what tools and environments they prefer, and what level of engineering support is needed. Another piece of advice is to find “good enough” solutions. Most ML products do not need Apache Spark or distributed training from day one, and renting a “NASA” machine from a cloud provider for a short period is more cost-effective than maintaining a cluster.
Eliisabet Hein: For a data scientist coming into a commercial setup from academia, at least in my experience, it can be a big change to take scalability and speed requirements into account when designing ML solutions. State-of-the-art models can often have millions of parameters and require heavy computational resources even to predict on, let alone train the models, so they might not work well in a complex production system processing millions of transactions, where real-time speed is an important factor. However, there are great light(er)-weight models, as well as tricks that many practitioners use out there, that I would recommend as extremely useful.
Hyperight: What’s the best advice you’ve received during your career, and what would be your advice for new data enthusiasts?
Eliisabet Hein: I think the best advice I’ve ever gotten is to start with the simplest possible approach and build up from that. It can be tempting to jump straight into a complicated state-of-the-art deep neural network model, but it’s much easier to find issues with your data and understand the constraints of the problem if your initial model is simple. And you’ve probably heard this many times before, but I would repeat that making sure you understand your dataset is probably the single most important thing, especially in the industry where the real-world data you’re working with can often be noisy, biased, or incomplete.
Featured image credits: Towfiqu barbhuiya on Unsplash