The Data Innovation Summit: 5 years of data and analytics journey (2019)

August 5, 2020

The star-studded edition

In 2019, the simple idea that we had conceived in 2015, became a reality — the Data Innovation Summit grew to become “the largest and the most influential data event in the Nordics and beyond”.

The Data Innovation Summit 2019 hosted the record number of 1800 delegates coming from the Nordic countries, but also people from other countries from all over the world.

As the topic, as well the functions involved in data innovation increased dramatically, so did the summit in terms of stages, number of delegates and exhibitors. The Data Innovation Summit 2019 included 6 stages on different themes that represented different roles involved in creating “data into insight” and the most innovative practices of turning “insight into action”.

In terms of content and presentation, the 2019 edition gathered some real rockstars in data and advanced analytics from some of the most progressive companies in the world. On stage, we welcomed some of the most forward-oriented organisations that are paving the way ahead for data, AI and advanced analytics.

These are some of the highlight presentations of the Data Innovation Summit 2019.

The star-studded Data Innovation Summit line-up

NLP for Online Conversations – Katie Bauer, Reddit

For the first time at the Data Innovation Summit, we welcomed a data expert from the online community Reddit. Katie Bauer, Data Science Manager at Reddit, described NLP tools and techniques for addressing the unique type of conversational data and language found on social media posts, chat logs, and forum replies.

“As people live more of their lives online, there is a growing need for high-quality natural language processing,” stated Katie in her session. She also elaborated on common sets of stop words should be revised to maintain vocabulary that is extremely useful for conversational data, text normalisation techniques (such as lowercasing) often smooth over people’s ways of expressing emotion or tone of voice online and document vectors.

GitHub and Deep Learning on Graphs of Code – Clair Sullivan, GitHub

The world’s leading software development platform GitHub also came to the Data Innovation Summit Stage. Clair Sullivan, Machine Learning Engineer at GitHub, talked about how their +0.8 petabytes of data of code (such as commits, pull requests, issues, comments, and users) can be presented as graphs of code, and how they use deep learning on graphs of code to detect duplicate code and to obtain information about software and open source development.

Clair also described the challenge that duplicate code presents to GitHub, the type of duplicate code and possible solutions for solving it.

Unabridged AI/ML: A Silicon Valley Perspective – Bhairav Mehta , Apple

Bhairav Mehta, at the time Data Science Manager at Apple, gave a comprehensive overview of the latest developments and trends, but also covered significant historical breakthroughs that lead to the current state of AI and ML, from a Silicon Valley point of view.

Bhairav explained the digital correlated that unites all advanced technologies we have today such as cloud computing, artificial intelligence, Internet of Things, fog computing and data analytics. Diving into the more technical part of his presentation, he provided real-life use cases of Image Classification, Image Segmentation, Text Classification and Speech recognition, as well as how AI/ML models can be deployed on the Cloud-based containers and models development/deployment process.

Bighead: Airbnb’s End-to-End Machine Learning Framework – Nikhil Simha, Airbnb

Nikhil Simha, Software Engineer at Airbnb, presented the workings of Airbnb’s end-to-end machine learning platform – Bighead and how it unifies feature engineering, model training, model serving and monitoring to serve guests the most suitable travel recommendations. Nikhil also gave a concise overview of an industry-standard machine learning workflow.

As the main takeaway from his presentation, Nikhil related how Bighead empowers data scientists are Airbnb to take ideas to production with the best machine learning practices – in a span of days instead of months.

Conversational Search in the Google Assistant – Aleksandr Chuklin, Google

Aleksandr Chuklin, Research Engineer at Google, talked about how Google makes “Ok, Google” possible using conversational search, or search-based conversation – a technique that triggers “Ok, Google” on devices such as smart speakers, smartphones, tablets, watches, TVs.

Aleksandr outlined the challenges that come with a conversational search like background noise, narrow domain and incorrect answers by chatbots because of missing training data. He also presented the approaches his team takes to tackle these and the related challenges of voice search and results.

What Problems Should be Solved with AI in Banking – Mattias Fras, Nordea

Mattias Fras, Group Head of AI Strategy & Acceleration at Nordea, gave his perspective on the problems AI helps solve in the banking industry, focusing on Nordea’s journey with AI and automation. Mattias described in what functions and departments Nordea have implemented AI and the main lessons they learnt in the process. Additionally, he offered advice on how to pick the right problems to solve in banking and current challenges that provide room for improvement with AI in banking.

Fast Semantic Proposals for Image and Video Annotation – Srikar Muppirisetty, Volvo Cars

Srikar Muppirisetty, Manager, Machine Learning and Data Analytics at Volvo Car, talked about a novel approach for fast semantic proposals for image and video annotation using modified Echo State networks. Srikar first outlined the need and challenges of data annotation. However, manually annotated data for semantic segmentation tasks is time-consuming and tough to quality assure, accurate and automated region-based proposals can significantly aid high-quality data annotation.

Therefore, he proposed a novel modified active-learning framework that iteratively learns from a small subset of data (20-30% images) and adapts to a variety of semantic segmentation goals without manual supervision on test images.

Delivering Changes to a Globally Distributed Marketplace – Ritesh Agrawal, Uber and Anando Sen, Uber

Ritesh Agrawal and Anando Sen presented how Uber leverages ML to enhance customers experience by detecting and resolving any incidents with the app.

Detecting and solving errors fast but safely is critical for Uber because every day, hundreds of thousands of people rely on Uber to get to work, commute or make their daily livelihood by driving for Uber, highlighted Anando. He also presented the KPIs they use to make sure the app is reliable at all times, especially when rolling out changes to the app, distributed systems and infrastructure with high velocity. Ritesh introduced the data science point of view of transforming incidents into concrete, actionable items. He also described the configuration-related challenges and how they tackle them.

DataOps in Practice – Lars Albertsson, Mimeria

DataOps is a methodology and culture shift that brings the successful combination of development and operations (DevOps) to data processing environments. It breaks down silos between developers, data scientists, and operators, resulting in lean data feature development processes with quick feedback.

Lars Albertsson, Founder of Scling, demystified the DataOps methodology and presented some DataOps practices and tools that are applied in everyday development and operational workflow.

Understanding Space a Task for Data Science – Elin Eriksson, Folksam

Elin Eriksson, Data Scientist at Folksam and a passionate physicist with a PhD in Physics focusing on space and plasma physics from the Uppsala University – Department of Physics and Astronomy, discussed data’s role in understanding Outer Space, the kind of data sources they use, how they use data to understand Space, and why actually Space is studied. Elin presented different models but similar thought processes that occur across fields and how they interpret data based on theory and compare the data with data models.

Navigating Among Technology, Organisation & Data to Kick-Start the ML Journey-Fredrik Backner, KICKS

Fredrik Backner, Chief Analytics Officer at Kicks – the Nordic’s leading beauty retailer, guided us through his experience and perspective on the dos and don’ts of setting up a machine learning analytics practice. Fredrik described analytics in both the telco and retail industry and outlined the similarities and differences in analytics use cases. He also provided a comprehensive playbook for implementing analytics and getting value, as well as common pitfalls to watch for.

Balanced Data Management – Christian Rasmussen, Grundfos

Christian Rasmussen, Head of Technology, Innovation Lab 3 at Grundfos, narrated their story of ensuring good data quality in the digital offerings to their clients for monitoring, control, performance and fault detection based on analysis of performance sensor data from the pumping systems. Christian also talked about how they adopted a balanced data management across their entire data value chain, which involves both laying solid data foundations and doing the pilots in a fast and fast operationalising manner. He also introduced the term FAIR Q network which enables Grundfos to manage their data foundation.