Putting customers first with customer-centric data science

In today’s digitalisation-powered and data-driven, speed-of-light changing world, customers have become more demanding than ever. And companies are looking for solutions to place customers in the centre of their business and know them like the back of their hand. This is where customer-centric data science finds its place – as a strategy for improving, personalising customer experiences with data science and considering business from a point of view of individual customers.

What is customer centricity exactly and why should data scientists care?

Ewan Nicolson, Principal Data Scientist at Skyscanner, quoted a nice definition at Data Innovation Summit 2019, that perfectly describes what customer-centricity is all about.

“Customer centricity is all about recognising that all of your customers are individuals, and changing your business approach so that you deal with customers as individuals, don’t consider them as one homogeneous group, Ewan quoting professor Peter Fader.

Ewan sees the customer-centric approach as a great approach for data science because it provides one of the best data sets to work with for data scientists. But not only that, customer-centricity gives data scientists a chance to solve problems in a novel way, explains Ewan. And last, but not least, data scientists are focused on solving real human problems by thinking about customers and the value delivered to them.

Ewan Nicolson presenting at Data Innovation Summit 2019 — *Photo by Hyperight AB® / All rights reserved.*

Working at Skyscanner – a travel, search and booking online platform with 80 million users per month, gives Ewan a lot of customer-centric data about searches, bookings, hotels, flights etc.

How to get customer-centric datasets

Talking from his experience, Ewan described the strategy and principles needed to get customer-centric datasets. Although his examples are about Skyscanner, they can be applied to any online company working with huge amounts of customer data.

To start with, every time a customer has an interaction with a product, it is logged as a row in the data set specifying the user ID, the time it was carried out as timestamp and the details of the interaction, for example, a search from Stockholm to London.

Watch the full presentation with Ewan Nicolson

This is a simple dataset that provides several solid principles on how to get customer-centric datasets:

Having strong understanding of what customer ID is – knowing who every individual customer is.
Logging every single customer interaction in the dataset – which means the potential dataset is big and brings up the storage and accessibility question.
Storage of a large volume of customer data in a secure way.
Making sure there are no data silos – especially in the case with a third-party data provider
Providing consent to use customer data
Avoiding personal information – no demographic or personal information included, building purely behavioural dataset.

Why customer-centric data is perfect to work with

Recommendation and personalisation are big buzz words in data science, as Ewan states, but their realisation presents a challenge. But having created a customer-centric dataset, as shown above, we have a foundation to work on any sort of personalised recommendations.

Embeddings is a useful technique in this case Exan says, as it tells us that a customer that did a certain interaction, also did another interaction as well. E-commerce websites, such as the giant Amazon makes use of this technique a lot.

*Photo by Hyperight AB® / All rights reserved.*

Customer-centric data science means that all our customers are different and we should think of them as distinct people.

How does it work? – Usually, the dataset is very sparse, meaning there are lots of users and items, and each user hasn’t interacted with many items so there is no overlap between them. With embeddings, instead of specifying every single user ID and item ID in the inventory, we define characteristics (50-100) that describe customers and items. Afterwards, a neural network is trained to predict whether a customer will like a certain item. The final result is not only a very good model that predicts but also a concise and clear description of all items and customer in a much smaller dataset.

Understanding customers’ long-term behaviour

Creating a customer-centric dataset and implementing embeddings is a very good start for giving personal recommendations. But it can also help us understand their long-term behaviour as individual customers.

As Ewan explains, long-term user understanding is a strong indication of whether the company will grow. However, understanding customer behaviour on the long-term is really difficult. What we can do to make it possible is break down this customer-centric dataset into different types of customers in terms of:

Recency – How recently have customer interacted with the company.
Frequency – How many times a customer has come back in a fixed period of time.
Tenure – How long has a person been a customer for.
Clumpiness – Using a service or product more frequently at a given period compared to other periods, for example, when a new series comes up on Netflix.

Watch the full interview with Ewan Nicolson

This way a customer-centric dataset provides a very powerful way of describing and understanding customers. It allows us to segment our customers and detect group formations with different behaviours, customer lifetime values and different preferences. Ewan explains that these dimensions complement nicely all other descriptions we have for our customers in the dataset, and they can be integrated into predictive models.

Customer lifetime value – CLV (churn)

Customer lifetime value is a vital problem to solve as it’s one of the main KPI’s for a company, states Ewan. The alternative term of CLV – churn is one of the most important things to worry about as it represents people that have stopped being your customers.

For a company with a non-contractual setting such as Skyscanner, it’s even harder to observe when customers are churning, because they just silently disappear, says Ewan.

But the customer-centric approach allows them to turn things around. For every customer, they have two random variables – one for the possibility of a customer churning (which can’t be predicted), and one for whether the customer has interacted with them in a time period (which can be predicted). They feed the parameters in a model which provide plots based on which they can predict which customers have the possibility of churning so they can give out special offers to encourage them to stay.

This technique is helpful for Skyscanner to break away from seeing their customers as one homogeneous group and start thinking of them as individual people. And here lies the real value of the customer-centric approach – not in the fancy modelling, but in seeing customers as different people with different characteristics, emphasises Ewan.

How customer-centric data science helps

The main key points from Ewan’s presentation as to the benefits of using the customer-centric approach in data science are:

The customer-centric approach helps in creating a rich dataset
It solves a problem in an innovative way. CLV is no longer an accountancy problem, but it’s a prediction of an individual user and how they are different from other users.
It provides a much richer description of our customer, which can be implemented into the business to make better decisions.
It doesn’t solve an academic, regression or classification problem. It deals with real people that are on the other end of the model. The approach contributes to adding customer value as the end product.

Watch the full presentation with Ewan Nicolson

Watch the full interview with Ewan Nicolson

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bp_user-registered	13 years 8 months 8 days	This cookie is used to set which users can access the private pages of the website. It is a functional cookie.
bp_user-role	13 years 8 months 8 days	This is a functional cookie. It is used to set restriction to the user on acessing certain pages like back office, account page etc.
bp_ut_session	13 years 8 months 8 days	This is a functional cookie. This cookie is used to set restriction to the user on acessing certain pages like back office, account page etc.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_gat_gtag_UA_62786802_1	1 minute	No description
CONSENT	16 years 9 months 21 days 15 hours 5 minutes	No description
ihc_workflow_restrictions_0	1 month	No description
ihcMedia	1 hour	No description

Putting customers first with customer-centric data science