From Data Orientation into a Data Culture: The Preply Story

Since I joined Preply about two years ago, the company has gone through a process of deep transformation. We doubled in size, both in terms of revenue and headcount. We raised Series B and Series C rounds. We transitioned to a subscription business model. We iterated continuously on our product and rethought our strategy while scaling acquisition to grow our customer base in a sustainable fashion.

During this time, the company was in desperate need of exhaustive, reliable information. And the biggest challenge has been sailing the ship (or, better, riding the roller coaster) while in the process of building it.

Lots of critical questions needed clear answers and quick. Although the basics weren’t entirely in place, we had to find ways to meet the most compelling needs and inform the company strategy while building the team and infrastructure required to do so.

This implied scaling the Data Chapter to 30+ members, building a management layer, deploying several new tools, reimplement product tracking, and resolving any governance issues. All at once.

From a data perspective, Preply’s strongest asset was and still is, our orientation to data. I’ve been in companies where the leadership team ran out of the room, pulling their hair each time I showed a chart. That is not the case here. ‘Preplers’ at any level are eager to base their decisions on data and demand more of it every week.

The challenge was turning such data orientation into a data culture.

Problem Identification and the Need for a Change

The team already consolidated most data sources in a data warehouse and established a measurement framework. The biggest blocker, often an overlooked one, was data accessibility.

Preply was stuck into the sadly common scenario in which people have no direct, flexible access to data, hence relying on Data Analysts to write SQL and build complex ad-hoc dashboards to answer their queries.

The problem with this service model is that it quickly saturates. For each question an analyst answers, the requestor will come back with ten more. If you added more headcount, you’d probably be answering ten and receiving one hundred back.

Aside from not scaling, this approach is both frustrating for the stakeholder and demeaning for the analyst. The former sees their requests accumulating in a growing, slow-moving backlog and won’t get their answer on time. The latter ends up producing a stream of data points while lacking the context of why they’re needed and what their impatient stakeholders are trying to achieve. A recipe for burnout.

This problem was exacerbated by the cross-functional nature of our organization, which translates to over thirty teams attending. Each with a separate backlog and a unique set of priorities. Plus, of course, leadership and the board of directors. Hundreds of dashboards.

The solution was building a self-service layer to unlock data accessibility.

Steps for Transition from Data Orientation to Data Culture

First, we deployed Looker as our Business Intelligence tool for its self-service data exploration philosophy. Unlike traditional dashboarding tools, which require the previous work of data specialists, it makes end users autonomous.

Looker provides a semantic layer that allows for defining the underlying data warehouse tables and relationships, as well as the KPIs definitions and specific business rules required to exploit the data. In other words, it allows modelling the information and domain expertise otherwise stored in a data analyst’s mind through a relatively simple language (LookML).

As the user drags and drops concepts in a familiar environment (similar to a Pivot table), Looker generates and runs the required SQL queries to then return the data and enable its visualization. The centralized logical model caters for a single source of truth, hence enforcing data governance and a consistent view of the business.

Finally, Looker provides embedding capabilities and an API layer which allows for building data applications. All of which are fully managed and integrated with Git for code forking and versioning.

The semantic layer concept isn’t new, as multiple BI vendors (Microstrategy, Business Objects, IBM, and more) have provided similar functionality years ago. Yet again, the versatility of LookML and the huge leap in performance made by the latest cloud data warehousing technologies allow for the utmost speed and flexibility. Although other vendors are trying to catch up (Microsoft, Thoughtspot and, more recently, DBT), they cannot compare in terms of completeness of vision and maturity.

Then, we introduced Snowflake for data warehousing. Superior UX dictated such choice, separation of storage and computing, support for virtually unlimited, almost-linear scaling (with the Enterprise version) and, especially, how gracefully it handles concurrency. That’s crucial for Looker customers as the latter often generates a daunting amount of concurrent queries that other technologies (AWS Redshift, to name one) struggle to process.

We rely on Monte Carlo and its data observability platform for optimal reliability and lineage. Its out-of-the-box philosophy makes it painless to deploy, and it comes with automatic anomaly detection and lineage, along with support for more complex custom rules.

With a state-of-the-art BI stack in place, we could then deal with Data Science. We selected and deployed Databricks on top of Delta Lake. A convenient, fully-managed, cloud production environment featuring Spark Clusters and Python notebooks (among other components), accessible through an excellent environment with quality UX.

As we’re currently facing some limitations when training data-intensive ML Algorithms, such as Learning to Rank, for example, we’re exploring the latest innovations in the field. We have high hopes for newcomers QBeast, which leverages sophisticated sampling and indexing to allow parsing a fraction of the data, hence drastically reducing both the processing and training times while maintaining full compatibility with Spark.

Lastly, we deployed Amplitude for self-service product analytics and integrated it with the existing data platform.

Preply has a beautiful experimentation culture, and we run hundreds of AB tests each quarter. We are the proud creators of an in-house experimentation platform, which allows us to modify the user experience and measure the impact of our initiatives.

Tracking data was fit for experimentation but hadn’t been designed with analytics in mind. We found ourselves with a daunting 500+ undocumented events, plenty of product dependencies and little or no governance in place.

We resolved this by introducing a governance layer so that only approved events would reach Amplitude users through our integration. We would whitelist new, clean events while sanitizing the legacy ones. The data model and taxonomy ownership are now centralized, ensuring consistency.

With this architecture in place, Preplers can resolve most of their data needs autonomously. They can count on a team of motivated Data Analysts and Scientists if anything requires a deeper look. Eager to tackle complex problems and bring value to the business now that they’re relieved from pulling data for other people, day in and day out.

Once taken care of by the basics, the team is now focused on delivering business value through analytics and data science. We’re aiming to be innovators in marketing measurement, ranking and pricing, to name a few areas. Besides, of course, diving into customer behaviour to unlock business opportunities. Meanwhile, we’ve started a Data Academy program to increase data literacy across the company.

If you ever considered joining us, brace yourself. This is a rocket ship.

About the Author

Alessandro Pregnolato - VP of Data at Preply

Alessandro Pregnolato is the VP of Data at Preply, the online language tutoring marketplace. Best known for building and scaling the data function at Typeform, Marfeel, and Preply, he’s been also advising several tech unicorns such as Moonpay and Paack. He teaches SaaS product analytics at EADA Business School in Barcelona, where he is currently based. His previous career was as a professional musician. Alessandro’s experience in Business Intelligence and Data Science dates back twenty years, when he fell into the data world by pure chance, joining Adobe as a production planner. Since then, he has refocused his career multiple times. Alessandro Pregnolato will speak at the 2023 edition of the Data Innovation Summit.

The views and opinions expressed by the author do not necessarily state or reflect the views or positions of Hyperight.com or any entities they represent.

Featured image: seventyfourimages at Envato Elements

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bp_user-registered	13 years 8 months 8 days	This cookie is used to set which users can access the private pages of the website. It is a functional cookie.
bp_user-role	13 years 8 months 8 days	This is a functional cookie. It is used to set restriction to the user on acessing certain pages like back office, account page etc.
bp_ut_session	13 years 8 months 8 days	This is a functional cookie. This cookie is used to set restriction to the user on acessing certain pages like back office, account page etc.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_gat_gtag_UA_62786802_1	1 minute	No description
CONSENT	16 years 9 months 21 days 15 hours 5 minutes	No description
ihc_workflow_restrictions_0	1 month	No description
ihcMedia	1 hour	No description

From Data Orientation into a Data Culture: The Preply Story

Problem Identification and the Need for a Change

Steps for Transition from Data Orientation to Data Culture

About the Author

Add comment

Cancel reply

Operational Data Science: Ensuring ML Models Can Deliver Real-world Impact – Interview with Dr. Indy Leclercq, Manager Data Science at Talabat

Recap: Day 2 at Data Innovation Summit 2024

Recap: Day 1 at Data Innovation Summit 2024

Recent posts

Operational Data Science: Ensuring ML Models Can Deliver Real-world Impact – Interview with Dr. Indy Leclercq, Manager Data Science at Talabat

Recap: Day 2 at Data Innovation Summit 2024

Recap: Day 1 at Data Innovation Summit 2024

Decoding Data Modeling: A Pillar of Modern Data Stacks and AI Cost Efficiency – Interview with Serge Gershkovich, SqlDBM

Next-Generation AI: Deeper Experiments – Interview with Sina Nek Akhtar, Tech Lead, Data Analytics and ML at Google Cloud

Electrolux Continuing Journey to Data-driven Manufacturing Excellence – Interview with Klaas Dobbelaere, Electrolux

Navigating the Next Wave: Generative AI at Accenture – Interview with Mattias Aspelund & Julia Falk, Accenture

The Future of AI-Enabled Experiences – Interview with Dr. Ather Gattami, Leading Swedish AI Expert, AI Researcher at Bitynamics

Topics

Email Newsletter

Events

Hyperight

From Data Orientation into a Data Culture: The Preply Story

Problem Identification and the Need for a Change

Steps for Transition from Data Orientation to Data Culture

About the Author

Add comment

You may also like

Recent posts

Topics

Email Newsletter

Events

Hyperight