Integrated Data Architecture for 360 BI, Analytics & Data Science - Interview with Alessandro Pregnolato, Preply

This eighth edition of the Data Innovation Summit has a new stage – The Databases and Architecture Stage. Through this stage, Hyperight will bring closer to the audience technical presentations on how to manage enterprise databases and keep database performance on target. This stage will also offer knowledge on managing and supporting data collection, processing, analytical and ML workloads.

One of the speakers at this stage will be Alessandro Pregnolato, VP Data at Preply. In this interview, he guides us through this company’s journey in building its data architecture. This is an exciting journey, since Preply is a global language learning marketplace with data for about 50K+ tutors and millions of learners signed up for 15M+ lessons.

Hyperight: Can you tell us more about yourself and your organization? What are your professional background and current working focus?

Alessandro Pregnolato: I have been doing Business Intelligence and Data Science for over 20 years. I scaled the data function at multiple companies including Typeform, Marfeel, Paack, MoonPay, and, most recently, Preply. I enjoy building successful teams and infrastructures to promote a data culture across organizations.

Preply is a global language learning marketplace, connecting 50,000+ tutors with millions of learners from all over the world. Since the company launched in 2012, students have signed up for more than 15 million lessons. We offer 1o1 online tutoring as well as live group classes.

Since joining the company about 2 years ago, we scaled the data chapter to over 30 members. After building the BI foundations and the data infrastructure, the team is now focused on delivering business value through analytics and data science.

Hyperight: During the Data Innovation Summit 2023, you will share more on “Integrated data architecture for 360 BI, Analytics & Data Science”. What can the delegates at the event expect from your presentation?

Alessandro Pregnolato: I’ll be sharing my experience deploying the data infrastructure at Preply, which includes an overview of the technological and strategic choices. I’ll be walking the audience through the thought process, principles, and best practices that drove such an endeavor. Then, I’ll describe the resulting architecture and how we leverage it to promote a data culture, as well as the challenges we’re facing and some recommendations on how to overcome them.

Hyperight: What led Preply to think about building integrated data architecture?

Alessandro Pregnolato: It is a natural step in any company’s growth. Multiple data needs emerge organically at some point. These are the result of funding rounds, investor requests, unexplained trends, or, simply, people’s curiosity. At first, they’re addressed with scrappy solutions, such as running queries on operational data and producing excel spreadsheets. Then, it becomes clear that such an approach won’t scale and meet the growing needs of the business. And that’s when the fun begins.

Hyperight: Can you guide us through the journey of Preply when building its data infrastructure? What technologies and products did you use?

Alessandro Pregnolato: First, we deployed Looker as our Business Intelligence tool, for its self-service data exploration philosophy. Unlike traditional dashboarding tools, which require the previous work of data specialists, it makes end users autonomous. Then, we introduced Snowflake for data warehousing, Monte Carlo for reliability, Clarisights for marketing data sources integration, and Data Bricks for Data Science/ML Ops. Lastly, we deployed Amplitude for self-service product analytics and integrated it with the existing data platform.

Hyperight: What tools do you need to build this type of integrated data architecture? How can organizations know what tools they have and need?

Alessandro Pregnolato: That depends a lot on the organization, its scale, data volume, and margins. When volumes are manageable and margins are reasonable, then companies can leverage an array of SaaS applications and cloud tools. Data Warehousing solutions such as Snowflake, AWS Redshift, or Google Big Query. Ingestion tools such as Fivetran, Stitch, and Clarisights. BI applications, such as Looker and Tableau. Data Science platforms, such as Databricks. Product Analytics suites such as Amplitude or Mixpanel. CRM automation tools such as Braze or Autopilot. And so on.

When volumes are massive and margins are slim, as in the case of Ad Tech to name one, that’s when it gets tricky. The above solutions aren’t economically viable and companies must get creative and build their own solutions, often with open-source technologies. At a high cost of development, maintenance and complexity.

Hyperight: What are some of the benefits from this journey that organizations may find helpful when considering starting a similar process?

Alessandro Pregnolato: Start soon, as the company scales. Be prepared to make a significant investment, the data function representing between 5 and 10% of the total headcount. Get everyone onboard as this is a company effort. Focus on reliability, data governance, and accessibility. Invest in Data Literacy to ensure everyone across the company is able to access data autonomously.

Hyperight: What is essential for organizations to know when building a data architecture for 360 BI, Analytics & Data Science services? Do you have any final recommendations?

Alessandro Pregnolato: Hire some experienced Business Intelligence expertise. Someone who’s done it before and possibly multiple times. Most failure stories I’ve observed are the result of amazing professionals in their own field (e.g. engineering, marketing, product) who built non-scalable data infrastructure due to the lack of specific BI skills.

Hyperight: According to you, what AI trends can we expect in the upcoming 12 months?

Alessandro Pregnolato: I can see BI tools built upon logical layers (Looker, DBT) gaining prominence against traditional stand-alone dashboarding tools (Tableau, Qlikview). A surge in data observability and lineage tools (Montecarlo, DBT). A second wave of hype and inflated expectations for AI applied to businesses, fueled by the success of OpenAI, along with the commoditization of such technologies. A strong influence of the latter on search engines and knowledge bases.

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bp_user-registered	13 years 8 months 8 days	This cookie is used to set which users can access the private pages of the website. It is a functional cookie.
bp_user-role	13 years 8 months 8 days	This is a functional cookie. It is used to set restriction to the user on acessing certain pages like back office, account page etc.
bp_ut_session	13 years 8 months 8 days	This is a functional cookie. This cookie is used to set restriction to the user on acessing certain pages like back office, account page etc.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_gat_gtag_UA_62786802_1	1 minute	No description
CONSENT	16 years 9 months 21 days 15 hours 5 minutes	No description
ihc_workflow_restrictions_0	1 month	No description
ihcMedia	1 hour	No description

Integrated Data Architecture for 360 BI, Analytics & Data Science – Interview with Alessandro Pregnolato, Preply

Add comment

Cancel reply

Recap: Day 2 at Data Innovation Summit 2024

Recap: Day 1 at Data Innovation Summit 2024

Decoding Data Modeling: A Pillar of Modern Data Stacks and AI Cost Efficiency – Interview with Serge Gershkovich, SqlDBM

Recent posts

Recap: Day 2 at Data Innovation Summit 2024

Recap: Day 1 at Data Innovation Summit 2024

Decoding Data Modeling: A Pillar of Modern Data Stacks and AI Cost Efficiency – Interview with Serge Gershkovich, SqlDBM

Next-Generation AI: Deeper Experiments – Interview with Sina Nek Akhtar, Tech Lead, Data Analytics and ML at Google Cloud

Electrolux Continuing Journey to Data-driven Manufacturing Excellence – Interview with Klaas Dobbelaere, Electrolux

Navigating the Next Wave: Generative AI at Accenture – Interview with Mattias Aspelund & Julia Falk, Accenture

The Future of AI-Enabled Experiences – Interview with Dr. Ather Gattami, Leading Swedish AI Expert, AI Researcher at Bitynamics

AIAW Podcast E125 – Liza-Maria Norlin

Topics

Email Newsletter

Events

Hyperight