Hyperight

How to NOT Build Data Products? – Interview with Sebastian Herold, Senior Principal Engineer at Zalando SE

In this interview, we had the chance to speak with Sebastian Herold, Senior Principal Engineer at Zalando SE. For the past 5 years, Sebastian has been driving the evolution of Zalando’s data landscape into an efficient, scalable, and secure data mesh. He works closely with users, product teams, and vendors. Previously, Sebastian built Immobilienscout24’s data platform for 7 years and co-developed the Data Landscape Manifesto.

Sebastian emphasizes the transition to treating data as a product, democratizing access while ensuring security and compliance. He discusses challenges in formulating a data strategy and ongoing efforts to refine the platform for internal users. Sebastian highlights the need for standardized governance and metadata exchange to facilitate seamless integration of data services. Moreover, Sebastian will share insights on “How NOT to Build Data Products” at the Data Innovation Summit this year.

Hyperight: Can you tell us more about yourself and your organization? What are your professional background and current working focus?

Sebastian Herold
Sebastian Herold

Sebastian Herold: Hi, I’m Sebastian Herold and I work as a Senior Principal Engineer at Zalando – Europe’s biggest fashion platform. In the central Data Platform department, my job is to ensure that we provide an efficient, compliant, and productive self-service data platform. It serves thousands of internal users who build data products or generate insights on top of it.

Originally, I started my career as an engineer and consultant. However, I noticed that architecting is more fun. I built a data platform for ImmobilienScout24, leading to the Data Landscape Manifesto co-authored with Arif Wider from Thoughtworks.

Hyperight: During the Data Innovation Summit 2024, you will share more on “How NOT to Build Data Products?”. What can the delegates at the event expect from your presentation?

Sebastian Herold: For over a decade, I have seen people building data pipelines. Only in recent years has the understanding grown that data is not just a by-product, but the product itself. The data mesh paradigm introduced the concept to a wider audience, but nevertheless the perception of “data as a product” varies a lot. The talk will be about anti-patterns that I noticed over time.

Hyperight: Can you share insights into the key principles of building a modern data platform? How does it support shifting business models and changing workforce operations?

Sebastian Herold: It took us quite some time to finally come up with a data strategy for Zalando. Most of the time, we tried to understand the challenges from different angles and ended up with a few general principles for our data platform:

  1. Handle data in a trustworthy manner: Security and compliance are key for your platform.
  2. Democratize data: Make it easy to discover, use and share data in a compliant way.
  3. Treat data as a product: Your platform should have data products as its central entities.
  4. Take a platform approach: Provide tools that are deeply integrated and interoperable for maximum productivity of your users.

Hyperight: Having worked on Zalando’s data landscape for the past 5 years, what challenges did you encounter? How did you transform it into an efficient and scalable data mesh of data products?

Sebastian Herold: Good question. There have been all sorts of problems: Getting backup by upper management, putting yourself in the platform user’s shoes. Defining the right SLOs for our platform provides enough guidance for further development, but educating users can take long time until they finally take ownership and think of data as a product.

Hyperight: How do you describe the evolution of this journey and its learning points for Zalando SE today? What was the starting point, where are you now, and what is next?

Sebastian Herold: While data problems seem to be quite local at first, the definition of our company-wide data strategy helped us to see these problems from the right perspective. It also enabled us to detect recurrent patterns.

The abstraction made it easier to put data problems into business perspective and convinced upper management to support. Having this strategic guidance still acts as guardrail for our internal development as well as the conversations with our users.

Currently, hundreds of users from all over the company are using our platform which shows that it basically works. At the same time, we still see people solving similar problems again and again. This drives us to extend the value chain of our platform and to ease the life of our users.

Lineage, metadata integration and governance are areas in which there is still a lot of potential for improvement.

Hyperight: Balancing efficiency, scalability, and security is a constant challenge in modern data platforms. How does Zalando’s data mesh strike this balance, and what are some key considerations for others building similar architectures?

Sebastian Herold: Luckily, scalability is often supported by big hyperscalers, yet balancing efficiency and security remains an ongoing challenge for us. Especially, providing a lean approach to data access management took us several iterations. We found an acceptable vision to avoid unnecessary delays for users to access data while fulfilling complex regulatory requirements.

Data mesh introduces the right vocabulary for this: Central self-service data platform, domains and automated federated governance pushed us in the right direction to tackle the problem in a sustainable manner.

Hyperight: Looking ahead, how do you see the role of data products evolving in the next few years? What emerging technologies or trends do you believe will have a significant impact on their development?

Sebastian Herold: In the last 2 – 3 years the focus shifted from building pure data platforms to building data product platforms having data products as first class citizens. I believe this trend will continue, but also forces all parties to effectively exchange metadata about data products like schema, ownership, lineage, access control or quality. First standards like OpenLineage try to address this gap, but more standardization in the governance is needed for flawless exchange and better integration of data services of all kinds.

For the newest insights in the world of data and AI, subscribe to Hyperight Premium. Stay ahead of the curve with exclusive content that will deepen your understanding of the evolving data landscape.

Add comment