Building a Lakehouse: Leverage Open Table Formats for Unified Stream and Batch Architecture – Micha Kunze, Maersk

In this talk at NDSML Summit 2023, Micha Kunze from Maersk, dives into how their team builds, tests, quality-checks, and manages data pipelines.

Session Outline

In this session at the NDSML Summit 2023, Micha Kunze from Maersk, dives into how their team builds, tests, quality-checks, and manages thousands of data pipelines. They use open-source table formats and processing frameworks to deliver near-real-time operational data and support our ML feature store.

Key Takeaways:

  • Table streaming for cheap nearline (minutes) data processing
  • Leveraging ML to find anomalies on streaming velocity across our datasets
  • Query any dataset/stream from via a fully automated metastore
Add a comment

Leave a Reply