If we are drawing association, software engineering and DataOps share a lot of challenges such as automating manual tasks, code tests, data tests, integration tests, on-demand environment creation, data governance, infrastructure provisioning etc. So, by solving these challenges using software engineering, we can reduce the time spent on operations as well as drastically decrease the development cycle times, which can tremendously improve efficiency, advises Micha Ben Achim Kunze, Lead Data Engineer at Maersk.
“Viewing your data and operations challenges as software engineering challenges will make you orders of magnitude more effective,” emphasises Micha. He will show us how they reliably deliver millions of forecasts per day while sustaining a high change velocity of their products by leveraging software engineering practices in his session at the Data Innovation Summit 2021.
We had a chat with Micha to learn why seeing data and operations challenges as software engineering challenges leads to more effective solutions, the DataOps maturity, current challenges and future trends in DataOps and data engineering.
Hyperight: Hi Micha, I’m really excited to welcome you to the 6th edition of the Data Innovation Summit. As an intro to our discussion, please tell us a bit more about yourself and your background.
Micha Ben Achim Kunze: I am currently the Lead Data Engineer in the Forecasting team of Maersk, the world’s largest ocean container shipping company. My team is responsible for delivering forecasts for Maersk’s service delivery. Operating data products that are used day-to-day on this scale is exciting and requires high-quality data and operational excellence. Hence, I am excited to share some of our learnings at the summit!
For my background: I have started in academia with a BSc degree in Physics and a PhD in Biophysics followed by a PostDoc. I always loved solving problems using computers and spectrometers and I eventually took the step towards industry and became a Data Engineer at Novo Nordisk before I joined Maersk. All in all, I have a broad scientific background, a learning mindset, and a passion for solving data problems.
Hyperight: Your Data Innovation Summit session focuses on the topic DataOps is a Software Engineering Challenge. Could you please tell us why seeing data and operations challenges as software engineering challenges leads to more effective solutions?
Micha Ben Achim Kunze: For me, the essential goal of DataOps is: Deliver the best outcomes with data fast, without breaking things.
Seeing DataOps as a Software Engineering challenge is my take on an analogy to Site Reliability Engineering: SRE approaches operations as a software engineering challenge to make operations highly efficient and scalable. Or more tangibly: If you experience something that does not work properly, is inefficient, or repeatedly needs manual intervention or fixing, you apply software engineering to fix it in an automated way. You purposefully minimize the time you have to spend on operations by applying engineering to operations, which in turn means you can develop more. In essence, it is a continuous improvement of your process and product.
And this is key to what we Data Engineers aim to do with data: we want to reliably (correctly, timely etc.) deliver data products and develop new data products at the same time.
Following this train of thought, we can see a lot of Software Engineering challenges in DataOps: Automate manual tasks, code tests, data tests, integration tests, on-demand environment creation, data governance, infrastructure provisioning etc. And by addressing these challenges using software engineering, we can reduce the time spent on operations as well as drastically decrease the development cycle times, leading to huge efficiency gains.
Hyperight: DataOps is the data management for the AI era. Do you agree with this statement? And why DataOps is the right methodology for every company striving to be AI-driven?
Micha Ben Achim Kunze: I think this statement is quite loaded. But, in a picture where we use DataOps to describe practices to scale our capability to deliver data to the highest quality at speed, this statement is somewhat fitting. You need good DataOps practices to deliver high-quality AI or ML, it is as simple as that. So if you are serious about AI or ML, you need to adopt good DataOps practices.
For me, the essential goal of DataOps is: Deliver the best outcomes with data fast, without breaking things.
Hyperight: What’s the overall DataOps maturity with organizations?
Micha Ben Achim Kunze: This seems very mixed and I can see a big divide: Leaders in the field are orders of magnitude more effective in dealing with DataOps related issues compared to the majority of companies that are lagging far behind.
Often, the problem is however not the technology choices companies have made – even though that is what you might think. Most of the time it is the lack of good data practices and lack of understanding of what it takes to deliver value with data. Fostering a good engineering culture where you focus on improving your practices, removing obstacles and friction, removing functional silos where needed is much more valuable than the newest tech stack.
Hyperight: What are some of the challenges that DataOps and data engineering are dealing with?
Micha Ben Achim Kunze: For me, the main challenge we are dealing with is good practices. Going back to Software Engineering, a lot of good practices and patterns have been developed and used to a point where they became accepted standards because they very clearly delivered value.
Data Engineering is rather nascent in this regard when looking at the average organization. Constructs such as automated code and data testing, disposable environments, etc., exist only in a few data teams and are often completely absent. Creating such good practices that will get widely adopted is very much a work in progress.
Hyperight: And lastly, what can we expect as future outlooks with data engineering?
Micha Ben Achim Kunze: I feel like we are coming full circle with what is now being called “data-centric AI”. Data was always the fuel for analytics and AI/ML and it will stay that way. Seeing that being re-iterated in the ML/AI field is great as it acknowledges the importance of good Data Engineering work and efficient DataOps practices.
I am excited about the outlook for the next couple of years, where I expect to see an even bigger impact of Data Engineering in the data world. I also expect some leaps in our practices and tooling that will come along with that.