From Python Snippets to Production-Ready ML Models: How to Streamline and Trust Your ML Models in Production – Interview with Georgios Gkekas, Vent.io GmbH

During this interview, we had the pleasure of speaking with Georgios Gkekas, Chief Technology Officer at vent.io GmbH. Vent.io GmbH is the digital arm of Deutsche Leasing Group. Georgios spearheads the company’s technological strategy. He oversees the establishment of a full-stack engineering department, including data scientists, software engineers, and site reliability engineers. With over 16 years of experience, Georgios is a seasoned professional in developing digital products.

Moreover, Georgios and Dr. Sergei Savin, Data Scientist at Vent.io GmbH, will be presenting on “From Python Snippets to Production-Ready ML Models: How to Streamline and Trust Your ML Models in Production” at the upcoming Data Innovation Summit 2024.

Hyperight: Can you tell us more about yourself and your organization? What are your professional background and current working focus?

Georgios Gkekas

Georgios Gkekas: As the CTO of Vent.io GmbH, I work for the digital and innovation branch of Deutsche Leasing AG. Vent.io designs and experiments with digital business models and products in the fields of Asset Finance and Asset-related services.

Regarding my professional background, I have over 15 years of engineering and software architecture experience. I’ve held various roles, primarily in the financial services industry. Currently, as CTO, my role is to ensure the successful implementation of the technological vision. I lead the engineering and data science teams to create products meeting our mother company’s demands.

Hyperight: During the Data Innovation Summit 2024, you and Dr. Sergei Savin will present on “From Python Snippets to Production-Ready ML Models: How to Streamline and Trust Your ML Models in Production”. What can the delegates at the event expect from your presentation?

Georgios Gkekas: At the event, delegates can expect me and Dr. Sergei to provide a brief overview of our own lessons learned. Moreover, we’ll cover the full lifecycle of data products, from ideation to production. We will focus on key areas that companies should consider to professionalize their ML-model delivery workflow with robust governance. Specifically, we will discuss the following points:

  • Establish a scalable MLOps-oriented operating model to align data science activities with business goals across the organization effectively.
  • Treat everything in data science as code. Follow software engineering practices such as version control, testing, continuous integration and delivery, monitoring, and KPI-based steering.
  • Implement a model repository to store, track, and manage the lifecycle of machine learning models. Enable collaboration and governance among data scientists and other stakeholders.

Hyperight: In your talk, you will touch upon the concept of streamlining ML models in production. What strategies or tools are most effective for this, and how do they impact overall development process efficiency?

Georgios Gkekas: We don’t need to look very far for finding tools that help us streamline and professionalize the development of ML models. Usually, the challenge lies in the lack of DevOps and software engineering competencies in current data science or AI teams. Define a strategy that treats Data Science as a software engineering discipline and integrate a DevOps mentality into it. Then, ensure effective communication with your teams to execute it meticulously.

In the past decades, software engineers around the world have worked very hard and created concepts and tools for providing continuous integration/deployment, full traceability of actions, observability and lineage. The very same principles can effectively apply in ML operations, even with some tweaks enabling a better experimentation cycle during model development. To achieve this, it’s important for data teams to maintain a full lineage not only of the production workloads but also of the experiment runs, something usually overlooked. Various tools exist, like Mlflow, that help you take control over those activities.

Hyperight: As the CTO of Vent.io GmbH, could you share insights into the technological strategy you’ve defined for the company? How does it align with deploying machine learning models in a production environment?

Georgios Gkekas: Our mission is to provide digital solutions that accelerate the growth of German SMEs, which, in turn, are the customers of Deutsche Leasing. The technological strategy aligns with that mission by setting up the appropriate skills/competencies, tools and processes. Alignment between strategy and execution is only achievable by establishing an appropriate setup across all three dimensions.

Interdisciplinary teams called squads develop our data products, incorporating all necessary skills. Data scientists collaborate with data engineers, software engineers, and site reliability engineers in one squad. Together, they ensure that the product can move smoothly from development to production. Our backlog prioritizes non-functional requirements such as reliability, security, monitoring, and performance, ensuring they receive due attention. Our cloud-native infrastructure supports the entire process, enabling early adoption of new tools and mechanisms that facilitate experimentation with the latest technologies.

Hyperight: In your experience as a software engineer and architect in the financial services sector, how have you observed the role of data science evolve? Particularly in the context of developing digital products?

Georgios Gkekas: Data Science has become more than just a specialized field and is now an essential activity for companies of different sizes. For a long time, companies from various sectors have faced challenges understanding and applying data science to their businesses successfully. I have observed that decision makers have understood that data science is not about doing advanced research and using rare technologies. We now have general awareness in the industry that data science needs to be integrated into the daily operations. Using AI SaaS solutions are important, but you cannot have an edge unless you use and leverage your own data. I will try to illustrate this with an AI layered-cake-kind architecture.

Let’s imagine four layers of AI architecture where the first one at the bottom contains the foundation models. This is the level where AI and ML engineers need to implement the algorithms we use. The second level of the architecture is where fine-tuned versions of foundation models are built based on specific data. The third level is where new AI applications emerge mainly through engineering work by combining different AI services and APIs and possibly by doing prompt engineering. And the fourth level is where complete AI products are made and ready to be used as a service, e.g. service center chatbots.

Now, after considering those levels, only a few companies work on the first level of foundation models. These are the ones that do leading-edge AI research like OpenAI, Google, Aleph Alpha etc. A modern enterprise does not need to work on that level to create a competitive advantage in AI for itself. The edge of every company lies in their own data, which capture the interactions with their customers and products. I think that this data is the main intellectual asset that holds a lot of insights that are waiting to be revealed. And this is where data science can play a role and deliver great value. In house data science teams can work on the second and/or third level and offer tailored solutions for the industry in question.

Hyperight: What are some challenges that individuals and organizations most often face when transitioning from experimental code to a robust production-ready data product environment? How can they mitigate these challenges?

Georgios Gkekas: A good data product that adds business value to your bottom line requires a well-aligned collaboration between data science and engineering. You can’t just develop that by just assigning the task to a data science team. And this is where the challenge lies; namely on the very nature of the tasks of data science and engineering teams. Data scientists have an experimental way of working, which is crucial for creating models.

On the other hand, engineers follow certain guidelines while developing and working in agile cycles, which support experimentation, yet are stringently planned and streamlined by product management teams. There are various challenges that arise when you try to combine inherently different working cultures. Challenges range from the need for exotic technologies during modeling to an unpredictable time schedule of the exploratory phase – two factors intrinsic to data science operations.

Besides that, data science teams often do not have the necessary skills to ensure product robustness and reliable operations in production. And this is a recurring problem in many organizations, which cannot make the most of their data science investment due to that lack of alignment. If your data science strategy does not facilitate a smooth integration of your engineering capabilities and does not adopt a DevOps-first mentality, then you risk that your data science activities will remain in the experimental phase and will not find a good way to production.

Hyperight: Considering the challenges highlighted in transitioning from Python snippets to production-ready ML models, what key recommendations would you offer? To individuals and organizations embarking on a similar journey.

Georgios Gkekas: There is no need to reinvent the wheel here. As such, I can’t help but repeat the importance of bridging data science with engineering operations and establishing a DevOps – or in fancier words – MLOps mentality. It is of paramount importance to consider everything as code and treat the daily operations of the data science team the same as software engineering.

In the end, it is all about creating software products with an impact to the end customer. And to that end, proven best practices can be reapplied to your data science activities.

Hyperight: According to you, what AI trends can we expect in the upcoming 12 months?

Georgios Gkekas: One of the topics that will keep the whole AI community busy will be the various regulatory activities, and in particular, the EU AI act. This will have repercussions not only to the development processes of ML models but also to the research areas of ML explainability. And from a philosophical perspective, it will be interesting to see whether humanity will continue to insist on having “human readable ML models” or give up on that endeavor and start trusting the algorithms and data more.

Besides this, I believe there will be increased interest in edge AI and federated learning. Cloud computing is powerful but still a bottleneck, and sometimes not desirable due to data privacy reasons. Therefore, the community will strive to create solutions for running AI models on local devices and can be fine-tuned based on diverse distributed data sources.

Last but not least, we cannot forget the potential effects that scientific breakthroughs in quantum computing can have on ML models and AI applications. However, I don’t think we will see an industrial application in that area in the next 12 months. I expect this to be the case in 3 to 5 years from now.

For the newest insights in the world of data and AI, subscribe to Hyperight Premium. Stay ahead of the curve with exclusive content that will deepen your understanding of the evolving data landscape.

Add comment