The Future of Data Platforms is Truly Hybrid Lake-House Architecture: Interview with Liangfeng Hu

In the next few years, exciting upcoming technology is the hybrid lake-house architecture of data platforms, thinks Liangfeng Hu, a Principal Machine Learning Engineer at The AI Framework.

“This will play an important role in shortening the time from prototyping an innovative idea on an engineer’s local machine to a customer enjoying a service hosted on a fully automated, global, and robust cloud infrastructure.”, adds Liangfeng.

In this interview, he shares his views on the importance of decentralized teams compared to centralized ones. He also shares advice to anyone who plans to start an AI project and how companies can foster their digitalization by using and adopting AI. There are many more topics that he brings up. That is why we recommend you read the interview.

Hyperight: Could you tell us a bit about you. What do you do as a machine learning (ML) engineer?

Liangfeng Hu: My role centers on well established software engineering best practice and operational excellence. Whereas it is well known that data scientists build machine learning models, there is a tremendous amount of engineering work that goes into actually putting them to use. This includes improving data quality, software quality, creating testing frameworks, monitoring of systems, automation of all stages, the underlying platform design, and so forth. That’s what an ML engineer does.

Hyperight: What are you passionate about?

Liangfeng Hu: I really love seeing AI being put to real world use. I want this promising technology to be more than just “an experiment”. Right now every company realizes that digitalization is good and that by using AI they can amplify the potential, and yet adoption is much slower than it could be. I think this is because many companies are not always familiar with technology innovation best practices and in particular ones that are well established in software engineering. This is where I really enjoy contributing, because adding a bit of proven structure can really accelerate progress and successful projects.

Hyperight: What is your current working focus? Any interesting projects you are working on?

Liangfeng Hu: A neat project I’m working on right now is a customer engagement prediction model at a large fashion retailer. The company had been using ordinary reporting and business intelligence tools previously to understand customer behavior. For example, why do some customers purchase one item and never return? The old manual analysis was superseded entirely by an AI model that could predict and explain behavior. Since the system was completely automated we could be extremely structured in improving decision making, and by extension the customer experience!

Hyperight: Why did you pick this project in particular?

Liangfeng Hu: This project was right at the crossroads of “low hanging fruit” and having high impact. It was easy to identify because the company already had an old way of measuring customer behavior. We simply knew that modern machine learning could do the job better.

Hyperight: What was the result and what was the hardest part of the project?

Liangfeng Hu: The simple answer is that we improved accuracy by 10%, which is a great result on its own. However, perhaps equally important, is the fact that it’s part of an operational platform that we now can do A/B testing, incrementally improve the solution, scale it across the organization, and much more. It’s an important part of becoming data driven. The hardest part was making the end-to-end pipeline reliable. We’re not just building one-off experiments after all, we’re building “capabilities” that we want to scale across the business and that customers might come to rely on.

Hyperight: What would you recommend to others that want to start an AI project?

Liangfeng Hu: Many companies start by building a huge AI platform first, which in my experience, is a terrible idea. Instead, it is much better to start with a high value MVP (minimum viable product) and then iterate quickly. Across the many projects I’ve led this has to be one of the most consistent takeaways.

Hyperight: Based on your opinion, what would be the best solution for data engineers and scientists to work in, centralized or decentralized team?

Teamwork — Photo by Visual Tag Mx on Pexels

Liangfeng Hu: In my experience, decentralization tends to work better – and coincidentally maps well with modern ideas of data-mesh. Distributed teams can focus on solving real problems, not just producing “software” from an ivory tower. For example, when I was working at a large telecommunications company, they asked the central team to build a very capable data analysis platform. The problem however, was that they never managed to attach it to relevant projects because they didn’t really understand the requirements that were important to business users. Not only that, but they also didn’t understand what was important to developers that were supposed to target the platform! So, distributed teams tend to do much better in this regard.

Hyperight: If your projects had an infinite budget what would you do differently?

Liangfeng Hu: That’s a funny question that I haven’t had to consider before, because budget is almost never a bottleneck. The projects that I’ve led always start small and then grow along with their budget. The first and most important challenge is to design a good MVP given your AI and data strategy.

Hyperight: What are your thoughts on Edge computing?

Liangfeng Hu: There is something to be said about low latency on edge devices, and I’m thinking of everything from embedded devices to beefy workstations. You can’t imagine how frustrating it is to work on cloud resources that have tremendous throughput but always take a few seconds to respond to every command. I’m not saying you can’t achieve low latency from cloud resources. But using a giant cluster in the cloud for rapid prototyping can sometimes be like waiting for the elevator instead of just running up one floor of stairs.

One thing I’ve found convenient for financial customers that have very high security requirements, is having a workstation that mirrors the live deployment environment but lets me work efficiently. The company can ensure that data and code is secure while not getting in the way of my productivity.

Hyperight: From your perspective, what is an exciting upcoming technology? Any trends you see more of in the upcoming years?

Liangfeng Hu: I think we’re getting close to seeing truly hybrid lake-house architectures. By that, I mean we will use data platforms that seamlessly span across multiple cloud providers, and on-premise resources. This will play an important role in shortening the time from prototyping an innovative idea on an engineer’s local machine, to a customer enjoying a service hosted on a fully automated, global, and robust cloud infrastructure.

Hyperight: What would be your career advice for any data enthusiast?

Liangfeng Hu: I think it’s important to leave behind things that you’re proud of. One thing I still find myself doing to this day, is trying to think one step ahead of the target. For instance, if I’m asked to design an algorithm, I don’t just start working on it and then call it a day when the requirements are fulfilled. I study the problem and try to figure out what the simplest solution is. Don’t misunderstand though. Junior data scientists produce complicated solutions. Senior engineers produce simple, elegant, maintainable solutions. My advice is to cultivate a sense of engineering excellence – pride in craftsmanship.

Interviews from partners and exhibitors at Data Innovation Summit 2022: