Hyperight

AI on Hybrid Platforms: Integrate HPC, Edge, and Cloud – Interview with Johan Kristiansson, RISE Research Institutes of Sweden

AI on Hybrid Platforms: Integrate HPC, Edge, and Cloud! In this interview, we had the chance to speak with Johan Kristiansson from RISE! Johan, Senior Researcher at RISE (Research Institutes of Sweden), brings extensive experience in distributed systems, software engineering, and machine learning. Kristiansson focuses on developing advanced computational platforms that enable secure, robust workflows across diverse systems, driving innovation in AI technologies.

As a speaker at the upcoming NDSML Summit 2024, Johan will share more insights on streamlining AI across hybrid platforms, emphasizing the potential of High-Performance Computing (HPC) and its integration with cloud and edge computing.

Hyperight: Can you tell us more about yourself and your organization? What are your professional background and current working focus?

Johan Kristiansson, speaker at NDSML Summit 2024
Johan Kristiansson, speaker at NDSML Summit 2024

Johan Kristiansson: I am a Senior Researcher at RISE (Research Institutes of Sweden) with over 20+ years of expertise in distributed systems, software engineering, and machine learning. My career spans both academia and industry, with roles at Luleå University of Technology, Ericsson, Mobilaris (now Epiroc), and most recently RockSigma, where I contributed to the development of a seismic processing engine for underground mines.

Currently, my work focuses on building advanced computational platforms. This enables industries to create secure and robust large-scale workflows across heterogeneous systems. The goal is to accelerate innovation and adoption of AI technologies.

In addition to my role at RISE, I am also involved with ENCCS (EuroCC National Competence Center Sweden). At ENCCS, we provide access to some of Europe’s most powerful supercomputers. ENCCS supports industry, public administration, and academia in harnessing these computational resources, while also offering training to help organizations fully leverage high-performance computing, AI, and even quantum computers.

Hyperight: During the NDSML Summit 2024, you will speak on streamlining AI on hybrid platforms: integrating HPC, edge, and cloud. Can you explain what High-Performance Computing (HPC) is and how it differs from traditional computing approaches?

Johan Kristiansson: High-performance computing (HPC) leverages supercomputers to tackle complex problems, achieving performance levels that surpass traditional computing systems. Traditional computing, including cloud services, mainly focuses on managing Internet servers and has limited capacity for parallel task processing. HPC systems can leverage thousands or even millions of processors, all working in parallel to process massive datasets and solve problems that would be infeasible using standard computing approaches.

The key difference between HPC and traditional computing is mainly performance. While traditional computing is suitable for everyday tasks like running applications or managing data, HPC is more like a racing car, built to handle demanding workloads. Workloads such as scientific simulations, weather modeling, or large-scale AI training. HPC systems are optimized for speed and efficiency, making them capable of handling tasks that require immense computational resources. This makes HPC fundamental in AI development, where processing vast amounts of data and training large ML models are essential.

Hyperight: EuroHPC is a significant player in the global HPC landscape. Could you elaborate on its role and the key objectives it’s aiming to achieve? Particularly in the context of European computational needs?

Johan Kristiansson: EuroHPC (European High-Performance Computing Joint Undertaking) is a major initiative launched by the European Union to establish Europe as a global leader in supercomputing. It is a public-private partnership that brings together the European Union, participating countries, and private partners to coordinate efforts in HPC across the continent.

EuroHPC plays a crucial role in strengthening Europe’s position in the global HPC landscape. Its primary mission is to develop a world-class supercomputing infrastructure within Europe. This ensures that European researchers, industries, and public institutions have access to some of the most powerful computational resources available. This is essential for addressing Europe’s growing computational needs across fields such as science, industry, and public administration.

A key objective of EuroHPC is to build and deploy pre-exascale and exascale supercomputers. These supercomputers are capable of performing more than a billion calculations per second. They meet Europe’s current computational demands and position Europe as a leader in the next generation of supercomputing technologies. EuroHPC offers free access to these cutting-edge systems, including powerful GPUs, giving users access to resources that would otherwise be very expensive to obtain.

Hyperight: In your view, how will the evolution of HPC drive advancements in AI and machine learning? Especially in the context of distributed computing environments?

Johan Kristiansson: HPC provides immense processing power, essential for training complex AI models that require vast amounts of data and computation. This enables faster model training, reduces iteration times, speeds up optimization, and enables exploration of new ML architectures. As HPC evolves, particularly with exascale computing, it will enable us to tackle even more complex problems. And this will push the boundaries of innovation and large-scale ML model development.

I believe we should seamlessly integrate HPC with other computing environments. HPC excels at large-scale ML training, while cloud and edge platforms are better suited for handling real-time data processing and inference closer to where the data is generated. This hybrid approach would optimize both performance and flexibility, making AI deployment more efficient in distributed settings such as smart cities or IoT-driven industrial applications.

Hyperight: When working with EuroHPC supercomputers, what are some challenges faced by researchers and enterprises? How are these being addressed?

Johan Kristiansson: One main challenge is that HPC systems are not always easy to use. Particularly for users coming from cloud environments. The complexity of HPC can be overwhelming, especially for those with limited experience. To address this, EuroHPC, through financing National Competence Centres, like ENCCS, offers training, workshops, and user support to users. This helps them develop the expertise needed to effectively leverage these systems.

In contrast to cloud environments, HPC systems are more like powerful desktop computers. They often lack APIs and services commonly found in cloud-based systems. This can make it harder for users accustomed to cloud computing to use HPC, as they may need to interact directly with the underlying infrastructure. Additionally, this creates challenges when integrating HPC with other systems. Lack of seamless integration can slow down development and limit users who are used to automation and orchestration tools available in the cloud.

Hyperight: The concept of a Compute Continuum is becoming increasingly important in modern computing. Can you discuss its significance and the potential benefits it offers for both research and enterprise applications?

Johan Kristiansson: The idea behind Compute Continuums is to create a flexible, scalable infrastructure that combines the power of HPC and the accessibility of cloud. It also integrates real-time capabilities of edge computing into a unified system. In this model, computations take place wherever it makes the most sense based on the task’s requirements. A Compute Continuum enhances flexibility and scalability. This allows businesses to deploy machine learning models or data-intensive applications seamlessly across different environments. Whether a task requires real-time analysis at the edge or large-scale computation on HPC, this unified infrastructure enables enterprises to optimize performance and reduce costs. It also accelerates innovation by leveraging the best platform for each specific task.

For researchers, the Compute Continuum can make it easier to use both HPC and cloud systems by offering portability between different platforms. This enables them to seamlessly move workloads, data, and applications across different computing environments. Additionally, a Compute Continuum can make HPC more user-friendly by simplifying access to resources and reducing the complexity of managing different systems. This enables researchers to focus more on their work and less on technical hurdles.

Hyperight: You’re currently involved in developing ColonyOS, a meta-operating system that facilitates seamless workflows across various computing environments. Could you describe ColonyOS and its importance in meeting Europe’s growing computational demands?

Johan Kristiansson: ColonyOS is a meta-operating system designed to create Compute Continuums and enable seamless execution across diverse computing environments, including HPC, cloud, and edge systems. Its primary goal is to facilitate integration and abstract the complexity of these platforms. With ColonyOS, users can connect various platforms to their personal Compute Continuum, almost as easily as plugging appliances into standard power outlets. This allows them to seamlessly leverage HPC resources, cloud environments, or even connect their own machines to the continuum. ColonyOS functions as a modern grid computing platform, enabling users to access and utilize computational power from various systems. It simplifies the process by eliminating the need to deal with the complexities of each platform individually.

This is particularly important in meeting Europe’s growing computational demands. Industries and research institutions increasingly rely on large-scale computations, AI, and real-time data processing. ColonyOS provides a unified interface that simplifies the execution of complex workflows. This allows researchers, enterprises, and developers to access optimal resources for their tasks without being restricted to one platform. This approach also enhances security and privacy, as it brings computation to the data, allowing sensitive information to remain local while still executing tasks.

Hyperight: What challenges does ColonyOS tackle regarding cross-platform integration and resource management? How does it address the complexities of HPC environments?

Johan Kristiansson: ColonyOS addresses several key challenges related to cross-platform integration. One of the main challenges is the seamless integration of HPC, cloud, and edge systems. This often has distinct architectures, APIs, and management protocols. ColonyOS simplifies this by providing a unified interface, allowing users to interact with various platforms as if they were a single environment. This abstraction eliminates the need for users to grasp technical system details, simplifying cross-platform workflow management significantly.

Resource management is another critical challenge. ColonyOS addresses this by dynamically distributing resources based on the specific needs of each task, ensuring optimal performance. This allows users to select the best platform—HPC for heavy computations, cloud for scalability, or edge for real-time processing— all without manual intervention.

ColonyOS also enables a clear separation of concerns, allowing system integrators to focus on platform integration. Meanwhile, users can concentrate on building applications and utilizing the system. This removes the barrier for users to learn the complexities of HPC or cloud platforms. It offers portability across platforms without requiring deep technical knowledge. By addressing these challenges, ColonyOS allows researchers and enterprises to easily leverage the full potential of HPC, cloud, and edge resources in a seamless and integrated way.

Hyperight: How does ColonyOS stand out compared to traditional operating systems and middleware solutions? Particularly in terms of enhancing the usability of HPC resources?

Johan Kristiansson: ColonyOS bridges the gap between various computing environments by providing a meta-operating system. This system sits on top of existing platforms, operating systems, and middleware. The goal is not to replace existing technologies but rather integrate them into a unified continuum. This enables seamless process execution across HPC, cloud, and edge platforms.

Unlike traditional systems, which often require deep technical expertise, ColonyOS promotes usability through intuitive interfaces and tools that automate resource and data management, workload scheduling, and data orchestration. This eliminates the need for HPC-specific knowledge, making these powerful resources accessible to a broader range of researchers and enterprises. By simplifying HPC management, ColonyOS enhances productivity and lowers barriers to entry for utilizing advanced computational resources. Additionally, ColonyOS adds flexibility by allowing users to seamlessly transition between different systems. For example, users can start on a local machine, connect to an HPC system, and then transition to a cloud platform. They can even use multiple cloud systems to, for example, reduce costs.

Hyperight: Looking ahead, how do you see the integration of HPC and AI evolving? What impact do you anticipate this will have on various industries?

Johan Kristiansson: As AI models become increasingly complex, they require vast amounts of computational power for training and inference. And HPC is well-positioned to provide that. HPC’s ability to process massive datasets in parallel could enable breakthroughs in areas such as healthcare, manufacturing, finance, and scientific research.

Looking ahead, I believe the compute continuum will evolve and become more decentralized. This shift will enable even more dynamic and flexible approaches to resource management. My long-term goal is to create a fully decentralized pervasive computing platform, where computational resources are ubiquitous, seamlessly integrated, and available on-demand across any platform or device. This vision seamlessly integrates AI into everything, enabling the continuous flow of data and computational tasks. It will create a world where computational power is always available, reshaping how we interact with the Internet.

Transform Your AI Vision into Reality and Value at NDSML Summit 2024!

Since 2016, the annual Nordic Data Science and Machine Learning (NDSML) Summit has been the premier event where AI innovation meets enterprise-scale implementation. Held on October 23rd and 24th, 2024, at Sergel Hub, Stockholm, this year, we bring a fresh focus on the latest trends in Data Science, Machine Learning, and Artificial Intelligence. Dive into three tracks: Strategy & Organization, ML and MLOps, and Infrastructure & Engineering, each designed to address the current and future challenges in AI deployment!

Explore topics such as Generative AI and Large Language Models, robust AI techniques, Computer Vision, Robotics, and Edge Computing. With 450+ Nordic delegates, 40+ leading speakers, interactive panels, and a hybrid format for both on-site and online attendees, this summit is a place to harness the full potential of AI! GET YOUR TICKETS NOW.

Add comment

Upcoming Events