Autonomous AI Navigation Systems – Interview with Aishwarya Jadhav, Waymo

How is AI transforming navigation for the visually impaired? Discover breakthrough insights at NDSML Summit 2024 with the AI Guide Dog project!

October 21, 2024

Real-Time AI Navigation Assistance for the Blind

In this interview, we speak with Aishwarya Jadhav from Waymo! Aishwarya is making significant strides in the field of autonomous navigation systems. She has a background in computer vision and AI and has previously contributed to the Full Self-Driving system at Tesla and worked on foundational AI for the Optimus robot.

At the NDSML Summit 2024, she shares insights on autonomous systems, focusing on the AI Guide Dog project. This project enhances navigation for the visually impaired. Join us as we delve into her journey, the challenges faced, and the future of real-time AI navigation assistance!

Hyperight: Can you tell us more about yourself and your organization? What is your professional background and current working focus?

Aishwarya Jadhav, speaker at NDSML Summit 2024

Hi! I’m Aishwarya Jadhav, and I work as a Machine Learning Software Engineer at Waymo. I’m part of the Perception team, focusing on Computer Vision problems for our autonomous robo-taxis. Before this, I was at Tesla Autopilot, where I contributed to their Full Self-Driving system. I also briefly worked on foundational AI for the Optimus robot. I’ve been in the autonomous driving industry for about 2.5 years. My work has primarily focused on researching and developing AI models that enhance an autonomous agent’s ability to perceive, reason, and understand the world around it.

Before my work in autonomous systems, I interned as a Machine Learning Engineer at Google. I also spent three years as a technical lead at Morgan Stanley, where I built large-scale financial recommendation engines.

Disclaimer. The views represented in this interview are my own and do not reflect those of any company I am associated with or have been associated with in the past. The autonomous systems I discuss in this interview and my talk do not represent proprietary technology but describe general systems from literature and my prior research.

Hyperight: During the NDSML Summit 2024, you will share more on Autonomous AI Navigation Systems. What can the delegates at the event expect from your presentation?

When we think about autonomous navigation systems (AS), the image that often comes to mind is a self-driving vehicle (SDV) moving autonomously without a driver. However, autonomous systems go far beyond SDVs. They encompass everything from robots operating on industrial floors and in restaurants to AI assistance systems on small devices like mobile phones.

Any system capable of perceiving, understanding, and making navigation decisions falls under the domain of AS. With steady advancements over the past two decades and rapid developments in recent years, this technology is poised to reshape our interactions. It will change how we engage with the world around us.

In my talk, I’ll explore the diverse applications of AS. I will outline a general roadmap for approaching the problem and a framework that highlights the essential components of these systems. I’ll discuss key differences in environments and expectations for various types of autonomous navigation systems. I will also address challenges like real-time, on-device decision-making, environmental variability, and safety. All of these make this field both complex and exciting!

AI is at the core of AS. It makes it a fascinating research domain that brings together Computer Vision, Behavior Prediction, and Motion Planning. I’ll briefly touch on these aspects and dive deeper into a motivating use case: the AI Guide Dog. The AI Guide Dog is a real-time navigation assistance system for the visually impaired. I’ll walk through the system’s complexities, how we leverage AI to address them and explore the computer vision and behavior prediction models at play. Additionally, I’ll discuss the challenges of deploying such systems on-device. I will also draw parallels between this assistance system and its more complex SDV counterparts.

Ultimately, while AS are complex technological systems, their core goal is safety and efficiency. SDVs aim to outperform human drivers in safety. Meanwhile, assistance systems strive to enable safe navigation for the visually impaired. Autonomous robotic systems aim to minimize human-error-related accidents while boosting efficiency in industrial settings. I’ll conclude by discussing how these systems approach safety and build user trust.

Hyperight: Can you give us a brief overview of the AI Guide Dog research project, and what inspired its development?

Sure! The AI Guide Dog project began as a research initiative at Carnegie Mellon University about four years ago. This initiative was led by Aniruddh Koul, a distinguished CMU alum. Aniruddh had been deeply involved with SeeingAI at Microsoft, an app that functions as a chatbot assistant for blind users. The overwhelmingly positive feedback from the blind community inspired the idea of AI Guide Dog—a system that could go beyond describing surroundings. It would not only “see” the environment but also understand it and make real-time navigation decisions to assist users. The vision was to create something akin to a traditional guide dog but powered by AI and accessible through a mobile phone! In 2020, three graduate students at CMU began laying the groundwork for data collection. They were mentored by Aniruddh and focused on building the infrastructure to bring this idea to life.

When I joined CMU in 2021, I was impressed by the impact and potential of such assistance systems. It was incredible to see how AI could tangibly transform the lives of real people! The technical challenges of this problem made it even more exciting. Joining the AI Guide Dog research group actually marked the beginning of my journey into the autonomous navigation industry. We continued to refine the project through 2021 and 2022, iterating on prototypes and experimenting with different models. In 2023, Aniruddh nominated me to step into his role as the project mentor, and I’ve been guiding the research ever since, with new student collaborators joining each year.

I take pride in watching this project evolve over the years and helping uncover promising new research directions to refine the assistance system. We’re steadily moving closer to a mobile app-based assistant that can guide users in real time!

Hyperight: What are the main challenges faced when integrating computer vision with real-time AI for autonomous navigation?

Autonomous navigation is a complex problem with multiple moving components, particularly from the perspectives of Computer Vision, Behavior Prediction, and Motion Planning. Each area may involve one or several machine learning-based or algorithmic components. These components must work in tandem to generate navigation trajectories by processing various environmental signals (video and sensor data). One of the primary challenges all autonomous systems (AS) face is the trade-off between latency and performance. These models must run in real-time and operate on onboard hardware. This means they need to be performant on the CPUs and GPUs available in cars, robots, or mobile devices. To put things into perspective, the most common AI models of the past two years, LLMs, typically run on expensive GPU cluster servers with limited constraints on memory and storage. In contrast, the hardware and resources available for AS models are orders of magnitude more restrictive. This aspect significantly impacts the design and scope of the models we work with in the AS industry.

Additionally, each platform presents its challenges related to hardware availability and feasibility. For instance, a self-driving vehicle (SDV) can develop and deploy custom hardware to run its models. In contrast, a mobile-device-based navigation assistance system is limited to the specific GPUs available on that phone. However, it’s important to note that comparing these systems is not entirely fair. The complexity of the AS for SDVs is significantly greater than that of mobile assistance systems. Other challenges we encountered while working on AI Guide Dog were related to keeping our models lightweight. This was necessary to ensure they could run smoothly alongside other processes on users’ mobile devices without draining the battery or overheating the phone.

This impacted aspects of the assistance system, such as the frame rate at which we sample videos captured by the user’s camera and the minimal set of sensor data to monitor and utilize as inputs to the model. It also influenced our choice of model. We had to maintain user privacy, ensuring that camera-captured information and location data remained on the user’s device and were not transmitted to the cloud. Our goal is to provide the safest possible experience while minimizing interference with users’ daily lives, ultimately helping build trust.

Hyperight: How do you and your team approach the trade-offs between explainability, accuracy, and real-time processing in your models?

That’s a great question! While we discussed the trade-offs between accuracy (performance) and real-time processing (latency), explainability is another crucial dimension. Especially in the AS industry. When users trust technology to safely take them from point A to point B, it becomes essential to provide them with reasons to maintain that trust. For the AI Guide Dog system, an explainability feature might involve audio cues. These cues explain the surroundings and the rationale behind specific instructions, such as a turn directive. This represents the ideal scenario for explainability, but even without live feedback, our models should at least provide insights into their decision-making processes during development. Explainability in machine learning is a broad topic.

I won’t delve into the details, but some AI models lend themselves well to explainability through visualizing output predictions as a function of the inputs, while others do not. This dimension also affects system design decisions and can impact overall performance. During my presentation, I will showcase visualizations from the models used in AI Guide Dog. These visualizations will illustrate the model’s decision-making process, including what it sees, the factors it considers important, and how much relevance it places on those factors.

Furthermore, black-box models, while scalable and performant, are challenging to debug and maintain. Achieving minimal latency, satisfactory explainability, and high accuracy is a fine balance and a difficult goal to reach. This involves extensive benchmarking of various model combinations and implementing compilation techniques for on-device deployment. Additionally, it requires applying latency reduction strategies and defining strict error tolerance criteria for output accuracy.

Hyperight: What role does deep learning play in the AI Guide Dog project? How does it enhance the system’s performance?

Deep learning models are central to the AI Guide Dog system. They enable us to interpret the user’s surroundings by processing video input streamed from the mobile camera. We utilize computer vision techniques to analyze this video stream and predict the optimal navigation path for the user. Additionally, deep learning models help us predict the behavior of other entities in the environment, such as pedestrians. By forecasting the potential paths of these dynamic entities, the system can generate trajectories that minimize the risk of collisions. Our research paper details the various deep learning models we experimented with to achieve the best performance efficiency while adhering to our latency constraints.

Hyperight: How do you envision the future of real-time AI navigation assistance evolving in the coming years?

Navigation assistance systems for the blind have been around for nearly two decades, with new papers and devices emerging every few years. However, research in this area has been sparse. As a result, few concepts have successfully translated into practical devices that can be easily adopted by the blind community. This has primarily been due to the tendency of prior work to focus on developing separate, often bulky devices that users must wear. These devices provide motor impulses to guide them. The associated costs and learning curve for such devices are significant barriers to widespread adoption. Furthermore, these systems were not primarily driven by AI, offering limited scope and constrained environments for operation.

Recent advancements in on-device real-time AI systems in other fields have opened up new opportunities and flexibility for developing navigation assistance systems. It is no longer forbidden to create assistants that operate on smartphones. This allows for minimal costs, no cumbersome gear, and quicker ramp-up times, enabling faster and broader adoption. The only bottleneck now is the time and resources that researchers and the industry can dedicate to such niche initiatives. Other areas of autonomous systems, such as self-driving vehicles (SDVs) and robots, continue to dominate industry focus. However, with increased attention to this cause and engagement from the right stakeholders, fully autonomous assistance systems could soon become a reality.

To sum up, I believe this industry and its use cases hold significant transformative potential, now more than ever. With the right people and investments, it could serve as a quintessential example of AI for social betterment.

Catch Aishwarya’s insightful presentation on autonomous AI navigation systems at this year’s NDSML Summit. Don’t miss this opportunity to explore the future of AI navigation systems with AI insights!