Evaluation Techniques for Large Language Models – Interview with Rajiv Shah, Hugging Face

In an interview, we sat down with Rajiv Shah, a seasoned Machine Learning Engineer with ten years of experience as a data scientist. Rajiv provides a glimpse into Hugging Face’s role in revolutionizing the open-source AI community. As a former key figure at Hugging Face, Rajiv focused on addressing complex challenges using open-source AI.

Rajiv will also present on “Evaluation Techniques for Large Language Models” at the upcoming Data Innovation Summit 2024. He aims to equip delegates with actionable ideas for building robust models. Moreover, Rajiv sheds light on Hugging Face’s journey, its expansion beyond natural language processing (NLP), and its vibrant open-source community. He also touches upon challenges in evaluating large language models (LLMs). Additionally, he discusses practical applications of AI in enterprise teams and emphasizes the crucial role of interpretability. The interview concludes with his insights into upcoming AI trends, providing a concise yet comprehensive exploration of Hugging Face’s impact.

Hyperight: Can you tell us more about yourself and your organization? What is your professional background and current working focus?

Image of Rajiv Shah, a speaker presenting at the Data Innovation Summit 2024 in Stockholm — Rajiv Shah

Rajiv Shah: For the past decade, my professional background has been in data science. I work at Hugging Face where I help enterprises solve their challenging problems using open source AI.

As a company, Hugging Face is devoted to building the open source AI community.

Hyperight: During the Data Innovation Summit 2024, you will share more insights on “Evaluation Techniques for Large Language Models“ – a highly relevant and important topic in the realm of LLM revolution. What can the delegates at the event expect from your presentation?

Rajiv Shah: People should walk away with ideas for building better models and having more confidence in them. As AI grows in complexity, it’s more important to ensure it’s actually solving the problem that we care about. Too often I have seen people grab the latest technology. However, a mismatch between its capabilities and end users’ needs hinders widespread usage. Evaluation is a crucial link that helps us build more useful models in less time.

During my talk, I want people to understand the role of evaluation. I’ll also cover some of the best techniques for working with generative AI/Large Language Models (LLMs).

Hyperight: What distinguishes Hugging Face’s approach to the development and deployment of generative AI from other organizations in the AI space?

Rajiv Shah: Hugging Face dedicates itself to the open-source community, focusing not on a particular model or hardware stack but on building tools and infrastructure for sustained open-source growth.

Hyperight: What was the journey like for Hugging Face in adopting machine learning technologies, especially in natural language processing? What contributed to the company’s current standing in the field?

Rajiv Shah: Hugging Face was a pioneer in bringing the latest advances, like transformers, to be easily and widely used. For the last five years, they have been a crucial tool for data scientists and developers to harness the latest advancements in AI. While Hugging Face started in natural language processing, over time, the company has expanded to all sorts of modalities of data including images and audio. Today, Hugging Face hosts over a million models and millions of users visit the site regularly.

Hyperight: What resources and tools were essential for initiating and sustaining this journey?

Rajiv Shah: It’s a community! Hugging Face has thousands of people submitting code, writing tutorials, and educating people about open source. The greatest resource is the time, as well as willingness of all sorts of people to contribute.

Hyperight: Can you elaborate on any challenges in evaluating the effectiveness and performance of large language models in enterprise settings, as well as limitations of existing methods, including their impact on optimal LLM selection?

Rajiv Shah: There is a robust market for LLMs from proprietary APIs like OpenAI to open models like Meta’s Llama model. Each of these approaches come with their tradeoffs. While an API is quick to get started with, there is also a lack of control and the need to send data outside of your environment. With an open model, you need a team and resources that can deploy a model with your own environment. I often see major corporations incorporating a mix of these approaches.

With the growth of LLMs, there are thousands of LLMs available now from providers like Hugging Face. The challenge lies in selecting the appropriate Large Language Model (LLM) for your needs. In my evaluation talk, I aim to provide some guidance for helping people through this decision.

Hyperight: Can you provide insights into the practical applications of AI that you find most promising for enterprise teams, especially in the context of LLMs?

Rajiv Shah: The biggest use cases I see are customer support applications like chatbots, code generation tools like Github Copilot, and question/answer systems built on a RAG (Retrieval-augmented generation) framework. Beyond that, people are often trying to substitute LLMs in traditional NLP use cases like text classification or text extraction.

Hyperight: From your perspective, how crucial are interpretability and explainability in the adoption of large language models, especially in industries emphasizing regulatory compliance?

Rajiv Shah: Interpretability and explainability are critical for regulatory compliance. The opaqueness of LLMs means that they will not meet regulatory requirements. However, LLMs will have a significant impact by creating data used in smaller regulated models. They will also serve as advisors in an increasing number of use cases.

Hyperight: According to you, what AI trends can we expect in the upcoming 12 months?

Rajiv Shah: Algorithms: Alternatives to traditional transformers will grow with improved performance, and enterprises will use some of those alternatives in production by the end of the year.

Generative AI Hype: We will see some of the massive spending by enterprises on Generative AI pullback as they realize the ROI is not meeting their expectations. We are already seeing companies struggling to get their models in production.

Startups: Shakeup among Generative AI startups as many fail to meet their projected revenue targets. We are already seeing startups close down or pivot.

For the newest insights in the world of data and AI, subscribe to Hyperight Premium. Stay ahead of the curve with exclusive content that will deepen your understanding of the evolving data landscape.

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bp_user-registered	13 years 8 months 8 days	This cookie is used to set which users can access the private pages of the website. It is a functional cookie.
bp_user-role	13 years 8 months 8 days	This is a functional cookie. It is used to set restriction to the user on acessing certain pages like back office, account page etc.
bp_ut_session	13 years 8 months 8 days	This is a functional cookie. This cookie is used to set restriction to the user on acessing certain pages like back office, account page etc.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_gat_gtag_UA_62786802_1	1 minute	No description
CONSENT	16 years 9 months 21 days 15 hours 5 minutes	No description
ihc_workflow_restrictions_0	1 month	No description
ihcMedia	1 hour	No description

Evaluation Techniques for Large Language Models – Interview with Rajiv Shah, Hugging Face

Add comment

Cancel reply

Recap: Day 2 at Data Innovation Summit 2024

Recap: Day 1 at Data Innovation Summit 2024

Decoding Data Modeling: A Pillar of Modern Data Stacks and AI Cost Efficiency – Interview with Serge Gershkovich, SqlDBM

Recent posts

Recap: Day 2 at Data Innovation Summit 2024

Recap: Day 1 at Data Innovation Summit 2024

Decoding Data Modeling: A Pillar of Modern Data Stacks and AI Cost Efficiency – Interview with Serge Gershkovich, SqlDBM

Next-Generation AI: Deeper Experiments – Interview with Sina Nek Akhtar, Tech Lead, Data Analytics and ML at Google Cloud

Electrolux Continuing Journey to Data-driven Manufacturing Excellence – Interview with Klaas Dobbelaere, Electrolux

Navigating the Next Wave: Generative AI at Accenture – Interview with Mattias Aspelund & Julia Falk, Accenture

The Future of AI-Enabled Experiences – Interview with Dr. Ather Gattami, Leading Swedish AI Expert, AI Researcher at Bitynamics

AIAW Podcast E125 – Liza-Maria Norlin

Topics

Email Newsletter

Events

Hyperight

Evaluation Techniques for Large Language Models – Interview with Rajiv Shah, Hugging Face

Add comment

You may also like

Recent posts

Topics

Email Newsletter

Events

Hyperight