Phi-3.5 Series: Microsoft’s Newest Trio of Small Language Models

Microsoft has unveiled a new set of open-source AI models that are claimed to outperform Google’s Gemini 1.5 Flash, Meta’s Llama 3.1, and even OpenAI’s GPT-4o in certain aspects!

August 22, 2024

Microsoft has unveiled a new set of compact, open source AI models claimed to outperform Google’s Gemini 1.5 Flash, Meta’s Llama 3.1, and even OpenAI’s GPT-4o in certain aspects!

The new models include Phi-3.5-mini-instruct, Phi-3.5-Mixture of Experts (MoE)-instruct, and Phi-3.5-vision-instruct. These are the latest additions to Microsoft’s Phi-3 series of small language models (SLMs). They follow the debut of the original Phi-3-mini, which was introduced in April this year.

What are the New Phi-3.5 Models?

The latest Phi 3.5 models include the Phi-3.5-mini-instruct with 3.82 billion parameters. The lineup also features the Phi-3.5-MoE-instruct with 41.9 billion parameters (actively utilizing 6.6 billion) and the Phi-3.5-vision-instruct with 4.15 billion parameters.

These parameters indicate the models’ scale and their capacity for learning and expertise in various tasks. All models also support a context window of 128k tokens. This allows them to handle and generate extensive data, such as text, images, audio, code, and video. Remarkably, they deliver near state-of-the-art performance across numerous third-party benchmarks, often outperforming competitors like Gemini 1.5 Flash, Llama 3.1, and GPT-4o.

According to Microsoft, the Phi-3.5 Mini was trained for a period of ten days on 3.4 trillion tokens while the Phi-3.5 MoE model was trained for a period of 23 days on 4.9 trillion tokens. It took 500 billion tokens and six days to train the Phi-3.5 Vision model, the company said. The training datasets fed to the new Phi-3.5 models comprised high-quality, reasoning-dense, publicly available data.

What are the Capabilities?

The Phi-3.5 Mini excels at straightforward and rapid reasoning tasks, making it ideal for coding and tackling mathematical or logical problems. The Phi-3.5 MoE model, which integrates several specialized models, is adept at handling intricate AI tasks across various languages. Conversely, the Phi-3.5 Vision model is designed for both text and image processing. This multimodal capability enables it to perform tasks such as summarizing videos and analyzing charts and tables.

1. Phi-3.5 Mini

The Phi-3.5 Mini Instruct is a compact AI model with 3.8 billion parameters. It’s designed for efficient instruction processing and supports a 128k token context length. It excels in environments with limited memory or computational resources. This makes it ideal for code generation, solving math problems, and logical reasoning.

Despite its smaller size, this model achieves impressive results in multilingual and multi-turn conversations, showcasing notable advancements over its predecessors. It performs exceptionally well on various benchmarks, often surpassing other models of similar size, like Llama-3.1-8B-instruct and Mistral-7B-instruct, particularly in the RepoQA benchmark, which evaluates long-context code understanding.

2. Phi-3.5 MoE

The Phi-3.5 MoE (Mixture of Experts) model marks a novel approach from the company, integrating multiple specialized models into a single framework. With an architecture supporting 42 billion parameters and a 128k token context length, it offers scalable performance for complex tasks.

However, it actively utilizes only 6.6 billion parameters. This model is tailored for advanced reasoning tasks, including code generation, mathematical problem solving, and multilingual comprehension. It frequently outperforms larger models in targeted benchmarks, such as RepoQA, demonstrating its efficiency in specific areas.

3. Phi-3.5 Vision

The Phi-3.5 Vision Instruct rounds out the Phi-3.5 series with its ability to process both text and images. This multimodal model excels in a variety of tasks. Tasks like image interpretation, optical character recognition, understanding charts and tables, and summarizing videos. It shares the 128k token context length of its Phi-3.5 counterparts, allowing it to handle intricate, multi-frame visual tasks efficiently. Microsoft trained the Vision Instruct model on a mix of synthetic and carefully curated publicly available datasets, emphasizing high-quality, data-rich reasoning capabilities.

Source: Discover the New Multi-Lingual, High-Quality Phi-3.5 SLMs

To Wrap Up

Microsoft’s Phi-3.5 series sets a new standard in AI with its innovative, compact models that blend powerful reasoning, multilingual processing, and advanced multimodal capabilities.

With impressive benchmarks, these models aren’t just competing with the best—they’re redefining what small language models can achieve. Whether it’s tackling code, mastering complex languages, or analyzing visuals, the Phi-3.5 series is pushing boundaries and setting the stage for the next wave of AI innovation.