The blog discusses AI21 Labs’ new small model, Jamba Reasoning 3B, designed for enterprise use to free up data center traffic by running inference on edge devices. The model combines Mamba architecture and Transformers to achieve faster speeds and reduced memory requirements. It has been tested on a MacBook and performs well on tasks like function calling and policy-grounded generation.
Enterprises have shown interest in utilizing small models like Jamba Reasoning 3B, MobileLLM-R1 from Meta, Gemma from Google, and FICO’s Focused Language and Sequence models. These models cater to specific industry needs and can run on compute-constrained devices. AI21 Labs’ model stands out for its compact size and efficient performance on reasoning tasks without compromising speed.
Jamba Reasoning 3B excelled in benchmark testing compared to other small models, showcasing strong performance on tasks like IFBench and Humanity’s Last Exam. The model’s highly steerable nature offers enhanced privacy options to enterprises as inference is kept local on devices. This trend towards small, efficient models aligns with the evolving needs of the industry for optimized customer experiences and data privacy.