Summary:
- Nvidia introduces a new small language model, Nemotron-Nano-9B-V2, designed to fit on a single Nvidia A10 GPU.
- The model combines Transformer and Mamba architectures, offering a balance of accuracy and efficiency.
- Nemotron-Nano-9B-V2 achieves competitive accuracy on various benchmarks and is released under a permissive licensing agreement for commercial use.
Article:
Small language models are gaining popularity in the AI landscape, with Nvidia unveiling its latest creation, Nemotron-Nano-9B-V2. This model, designed to run on a single Nvidia A10 GPU, marks a significant reduction from its original size of 12 billion parameters to 9 billion. The model’s ability to toggle on and off AI reasoning adds a unique feature, allowing users to control the model’s output process.Nemotron-Nano-9B-V2 stands out for its fusion of Transformer and Mamba architectures, offering a blend of efficiency and accuracy. By incorporating state space layers, the model can handle longer sequences of information without the memory and compute overhead typically associated with Transformer models. This hybrid approach results in higher throughput on long contexts with comparable accuracy.
In terms of performance, Nemotron-Nano-9B-V2 shines on benchmarks like AIME25, MATH500, GPQA, and LiveCodeBench, showcasing competitive accuracy against other small-scale models. The model’s ability to handle instruction following and code generation tasks further enhances its versatility for various applications.
Moreover, Nemotron-Nano-9B-V2 is released under the Nvidia Open Model License Agreement, allowing for commercial use without the need for additional licensing negotiations or fees. Developers can freely create and distribute derivative models based on Nemotron-Nano-9B-V2, with Nvidia emphasizing responsible deployment and ethical considerations in line with Trustworthy AI guidelines.
Overall, Nvidia’s release of Nemotron-Nano-9B-V2 underscores the company’s commitment to providing developers with efficient and controllable language models. By leveraging hybrid architectures and innovative training techniques, Nvidia aims to empower developers with tools that balance accuracy, cost-efficiency, and latency in their AI projects.