Summary:
1. Deep Cogito, an AI research startup in San Francisco founded by ex-Googlers, has released four new large language models that focus on improving reasoning abilities over time.
2. The models, ranging from 70 billion to 671 billion parameters, are available for use by AI developers and enterprises under different licensing terms.
3. These models, part of the Cogito v2 family, offer different features suited for various needs, such as dense and MoE models for different applications.
Article:
Deep Cogito, a startup based in San Francisco, has recently unveiled four new large language models as part of their Cogito v2 lineup. Founded by former Google employees, Deep Cogito aims to enhance AI reasoning capabilities with these models. Ranging in parameters from 70 billion to 671 billion, these models offer unique features that cater to the diverse needs of AI developers and enterprises.
The Cogito v2 models, including variants like Dense 70B and 405B, as well as MoE models like 109B and 671B, serve different purposes based on their design. The Dense models activate all parameters on every forward pass, making them predictable and easy to deploy across various hardware setups. On the other hand, MoE models use a sparse routing mechanism to activate specialized subnetworks, allowing for larger model sizes without a proportional increase in computational costs.
These models are now available for download and usage on platforms like Hugging Face and Unsloth, as well as through APIs from Together AI, Baseten, and RunPod for those who cannot host inferences on their own hardware. Additionally, a quantized version of the 671B model offers faster performance on accessible hardware with a slight reduction in accuracy for some tasks.
What sets Deep Cogito’s models apart is their focus on self-improvement. Through a process called iterated distillation and amplification (IDA), the models are trained to internalize their reasoning processes, learning which paths lead to better outcomes. This unique approach results in faster, more efficient reasoning and improved performance overall.
Looking ahead, Deep Cogito plans to continue iterating on their models, with a focus on building smarter AI that can improve with each iteration. By offering their models as open source, the company aims to foster innovation and collaboration within the AI community. With the support of backers and infrastructure partners, Deep Cogito’s Cogito v2 models represent a new way of building intelligent systems – not by working harder, but by learning to work smarter.