French AI startup Pleias gained attention last year with the launch of its Pleias 1.0 family of small language models, which were built entirely on scraping open data. Now, Pleias has announced the release of two open source reasoning models designed for retrieval-augmented generation, citation synthesis, and structured multilingual output.
The newly launched models, Pleias-RAG-350M and Pleias-RAG-1B, are based on Pleias 1.0 and are available in CPU-optimized GGUF format. They are aimed at enterprises, developers, and researchers looking for cost-effective alternatives to large-scale language models without compromising traceability, multilingual capabilities, or structured reasoning workflows. The models are available under a permissive Apache 2.0 open source license, allowing organizations to modify and deploy them for commercial use cases.
RAG is a widely-used technique that allows AI models to connect to external knowledge bases, such as enterprise documents, to improve their performance in tasks like chatbot development. The Pleias-RAG models aim to bridge the gap between accuracy and efficiency in small language models by offering grounding, citations, and facts directly within the model’s inference process.
The models are described as “proto-agentic,” meaning they can autonomously assess queries, determine their complexity, and decide how to respond based on source adequacy. Despite their relatively small size, the models exhibit behavior traditionally associated with larger systems, thanks to a specialized mid-training pipeline that blends data generation with reasoning prompts.
In benchmark evaluations, Pleias-RAG-350M and Pleias-RAG-1B outperform most models under 4 billion parameters on tasks such as HotPotQA and MuSiQue. They also show competitive performance across languages, with negligible degradation in performance when handling non-English queries. The models can detect the language of a query and respond in the same language, making them suitable for global deployments.
Overall, the Pleias-RAG models offer a compelling alternative for organizations looking to enhance their AI applications with cost-effective, efficient, and multilingual small language models. With their focus on grounding, citations, and facts, as well as their competitive performance across tasks and languages, these models are poised to make a significant impact in the AI industry. There is a significant synergy that we attribute to our decision-making process, which transcends mere cost-effectiveness.
Open Access and Licensing:
As detailed in a technical document by Doria and the Pleias-RAG Library, the Pleias-RAG family models were trained using a Common Corpus to create the RAG training set, with Google Gemma utilized for generating reasoning synthetic traces due to licensing permissions. Both models are now available under the Apache 2.0 license, enabling commercial reuse and seamless integration into larger systems.
Pleias highlights the adaptability of these models for incorporation into search-enhanced assistants, educational platforms, and customer support systems. Additionally, the company offers an API library to simplify the formatting of structured input and output for developers.
The release of these models marks a strategic move by Pleias to position small LLMs as tools for structured reasoning, rather than generic conversational bots. Through the use of an external memory architecture and systematic citation methods, the Pleias-RAG series provides a transparent and auditable alternative to more opaque cutting-edge models.
Future Outlook:
Looking towards the future, Pleias is focused on enhancing the capabilities of the models by improving context handling, integrating search functions more seamlessly, and fine-tuning personalities for a more consistent identity presentation. The exploration of reinforcement learning, particularly in areas such as citation accuracy, is also underway to enable algorithmic measurement of quote verification.
Collaborative efforts with partners like the Wikimedia Foundation are ongoing to facilitate targeted search integrations using reputable sources. Ultimately, Pleias envisions a shift away from RAG-specific implementations, models, and workflows as more advanced AI models are developed and deployed, incorporating RAG and agentic tool usage intrinsically. According to Doria, the goal is to integrate search and source processing capabilities directly into the model itself, potentially rendering RAG obsolete as it becomes automated by agentic models capable of directing their own workflows.
With the introduction of Pleias-RAG-350M and 1B, the company is confident that small models, when combined with robust reasoning frameworks and verifiable outputs, can rival larger counterparts, especially in multilingual and resource-constrained environments.