Summary:
- Liquid AI has released LFM2-VL, a new generation of vision-language models designed for efficient deployment on various hardware.
- The models offer low-latency performance, strong accuracy, and flexibility for real-world applications.
- Liquid AI aims to make high-performance multimodal AI more accessible for on-device and resource-limited deployments with LFM2-VL.
Liquid AI has recently unveiled LFM2-VL, a cutting-edge series of vision-language models tailored for efficient deployment across a wide array of devices, ranging from smartphones to embedded systems. These models are designed to deliver low-latency performance, exceptional accuracy, and versatility for real-world applications. Building upon the success of their previous LFM2 architecture, Liquid AI’s LFM2-VL models are touted as the fastest on-device foundation models in the market, thanks to their innovative linear input-varying system that supports both text and image inputs at variable resolutions.
The release includes two variants: LFM2-VL-450M, a hyper-efficient model with fewer parameters for resource-constrained environments, and LFM2-VL-1.6B, a more capable model suitable for single-GPU and device-based deployment. Both variants process images at native resolutions up to 512X512 pixels, ensuring high-quality output without distortion. Additionally, the models incorporate non-overlapping patching for larger images, enabling them to capture fine details and broader scenes effectively.
Liquid AI, founded by former researchers from MIT’s CSAIL, is renowned for its Liquid Foundation Models (LFMs) based on principles from dynamical systems and signal processing. These models are designed to handle various data types, including text, video, audio, and time series data, using fewer computational resources for real-time adaptability during inference. The company’s innovative approach has led to the development of the Liquid Edge AI Platform (LEAP), a cross-platform SDK that simplifies running small language models on mobile and embedded devices.
LFM2-VL’s technical design features a modular architecture combining a language model backbone, a vision encoder, and a multimodal projector. The models excel in benchmark evaluations, achieving impressive scores in tasks like RealWorldQA, InfoVQA, and OCRBench. In inference testing, LFM2-VL demonstrated the fastest GPU processing times in its class, making it a compelling choice for AI applications requiring speed and accuracy.
LFM2-VL models are now available on Hugging Face, accompanied by example fine-tuning code in Colab. These models are released under a custom LFM1.0 license, allowing commercial use under specific conditions. Liquid AI’s commitment to making high-performance multimodal AI accessible for on-device and resource-limited deployments underscores the company’s dedication to advancing AI innovation in a sustainable and inclusive manner.