The FAIR team at Meta has recently unveiled five new projects that are a significant step forward in their quest for advanced machine intelligence (AMI).
The latest developments from Meta are centered around enhancing AI perception, which involves the ability of machines to process and interpret sensory information. These projects also focus on advancements in language modeling, robotics, and collaborative AI agents.
Meta’s ultimate goal is to create machines that can acquire, process, and interpret sensory information about the world in a way that allows them to make decisions with human-like intelligence and speed.
The five new projects represent various efforts towards achieving this ambitious goal.
1. **Perception Encoder:** Meta has introduced the Perception Encoder, a large-scale vision encoder designed to excel in various image and video tasks. Vision encoders serve as the “eyes” for AI systems, enabling them to understand visual data. Meta emphasizes the challenge of building encoders that can meet the demands of advanced AI, which require capabilities to bridge vision and language, effectively handle both images and videos, and remain robust under challenging conditions.
2. **Perception Language Model (PLM):** Complementing the encoder is the Perception Language Model, an open and reproducible vision-language model aimed at complex visual recognition tasks. The PLM was trained using large-scale synthetic data combined with open vision-language datasets, without distilling knowledge from external proprietary models.
3. **Meta Locate 3D:** Meta Locate 3D aims to bridge the gap between language commands and physical action by allowing robots to accurately localize objects in a 3D environment based on open-vocabulary natural language queries.
4. **Dynamic Byte Latent Transformer:** Meta is releasing the model weights for its 8-billion parameter Dynamic Byte Latent Transformer, which operates at the byte level, offering significant improvements in inference efficiency and robustness compared to traditional tokenization-based language models.
5. **Collaborative Reasoner:** The Collaborative Reasoner project focuses on creating AI agents that can effectively collaborate with humans or other AIs. Meta aims to imbue AI with social skills such as communication, empathy, providing feedback, and understanding others’ mental states to enhance collaboration.
These projects underscore Meta’s dedication to fundamental AI research, particularly in building blocks for machines that can perceive, understand, and interact with the world in more human-like ways. The advancements made by Meta in AI perception, language modeling, robotics, and collaborative AI agents are paving the way for more capable and intelligent AI systems in the future.