The introduction of the transformer architecture in 2017 revolutionized artificial intelligence, with attention becoming a key component in modern AI models. However, attention comes with high computational and memory costs, creating challenges for both research and industry as models grow in complexity and size.
Recently, Manifest AI introduced a groundbreaking alternative to traditional transformers with their Brumby-14B-Base model. This model abandons attention in favor of a novel mechanism called Power Retention, which allows for efficient information processing over long contexts without the exponential memory growth associated with attention.
The Brumby model was retrained from an existing transformer model, achieving near-state-of-the-art accuracy while significantly reducing training costs. With its unique architecture and efficient design, Brumby-14B-Base marks a significant step towards a new era in AI models, challenging the dominance of traditional transformers and opening up possibilities for more diverse and cost-effective large-scale AI experimentation.