In the continuous effort to enhance the performance of generative AI services, Professor Jongse Park and his team from KAIST School of Computing, in collaboration with HyperAccel Inc., have introduced a groundbreaking NPU core technology. This new core not only boasts impressive performance metrics but also excels in energy efficiency, addressing the growing memory demands of advanced AI models like OpenAI’s ChatGPT-4 and Google’s Gemini 2.5. The research is set to be presented at the ‘2025 International Symposium on Computer Architecture (ISCA 2025)’, highlighting its innovative nature.
The primary objective of the study is focused on optimizing performance for large-scale generative AI applications by streamlining the inference process without compromising accuracy. This innovation is recognized for its integrated design of AI semiconductors and system software, which are essential for AI infrastructure.
Unlike traditional GPU-based AI systems that require multiple units to meet memory bandwidth and capacity requirements, the NPU core technology introduced here utilizes KV cache quantization, transforming resource utilization. This approach reduces the number of devices needed, ultimately lowering costs associated with building and operating generative AI platforms.
Central to the hardware architecture is an adaptation that maintains compatibility with existing NPUs while incorporating advanced quantization algorithms and page-level memory management. These enhancements ensure optimal utilization of available memory resources, improving operations efficiency and reducing power consumption.
- Cost-effectiveness: With superior power efficiency compared to cutting-edge GPUs, operational costs are anticipated to decrease significantly.
- Broader Implications: Beyond AI cloud data centers, this technology is expected to revolutionize the AI landscape, enabling environments like ‘Agentic AI’.
With a remarkable 60% performance enhancement over traditional GPUs while consuming 44% less power, this achievement underscores the potential of NPUs in developing robust and sustainable AI solutions. As AI technology continues to evolve rapidly, the outcomes of this research mark a pivotal moment in advancing state-of-the-art AI ecosystems.