Sunday, 20 Jul 2025
Subscribe
logo logo
  • Global
  • Technology
  • Business
  • AI
  • Cloud
  • Edge Computing
  • Security
  • Investment
  • More
    • Sustainability
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
  • 🔥
  • data
  • Secures
  • Funding
  • revolutionizing
  • Investment
  • Center
  • Series
  • Future
  • cloud
  • million
  • Growth
  • Power
Font ResizerAa
Silicon FlashSilicon Flash
Search
  • Global
  • Technology
  • Business
  • AI
  • Cloud
  • Edge Computing
  • Security
  • Investment
  • More
    • Sustainability
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Silicon Flash > Blog > AI > AI framework tackles LLM agent instability
AI

AI framework tackles LLM agent instability

Published April 25, 2025 By Juwan Chacko
Share
7 Min Read
AI framework tackles LLM agent instability
SHARE

Introducing RAGEN, An AI Framework to Enhance LLM Agent Stability in Complex Situations

Training artificial intelligence agents can be challenging, especially when they need to make decisions across multiple steps in dynamic environments. While reinforcement learning has shown success in tasks like solving math problems or generating code, its application to training agents in dynamic, multi-step scenarios is still evolving.

To address this gap, a team of researchers from institutions such as Northwestern University, Stanford University, Microsoft, and New York University have proposed StarPO (State-Thinking-Actions-Reward Policy Optimization). This framework aims to provide a generalized approach for training agents at the trajectory level, optimizing the entire sequence of interactions rather than individual actions.

Accompanying StarPO is RAGEN, a modular system designed to implement StarPO and facilitate the training and evaluation of LLM agents, focusing on their reasoning capabilities under reinforcement learning. RAGEN offers the necessary infrastructure for rollouts, reward assignment, and optimization in multi-turn, stochastic environments.

Minimalist Environments for Maximum Insight

To isolate the core learning challenges from other factors, the researchers tested LLMs using RAGEN in three minimalistic symbolic gaming environments:

Bandit: A single-turn, stochastic task testing risk-sensitive reasoning where the agent selects between options with different, initially unknown, reward profiles.
Sokoban: A multi-turn, deterministic puzzle requiring foresight and planning, as actions are irreversible.
Frozen Lake: A multi-turn, stochastic grid navigation task where movement attempts can randomly fail, demanding planning under uncertainty.
These environments allow for a clear analysis of how agents learn decision-making policies purely through interaction.

Key Findings: Stability, Rollouts, and Reasoning

The study uncovered three significant findings related to the training of self-evolving LLM agents:

See also  ServiceNow's Vision: Harnessing Unified AI to Simplify Enterprise Operations

The ‘Echo Trap’ and the need for stability: Agents would initially improve but then experience performance collapse, overfitting to locally rewarded reasoning patterns. To address this, the team developed StarPO-S, a stabilized version of the framework that incorporates variance-based trajectory filtering, critic incorporation, and decoupled clipping and KL removal techniques.
Rollout quality is crucial: Factors such as task diversity, interaction granularity, and rollout frequency significantly impact learning. Maintaining freshness and appropriate action budgets is essential for stable training.
Reasoning requires careful reward design: Merely prompting models to ‘think’ is not enough to guarantee meaningful reasoning, especially in multi-turn tasks. The researchers suggest exploring rewards that evaluate the quality of intermediate reasoning steps for better agent reasoning.
RAGEN and StarPO: Advancing Self-Evolving AI

The RAGEN system and StarPO framework represent a step forward in training LLM agents that can reason and adapt in complex environments. The research emphasizes the challenges of multi-turn reinforcement learning and offers strategies to mitigate them, including StarPO-S’s stabilization techniques.

While the study acknowledges limitations and the need for further testing, it paves the way for building AI systems that excel in complex interactions and verifiable outcomes. This work is crucial for domains requiring sophisticated AI systems, such as theorem proving, software engineering, and scientific discovery.

For more insights on AI and big data from industry experts, consider attending the AI & Big Data Expo in Amsterdam, California, and London, co-located with other leading events like the Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo. Explore upcoming enterprise technology events and webinars powered by TechForge for further learning opportunities. Title: “The Ultimate Guide to Creating a Sustainable Wardrobe: Tips for Eco-Friendly Fashion”

See also  Enhancing Edge Inference: Breaking AI's Storage Barrier

In today’s fast-paced world, the fashion industry is constantly churning out new trends and styles, leading to a culture of overconsumption and waste. As consumers, it’s important for us to be mindful of the impact our choices have on the environment. One way to do this is by creating a sustainable wardrobe that not only looks great but also promotes eco-friendly practices. In this ultimate guide, we’ll explore tips and tricks for building a wardrobe that is both stylish and environmentally conscious.

1. Invest in Quality Pieces: Instead of buying cheap, fast fashion items that will fall apart after a few wears, opt for high-quality, durable pieces that will last for years to come. Look for clothing made from sustainable materials like organic cotton, bamboo, or recycled fabrics.

2. Embrace Minimalism: Instead of succumbing to the pressure to constantly buy new clothes, embrace a minimalist approach to fashion. Build a wardrobe of versatile, timeless pieces that can be mixed and matched to create a variety of outfits. This not only reduces waste but also saves you money in the long run.

3. Shop Secondhand: One person’s trash is another person’s treasure, so why not shop secondhand for your clothing needs? Thrift stores, consignment shops, and online platforms like Poshmark and Depop are great places to find gently used, affordable clothing. Not only does this reduce the demand for new clothing production, but it also gives old items a new life.

4. Support Sustainable Brands: When shopping for new clothing, look for brands that prioritize sustainability and ethical production practices. Brands like Patagonia, Reformation, and Everlane are known for their commitment to environmental and social responsibility. By supporting these companies, you’re voting with your wallet for a more sustainable fashion industry.

See also  Revolutionizing LLM Deployment: Exploring Google's Diffusion Approach

5. Take Care of Your Clothes: Extend the life of your clothing by taking proper care of them. Follow care instructions on the labels, wash items in cold water, and air dry whenever possible. Investing in a clothes steamer or lint shaver can also help keep your clothes looking fresh and new.

Creating a sustainable wardrobe is not only good for the environment, but it also allows you to express your personal style in a conscious way. By following these tips and making mindful choices, you can build a wardrobe that reflects your values and helps to reduce the negative impact of the fashion industry on the planet. Let’s make sustainable fashion the new trend!

TAGGED: agent, framework, instability, LLM, tackles
Share This Article
Facebook LinkedIn Email Copy Link Print
Previous Article Dataminr Receives 0M Investment from Fortress Dataminr Receives $100M Investment from Fortress
Next Article Motorola Razr 60 vs Razr 60 Ultra: Which is a Better Buy? Motorola Razr 60 vs Razr 60 Ultra: Which is a Better Buy?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
LinkedInFollow

Popular Posts

Precision Diagnostic Company PreciseDx Secures $11 Million in Funding Boost

Summary: PreciseDx, a NYC-based company, raised $11M in funding for its breast cancer diagnostic, PreciseBreast,…

June 22, 2025

Theom Secures $20M in Series A Investment Round

Summary: Theom, an AI-native Data Operations Center platform based in San Jose, raised $20 million…

May 12, 2025

Food Safety Pre-FDA: A Look Back at Pre-Regulation Eating Practices

The Evolution of Food Safety Regulations in the United States In today's modern society, we…

May 1, 2025

Put down the phone: Don’t let tech prevent your present moment

Attending live performances can be a transformative experience, allowing us to immerse ourselves in the…

April 20, 2025

NOV CIO fused AI and Zero Trust to slash threats by 35x

Subscribe to our daily and weekly newsletters to stay updated with the latest news and…

April 19, 2025

You Might Also Like

AnyCoder: Streamlining Web App Development with Kimi K2 Technology
AI

AnyCoder: Streamlining Web App Development with Kimi K2 Technology

Juwan Chacko
What is MCP and how does it work?
How can MCP benefit our development process?
What are the key features of MCP that we should be aware of?
How does MCP integrate with our existing systems and technologies?
What security measures are in place to protect our data when using MCP? 

New title: "Maximizing Development Efficiency: A Comprehensive Guide to MCP for Developers"
AI

What is MCP and how does it work? How can MCP benefit our development process? What are the key features of MCP that we should be aware of? How does MCP integrate with our existing systems and technologies? What security measures are in place to protect our data when using MCP? New title: "Maximizing Development Efficiency: A Comprehensive Guide to MCP for Developers"

Juwan Chacko
Securing ChatGPT: Building an AI Fortress
AI

Securing ChatGPT: Building an AI Fortress

Juwan Chacko
Top Sales PoC Platforms of the Future: Revolutionizing the Sales Process in 2025
AI

Top Sales PoC Platforms of the Future: Revolutionizing the Sales Process in 2025

Juwan Chacko
logo logo
Facebook Linkedin Rss

About US

Silicon Flash: Stay informed with the latest Tech News, Innovations, Gadgets, AI, Data Center, and Industry trends from around the world—all in one place.

Top Categories
  • Technology
  • Business
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2025 – siliconflash.com – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?