Stay up to date with the latest developments and exclusive content in the field of artificial intelligence by subscribing to our daily and weekly newsletters. Find out more
When it comes to customizing large language models (LLMs) for specific tasks, two common approaches are fine-tuning and in-context learning (ICL). A recent study conducted by researchers from Google DeepMind and Stanford University delved into the generalization capabilities of these methods. The study revealed that ICL demonstrates superior generalization abilities, although it does require higher computation costs during inference. Additionally, the researchers proposed a novel approach to combine the strengths of both methods.
These findings have significant implications for developers looking to build LLM applications tailored to their enterprise data.
Exploring How Language Models Adapt to New Tasks
Fine-tuning involves further training a pre-trained LLM on a specialized dataset to impart new knowledge or skills. In contrast, ICL does not alter the model’s internal parameters but provides examples of the desired task directly within the input prompt to guide the LLM. The model then learns how to handle similar queries based on these examples.
The researchers conducted a rigorous comparison of how well models generalize to new tasks using these two methods. They created synthetic datasets with intricate, self-consistent structures, such as imaginary family trees or hierarchies of fictional concepts, to test the model’s ability to learn new information. To ensure unbiased testing, all nouns, adjectives, and verbs were replaced with nonsensical terms that the LLMs had not encountered during pre-training.
The models were subjected to various generalization challenges, including simple reversals and syllogisms, as well as a more complex semantic structure benchmark. The results highlighted the effectiveness of ICL in promoting better generalization in data-matched settings compared to standard fine-tuning.
A Hybrid Approach: Enhancing Fine-Tuning
Building on the superior generalization capabilities of ICL, the researchers introduced a new method to enhance fine-tuning by incorporating in-context inferences into the training data. This approach leverages the LLM’s own ICL abilities to generate diverse examples, which are then added to the fine-tuning dataset.
Two main data augmentation strategies were explored:
- A local strategy focused on rephrasing individual sentences or drawing inferences from them.
- A global strategy involved providing the full training dataset as context to generate longer reasoning traces of relevant inferences.
When the models were fine-tuned on these augmented datasets, significant improvements in generalization were observed. Augmented fine-tuning not only outperformed standard fine-tuning but also surpassed plain ICL in terms of performance.

This innovative approach presents a promising avenue for enterprises seeking to enhance the generalization capabilities of their fine-tuned models. By incorporating ICL-augmented datasets, developers can create more robust LLM applications that perform effectively across diverse real-world inputs without incurring continuous inference-time costs associated with large in-context prompts.
While augmented fine-tuning may increase the overall training costs, the improved generalization benefits outweigh the expenses, making it a cost-effective solution in the long run. Developers are encouraged to explore augmented fine-tuning in cases where standard fine-tuning alone falls short.
Ultimately, this research contributes to advancing the understanding of learning and generalization in foundation models, offering practical insights for adapting them to various downstream tasks.