Researchers Introduce Proxy-Tuning: An Efficient Alternative to Finetuning Large Language Models
Researchers from the University of Washington and the Allen Institute for AI have set a new precedent in the work of fine-tuning LLMs. The study, led by Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, and Noah A. Smith, introduces a concept known as “proxy-tuning,” a method that promises to streamline the adaptation of large pretrained LMs efficiently.
Traditionally, large language models like GPT and BERT have required extensive resources for fine-tuning to meet specific needs or to enhance their performance. This process often poses a challenge, especially when model weights are inaccessible or resource constraints are a concern.
In this paper, the team’s research addresses this gap by presenting a resource-effective alternative that maintains, and in some cases, enhances the efficacy of these models. This is where proxy tuning comes into play.
This method is a lightweight, decoding-time algorithm that works in conjunction with black-box language models, which are typically large-scale and pre-trained. The core of this technique involves tuning a smaller language model and then applying the predictive differences between the small-tuned and untuned models.
This adjustment effectively shifts the predictions of the base model toward the desired tuning goal. The beauty of this method lies in its ability to leverage the advantages of larger, more comprehensive models without directly modifying them.
The effectiveness of proxy-tuning is underscored by its application to Llama2–70B, a prominent language model. By using a proxy model of only 7B in size, the researchers successfully narrowed 88% of the performance gap between the standard and fully tuned versions of Llama2–70B.
This was achieved across various benchmarks including knowledge accuracy, reasoning capabilities, and safety measures. Notably, in tests involving TruthfulQA, a platform assessing the integrity of model responses, the proxy-tuned models outperformed their directly tuned counterparts, suggesting that this method may better preserve factual accuracy.
The implications of this study extend far beyond these initial experiments. This new method, so far, has demonstrated the versatility of proxy-tuning in other domains, such as code adaptation and task-specific fine-tuning for question-answering and mathematical problems.
This flexibility indicates the potential if it becomes possible to scale.
Originally posted on OpenDataScience.com
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.