Meta AI Researchers Propose Advanced Long-Context LLMs
In a new paper, researchers at Meta AI are proposing advanced long-context LLMs to address the lack of access to LLMs with robust long-context capabilities. In the past, this has primarily been available via proprietary APIs.
The problem though, is that this approach left a void for researchers and developers seeking open-source solutions. While open-source long-context models exist, they often fall short in rigorous evaluations, focusing on language modeling loss and synthetic tasks that don’t reflect real-world scenarios adequately.
To address these challenges, Meta AI pushing forward a new methodology. It builds upon continual pretraining from LLAMA 2 checkpoints, incorporating an additional 400 billion tokens to create extensive training sequences that capture the essence of long-context understanding.
Thus far, the results are in a range of model variants, from smaller 7B/13B models trained with 32,768-token sequences to larger 34B/70B models with 16,384-token sequences. What distinguishes this approach is the thoroughness of its evaluation process.
Unlike previous studies, Meta Research assesses the models across multiple dimensions, including language modeling capabilities, synthetic tasks, and real-world benchmarks. They cover both long and short-context tasks, providing a comprehensive view of the models’ abilities.
The findings underscore the models’ scaling behavior, demonstrating their consistent performance improvement with more extensive contexts. Context length emerges as a pivotal axis of scaling for LLMs.
Compared to LLAMA 2 on research benchmarks, this method delivers significant advancements in long-context tasks and modest improvements in standard short-context tasks. Notably, it excels in coding, mathematical problem-solving, and knowledge-related tasks.
Another avenue that this research explores is finding a cost-effective approach for instruction fine-tuning, resulting in a chat model that outperforms gpt-3.5-turbo-16k on various long-context benchmarks.
Based on the paper and results, it seems that Meta Research’s approach bridges the divide between proprietary and open-source long-context LLMs. It claims to offer models with superior performance, comprehensive evaluations, and insights into the factors that shape their capabilities.
This work empowers researchers and developers to harness the potential of long-context LLMs, which in turn can help usher in a new era of NLP-based research. From helping to push forward greater human-computer interactions, Meta AI looks to push forward greater democratization and access to advanced language models and tools.
Originally posted on OpenDataScience.com
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.