NVIDIA’s New Model ChatQA-2 Rivals GPT-4 in Long Context and RAG Tasks

The Llama3-ChatQA-2-70B model can process contexts up to 128,000 tokens, matching GPT-4-Turbo's capacity.
Views : 2,653

Researchers at NVIDIA have developed Llama3-ChatQA-2-70B, a new large language model that rivals GPT-4-Turbo in handling long contexts up to 128,000 tokens and excels in retrieval-augmented generation (RAG) tasks. 

The model, based on Meta’s Llama3, demonstrates competitive performance across various benchmarks, including long-context understanding, medium-length tasks, and short-context evaluations.

Read the full paper here

The Llama3-ChatQA-2-70B model boasts several key highlights, including its ability to process contexts up to 128,000 tokens, matching the capacity of GPT-4-Turbo. It demonstrates superior performance in RAG tasks compared to GPT-4-Turbo and delivers competitive results on long-context benchmarks extending beyond 100,000 tokens. 

Additionally, the model performs strongly on medium-length tasks within 32,000 tokens and maintains effectiveness on short-context tasks within 4,000 tokens.

The researchers employed a two-step approach to extend Llama3-70B’s context window from 8,000 to 128,000 tokens. This involved continued pre-training on a mix of SlimPajama data with upsampled long sequences, followed by a three-stage instruction tuning process.

Evaluation results show that Llama3-ChatQA-2-70B outperforms many existing state-of-the-art models, including GPT-4-Turbo-2024-04-09, on the InfiniteBench long-context tasks. The model achieved an average score of 34.11, compared to GPT-4-Turbo’s 33.16.

For medium-length tasks within 32,000 tokens, Llama3-ChatQA-2-70B scored 47.37, surpassing some competitors but falling short of GPT-4-Turbo’s 51.93. On short-context tasks, the model achieved an average score of 54.81, outperforming GPT-4-Turbo and Qwen2-72B-Instruct.

The study also compared RAG and long-context solutions, finding that RAG outperforms full long-context solutions for tasks beyond 100,000 tokens. This suggests that even state-of-the-art long-context models may struggle to effectively understand and reason over such extensive inputs.

This development represents a significant step forward in open-source language models, bringing them closer to the capabilities of proprietary models like GPT-4. The researchers have provided detailed technical recipes and evaluation benchmarks, contributing to the reproducibility and advancement of long-context language models in the open-source community.

📣 Want to advertise in AIM? Book here

Picture of Gopika Raj

Gopika Raj

With a Master's degree in Journalism & Mass Communication, Gopika Raj infuses her technical writing with a distinctive flair. Intrigued by advancements in AI technology and its future prospects, her writing offers a fresh perspective in the tech domain, captivating readers along the way.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Women in Tech Summit
Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem.
Rising 2025 is just around the corner! Book your passes now to lock in your ticket at the lowest price.