Google AI Proposes New Method to Reduce Burden on LLMs: Pairwise Ranking Prompting

3 min readJul 28, 2023

Google AI researchers have released a new paper proposing a new approach called Pairwise Ranking Prompting, or PRP for short. The goal is to alleviate the challenges faced by Large Language Models in solving text ranking problems. LLMs, such as GPT-3 and PaLM, have demonstrated remarkable performance on natural language tasks, even in zero-shot settings.

But, when it comes to text ranking, existing methods tend to fall short compared to trained baseline rankers, with the exception of black box systems like GPT-4. In the paper, the team acknowledges the value of black box systems, they also emphasize the constraints faced by academic researchers, including cost and access limitations.

So in their study, they delve into the reasons why LLMs struggle with ranking problems using the current pointwise and listwise approaches. According to the team, they found that generating calibrated prediction probabilities for pointwise techniques proves to be exceedingly challenging for LLMs.

Listwise techniques, on the other hand, result in inconsistent or irrelevant outputs, indicating a lack of ranking awareness in current LLM pre-training and fine-tuning techniques. So to compensate for this limitation and reduce issues related to task complexity, the researchers proposed the PRP paradigm.

This method utilizes a simple prompt architecture, employing a query and a pair of documents as the prompt for ranking tasks. Unlike existing methods, PRP offers both generation and scoring LLM APIs by default, addressing the calibration issue. Several PRP variations are discussed to ensure efficiency and effectiveness.

They went on to evaluate PRP using moderate-sized, open-sourced LLMs on traditional benchmark datasets. The results paid off as they surpassed previous methods based on the black box commercial GPT-4 with significantly larger model sizes.

One example of this was on the TREC-DL2020 dataset. The PRP based on the 20B parameter FLAN-UL2 model achieved a more than 5% improvement at NDCG@1 compared to the prior best method. On TREC-DL2019, PRP outperformed existing solutions such as InstructGPT by over 10% on most ranking measures, with slight performance degradation in NDCG@5 and NDCG@10 metrics compared to GPT-4.

Overall, the PRP exhibits several advantages, including its support for LLM APIs for scoring and generation, and its insensitivity to input orders. This work presents three major contributions. First, it demonstrates effective zero-shot ranking using moderate-sized, open-sourced LLMs. Next, the achievement of state-of-the-art ranking performance through straightforward prompting and scoring mechanisms.

And finally, the exploration of efficiency enhancements while maintaining good empirical performance.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.

Google AI Proposes New Method to Reduce Burden on LLMs: Pairwise Ranking Prompting

Written by ODSC - Open Data Science