Study Hopes to Unlock the Potential of LLMs in Mathematical Problem-Solving

3 min readNov 15, 2023

In the ever-evolving landscape of artificial intelligence, even the most advanced LLMs, including GPT-4 and PaLM 2, face challenges when it comes to solving complex mathematical problems. A recent study by researchers from Google and Yale hopes to shed light on how LLMs can overcome these hurdles and significantly improve their arithmetic problem-solving capabilities.

The study, conducted with the PaLM 2 model in both its small (PaLM 2-S) and large (PaLM 2-L) forms, reveals intriguing insights into the potential of LLMs. Initially, the research showcases that the models exhibit a higher probability of discovering accurate answers when allowed to tackle a problem multiple times.

For example, the pre-trained PaLM 2-L achieves an impressive 33.4% accuracy with greedy decoding; but, the study emphasizes that this performance can be further enhanced. When sampling 64 solutions using temperature sampling, a staggering 79.4% of the time, there is at least one accurate answer (pass@64).

This discrepancy highlights the LLMs‘ ability to generate accurate solutions while struggling to discern between proper and erroneous answers. To bridge this performance gap, the researchers explore three fine-tuning techniques:

Supervised Step-by-Step Solution Fine-Tuning (SSFT): The study investigates whether pre-trained LLMs can benefit from a supervised fine-tuning step, aiming to provide a starting point technique. LLMs are adjusted to deliver entire solutions and answers.
Solution-Cluster Reranking (SCR): This technique focuses on perfecting the generator as a solution evaluator for candidate solution reranking. The researchers introduce a novel method that combines the advantages of majority voting with reranking, efficiently categorizing candidate replies into groups based on mathematical equivalency.
Sequential Multi-tasking Fine-Tuning: Beyond solution assessment, the study delves into enhancing LLMs’ performance in solution generation. By framing the solution assessment task as a natural language generation problem, the researchers aim to leverage it as valuable supervision for the solution generation model, adjusting the model in three stages.

The study’s findings on PaLM 2-S and PaLM 2-L underscore several key takeaways. SSFT’s dependence on well-formatted answers. The quality and style of step-by-step solutions significantly influence the refined model.

Efficiency of Reranking common solution clusters: Reranking only the most common solution clusters yields better performance and improved computational efficiency, presenting a potential standard practice for future work.

Dual-task training benefits: Training the model for both solution generation and evaluation tasks demonstrates improved performance. The proposed multi-task sequential fine-tuning proves more effective in enhancing the solution generation model compared to supervised solution fine-tuning alone.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

Study Hopes to Unlock the Potential of LLMs in Mathematical Problem-Solving

Written by ODSC - Open Data Science

No responses yet