Reducing Hallucinations by 95% with Memory Tuning

5 min readOct 14, 2024

Editor’s note: Sharon Zhou is a speaker for ODSC West this October 29th-31st! Be sure to check out her talk, “Removing Hallucinations by 95% with Memory Tuning: A Technical Deep Dive,” there!

There’s no question that LLMs are unlocking new capabilities across all sectors. Data is now more valuable than ever because it can be turned into intelligence. Case in point: companies like Reddit, StackOverflow, and Twitter (now X) have locked down their public content. According to Reddit CEO Steve Huffman, it didn’t make sense for Reddit to give “all of that value to some of the largest companies in the world for free.” Companies are also trying to figure out how to leverage LLMs to make their private data more useful and create their next competitive advantage. While LLM accuracy has improved greatly in a relatively short period of time, today LLMs are still too unreliable for most production use cases, especially in high-stakes domains.

Leveraging two advanced fine-tuning technologies, LoRAs (Low-Rank Adaption) and MoE (Mixture of Experts), we have invented Lamini Memory Tuning, a breakthrough technology that reduces hallucinations by up to 95%.

What are LLM hallucinations?

Hallucinations refer to LLM outputs that are incorrect or made up. This can consist of factually inaccurate responses, made-up answers, ignoring instructions given by the user, logic errors, and more. In this demo by my co-founder, Greg Diamos, you can see hallucinations at work:

Greg asks the LLM “What year did David Aguilar climb the Golden Gate Bridge?”
The LLM responds “I couldn’t find any information on a person named David Aguilar climbing the Golden Gate Bridge. It’s possible that you may be thinking of David Aguilar, a former U.S. Secretary of the Interior, but I couldn’t find any records of him climbing the Golden Gate Bridge.”

When we look at the Wikipedia article on the Golden Gate Bridge, it states “In May 1981, Dave Aguilar climbed the South Tower of the Golden Gate Bridge to protest offshore oil drilling.”

LLMs are trained on data scraped from the internet and this piece of information has been around for a while, so they have most certainly seen it during training. Although this fact was included in the training data, it still hallucinated.

Why do LLMs hallucinate?

General-purpose LLMs are designed to hallucinate because they are trained to reduce the average error across the examples they’ve seen, or generalization error, so they won’t focus on getting facts right, just getting in the right vicinity. LLMs are also trained on many different data sources which may contain inaccurate, incomplete, conflicting, or biased data, or it may be missing information altogether. Therefore LLMs are pretty good at everything, but perfect at nothing.

Hallucinations remain endemic to general-purpose LLMs, even with advanced techniques like RAG and instruction-finetuning. It turns out that hallucinations are a fundamental technical problem with a concrete solution that changes the LLM objective. That solution is called Memory Tuning.

What is Memory Tuning?

Lamini Memory Tuning is a new way to fine-tune any open LLM by tuning millions of LoRA adapters and selecting across them in a wide Mixture of Experts at inference time. Instead of optimizing average error on everything, Memory Tuning optimizes for zero error on specific facts, so it recalls those facts nearly perfectly while still allowing for the LLM to generalize with average error on everything else. It changes the paradigm to make the LLM near perfect on facts, and still pretty good at everything else.

As illustrated below, with Memory Tuning, the training error (loss) is zero when the model is supposed to recall a specific fact, so the model selects the exact right token, eliminating hallucinations. It also significantly reduces the computational requirements because at inference, only the relevant experts (LoRA adapters) are retrieved from an index.

Join me at ODSC 2024 where I’m giving the talk “Removing Hallucinations by 95% with Memory Tuning: A Technical Deep Dive” to learn more about the conceptual framework and technical implementation details behind Memory Tuning. I’ll share real-world use cases from Fortune 500 companies that have achieved 95% accuracy. In the meantime, if you’d like to try it out yourself, we’re offering $300 in free credit for Lamini On-Demand which you can apply to our Meta Lamini Text-to-SQL Memory Tuning tutorial. The tutorial walks you through our joint notebook created in partnership with Meta and is a companion to our DeepLearning.AI course: Improving Accuracy of LLM Applications.

*****

About the author/ODSC West 2024 speaker:

Sharon Zhou, PhD, CEO & Co-Founder at Lamini

Dr. Sharon Zhou is the co-founder and CEO of Lamini. As a former Stanford faculty member, she led a research group in generative AI and published award-winning papers in generative AI. Sharon teaches some of the most popular courses on Coursera, including Finetuning LLMs, reaching nearly half a million professionals. She received her PhD in AI from Stanford, advised by Dr. Andrew Ng. Before her PhD, she was an AI product manager at Google. She received her bachelor’s degree from Harvard in computer science and Classics. Finally, Sharon has served as an AI advisor in Washington D.C. and has been featured in MIT Technology Review’s 35 Under 35 list.

Learn more about Lamini on LinkedIn, X/Twitter, and their website.

Reducing Hallucinations by 95% with Memory Tuning

What are LLM hallucinations?

Why do LLMs hallucinate?

What is Memory Tuning?

Written by ODSC - Open Data Science

No responses yet