LLM Distillation — Build Enterprise-Grade Applications Like Apple

ODSC - Open Data Science
4 min readNov 6, 2024

--

Apple announced that their Apple Intelligence starts with on-device small language models (SLMs). SLMs are rapidly gaining traction in the ML community, leveraging the intelligence of increasingly large and powerful LLMs in the form of much smaller, faster and efficient models.

While ChatGPT captured the imagination of billions of users across the globe, enterprise workflows don’t require AI to answer questions as varied as “what the hippos eat” and “which Spanish director has won the most Oscars”. Companies are finding that they require AI that excels at specific tasks rather than being a jack-of-all-trades.

LLM Distillation is a technique that creates smaller, specialized models. These models inherit the language understanding capabilities of their larger counterparts but focus solely on 1–2 key workflows, eliminating unnecessary features for improved efficiency and speed while retaining high quality.

In this article we will talk through the steps required to distill a large, state of the art model to a smaller sibling model in order to improve the smaller model’s precision F1-score (+3.83%), and accuracy (+8.1%), all while reducing costs to zero by running in Google Colab and improving speed 3.5x relative to the original large model.

1. To start with, we select a teacher model. We choose Llama 3.1 405B as the current best performing open source model. We use a smaller model Llama 3.2 3B as a more compact student model to learn the particular nuances of this dataset.

2. Select a dataset — We use the open source PubMed Dataset to represent an advanced, industry specific use case that a base foundation model will not have in its base training set and will not understand out of the box.

3. We partition the dataset into a training dataset and a test dataset to ensure our training does not bias our results.

4. Measure more than quality — In addition to quality metrics, we also measure the inference speed and cost of each model.

5. Fine tuning — Next, we gather ground truth, expert-provided data for the training data. We use this to fine-tune the student model. Finally, we re-measure the results of the student model. As you can see, the results have improved significantly. We can now deploy this fine-tuned student model much faster and at much lower costs.

Join my ODSC West tutorial for a deeper exploration of LLM distillation. We’ll cover everything from advanced prompt engineering to parameter optimization, giving you hands-on experience with fine-tuning these powerful models.

About the Author/ODSC West 2024 Speaker:

Ivan Lee graduated with a Computer Science B.S. from Stanford University, then dropped out of his master’s degree to found his first mobile gaming company Loki Studios. After raising institutional funding and building a profitable game, Loki was acquired by Yahoo. Lee spent the next 10 years building AI products at Yahoo and Apple and discovered there was a gap in serving the rapid evolution of Natural Language Processing (NLP) technologies. He built Datasaur to focus on democratizing access to NLP and LLMs. Datasaur has raised $8m in venture funding from top-tier investors such as Initialized Capital, Greg Brockman (President, OpenAI), and Calvin French-Owen (CTO, Segment) and serves companies such as Google, Netflix, Qualtrics, Spotify, the FBI, and more.

--

--

ODSC - Open Data Science
ODSC - Open Data Science

Written by ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

No responses yet