Unlocking the Potential of Small Language Models: Insights from Ivan Lee

The AI landscape is evolving at an unprecedented pace. While large language models (LLMs) like OpenAI’s GPT-4 and Google’s Gemini dominate headlines, a quieter revolution is unfolding in the world of small language models (SLMs). Ivan Lee, CEO and founder of Datasaur, shared his expertise on this topic during a recent podcast hosted by ODSC. Ivan’s insights provide a roadmap for leveraging SLMs, improving model flexibility, and deploying AI effectively in 2025 and beyond.
This is a recap of a recent episode of ODSC’s Ai X Podcast. You can listen to the full podcast on Spotify, Apple, and SoundCloud.
The Case for Small Language Models
SLMs, which range from 1 billion to 8 billion parameters, offer a compelling alternative to their larger counterparts. Ivan emphasizes that while LLMs deliver remarkable general-purpose capabilities, they often come at the cost of higher computational requirements, increased latency, and hefty expenses. For many enterprises, a tailored SLM trained for specific tasks can be a smarter, more efficient choice.
Ivan Lee draws an analogy to cooking: “I’m a terrible cook, but if I trained with a Michelin-star chef for a month, I could make a darn good omelet. Similarly, SLMs don’t need to excel at everything — they just need to specialize in a specific domain.” By fine-tuning SLMs with domain-specific data, organizations can achieve high accuracy and efficiency at a fraction of the cost.
Why Flexibility Matters
One of the most pressing challenges in AI today is navigating the rapidly shifting landscape of models and technologies. Ivan Lee stresses the importance of flexibility, urging organizations to avoid locking themselves into a single model or vendor. “It’s prudent not to commit yourself to any one model or company,” he explains. “The battlefield is evolving, and being adaptable is key to staying competitive.”
To achieve flexibility, Ivan advocates for:
- Abstraction Layers: Implement a standardized API endpoint for your AI models. This allows you to swap models or update underlying technologies without disrupting your applications.
- Data Portability: Save your fine-tuning datasets and configurations. These assets enable you to quickly retrain or transition to a different model as better options become available.
- Continuous Evaluation: Regularly test your models using real-world prompts and ground-truth datasets to ensure they meet your evolving requirements.
Simplifying Fine-Tuning
Fine-tuning, once considered a complex and resource-intensive process, is now more accessible than ever. Ivan Lee outlines a straightforward three-step approach:
- Select a Model: Choose an appropriate base model for your task, such as LLaMA, GPT, or Claude.
- Prepare Your Data: Create a two-column dataset where one column contains prompts, and the other contains the desired responses.
- Fine-Tune: Use tools like Datasaur’s platform or a cloud provider’s APIs to fine-tune the model. In many cases, this can be completed within a few hours.
“Fine-tuning isn’t as daunting as it seems,” Ivan assures listeners. “You can start with just a few thousand examples and achieve remarkable improvements for specific tasks.”
The Role of Retrieval-Augmented Generation (RAG)
While fine-tuning remains a cornerstone of SLM optimization, Ivan Lee acknowledges the growing importance of retrieval-augmented generation (RAG). RAG combines pre-trained models with external knowledge bases to generate context-aware responses in real time. This approach is particularly useful for applications requiring up-to-date or domain-specific information.
Ivan is quick to note that fine-tuning and RAG are not mutually exclusive. “You can fine-tune a model to understand a specific domain and then enhance it with RAG for real-time adaptability,” he explains. This combination provides the best of both worlds: efficiency and contextual accuracy.
Cost Considerations and ROI
A recurring theme in Ivan’s discussion is the need to evaluate the return on investment (ROI) for AI deployments. He points out that many organizations spend exorbitant amounts on LLMs for tasks that could be accomplished more cost-effectively with SLMs or traditional NLP techniques.
Ivan highlights a significant cost-saving opportunity: “With small language models, you can achieve up to 17 times the cost-efficiency compared to large models. For organizations running millions of API calls, this translates to substantial savings.”
Open Source vs. Proprietary Models
The debate between open-source and proprietary models continues to be a focal point for enterprises. Ivan provides a balanced perspective:
- Proprietary Models: These often deliver superior out-of-the-box performance and integrate seamlessly with existing ecosystems. However, they come with higher costs and limited flexibility.
- Open-Source Models: While open-source models like LLaMA and Mistral require more expertise to implement, they offer unparalleled customization and cost advantages.
“By 2025, we’ll see a more level playing field between open-source and proprietary models,” Ivan predicts. “As open-source solutions catch up in quality, enterprises will increasingly adopt them for their cost-effectiveness and adaptability.”
Deployment Best Practices
Effective deployment is crucial for maximizing the value of AI investments. Ivan shares several best practices:
- Use Specialized Hardware: Leverage GPUs and AI-specific chips like AWS Inferentia to optimize inference costs and latency.
- Adopt a Layered Architecture: Design your deployment stack to accommodate multiple models and embedding layers, ensuring you can switch components as needed.
- Monitor for Drift: Implement prompt unit testing to detect and address changes in model behavior over time.
Beyond Text: The Rise of Multimodal Models
While much of today’s AI focus is on text-based applications, Ivan sees significant potential in multimodal models that integrate text, images, and other data types. However, he cautions against using multimodal models for tasks better suited to domain-specific solutions. “Do you want a Swiss Army knife, or do you want a tool that’s exceptionally good at one thing?” he asks.
Looking Ahead in 2025
Ivan concludes with a bold prediction: SLMs will become the dominant AI solution for most enterprises by 2025. “Not everyone needs a supercomputer. SLMs democratize AI by making it more accessible, efficient, and affordable,” he asserts.
For businesses and developers alike, the message is clear: Focus on building adaptable, cost-effective solutions that align with your specific needs. As the AI ecosystem continues to evolve, those who prioritize flexibility and ROI will be best positioned to thrive.