Building AI Applications with Foundation Models: Key Insights from Chip Huyen
In a recent episode of ODSC’s AiX podcast, we had the pleasure of speaking with Chip Huyen, an AI expert and bestselling author of Designing Machine Learning Systems and AI Engineering: Building Applications with Foundation Models. Chip shared her insights into the rapidly evolving discipline of AI engineering and provided actionable advice for professionals navigating the ever-changing AI landscape. Below, we delve into some of the highlights from this enlightening conversation.
You can listen to the full podcast on Spotify, Apple, and SoundCloud.
What is AI Engineering?
Chip Huyen began by explaining how AI engineering has emerged as a distinct discipline, evolving out of traditional machine learning engineering. While machine learning engineers focus on building models, AI engineers often work with pre-trained foundation models, adapting them to specific use cases. This shift has made AI engineering more multidisciplinary, incorporating elements of data science, software engineering, and system design.
Chip highlighted the key differences:
- Model Usage: Machine learning engineers often build models from scratch, while AI engineers primarily adapt existing foundation models.
- Evaluation Challenges: Foundation models are open-ended, making their evaluation significantly more complex than traditional machine learning models.
- Integration with Products: AI engineering requires a closer relationship with product design, as applications increasingly start with user needs rather than data availability.
The Core of AI Engineering: Data and Compute
Two factors consistently underpin successful AI applications: data scalability and compute. According to Chip Huyen, foundational models thrive on large datasets and robust computational infrastructure. However, these requirements also present challenges, particularly for smaller organizations with limited resources.
Chip emphasized the importance of “dataset engineering” — a concept she explores in-depth in her book. Dataset engineering involves curating and optimizing training data to ensure quality and relevance. For example, techniques like back-translation can verify the quality of AI-generated data, enhancing model performance.
Evaluating Foundation Models: A Growing Challenge
Evaluation is one of the most critical yet challenging aspects of working with foundation models. Unlike traditional machine learning tasks, where outputs are binary or categorical, foundation models produce nuanced, open-ended outputs that are harder to assess. Chip Huyen outlined three approaches to evaluation:
- Functional Correctness: Assessing how well a model performs a specific task, such as generating accurate code or summaries.
- AI as a Judge: Using AI models to evaluate the outputs of other models, though this requires careful design to ensure reliability.
- Comparative Evaluation: Comparing multiple outputs to determine which is better, even if absolute quality is hard to measure.
Chip’s insights into evaluation are particularly relevant for organizations integrating AI into mission-critical applications, where silent failures can have significant consequences.
Building Real-World Applications: Lessons and Mistakes
Chip Huyen candidly shared common mistakes she has observed in AI application development:
- Overengineering: Many teams rush to use generative AI for tasks that simpler methods, such as decision trees, could handle more effectively.
- Lack of User-Centric Design: Teams often fail to understand user needs, leading to applications that produce technically correct but unhelpful outputs.
- Inadequate Prompt Engineering: Prompts should be treated as critical components of the system, with version control and transparency to ensure consistent performance.
She emphasized the importance of starting small with low-risk use cases, iterating based on user feedback, and gradually scaling up.
Recommendations for Resource-Constrained Teams
For teams with limited GPU resources, Chip offered practical advice:
- Start with open-source models and fine-tune them on private data using parameter-efficient techniques like LoRA (Low-Rank Adaptation).
- Use quantization libraries to reduce the computational load, making it feasible to train and deploy models on limited hardware.
- Focus on data quality over quantity. Curated datasets can yield better results than massive, unfiltered datasets.
The Future of AI Engineering
Looking ahead, Chip is excited about the growing capabilities of agentic AI and the potential for foundation models to interact more effectively with real-world tasks. She also anticipates advancements in compute hardware, which could challenge the dominance of GPUs and unlock new possibilities for AI development.
Finally, Chip stressed the importance of continuous learning and networking. Surrounding yourself with knowledgeable peers and engaging in forums like ODSC can help professionals stay ahead in this fast-paced field.
Key Takeaways
Chip Huyen’s insights provide a roadmap for professionals looking to build robust AI applications with foundation models:
- Treat evaluation as a core part of the development process.
- Invest in dataset engineering to maximize the value of your data.
- Start small, iterate based on user feedback, and scale up thoughtfully.
- Leverage open-source models and efficient fine-tuning techniques to overcome resource constraints.
- Stay engaged with the community to keep pace with emerging trends and technologies.
For more in-depth guidance, consider reading Chip’s latest book, AI Engineering: Building Applications with Foundation Models, or attending her sessions at ODSC conferences.
ODSC East 2025 coming up this May 13th-15th in Boston, MA, in addition to virtually, is the best AI conference for AI builders and data scientists there is. Come learn from experts representing the biggest names in AI like Google, Microsoft, Amazon, and others, network with hundreds of other like-minded individuals, and get hands-on with everything you need to excel in the field.