AI Development Lifecycle Learnings of What Changed with LLMs

4 min readFeb 5, 2025

The recent fluctuations in big tech stocks have underscored a pressing issue: many large language model-based products fail to achieve significant user traction. This problem often stems from inadequate user value, underwhelming performance, and an absence of robust best practices for building and deploying LLM tools as part of the AI development lifecycle.

At ODSC Europe 2024, Noe Achache, Engineering Manager & Generative AI Lead at Sicara, spoke about the performance challenges and outlined key lessons and best practices for creating successful, high-performing LLM-based solutions.

You can watch the full video of this session here and download the slides here.

The Evolving AI Development Lifecycle

Despite the revolutionary capabilities of LLMs, the core development lifecycle established by traditional natural language processing remains essential: Plan, Prepare Data, Engineer Model, Evaluate, Deploy, Operate, and Monitor. LLMs, while accelerating some processes, introduce complexities that require new tools and methodologies. The shift from structured outputs to generating free-form text adds layers of unpredictability, making it imperative to implement the traditional lifecycle with added vigilance and innovation.

Key challenges include:

Preparing robust datasets for evaluation.
Engineering scalable and adaptable solutions.
Monitoring dynamic model behavior in real-world applications.

Common Pitfalls in LLM Development

Neglecting Data Preparation: Poorly prepared data leads to subpar evaluation and iterations, reducing generalizability and stakeholder confidence. Real-world applications often expose gaps that proper data preparation could have preempted.

Lack of Comprehensive Model Evaluation: Without rigorous evaluation, developers face pointless iterations, limited insights into model behavior, and challenges in establishing product reliability.

Inadequate Monitoring: Neglecting to monitor user interactions and data drifts hampers insights into product adoption and long-term performance.

Real-World Application: Text-to-SQL in Healthcare

In his talk, Noe provided a real-world case study on the issue. Consider a healthcare consultancy managing a vast database of drug information. Previously, consultants spent weeks manually querying data. An LLM-based solution was introduced to translate natural language queries into SQL, streamlining the process. Integrated tools, including OpenAI and Langfuse for monitoring, facilitated semantic retrieval and summarization.

This success hinged on careful execution of the development lifecycle:

A 100-question evaluation dataset ensured the solution aligned with user needs.
Clear definitions of scope and expectations minimized iterative inefficiencies.

Best Practices for Success

Data Preparation

User-Driven Evaluation Datasets: Interview users to gather real-world scenarios. In the healthcare case, this included 100 representative questions and 10 varied summaries.
Clarify Scope Early: Data collection helps define project scope, guiding whether to use free-form generation or classification approaches.
Set Expectations: For complex outputs like summaries, aligning on what “good” looks like is critical to avoid iterative confusion.

Model Evaluation

Two primary methods include manual evaluation and leveraging LLMs as judges.

Manual Evaluation: Reliable but resource-intensive. Use it for early understanding and to refine automated pipelines.
LLM Judges: While cost-effective, they can introduce bias and require careful calibration. For tasks with inherent non-determinism, run multiple iterations to average results.

Monitoring

Adopt Specialized Tools: Platforms like Langfuse enable tracking token usage, costs, user queries, and even the LLM’s decision-making process.

Analyze Production Usage: Understanding how users interact with the model informs future iterations and feature updates.

The Future of LLM Development

Despite their transformative potential, LLMs require methodical execution across traditional lifecycle steps. Emerging tools, tailored to LLMs’ unique challenges, are becoming indispensable.

For instance:

Data Preparation: Google Sheets.
Model Engineering: DVC (Data Version Control).
Evaluation: Tools like Notion.
Deployment: Platforms such as Lamini.
Monitoring: Langfuse and Long Mei.

Conclusion on the AI Development Lifecycle

To wrap up his talk, Noe Achache acknowledged that while the development steps remain consistent, the methodologies and tools required to build impactful LLM applications are evolving. By adopting these practices, data professionals can drive innovation while mitigating risks, ensuring LLM-based solutions achieve both traction and reliability.

ODSC East 2025 coming up this May 13th-15th in Boston, MA, in addition to virtually, is the best AI conference for AI builders and data scientists there is. Come learn from experts representing the biggest names in AI like Google, Microsoft, Amazon, and others, network with hundreds of other like-minded individuals, and get hands-on with everything you need to excel in the field.

If you’re antsy and want to start upskilling sooner rather than later, you can also check out the 4-week virtual training summit, AI Builders! Starting January 15th and running every Wednesday and Thursday until February 6th, this event is designed to equip data scientists, ML and AI engineers, and innovators with the latest advancements in large language models (LLMs), AI agents, and Retrieval-Augmented Generation (RAG).