How To Set the Right KPIs for GenAI

3 min readSep 19, 2024

If 2023 was the year of experimentation, 2024 is about deploying generative AI in the enterprise. With organizations ramping up their investments in GenAI technologies, the next step is measuring and proving the value of these investments.

However, unlike other projects, setting the right KPIs for generative AI requires a slightly different mindset.

As this Atlassian author puts it, “Output over time is a good way to measure the impact of machines, not knowledge workers.” If we measure the value of AI in terms of the time it frees up for humans, expecting that this will naturally lead to increased productivity, we risk setting ourselves up for failure.

He goes on to say, “When we talk about productivity, we are inherently and inescapably talking about output — not outcomes. When we talk about increasing productivity, we’re really talking about increasing output.”

Instead, we should consider measuring the outcomes (as opposed to output) of using AI.

Understanding the challenges and opportunities of GenAI will be key to ensuring that enterprises not only adopt these technologies but also get meaningful and measurable benefits from them.

Why GenAI KPIs Should Mirror Workforce Evaluation

Organizations often measure AI value in terms of time saved, but that’s just scratching the surface. Using only time saved can lead to mismatched expectations with ROI, especially when taking into consideration the cost of running GenAI solutions at scale.

The real value lies in how well AI enhances task-specific success metrics, like reducing fulfillment time in customer service or increasing engagement in marketing.

In general, your KPIs should reflect the outcomes of using AI, not the model itself. The good news: If humans have been performing a task with some consistency within your organization, you already have a great set of metrics to measure the value of the AI-enabled solution.

These metrics might include efficiency in completing tasks, output per employee, error rate, and customer satisfaction, to name a few.

In some cases, individuals may spend more time on a task but can explore a much larger set of creative possibilities or perform work at higher quality.

Where Model Benchmarks and KPIs for GenAI Intersect

While GenAI benchmarks give us a general understanding of a model’s capabilities, company-specific KPIs are great for measuring performance against business goals and use cases. When combined, they become the gold standard for evaluating AI models.

Here are just a few areas where benchmarks and KPIs intersect:

Standardized benchmarks offer a broad assessment of an AI model’s capabilities, while KPIs provide a specific measure of how well the model performs in the organization’s unique context.
Benchmarks can be used for initial screening to select promising models. Once selected, these models can be evaluated continuously using KPIs to ensure ongoing performance and alignment with business objectives.
Benchmarks help set expectations about a model’s potential, while KPIs measure its actual real-world impact on the business.

Ultimately, businesses must learn to separate model performance from its efficacy.

All models are wrong some amount of the time. But it doesn’t matter if the model is right 99% of the time if you can’t do anything with the results or integrate it into a workflow.

Conversely, a less accurate model may be more valuable if integrated correctly into a workflow where expert humans can seamlessly intervene as necessary.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

How To Set the Right KPIs for GenAI

Why GenAI KPIs Should Mirror Workforce Evaluation

Where Model Benchmarks and KPIs for GenAI Intersect

Written by ODSC - Open Data Science

No responses yet