Explaining the Black Box: From Beta Coefficients to SHAP Values

ODSC - Open Data Science
4 min readJan 6, 2025

--

Imagine applying for a credit card and receiving a rejection. Frustrated, you ask, “Why?” This simple yet critical question highlights the need for explainability in machine learning. As algorithms increasingly shape key decisions in finance, healthcare, and beyond, understanding the rationale behind predictions becomes essential. In this blog, we’re going to explore a recent talk by Giorgio Clauser, Head of Data at Moneyfarm, where he dived into the concept of explainability in ML, focusing on the power of SHAP values to deliver transparency and trust.

Editor’s note: This is a summary of a session from ODSC West 2024 on LLM evaluation. To learn directly from the experts in real time, be sure to check out ODSC East 2025 this May!

Why Explainability Matters

Behavioral Economics and Human Intuition

Humans naturally seek causal explanations, a trait extensively studied by Daniel Kahneman in his work on cognitive biases. We are wired to create narratives, particularly when decisions impact us directly. Whether it’s a loan denial or a health diagnosis, explainability bridges the gap between complex algorithms and human understanding.

Regulatory Requirements

Regulators have also emphasized the need for explainability. The GDPR grants individuals the “right to explanation,” and the EU AI Act underscores the importance of traceability and technical transparency. In regulated environments, explainability is not a luxury but a necessity. Models must be interpretable to satisfy compliance requirements and ensure fairness.

Global vs. Local Explainability

Explainability in ML can be divided into two broad categories:

  • Global Explainability: Provides insights into how a model behaves across an entire dataset. For example, understanding how age and digital engagement influence churn risk on average.
  • Local Explainability: Focuses on individual predictions. Consider Mark, a customer whose churn risk is being evaluated. Local explainability identifies the specific factors — such as age or salary — that influenced his prediction.

This distinction helps data scientists determine whether to focus on model-level transparency or individual-level explanations.

Traditional Statistical Models: A Legacy of Simplicity

Traditional statistical models like logistic regression offer inherent explainability. Their coefficients provide a clear picture of how variables influence predictions both globally and locally. For instance, a beta coefficient in a logistic regression model reveals the impact of a feature on the outcome. While these models may lack the performance of modern ML techniques, their transparency makes them the standard in regulated industries.

However, these models were designed in an era of limited computational power and may fall short when handling large datasets or complex relationships.

The Complexity of Modern Machine Learning Models

Contemporary ML models, such as tree ensembles, bring unprecedented predictive power but sacrifice interpretability. Feature importance metrics can provide global explainability, but they fail to indicate whether a feature’s impact is positive or negative.

Local explainability becomes even more challenging due to the non-linear nature of these models. Issues such as the curse of dimensionality and context-dependent interactions further obscure their inner workings. This complexity necessitates advanced tools like SHAP.

Applying SHAP Values in Practice

Let’s revisit Mark’s churn risk example to see how SHAP values work. SHAP calculates a base value, which represents the average prediction across the dataset. Each feature’s contribution is then assessed relative to this base value.

For instance, in Mark’s case:

  • His salary reduced churn risk.
  • His age increased it.

A waterfall chart generated by SHAP visualizes these contributions, making complex interactions intuitive and actionable.

Deploying SHAP in Production

Integrating SHAP into production systems presents challenges but offers immense value. At Moneyfarm, adapting training and scoring pipelines to include SHAP values required innovative solutions, such as mapping ML features to business concepts via lookup tables.

Additionally, SHAP’s additive nature simplifies the aggregation of feature contributions into higher-level concepts. However, handling SHAP’s outputs demands advanced data manipulation skills — what one might call “Pandas ninjitsu.”

Evaluating SHAP: Pros and Cons

Pros:

  • Grounded in Cooperative Game Theory, providing a robust theoretical foundation.
  • Offers both global and local explainability.
  • Model-agnostic, working across various ML models.
  • Includes visual tools for intuitive interpretation.

Cons:

  • Computationally expensive, especially for complex models.
  • Requires careful consideration of approximation methods or data sampling to mitigate performance costs.

Future Potential: SHAP Values and Beyond

Explainability is not just about regulatory compliance — it’s about trust. SHAP’s ability to demystify predictions fosters confidence in ML systems. Looking ahead, integrating SHAP with Large Language Models could further enhance explanations, combining quantitative insights with natural language narratives.

Conclusion

From traditional beta coefficients to modern SHAP values, explainability in ML has evolved to meet the demands of both users and regulators. SHAP’s ability to bridge global and local explainability makes it indispensable in today’s AI-driven world. By incorporating tools like SHAP, organizations can build transparent, accountable, and trustworthy models.

In an era where algorithms increasingly shape human lives, explainability isn’t optional — it’s essential.

--

--

ODSC - Open Data Science
ODSC - Open Data Science

Written by ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

No responses yet