Opening The Black Box — Interpretability In Deep Learning

4 min readOct 8, 2019

In the last decade, the application of deep neural networks to long-standing problems has brought a breakthrough in performance and prediction power. However, high accuracy, deriving from the increased model complexity, often comes at the price of loss of interpretability, i.e., many of these models behave as black-boxes and fail to provide explanations on their predictions. While in certain application fields this issue may play a secondary role, in high-risk domains, e.g., health care, it is crucial to build trust in a model and being able to understand its behavior.

What is interpretability?

The definition of the verb interpret is “to explain or tell the meaning of: present in understandable terms” (Merriam- Webster 2019). Despite the apparent simplicity of this statement, the machine learning research community is struggling to agree upon a formal definition of the concept of interpretability/explainability. In the last years, in the room left by this lack of formalism, many methodologies have been proposed based on different “interpretations” (pun intended) of the above definition. While the proliferation of this multitude of disparate algorithms has posed challenges on rigorously comparing them, it is nevertheless interesting and useful to apply these techniques to analyze the behavior of deep learning models.

What is this tutorial about?

This tutorial focuses on illustrating some of the recent advancements in the field of interpretable deep learning. We will show common techniques that can be used to explain predictions on pre-trained models and that can be used to shed light on their inner mechanisms. The tutorial is aimed to strike the right balance between theoretical input and practical exercises. The session has been designed to provide the participants not only with the theory behind deep learning interpretability, but also to offer a set of frameworks and tools that they can easily reuse in their own projects.

Depiction: a framework for explainability

The group of Cognitive Health Care and Life Sciences at IBM Research Zürich has open-sourced a python toolbox, depiction, with the aim of providing a framework to ease the application of explainability methods on custom models, especially for less experienced users. The module provides wrappers for multiple algorithms and is continuously updated including the latest algorithms from AIX360. The core concept behind depiction is to allow users to seamlessly run state-of-art interpretability methods with minimal requirements in terms of programming skills. Below an example of how depiction can be used to analyze a pre-trained model.

A simple example

Let’s assume to have a fancy model for classification of tabular data pre-trained in Keras and available at a public url. Explaining its predictions with depiction is easy as implementing a lightweight wrapper of depiction.model.core.Model where its predict method is overloaded.

Once FancyModel is implemented, using any of the depiction.interpreters available in the library, is as easy as typing:

The explanations generated depend on the specific interpreter used. For example, in the case of explanations generated using LIME (Ribeiro et al.), when using a Jupyter notebook, one can simply run:

and directly obtain the model-specific explanation:

Want to know more?

If you found this blog post interesting and you want to know more about interpretability and depiction, come and join us at the tutorial “Opening The Black Box — Interpretability In Deep Learning” at ODSC Europe 2019 this November 20th in London.

Original post here.

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.