Interpretable Knowledge Discovery Reinforced by Visual Methods
Editor’s Note: See Boris Kovalerchuk’s talk “Interpretable Knowledge Discovery Reinforced by Visual Methods” at ODSC West 2019.
Visual reasoning and discovery have a long history. Chinese and Indians had visual proof of the Pythagorean Theorem in 600 B.C. before it was known to the Greeks. Scientists such as Bohr, Boltzmann, Einstein, Faraday, Feynman, Heisenberg, Helmholtz, Herschel, Kekule, Maxwell, Poincare, Tesla, Watson, and Watt have declared the fundamental role that images played in their most creative thinking.
[Related Article: Data Visualization and the Data Science Workflow]
The fundamental challenge for visual creative thinking and discovering in multidimensional data (n-D data) used in machine learning (ML) is that we cannot see multidimensional data with a naked eye.
We need visual analytics tools (“n-D glasses”) for this. The challenge starts at 4-D.
Often we use non-reversible, lossy Dimension Reduction methods such as Principal Component Analysis, Multidimensional Scaling, SOM, t-SNE and others that convert, say, every 10-D point to 2-D point in visualization. While such reduction of 10 numbers to 2 numbers is very beneficial, in general, it is lossy with non-interpretable features produced. As a result, it can remove important interpretable multidimensional information before starting the discovery of complex n-D patterns. In some cases, it can be like throwing the baby out with the bathwater.
Alternative lossless reversible methods based on a new concept of General Line Coordinates (GLC) exists, but have their own challenges. They suffer more from occlusion [Kovalerchuk, 2018] than lossy methods. GLCs break a 400-year-old tradition of using orthogonal Cartesian coordinates, which fit well to modeling the 3-D physical world, but are very limited, for lossless visual representation, of the diverse and abstract high-dimensional data, which we deal with in machine learning.
[Related Article: Data Visualization for Academics]
The hybrid methods, which use the advantages of reversible and non-reversible methods, for interpretable knowledge discovery in machine learning are most beneficial. The tutorial shows how a fundamental difference between analytical and visual ML, in data generalization and explanation power, leads to the construction of better explainable ML models. It covers five complementary approaches with different roles of analytical, visual, black box, and glass box ML models: (1) Visualization of Analytical ML Models, (2) Visual Discovery of Analytical ML Models, (3) Visual Explanation of Analytical ML Models, (4) Discovering Visual ML Models aided by Analytical ML, and (5) Discovering Analytical ML Models aided by Visual ML. All of them will be illustrated with multiple ML case studies on real-world data. Learn more at this upcoming ODSC West 2019 talk “Interpretable Knowledge Discovery Reinforced by Visual Methods.”
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.