7 More Methods For Better Machine Learning

ODSC - Open Data Science
5 min readMay 20, 2022

--

Many companies are now utilizing data science and machine learning, but there’s still a lot of room for improvement in terms of ROI. A 2021 VentureBeat analysis suggests that 87% of AI models never make it to a production environment and an MIT Sloan Management Review article found that 70% of companies reported minimal impact from AI projects. Yet despite these difficulties, Gartner forecasts investment in artificial intelligence to reach an unprecedented $62.5 billion in 2022, an increase of 21.3% from 2021.

Nevertheless, we are still left with the question: How can we do machine learning better? To find out, we’ve taken some of the upcoming tutorials and workshops from ODSC Europe 2022 and let the experts via their topics guide us toward building better machine learning.

  1. Bayesian Statistics for Marketing

Among Bayesian early adopters, digital marketing is chief. While many industries are embracing Bayesian modeling as a tool to solve some of the most advanced data science problems, marketing is facing unique challenges for which this approach provides elegant solutions. Among these challenges are a decrease in quality data, driven by an increased demand for online privacy and the imminent “death of the cookie” which prohibits online tracking. In addition, as more companies are building internal data science teams, there is an increased demand for in-house solutions.

Abstracted from: The Bayesian Revolution in Online Marketing | Thomas Wiecki, PhD | Chief Executive Officer | PyMC Labs

2. Fixing Issues with Data Distribution

The real world is a constant source of ever-changing and non-stationary data. That ultimately means that even the best ML models will eventually go stale. Data distribution shifts, in all of their forms, are one of the major post-production concerns for any ML/data practitioner. As organizations are increasingly relying on ML to improve performance as intended outside of the lab, the need for efficient debugging and troubleshooting tools in the ML operations world also increases.

Distribution shift issues, if unaddressed, can mean significant performance degradation over time and even turn the model downright unusable. How can teams proactively assess these issues in their production environment before their models degrade significantly? To answer this question, traditional statistical methods and efficient data logging techniques must be combined into practical tools in order to enable distribution shift inspection and detection under the strict requirements a production environment can entail.

Abstracted from: Visually Inspecting Data Profiles for Data Distribution Shifts | Felipe de Pontes | Data Scientist | WhyLabs

3. Supporting Explanations Better

As automated decision-making solutions are increasingly applied to all aspects of everyday life, capabilities to generate meaningful explanations for a variety of stakeholders (i.e., decision-makers, recipients of decisions, auditors, regulators…) become crucial.

Explainability by design is a new methodology characterized by proactive measures to include explanations in the design rather than reactive measures attempting to retrofit explanations capability as an afterthought.

Abstracted from: Explainability by Design: a Methodology to Support Explanations in Decision-making Systems | Luc Moreau, PhD | Professor of Computer Science and Head of the department of Informatics | King’s College

4. Healthcare Predictive Modeling

Healthcare and biology have complex, interconnected data models. Can graph databases be used to store these data sets? Can machine learning models identify novel relationships in these graph networks to improve prediction accuracy? Is feature extraction or training performance impacted by the use of graph databases? This session will demonstrate how open-source graph databases can be used to model biomedical data. We will show how data pipelines can be used to create complex networks that can integrate biological and clinical data sets for downstream machine learning applications.

Abstracted from: Healthcare Predictive Modeling with Graph Networks | Wade Schulz, MD, PhD | Assistant Professor; Director, Center for Computational Health | Yale University

5. Responsible AI is Still Key

Explainable AI, or XAI, is a rapidly expanding field of research that aims to supply methods for understanding model predictions. We will start by providing a general introduction to the field of explainability, introduce the Alibi library and focus on how it helps you to understand trained models. We will then explore the collection of algorithms provided by Alibi and the types of insight they each provide, looking at a broad range of datasets and models, and discussing the pros and cons of each. In particular, we’ll look at methods that apply to any model. The focus will be on application to real-world datasets to show the practitioner that XAI can justify, explore and enhance their use of ML.

Abstracted from: Open Source Explainability — Understanding Model Decisions Using Alibi | Alex Athorne | Research Engineer | Seldon

6. The Growing Use of Knowledge Graphs

Advances in information extraction have enabled the automatic construction of large knowledge graphs (KGs) like DBpedia, YAGO, Wikidata of Google Knowledge Graph. Learning rules from KGs is a crucial task for KG completion, cleaning, and curation. This tutorial presents state-of-the-art rule induction methods, recent advances, research opportunities as well as open challenges along this avenue.

Abstracted from: Rule Induction and Reasoning in Knowledge Graphs | Daria Stepanova, PhD | Research Scientist | Bosch Center for AI

7. Understanding Digital Twins

Digital Twins, like the term artificial intelligence, is being used to mean very different things. We define the spectrum of different uses of the word from a digital data twin to a more sophisticated ecosystem of cognitive adaptive twins. We trace the history of digital twins and its roots in agent-based simulation and how it is merging with advances in machine learning (ML) and other areas of AI to morph into simulation intelligence. We describe several examples of digital twins in the transportation, banking, and healthcare sectors. Like ML models, simulation models are also being deployed across the enterprise and co-exist with other software and AI/ML models.

Abstracted from: Digital Twins: Not All Digital Twins are Identical | Dr. Anand Srinivasa Rao | Global Artificial Intelligence Lead, AI-Emtech Lead in Labs, Partner in Analytics Insights | PwC

Learn Better Methods For Better Machine Learning at ODSC Europe 2022

To dive deeper into these topics, join us at ODSC Europe 2022 this June 15th-16th. The conference will also feature hands-on training sessions in focus areas, such as machine learning, deep learning, MLOps and data engineering, responsible AI, machine learning safety and security, and more. What’s more, you can extend your immersive training to 4 days with a Mini-Bootcamp Pass. Check out all of our free and paid passes here.

--

--

ODSC - Open Data Science
ODSC - Open Data Science

Written by ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

No responses yet