20 Responsible AI and Machine Learning Safety Talks Every Data Scientist Should Hear

ODSC - Open Data Science
13 min readApr 16, 2022

--

As the adoption of AI accelerates in industry two increasingly important and related topics are responsible AI and machine learning safety (ML safety) which are featured tracks at ODSC East 2022. Here’s just a sample of 20 of over 110 free talks from leaders in the field that you can attend in-person or virtually from April 19th-21st with a free Bronze Pass.

https://odsc.com/boston/

Editor’s note: Abstracts are abbreviated for some sessions. Please check our schedule for full abstracts.

Responsible AI Talks include:

#1: Overconfidence in Machine Learning: Do Our models Know What They Don’t Know: [Keynote]

Padhraic Smyth, PhD, Chancellor’s, Professor, UC Irvine

The past few years have seen major improvements in the accuracy of machine learning models in areas such as computer vision, speech recognition, and natural language processing. These models are increasingly being deployed across a variety of commercial, medical, and scientific applications. While these models can be very accurate in their predictions they can also still make mistakes, particularly when used in environments different to those they were trained on. A natural question in this context is whether models are able to calibrate their predictions: can we trust the confidence of a model? can models “self-assess” in terms of knowing what they don’t know? In this talk I will discuss key ideas and recent research in this area including work on prediction confidence and human-AI collaboration.

#2: Just Machine Learning [ Track Keynote]

Tina Eliassi-Rad, PhD, Professor, Northeastern University

Risk assessment is a popular task when machine learning is used for automated decision making. For example, Jack’s risk of defaulting on a loan is 8, Jill’s is 2; Ed’s risk of recidivism is 9, Peter’s is 1. We know that this task definition comes with impossibility results for group fairness, where one cannot simultaneously satisfy desirable probabilistic measures of fairness. I will highlight recent findings in terms of these impossibility results. Next, I will present work on how machine learning can be used to generate aspirational data (i.e., data that are free from biases of real-world data). Such data are useful for recognizing and detecting sources of unfairness in machine learning models besides biased data. Time-permitting, I will discuss steps in measuring our algorithmically infused societies.

#3: Dealing with Bias in Machine Learning

Thomas Kopinski, PhD, Professor for Data Science, University of South Westphalia

Bias is everywhere — in data, in algorithms, in humans. Therefore dealing with it is a difficult issue and even more important now that data increases exponentially in nearly all dimensions (velocity, volume, veracity) and machine learning systems trained on these data sets are becoming omnipresent. It is however not as straightforward to be accomplished as you have to start with awareness, take care of data quality while being proficient in measuring and monitoring your algorithms. This session will demonstrate the complexity of this issue, demonstrate the need for awareness on all the levels mentioned before and help you understand how to mitigate the problem in an analytic way.

#4: ImageNet and its Discontents. The Case for Responsible Interpretation in ML

Razvan Amironesei, PhD, Applied Data Ethicist | Visiting Researcher, Google’s Center for Responsible AI

Sociotechnical systems abound in examples of the ways they constitute sources of harm for historically marginalized groups. In this context, the field of machine learning has seen a rapid proliferation of new machine learning methods, model architectures, and optimization techniques. Yet, data — which remains the backbone of machine learning research and development — has received comparatively little research attention. My research hypothesis is that focusing exclusively on the content of training datasets — the data used for algorithms to “learn” associations — only captures part of the problem. Instead, we should identify the historical and conceptual conditions which unveil the modes of dataset construction. I propose here an analysis of datasets from the perspective of three techniques of interpretation: genealogy, problematization, and hermeneutics.

#5: Trustworthy AI

Jeannette M. Wing, PhD, Avanessians Director, Data Science Institute, Professor of Computer Science, Columbia University

Recent years have seen astounding growth in the deployment of AI systems in critical domains such as autonomous vehicles, criminal justice, and healthcare, where decisions taken by AI agents directly impact human lives. Consequently, there is an increasing concern if these decisions can be trusted. Under the umbrella of trustworthy computing, employing formal methods for ensuring trust properties such as reliability and security has led to scalable success. Just as for trustworthy computing, formal methods could be an effective approach for building trust in AI-based systems. However, we would need to extend the set of properties to include fairness, robustness, and interpretability, etc.; and to develop new verification techniques to handle new kinds of artifacts, e.g., data distributions and machine-learned models. This talk poses a new research agenda, from a formal methods perspective, for us to increase trust in AI systems.

#6: Mapping for Climate Change with Deep Learning on Remotely Sensed Imagery

Jeremy Irvin, Stanford University, Stanford Machine Learning Group, PhD candidate at Stanford University advised by Professor Andrew Ng.

Climate change is one of humanity’s most pressing challenges. Many crucial climate change-related problems can be tackled by developing new technologies to map and monitor the Earth. Due to recent advancements in machine learning together with the increasing resolution and availability of remotely sensed imagery, there is an unprecedented opportunity to develop such mapping technologies. In this session, I describe many existing and emerging approaches for machine learning-based Earth mapping, from classifying the drivers of forest loss to identifying the locations of energy infrastructure and greenhouse gas emission sources. I conclude by outlining the open challenges for developing effective machine learning-based mapping solutions for climate change. This talk is targeted at data scientists in both industry and academia who are interested in learning about how machine learning can be used to help combat and adapt to climate change, and how to use their skills to get involved.

#7: Open-source Best Practices in Responsible AI

Violeta Misheva, PhD, Senior Data Scientist, ABN Amro Bank & Vice-chair The Foundation for Best Practices in ML | Daniel Vale, Legal Counsel for AI & Data Science, H&M Group

Machine learning has been hastily operationalized, often with little regard for its wider societal impact. At the same time, there’s been a lack of clear, concrete guidelines on how to reduce the risks stemming from AI. With that in mind, we have started a non-profit organization, the Foundation for Best Practices in Machine Learning. Our goal is to help data scientists, governance experts, managers, and other machine learning professionals implement ethical and responsible machine learning. We do that via our free, open-source technical and organizational Best Practices for Responsible AI. These guidelines have been developed principally by senior ML engineers, data scientists, data science managers, and legal professionals.

#8: Need of Adaptive Ethical ML Models in Post Pandemic Era

Sharmistha Chatterjee, Senior Manager Data Sciences and Juhi Pandey, Senior Data Scientist, Publicis Sapient

The talk unveils the state of art deployment of sustainable federated machine learning models by considering different aspects of ethical AI. The talk will highlight building and monitoring private federated models in a large-scale enterprise while ensuring the sustainability of future smart ecosystems. By the end of the talk, the audience will get the know-how of sustainable federated learning, deployment monitoring metrics, and important KPIs to consider during scaling of such ML models, and deploying them in a distributed architecture.

#9: Leveling Up Your Organization’s Capacity for Data-informed Decisions

Mona Khalil, Data Science Manager, Greenhouse Software

A significant part of a data science team’s value is determined by how effectively the rest of the organization leverages their work for strategy, planning and decision-making. But getting your findings to the right end-users involves far more than just effective analytic & modeling approaches to your work. This talk will discuss some of the lessons we’ve learned over the years at Greenhouse and share the most effective strategies we’ve employed to evangelize data science at the company. By the end of this talk, you will learn concrete recommendations to increase access and availability of data across your entire company.

#10: Deploying AI for Climate Adaptation: A Spotlight on Disaster Management

Thomas Y Chen, Student Researcher, Academy for Mathematics, Science, and Engineering

Deploying AI for Climate Adaptation: A Spotlight on Disaster Management Big data and artificial intelligence have enabled numerous applications for humanitarian and social good. In terms of climate change, machine learning, deep learning, and computer vision approaches have proven to be useful for adaptation and mitigation. In a broad initial overview, we’ll highlight nine major areas in which artificial intelligence is key in the fight against this crisis: electricity systems, transportation, buildings and cities, industry, farms and forests, carbon dioxide removal, climate prediction, societal impacts, solar geoengineering. From harnessing deep learning-based computer vision techniques for infrastructure damage assessment after natural disasters using satellite imagery, to utilizing natural language processing technologies to analyze climate-related legislation, we contend that AI is a necessary tool in years ahead. At the same time, sustainable and responsible use of deep learning models is key. In the second half of this talk, we will highlight the specific use case of deploying mobile app-based disaster recovery machine learning models, where computer vision models are trained on real-time satellite imagery and/or social media data.

#11: Democratizing Access to Data with Synthetic Data Generation

Lipika Ramaswamy, Senior Applied Scientist, Gretel.Ai

Privacy-guaranteed synthetic data can help build back public trust in data usage, but how can organizations actually use this new technology in their workflows? Join us, as we go through the many ways you can interact with and utilize Gretel Synthetics, an open-source synthetic data generator that features differentially private learning. Whether you’re a developer, data scientist, or just a data enthusiast, this hands-on workshop will show you how using either Gretel’s APIs, CLIs, SaaS Console or SDK can offer any user an easy experience generating synthetic data.

#12: Responsible AI for Customer Product Organizations

Aishwarya Srinivasan, Data Scientist, Google Cloud AI Services

With the accelerated use of Machine Learning and AI technologies, having a comprehensive view of the use of these technologies, the data involved, and how the technology interacts with the users have become diagnostically complicated. The talk would shed light on the pitfalls faced by industries building user-facing AI applications. We would be going over the aspects of responsible AI, how they are critical to different industries, and addressing this from a data science and organizational perspective. The talk will lay out the structured pillar approach on how customer X product (B2C) organizations can ensure building responsible AI solutions.

Machine Learning Safety Talks include:

#13: Data Science and AI in Digital Transformation: Digital Can Lead to Blindness [Keynote]

Usama Fayyad, PhD, Professor and Inaugural Exec Director, Institute for Experiential AI & CS, Northeastern University, Chairman & Founder, Open Insights

Digital Transformation has been with us for decades, but the COVID-19 pandemic accelerated it beyond anyone’s imagination. With Digital Channels the data flux increases by one or more orders of magnitude. While this may create great opportunities for Data Science and AI, it turns out that much of the work in digital transformations focuses primarily on automating workflows. Many organizations overlook the collection of data as a high priority. Not only does this result in missing out on great opportunities to inject analytics, Data Science, and AI applications but ironically overlooking the data creates new challenges for organizations as they lose the ability to understand what is working and not working in their new digital channels. We show how thinking about the right data collection and management is already an urgent imperative.

#14: Unsolved ML Safety Problems

Dan Hendrycks, Research Intern, DeepMind

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, I outline a roadmap for ML Safety and refine the technical problems that the field needs to address. I present three pillars of ML safety, namely withstanding hazards (“Robustness”), identifying hazards (“Monitoring”), and steering ML systems (“Alignment”).

#15: ​​Evaluating, Interpreting and Monitoring Machine Learning Models

Ankur Taly, PhD, Staff Research Scientist, Google

Machine learning (ML) models have caused a revolution in several fields. Unfortunately, much of this progress has come with machine learning models getting more complex and opaque. Despite widespread deployment, the practice of evaluating models remains limited to computing aggregate metrics on held-out test sets. In this talk, I will argue how this practice can fall short of surfacing failure modes of the model that may otherwise show up during real-world usage. In light of this, I will discuss the importance of understanding model predictions by asking: why did the model make this prediction? I will discuss an evaluation workflow based on feature attributions, and describe several applications of it. Finally, I discuss how attributions can be used for monitoring models in production. I will conclude with some caveats about using features attribution. This talk is based on joint work with colleagues at Google.

#16: The Origins, Purpose, and Practice of Data Observability

Kevin Hu, Co-founder and CEO, Metaplane

We’ll rigorously define data observability (DO) to understand why it is different from software observability and existing data quality monitoring. We will derive the four pillars of DO (metrics, metadata, lineage, and logs) and then describe how these pillars can be tied to common use cases encountered by teams using popular data architectures, especially on cloud data stacks. Finally, we’ll close with pointers for how to put observability into practice, drawing from our experience helping teams across sizes, from fast-growing startups to large enterprises, successfully implement DO. Successfully implementing observability throughout an organization involves not only using the right technology, whether that be a commercial solution, an in-house initiative, or an open-source project, but implementing the correct processes with the right people responsible for specific jobs.

#17: Security Operations for Machine Learning at Scale with MLSecOps

Alejandro Saucedo, Director of Machine Learning, Seldon

In this talk, we introduce the conceptual and practical topics around MLSecOps that data science practitioners will be able to adopt or advocate for. We will also provide an introduction to key security challenges that arise in production machine learning systems as well as best practices and frameworks that can be adopted to help mitigate security risks in ML models, ML pipelines, and ML services. We will cover a practical example showing how we can secure a machine learning model, and showcasing the security risks and best practices that can be adopted during the feature engineering, model training, model deployment, and model monitoring stages of the machine learning lifecycle.

#18: A Unified View of Trustworthy AI with the 360 Toolkits

Kush R. Varshney PhD, Research Staff Member and Manager, IBM

Trustworthy AI includes many different concepts, including predictive performance, fairness, robustness, explainability, uncertainty quantification, and transparency. In this talk, I will explain a common framework of trustworthiness out of which these concepts fall out and highlight how to approach them using AI Fairness 360, Adversarial Robustness 360, AI Explainability 360, Uncertainty Quantification 360, and AI FactSheets 360.

#19: Kubernetes — Observability Engineering

Ravi Kumar Buragapu, Senior Engineering Leader — Reliability and Observability Engineering, Adobe Systems Inc

This session will address modern and cutting-edge strategies to address the challenges with Kubernetes given its complexity with its control plane and the container runtime layer and the observability of its health, performance, stability, and reliability. In other words the real-time golden signals and distributed tracing metrics. This talk will discuss the latest cutting-edge strategies for implementing Observability for Kubernetes.

#20: Introducing Model Validation Toolkit

Alex Eftimiades, Senior Data Scientist, and Matt Gillett, Software Development Engineer In Test, FINRA

Surrounding a typical ML pipeline many details are commonly swept under the rug. How will we monitor production data for concept drift? How do we measure false-negative rate in production? How confident can we be of our performance assessments with a small test set and how should they be modified when faced with biased data? How can we ensure our model follows reasonable assumptions? We introduce a new general-purpose tool, the Model Validation Toolkit, for common tasks involved in model validation, interpretability, and monitoring. Our utility has submodules and accompanying tutorials on measuring concept drift, assigning and updating optimal thresholds, determining the credibility of performance metrics, compensating for data bias, and performing sensitivity analysis. In this session, we will give a tour of the framework’s core functionality and some associated use cases.

and more..

Is your ML secure? Cybersecurity and threats in the ML world [Keynote]

Dr. Hari Bhaskar, PhD, Director — Data Science & AI Platform, and Jean-Rene Gauthier, PhD, AI Platform Architect, Oracle

Just like any other piece of software, machine learning models are vulnerable to attacks from malicious agents. However, data scientists and ML engineers rarely think about the security of their models. Models are vulnerable too — they’re representations of underlying training datasets and are susceptible to attacks that can compromise the privacy and confidentiality of data. Every single step in the machine learning lifecycle is susceptible to various security threats. But there are steps you can take. Attend this presentation to:

  • Learn about the most common types of attacks targeting the integrity, availability, and confidentiality of machine learning models
  • Discover best practices for data scientists and ML engineers to mitigate security risks
  • – Ask security-related questions of ML experts

Register for ODSC East 2022 and access over 110 free responsible AI and ML safety talks

https://odsc.com/boston/

We just listed off quite a few interesting talks coming to ODSC East 2022 this April 19th-21st — and everything above can be seen for free when you register for Bronze Pass. You can still upgrade to a training pass for 30% off and get access to all of our machine learning training options.

--

--

ODSC - Open Data Science
ODSC - Open Data Science

Written by ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

No responses yet