10 Steps to Become a More Responsible Data Scientist

10 min readMar 17, 2022

Responsible AI is a big tent idea that’s generated a lot of attention over the last few years. For those of us on the front lines, data scientists, machine learning (ML) engineers, researchers are others, the key question is: how do we build responsible AI in the real world? To help answer this question, we looked at 10 upcoming topics on how to become a more responsible data scientist at ODSC East this April 19th-21st.

#1 Generate Bias-Free Real-World Data to Reduce Risk

Risk assessment is often utilized when machine learning is used for automated decision-making. For example, Jack’s risk of defaulting on a loan is 8, Jill’s is 2, Ed’s risk of recidivism is 9, Peter’s is 1. We know that this task definition comes with impossibility results for group fairness, where one cannot simultaneously satisfy desirable probabilistic measures of fairness. This session will highlight recent findings in terms of these impossibility results. Recent research on how machine learning can be used to generate aspirational data (i.e., data that are free from biases of real-world data) will provide insight. Such data are useful for recognizing and detecting sources of unfairness in machine learning models besides biased data.

Abstracted from upcoming ODSC East 2022 Session: Just Machine Learning. Presented by Tina Eliassi-Rad, PhD, professor, Northeastern University

#2 Apply Open Source Responsible AI Best Practices

Machine Learning is often hastily operationalized, frequently with little regard for its wider societal impact. At the same time, there’s been a lack of clear, concrete guidelines on how to reduce the risks stemming from AI. A new organization, Foundation for Best Practices in Machine Learning, was recently founded with the goal of helping data scientists, governance experts, managers, and other machine learning professionals implement ethical and responsible machine learning. We do that via open-source technical and organizational Best Practices for Responsible AI. These guidelines have been developed principally by senior ML engineers, data scientists, data science managers, and legal professionals. Responsible ML starts with prudent MLOps and product management. The technical and organizational best practices for any responsible data scientist look at both the technical and institutional requirements needed to promote responsible ML. Both blueprints touch on subjects such as “Fairness & Non-Discrimination”, “Representativeness & Specification”, “Product Traceability”, and “Explainability” amongst other topics.

Abstracted from upcoming ODSC East 2022 Session: Open-source Best Practices in Responsible AI. Presented by Violeta Misheva, PhD, Senior Data Scientist ABN Amro Bank and Vice-chair, The Foundation for Best Practices in ML and Daniel Vale, Legal Counsel for AI & Data Science, H&M Group

#3 Understand AI’s Impact on Climate

Big data and artificial intelligence have enabled numerous applications for humanitarian and social good, as any responsible data scientist should know. In terms of climate change, machine learning, deep learning, and computer vision approaches have proven to be useful for adaptation and mitigation. Some of the major areas in which artificial intelligence is key in the fight against this crisis are: electricity systems, transportation, buildings and cities, industry, farms and forests, carbon dioxide removal, climate prediction, societal impacts, and solar geoengineering. From harnessing deep learning-based computer vision techniques for infrastructure damage assessment after natural disasters using satellite imagery to utilizing natural language processing (NLP) technologies to analyze climate-related legislation, we contend that AI is a necessary tool in years ahead. At the same time, sustainable and responsible use of deep learning (DL) models is key. Particularly, the notably large energy consumption of AI systems themselves have come under scrutiny; especially with the recent popularity of deep learning since approximately 2012, high-level computations have raised the overall energy consumption by 300,000 times or more. Balancing this concern, in addition to other considerations like model interpretability, accessibility, and fairness, are crucial challenges to tackle ahead.

Abstracted from upcoming ODSC East 2022 Session: Deploying AI for Climate Adaptation: A Spotlight on Disaster Management. Presented by, Thomas Y Chen Researcher, Academy for Mathematics, Science, and Engineering

#4 Understand Data Responsibility

Sociotechnical systems abound in examples of the ways advancing technology constitutes sources of harm for historically marginalized groups. Take the field of machine learning which has seen a rapid proliferation of new methods, model architectures, and optimization techniques. Yet, data — which remains the backbone of machine learning research and development — has received comparatively little research attention. Focusing exclusively on the content of training datasets — the data used for algorithms to “learn” associations — only captures part of the problem. Instead, we should identify the historical and conceptual conditions which unveil the modes of dataset construction and analyze datasets from the perspective of three techniques of interpretation: genealogy, problematization, and hermeneutics. This includes questions on the role of data provenance, the conceptualization and operationalization of the categories which structure these datasets (e.g. the labels which are applied to images), methods for annotation, the consent regimes of the data authors and data subjects, and stakeholders and other related institutional logics. Second, the technique of problematization builds on the genealogical question by asking: what are the central discourses, questions, concepts, and values which constitute themselves as the solution to problems in the construction of a given dataset?

Abstracted from upcoming ODSC East 2022 Session: ImageNet and its Discontents. The Case for Responsible Interpretation in ML. Presented by Razvan Amironesei, PhD, Applied Data Ethicist , Visiting Researcher, Google

#5 Build Trustworthy AI

Recent years have seen astounding growth in the deployment of AI systems in critical domains such as autonomous vehicles, criminal justice, and healthcare, where decisions taken by AI agents directly impact human lives. Consequently, there is an increasing concern of whether these decisions can be trusted. How can we deliver on the promise of the benefits of AI but address scenarios that have life-critical consequences for people and society? In short, how can we achieve trustworthy AI?

Under the umbrella of trustworthy computing, employing formal methods for ensuring trust properties such as reliability and security has led to scalable success. Just as for trustworthy computing, formal methods could be an effective approach for building trust in AI-based systems. However, we would need to extend the set of properties to include fairness, robustness, interpretability, etc.; and to develop new verification techniques to handle new kinds of artifacts, e.g., data distributions and machine-learned models. This talk poses a new research agenda, from a formal methods perspective, for us to increase trust in AI systems.

Abstracted from upcoming ODSC East 2022 Session: Trustworthy AI. Presented by Jeannette M. Wing, PhD Avanessians Director, Data Science Institute | Professor of Computer Science Columbia University

#6 Understand Drift Detection within Federated Learning System

Federated Learning (FL) is gaining more prominence and has become increasingly popular recently, due to the impact of the pandemic, where dependence on devices has increased tremendously. One approach to sustainable federated machine learning models is to consider different aspects of ethical AI while building and monitoring private federated models in a large-scale enterprise. FL-based systems have contributed much to human health, predictive maintenance tasks for the auto industry, production process monitoring, and discovering new trends, patterns, and anomalies. Automated deployment and monitoring become useful in designing robust AI/ML models due to uncertainties like COVID. Here we introduce the concept of ‘Concept Drift’ in ML models and highlight how autoML and drift detection strategies play a vital role in an FL environment, particularly with data aggregated from varied devices with different system configurations. It also addresses issues centered around drift on local devices and techniques aimed to minimize the effect on the performance of models. Different model KPI metrics and deployment best practices can be used to test the robustness and ethical aspects of an ML model.

Abstracted from upcoming ODSC East 2022 Session: Need of Adaptive Ethical ML Models in Post Pandemic Era. Presented by Sharmistha Chatterjee, Senior Manager Data Sciences and Juhi Pandey, Senior Data Scientist, Publicis Sapient

#7 Leverage AI Observability To Protect Your Customers

When machine learning models are deployed to production, their performance starts degrading resulting in companies having to react to the impacts of performance degradation reported by their customers. Now that ML models are increasingly becoming mission-critical for enterprises and startups alike, root cause analysis and gaining observability into your AI systems is similarly mission-critical. However, many organizations struggle to prevent model performance degradation and make assumptions about the quality of the data being fed into their ML models, largely because they don’t have the tools and organizational knowledge to prevent degradation. Many of the problems associated with ML models deployed in production can be addressed with data monitoring and AI observability best practices. Data scientists and machine learning engineers can take additional steps to proactively ensure the performance of their models, rather than reacting to the impacts of performance degradation reported by their customers.

Abstracted from upcoming ODSC East 2022 Session: AI Observability: How To Fix Issues With Your ML Model. Presented by Danny D. Leybzon, MLOps Architect, WhyLabs

#8 Learn To Mitigate Bias in Machine Learning

It’s well established that bias is everywhere–in data, in algorithms, in humans. As data increases exponentially in nearly all dimensions (velocity, volume, veracity) and machine learning systems trained on these data sets are becoming omnipresent, dealing with bias becomes more complex and important. Practitioners may underestimate the complexity that bias introduces into machine learning workflows. Thus, the first steps are awareness, understanding data quality, and proficiency in measuring and monitoring machine learning algorithms. Next is an understanding of how to mitigate these problems.

Abstracted from upcoming ODSC East 2022 Session: Dealing with Bias in Machine Learning. PresentedThomas Kopinski, PhD, Professor for Data Science, University of South Westphalia

#9 For More Responsible AI, Expand Explainable AI

As our AI models become deeper and more esoteric, they also become more inscrutable. Within the broader topic of Responsible AI, there is the need for Explainable AI. In a broader sense, explainability is how algorithms, data & predictions influence decisions, including counterfactuals. Responsible AI is more than the Explainability of AI models — it also includes Privacy & Security, Governance & Accountability, Robustness et al. Focusing on Explainability, one interesting challenge is that there is no consistent nomenclature, but a general understanding…which is neither precise nor uniform. For example, interpretability is the how/property while explainability is the why/outcome. While interpretability is somewhat fixed, explainability differs a lot depending on the audience viz. data scientists and model builders, model validators, business users, customers, and the regulators. Yet, they (interpretability & explainability) are used interchangeably! Talking about algorithms, there are different types each offering a distinct window into the black box — depending on the context, we might need to apply the Global Explanations or Local Explanations. Another interesting aspect is that Explainability belongs to a class of problems called the “wicked problems.

Abstracted from upcoming ODSC East 2022 Session: Explainable AI: Balancing the Triad — Business needs, Technology maturity & the Governance regulations. Presented by Krishna Sankar, Distinguished Engineer, Artificial Intelligence, U.S. Bank

#10 Engineer Ethics Standards Into Your AI System

Successful implementation of AI is difficult. Reportedly, 85% of AI projects fail to bring their intended results to the business. The recent Zillow debacle highlighted the enormous financial risks of implementing AI systems poorly, not only in terms of revenue impact, but also in reputational damage, loss of morale, and loss of jobs. So, how do we move forward with AI responsibly and effectively? That process starts with the system’s design. Two frameworks — IEEE 7000–2021 for Systems Design Ethical Concerns, and Section A of the Technical Best Practices from the Foundation for Best Practices in Machine Learning address responsible design. Melding the more technically focused IEEE framework with the more business-oriented section from the Technical Best Practices framework provides practical and actionable guidance on designing effective and ethical AI products. Key areas of the AI design process that standards cover include team composition and roles, problem statement and solution mapping, integrating context, organizational capacity, and defining the product and outcomes.

Abstracted from upcoming ODSC East 2022 Session: Can we let AI be great? Practical considerations in designing effective and ethical AI products. Presented by Masheika Allgood, Founder, AllAI Consulting, LLC

Bonus method: Take Responsibility for Securing You Machine Learning Models

Just like any other piece of software, machine learning models are vulnerable to attacks from malicious agents. However, data scientists and Machine Learning engineers rarely think about the security of their models. Models are vulnerable too — they’re representations of underlying training datasets, and are susceptible to attacks that can compromise the privacy and confidentiality of data. Every single step in the machine learning lifecycle is susceptible to various security threats. But there are steps you can take to become a responsible data scientist. Some of the most common types of attacks targeting the integrity, availability, and confidentiality of machine learning models such as training or poisoning attacks (fake data), testing or inference attacks, adversarial reprogramming, and evasion attacks. Best practices for data scientists and ML engineers to mitigate security risks include Adversarial training, gradient masking, and input regularization.

Abstracted from upcoming ODSC East 2022 Session: Is your ML secure? Cybersecurity and threats in the ML world.

Become a Responsible Data Scientist at ODSC East 2022

Responsible AI is now a cornerstone of ODSC events, including the upcoming ODSC East 2022 this April 19th-21st. The Responsible AI focus area will highlight everything from responsible AI toolkits to other open-source frameworks, tools, and case studies that can help you make sure your AI algorithms and projects are ethical, trustworthy, safe, and unbiased, making you a more responsible data scientist.

Currently, scheduled sessions include:

Deploying AI for Climate Adaptation: A Spotlight on Disaster Management
Open-source Best Practices in Responsible AI
Intro to Trustworthy AI
Data Science and Contextual Approaches to Palliative Care Need Prediction
Deep Learning Enables a New View in the Agriculture Industry
You Too Can Be a Cybersecurity Data Scientist!
…and more added each week!

To stay current with Responsible AI toolkits and more, subscribe to our newsletter for more case studies, news, and tutorials. You can also register for ODSC East 2022 now to save on all ticket types so you can learn more about responsible, ethical, and trustworthy AI.

Original post here.

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.

10 Steps to Become a More Responsible Data Scientist

Become a Responsible Data Scientist at ODSC East 2022

Written by ODSC - Open Data Science