How Will AI Affect the Role of Data Professionals?
Writing about the potential impact of AI and LLMs in 2023 is asking for trouble. Predictions range from the apocalyptic (open letters from noted experts warning of an existential threat to humanity) and the skeptical (interesting possibilities, but currently way too many flaws).
Even if we assume that the reality will fall somewhere between the two extremes, certainly in the medium term, the impact of generative AI on jobs across many sectors will be profound. Most of the discussions have focused on the implications for writers, designers, software engineers, researchers, lawyers, and administrative workers. Less has been said about data professionals themselves.
Humans and machines
Data scientists and analysts need to be aware of how this technology will affect their role, their processes, and their relationships with other stakeholders. Everyone is aware of the warning that’s often repeated on social media: AI won’t take your job, but somebody using AI will.
There are clearly aspects of data wrangling that AI is going to be good at. Identifying features and relationships, helping with cleaning and structuring data, and sifting gigantic data sets — these are all areas where machines are particularly adept. And these abilities are all already being built into data-heavy products and processes.
But the use of AI could go further. Ultimately, data professionals need to provide value to their business or organization. This involves building relationships, explaining concepts, and clear communication with others — particularly non-data specialists.
As models like ChatGPT can at the very least act human, it’s the tasks that appear more suited to flesh-and-blood professionals that could be handed over to AI.
Conversational interfaces
Abstraction is useful because it hides unnecessary levels of detail from users. Coders don’t want to work in machine code, so programming languages provide a readable translation layer between human and machine.
Chat interfaces can be viewed as another step up the ladder of abstraction. For example, business users will be able to query databases using natural language. ChatGPT is already being used to output SQL queries in the correct syntax.
This brings up the idea of ‘prompt engineering’ as a specific skill, or even role — but this slightly misses the point. With conversational interfaces, the model doesn’t need to rely on a single prompt from an expert user. It can ask a series of questions to get the information it needs from the user. Think of it like a doctor asking questions of a patient before coming back with a diagnosis.
The model could also remember key information about a specific user (the type of data that is most useful to their job, seasonal or data-specific business cycles that provide context, the user’s preference in terms of reading charts or graphs).
Operating in this ‘personal assistant’ role, it’s not a huge leap to imagine a Chief Marketing Officer arriving at her desk on a Monday morning and simply asking: ‘What do I need to know today?’. The AI can then try to predict what data is important to this specific user, retrieve it, and visualize it.
As well as personalizing for the user, the models could also be trained on niche or proprietary data sets to make them even more specialized. Bloomberg GPT is a bespoke, 50-million parameter LLM trained using a deep niche of financial information.
Automated data storytelling
As part of exploratory the data analysis process, it’s already possible to generate a bunch of charts and graphs with a few lines of code. But when it comes to communicating data insights to other stakeholders, things get more challenging.
Audiences for presentations, reports, or dashboards have varying levels of data literacy. They’re also interested in different parts of the data. Distilling down complex data sets for a specific audience is still a very human skill. In the words of Seth Godin: “In a presentation to non-scientists (or to bored scientists) the purpose of a chart or graph is to make one point, vividly.”
ODSC’s own analysis of 25,000 data analyst job descriptions found communication and data visualization in the top ten skills for the roles. The ability to articulate data insights to a range of audiences is in demand.
Alteryx’s Magic Documents is an example of using generative AI to automate at least part of this storytelling process. As well as visualizing data as part of a presentation, its Auto Insights feature can pull out descriptive narratives to add context to the numbers.
Explaining the models
Both conversational interfaces and automated data storytelling rely on simplification. Providing business users with specific insights inherently requires leaving some things out — filtering the signal from the noise.
But as anyone working directly with the data knows, this simplification process can lose vital caveats and nuance. To understand how far we can trust predictions and insights generated by AI, we need to understand the quality of the underlying data. So far, ChatGPT and similar models haven’t been great at saying ‘I don’t know’. Instead, they invent (or hallucinate) answers. Human data experts are going to have an even more essential role in explaining the limitations of both the models and the data.
The same goes for bias in training data. While there’s a general awareness that data bias is a potential problem, overenthusiastic executives can often forget this when they see a competitive edge that could benefit the organization. Data professionals need to be advocates for scrutinizing the source material. Somewhere between the scrappy, experimental stage of an AI project and deployment into production, those data sets need to be audited.
The humble data dictionary will also become more important if you have non-expert users querying data sets using their own terminology. Even a seemingly simple data point like ‘current company headcount’ can be different depending on who’s asking, and how it’s calculated. Finance departments tend to use a ‘full-time equivalent’ number, while HR count noses (literally, the number of humans in the company).
And there will be increasing pressure for those in data roles to explain not just what outputs a model is generating, but how and why it generated them. This is not simply a question of being ethical (although that needs to be baked into AI projects from the start). The US, EU, UK, and many other countries are actively looking at the legal and regulatory framework that surrounds the deployment of AI models.
The future data professional
These are exciting times to be working in data-focused roles. But while we look at the impact of AI on other professions, we should also be considering what it could mean for our own. As the people with the deepest understanding of this new technology, we need to be both evangelists for the possibilities it creates, and forensic critics of the risks that it raises.
About the author
Alan Rutter is the founder of consultancy Fire Plus Algebra, and is a specialist in communicating complex subjects through data visualization, writing, and design. He has worked as a journalist, product owner, and trainer for brands and organizations including Guardian Masterclasses, WIRED, Riskified, the Home Office, the Biotechnology and Biological Sciences Research Council, and the Liverpool School of Tropical Medicine.
Originally posted on OpenDataScience.com
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.