5 Data Science Trends in the Next 5 Years
This field is large enough that it’s a bit impossible to deeply cover all the things that can happen in the coming 5 years for it. Important data science trends that I foresee but won’t cover here are specific applications of data science in unique domains, integrating of low-code/no-code tools in the tech stack, and other narrowly focused insights.
This is going to be a focus on the general, broad themes of change among data science trends I see coming to stay in the next half-decade. This isn’t an exhaustive list, but it does cover a lot of the issues that are currently faced in practice today:
- Better Naming Conventions
- Sustainable Applications Outside of Technology Industries
- Data-centric Modeling
- Decision Science Expertise
- Data Science Creator Economy
1. Better Naming Conventions
The title of the Data Scientist has been a big issue for many in the industry mainly because of the ambiguity around what the role entails and what the company needs. Although I believe the job descriptions have largely become clearer and concise, the job profiles are just starting to become normalized.
This shift is important because it represents the maturity that the craft is achieving. Ultimately, I see these job families being sorted out as data science trends:
- Data Analyst / Product Data Scientist / Analytics Engineer
I think Data Analyst is a phenomenal entry-level role for the industry, but because of that it often gets written off as “easy” or “basic” when, in reality, it commands its depth of expertise. I see Senior Analyst involving a lot more experimentation and A/B testing knowledge that can partner incredibly effectively with Product Owners and Scrum teams.
I hypothesize that this function will be one of the hottest tech jobs in the coming years. As roles like Analytics Engineer start claiming the respect they deserve, it will be a function where people are empowered to be creative, design-oriented, fast learners and executers, and applicable to any domain. This will naturally follow as the role of the Data Scientist gets split up over time (see MLE below).
- Research Scientist
This one has been probably the first to be fleshed out and understood. Typically held for PhDs and responsible for pushing the boundaries of AI in our society, primarily dealing with Deep Learning and Reinforcement Learning.
- Machine Learning Engineer
Since this is where I operate, I have lots of thoughts about it. I believe the traditional Data Scientist today and Machine Learning Engineer will coalesce to just be the Machine Learning Engineer. It more precisely represents what the job function is as well; building end-to-end machine learning systems at scale. Today, most MLEs seem to be responsible for the model post-production and Data Scientists handle the realm before. This can be problematic over time with the handoff of responsibilities for long-term model deployments so it’s much more effective to have a team have oversight end-to-end.
As new tools are coming out and Python/Jupyter integration to Excel is getting rolled out, the PoC work that gets done in Jupyter is likely to be work that Analysts (Senior level probably if it involves PoC for modeling) primarily do. The reason is that this part of the craft is getting easier to do at a fast pace. This last point is important because I’m not saying this part of the craft is easy or that an Analyst’s work is trivial; it’s a deep domain of difficulties that should be respected. I am claiming that the Analyst’s excellence is speed and if they’re able to run analytics ranging from metrics, EDA, statistical testing, and more at a faster pace across the tools they use then this naturally starts to fit in their wheelhouse as tools evolve (see Cassie Kozyrkov’s great piece on the excellence of the data analyst here: https://towardsdatascience.com/what-makes-a-data-analyst-excellent-17ee4651c6db).
This may ask Data Scientists of today to pick a track based on their skillset & interest — are they more of an Analytics DS or a Machine Learning DS? Notice how I didn’t say a Software Engineering DS. Coding is needed for all the roles I’m discussing here. Although some require a higher proficiency in coding ability, everyone should be expected to and able to write clean and reusable code. To me, it’s like telling someone to do science without expecting them to follow the scientific method or the standard guidelines set by the scientific community.
- Data Engineer
This is also another role that’s here to stay and is well understood. Data Engineers curate and source datasets from the company’s existing data sources (Lakes, Warehouses, etc.) and can have primary oversight on how the data streams into the modeling and deployment pipeline.
2. Sustainable Applications Outside of Technology Industries
So far, we’ve seen widespread adoption of AI/ML/DS primarily in consumer tech, advertising, and marketing domains. There are a handful of companies working with Deep Learning making substantial progress for Computer Vision and NLP, but the reality is that not everyone is doing Deep Learning applications. Most scenarios are structured data problems for which Deep Learning isn’t the most effective solution.
Although sales, marketing, and advertising are massive industries, to me, the most exciting applications for Machine Learning are yet to come among data science trends. We will likely see widespread adoption of ML in healthcare, law, manufacturing, agriculture, and so much more. Industries that are traditionally heavily regulated or ones that are not primarily software industries will see a dramatic shift just so they can use Machine Learning at scale. Sustainable is also an important part of these applications. It won’t be enough to just visualize data that previously wasn’t even thought to be collected. We’ll likely see ML solutions working in parallel with domain experts in real-time or in production environments for industries that have taken some time to evolve technologically.
This is a win for many reasons, the most notable being that these industries will see higher efficiencies and innovative solutions that previously weren’t possible. Additionally, non-tech people will have an easier ramp to become tech people. Instead of needing to be a Machine Learning practitioner, they can be a doctor who hires an MLE on their team to collaborate with. Every function that can be improved with data will have a “+” added to the end which will mean they have ML skills on top of what they do. For eg. Lawyer vs Lawyer+: refers to those who know and have deeply studied Law and now use Machine Learning to amplify their abilities.
3. Data-centric Modeling
Andrew Ng has illustrated this best: a model is the sum of code and data. So far, we’ve iterated relentlessly on the model and the hyperparameters while holding the data constant. Although that has produced massive gains and progress in academia, in the industry we will see a huge shift to holding the model constant and iterating on the data and maybe also the hyperparameters. This is the essence of data-centric modeling.
As mentioned earlier, most problems are structured data problems meaning that they don’t deal with images, free text, or audio. They deal with data tables in some systems like a database or the cloud. Along with this, we have largely found the best-performing models as well. There are bound to be variations over time, but the ones that will be put in production environments are the ones that have been tested, validated, and reputed widely in the community. This inevitably means that for the most part, you won’t be spending much of your time modeling in the industry (this already is the case).
Most of your time is focused on getting your data right, creating a stronger dataset, feature engineering intelligently to capture necessary business effects, and that doesn’t even consider all the items to put that model in production. Although modeling and math skills will always be of value and are going to be needed on a team, you won’t need a full team of people with these skills. You may have one or two people with deep modeling expertise, but you primarily will (and do) need data experts. This is largely the case today as well, but it will grow to heights that are hard to predict currently. The data skills of tomorrow are bound to become quite specialized and difficult as society starts to record, collect, and store data from ways we don’t today such as integrating real-time sensors into fabric, city infrastructure, and our bodies.
4. Decision Science Expertise
It’s incredible how many focus on the depth of mathematical complexity instead of spending their time learning the ins and outs of the business and how people make decisions. Data Science is the practice of making data useful, and it will soon be required for practitioners to focus on the actual decisions needing to be taken, changed, or stopped and speak in those terms.
I believe that the gap between those who understand the full modeling pipeline and those who deeply understand the business will remain the same or grow larger over time. There are too many tools, techniques, and skills changing for non-tech people to keep up with. This will require those with tech skills to obtain strong sales skills to be a bridge.
Overvaluing the math will not help you in influencing a key decision to be made. Understanding motivations, contributing factors, varied personalities, and how to influence those in power will be required of the highest skilled Data Scientists. It’s commonly understood in the community of practitioners today, but I predict that we’ll see this being included in all the bootcamps and programs as key education for success.
5. Data Science Creator Economy
As an artist, I think it’s incredible how many Data Scientists have the tenacity to create art and freelance. Whether it’s writing, unique passion projects, or consulting with their brand name, I’ve heard of a lot of Data Scientists/MLEs taking up this route. I think the freelance route for this function today can come with a lot of variances — some will be good, most may be difficult, and some will be a fun way to spend extra hours in the day.
Over time I think this will be a serious job path for kids to aspire into. We’re currently witnessing a purge of college value which is worthy of a story all on its own, where kids are truly wondering what the point of all that debt is if they can learn the exact skills they want and need in a few years of dedicated focus (for entry roles) online. Becoming a freelance Data Scientist will likely be a realistic option (and fun one) for many, and I predict will have the ability to command the high salaries we see in the industry today.
This will require a few things to be figured out first though, the most important being that of data privacy and regulatory issues. Once companies have a standard for how to engage with freelance workers instead of needing a whole team staffed all the time, I think this can be an effective route for beginners and practitioners with high brand value to foray into.
Each of these sections of data science trends could be significantly expanded upon to provide further reasoning and clarity, but at a high level, I think these trends are ones you can count on for the coming 5 years. If there are data science trends you’d like to discuss, hear more about, or disagree with let me know!
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.