Rise of the ML Engineer
The job title “ML Engineer” is quickly outpacing “Data Scientist” in the new decade. Here are five reasons why you may want to become an ML engineer.
With the rapid growth of artificial intelligence comes a rising demand for machine learning (ML) engineers. AI-driven software that employs deep learning, machine learning, voice AI, autonomous machines, and machine vision are but a few of the drivers.
Another factor driving the rise of ML engineers is the deficit of experienced data scientists. As a result, many companies have already realized that, much like software development, it’s best to spread the work across several roles.
The ML engineer role lies between software engineering and data science. In larger teams, ML engineers free up data scientists to focus on core modeling that requires deep scientific expertise, such as statistics or other forms of mathematical modeling, leaving the engineering side to ML engineers.
What Exactly is an ML Engineer?
A quick search for “Machine Learning Engineer” on a job board will show you how skills and experience are prioritized under “Prefered Qualifications,” with qualifications such as a computer science or engineering background, coding skills, and machine learning framework experience included. Mathematical modeling skills, on the other hand, are listed, but often not prioritized.
ML Engineer vs Data Scientist
I’ve seen descriptions of the differences between ML engineers and data scientists that range from quite good to just plain wrong, notwithstanding the fact that many companies use the terms ML engineer and data scientist interchangeably. I propose a somewhat simple definition of a data scientist:
If you can code and build unique, usable, accurate models from scratch then you are a data scientist.
On the other hand, what is an ML engineer? That requires a little more context and an understanding of contributing trends.
Trend 1 — ML & DL Frameworks
Machine learning and deep learning frameworks form much of the infrastructure and do most of the heavy lifting in the data science ecosystem. In the past five years, there has been a slew of frameworks released. Programming languages such as Python, R, Julia, and even Java have many libraries and packages specific to ML and DL. However, it’s the open-source availability and ease of use of the more powerful and feature-rich machine learning and deep learning frameworks, such as TensorFlow, PyTorch, Keras, and Spark that allow the role of the ML engineer to thrive. Expertise in at least some of these popular frameworks is a key requirement for the role.
Trend 2 — Pre-Trained Models
We’ve come a long way since the Iris data set. Pre-trained models are becoming more readily available. These models were widely adopted in deep learning networks, such as YOLO and Mask R-CNN, for bounding boxes in image detection and VGG-Face and FaceNet for facial recognition.
The same trend continues with natural language processing (NLP), natural language understanding (NLU), and natural language generation (NLG). Pre-trained models are making intelligent chatbots, Q&A system, language translation, and many more NLP applications readily accessible. Some of the well-known multi-purpose pre-trained models include BERT, GPT-2, UMLFIT and especially the Hugging Face Transformers API library, which gives ready access to 32+ pretrained NLU and NLG models. In addition, libraries like spaCy provide core general-purpose pre-trained models capable of predicting named entities, part-of-speech tags and syntactic dependencies.
ML engineers leverage the fact that many of these models can be used out-of-the-box and relatively easily fine-tuned for more specific and custom data.
Trend 3 — Automated Machine Learning (AutoML)
There is also a growing trend toward automatic machine learning. Generally speaking, this encompasses automatic feature selection, data transformation, and other specialized job functions normally performed by skilled data scientists.
AutoML initially started life as a data science productivity tool, helping reduce the time required for many of these tasks. Now it’s a key part of the ML engineer skill set allowing them to automate data preparation, including imputation and feature selection, and performing a best model search with automatic hyperparameter optimization.
AutoML tools will continue to grow more sophisticated, allowing ML engineers to take on more tasks that were the purview of data scientists.
Trend 4 — MLOps and Data Engineer Trend
Long gone are the days when data scientists could build a model locally and then easily deploy them to production. Similar to the role DevOps and infrastructure engineering play in software engineering, MLOps and data engineering are becoming core components in successful machine learning and deep learning projects. ML engineers by definition are seasoned programmers that possess the skills to build the ML workflows and infrastructure necessary to move projects from inception to production.
Distributed machine learning engines like Apache Spark and workflow management platforms like Apache Airflow and Kubeflow are just a few of the many tools ML engineers employ to build data pipelines.
Given the infrastructure and tools employed, this type of work must be done on the cloud, not locally. Thus, the favored domain of an ML engineer is the cloud.
Trend 5 — Jobs Market Trends.
Demand for experienced data scientists continues to outstrip the supply by orders of magnitude. Savvy organizations understand the need to build a team around AI projects that includes data scientists, ML engineers, data engineers, specialized QA engineers and more.
Thus everyone from AI labs, to tech giants Google, Facebook, and Uber, to Fortune 500 companies like Bloomberg, CitiBank, Biogen, GE, and Ford–not to mention hot startups like Tesla and Airbnb–are snapping up ML engineers. With rising demand comes increased pay, which is attracting many to the field.
Trends aside — Becoming an ML Engineer
As we’ve argued above, an ML engineer is someone who may lack the in-depth scientific skills of a data scientist, but has other in-demand skills including programming, ML & DL frameworks, AutoML, MLOps, and data engineering. Notwithstanding the fact that many data scientists also serve in the role of ML engineers.
For the most part, the path to become an ML Engineer begins with code. As a result, Python, R, and Julia programmers have a bit of a head start. However Java, .NET, javascript and other languages are all increasingly being utilized in data science, AI libraries and APIs.
With a fundamental mastery of the code basics, the path ahead is clear. The next step to becoming an ML engineer is to gain experience with ML & DL frameworks, pre-trained models, and AutoML coupled with ML workflow platforms.
ODSC
The Open Data Science Conference (ODSC) is the perfect place to start or continue your ML engineer journey. We offer hands-on training for programmers upskilling for machine learning in our Machine Learning for Programmers track. Additionally, our MLOps and Data Engineering track will help you build sophisticated workflows. Learn more about ODSC East 2020 this April 13–17 and gain the skills you need to become an ML engineer.