2020 Outlook on AutoML Updates & Latest Recent Advances
The field of automated machine learning or AutoML continues to expand with new products and services being announced at a frenetic pace. As a data scientist, I’m motivated to carefully monitor this technology because it could potentially impact my profession especially if these tools open up the field of data science to non-data scientists. But for now, as it was explained to me by a leader in this space, the primary goal for these tools is to allow for actual data scientists to get more work done, faster.
I’ve written about this topic before. Early last year, there was “Automated Machine Learning: Myth Versus Reality,” where I introduced this new product category that was gaining steam. Then later last year, there was “Should You Build or Buy Your Data Science Platform?” where I provided an up-to-the-minute list of AutoML options and services.
In this article, I will update my coverage of the AutoML updates space with a new list of autoML options, some of which recently have come aboard the bandwagon, including AutoMLasS (AutoML as a Service) offerings, and no-code/low-code solutions. If you know of other AutoML solutions, please leave a note so I can add it for the next time around.
Attraction of AutoML
First, let’s summarize the attraction of AutoML in general. Not all AutoML products and services include all these features, but the following items generalize the intent. We see that AutoML works to fill the gap between “supply” and “demand” in the data science marketplace.
- AutoML establishes a data prep strategy for turning raw data into processed data ready for machine learning.
- AutoML helps out with feature engineering by selecting useful features and creating new ones.
- AutoML selects an appropriate statistical learning model (e.g. linear regression, gradient boosting, neural networks, etc.)
- AutoML tunes the hyperparameters that drive machine learning algorithms.
- AutoML utilizes ensemble methods to improve accuracy.
AutoML Updates and Platforms
Altair Knowledge Works: Self-service data analytics and machine learning platform.
Amazon SageMaker: Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly.
Auto-sklearn: Auto-sklearn provides out-of-the-box supervised machine learning. Built around the scikit-learn machine learning library, auto-sklearn automatically searches for the right learning algorithm for a new machine learning dataset and optimizes its hyperparameters.
Auto-WEKA: Auto-WEKA helps non-expert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications.
AutoKeras: An AutoML system based on Keras. It is developed by DATA Lab at Texas A&M University. The goal of AutoKeras is to make machine learning accessible for everyone.
BigML: BigML is a consumable, programmable, and scalable Machine Learning as a Service (MLaaS) platform that makes it easy to solve and automate common ML algorithms.
Big Squid: By automating many repetitive tasks that data scientists usually perform, Big Squid’s Kraken platform empowers more users to solve their own machine learning problems.
Binah.AI: Binah.ai is focused on out-of-the-box AI use cases that deliver simple solutions to complex problems at unparalleled levels of accuracy, performance, and stability.
cnvrg.io: Cnvrg.io is a full-stack data science platform that provides everything needed to build, manage and automate machine learning — company-wide.
Databricks: AutoML on Databricks automates machine learning pipelines from feature engineering, model search, hyperparameter tuning, and inference while providing data scientists with the flexibility and control they need.
Dataiku: A visual machine learning suite that guides the user through all of the machine learning steps (train-test split, features handling, metrics to optimize, different templates of pre-set algorithms).
DataRobot: DataRobot offers an advanced enterprise AI platform that democratizes data science and automates the end-to-end process for building, deploying, and maintaining artificial intelligence and machine learning at scale.
Determined AI: Provides a platform for AutoML at scale. Speeds up model development by 100x via distributed training and best-in-class hyperparameter search.
Domino Data Lab: Domino provides a unified data science platform to build, validate, deliver, and monitor ML models at scale.
dotData: dotData helps accelerate and democratize the data science process via AutoML and Feature Engineering Automation (a.k.a. AutoML 2.0), shortening AI & ML project turnaround.
Dotscience: The Dotscience software platform for collaborative, end-to-end ML lifecycle management empowers ML and data science teams using a suite of tools designed to address the needs of DevOps for machine learning (ML).
Google: Cloud AutoML is a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs. It relies on Google’s state-of-the-art transfer learning and neural architecture search technology.
Gramener: Gramex is a low-code data science platform to build enterprise-grade visual analytics apps.
H2O: Delivering automatic feature engineering, model validation, model tuning, model selection and deployment, machine learning interpretability, time-series and automatic pipeline generation for model scoring, H2O Driverless AI provides companies with an extensible customizable data science platform.
Iguazio: The Iguazio Data Science Platform automates machine learning pipelines, transforming AI projects into real-world business outcomes.
John Snow Labs: John Snow Labs is a healthcare company specializing in accelerating progress in data science. It provides an AI platform for healthcare and life science organizations.
KNIME Software: KNIME Analytics Platform is open source software for creating data science applications and services. Includes AutoML component to automatically train supervised machine learning models by performing some data preparation, parameter optimization, scoring, evaluation and selection.
Kortical: Enterprise AI as a service platform, driven by AutoML that writes code.
Logical Clocks: Hopsworks is an open-source Enterprise platform for the development and operation of Machine Learning (ML) pipelines at scale, based around a Feature Store for ML.
MissingLink: MissingLink helps data engineers streamline and automate the entire deep learning lifecycle. Provides a set of deep learning lifecycle management tools to automatically track experiments, data, machines, and models.
mljar: The open source mljar AutoML platform allows building machine learning models without coding.
Qeexo: Qeexo AutoML is a one-click, fully-automated platform allowing customers to rapidly build machine learning solutions for highly-constrained environments using sensor data.
R2.ai: R2 Learn is an AutoML product that enables enterprises of all sizes to have ML development capabilities.
RealityEngines: AI-assisted ML. AI creates a first pass of a deep-learning model given a use-case or a data set. Data scientists can then either use that model directly or fine-tune.
RapidMiner: RapidMiner is a fully transparent end-to-end data science platform that includes data prep, machine learning, and model operations.
Salesforce Einstein: Salesforce built TransmogrifAI (pronounced trans-mog-ri-phi), an end-to-end automated machine learning library for structured data that is used to help power the Einstein AI platform.
SigOpt: SigOpt is a standardized, scalable, enterprise-grade optimization platform and API designed to unlock the potential of modeling pipelines. This fully agnostic software solution accelerates, amplifies, and scales the model development process.
Splice Machine: With Splice ML Manager, data science teams are able to produce a higher number of more predictive models.
TPOT: Tree-Based Pipeline Optimization Tool (TPOT) algorithm and software for AutoML using Python and the scikit-learn machine learning library.
TROVE: TROVE offers data wrangling and curation to help companies unlock data and apply sophisticated AI models to make data-driven decisions that deliver measurable business results.
World Programming: WPS Analytics is a powerful and versatile software platform for scalable data manipulation and analytics.
Want to learn more about some of these automl platforms in-person? Check out ODSC East 2020 this April 13–17 and get a taste of what automl can do for you with hands-on training sessions!