Should You Build or Buy Your Data Science Platform?

  • They connect to multiple data scores and provide traditional ETL functionality.
  • They allow you to run machine learning, deep learning, NLP, and other traditional models to some degree.
  • They can display or input the model results to another system.
  • Some platforms are able to deploy to a production environment and have staging, testing, etc. built in.
  • The issue of “vendor lock-in” is a real concern. What happens to customers if the company is acquired, or goes under?
  • Despite many of the platforms having APIs, and open source connectivity, they are frameworks that are very good at what they do but are generally inflexible otherwise.
  • Some vendors are slow to act on new trends. You can be more nimble by building your own. For example, when deep learning became hot and before Tensorflow or MXNet, may of the existing platforms like RapidMiner, H2O, DataRobot had no deep learning capabilities and some still do not.
  • Cost — the platform pricing may be prohibitive.
  • Flexibility — there is a desire to avoid vendor lock-in, or the framework is too narrowly focused. Using scikit-learn, Tensorflow, etc., it’s possible to build a model that is full-featured and is more suited to the data scientist’s company workflow.
  • Domain expertise — many of these platforms are great for generic problems, however, if the need is to find solutions for a very specific problem in a specific domain then it might be better to build a custom tool.
  • Existing infrastructure and expertise — if you’re running a Java shop or an R shop and have deployed a specific data warehouse/data lake, you may want to build out a solution to leverage your existing technology stack.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ODSC - Open Data Science

ODSC - Open Data Science


Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.