Top Data Wrangling Skills Required for Data Scientists

  • Discovering — includes some of the EDA steps in the data science process, i.e. getting to know your data in terms of patterns and correlations. You’ll often work with a domain expert here.
  • Structuring — since data comes in all shapes and sizes, you’ll need to be able to merge, order, and reshape the data to be suitable for machine learning.
  • Cleaning — enterprise data is often dirty and inconsistent. Missing data values will affect the accuracy of your models. Date values can cause particular frustrations due to the many ways of representing dates in a database.
  • Enriching — how can you derive data from what you already have? For instance, if you have a business address in your data set, for machine learning and data visualization purposes, it would be helpful to supplement the address with longitude and latitude values.
  • Validating — validating the data is really the next step after cleaning, by taking a deeper look at the data values to make sure they make sense statistically and to the correct business context.
  • Publishing — after completing data wrangling, you’ll need to integrate all the individual steps in a “data pipeline” so when the data set needs to be refreshed, you can simply re-run the pipeline and execute all the data wrangling tasks at once. You should fully document the data wrangling steps so you won’t forget the decisions you made along the way.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ODSC - Open Data Science

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.