Top 10 Big Data Blunders Part 1

ODSC - Open Data Science
4 min readJul 10, 2019

--

Your company is making the move to big data initiatives, but with so many organizations launching half-baked data initiatives, how do you know your organization is going to succeed? Dr. Michael Stonebraker, a co-founder of Tamr and co-director of the Intel Science and Technology Center for Big Data, believes that companies are making huge mistakes in their adoption of big data practices with a few common big data blunders.

Unfortunately for job seeking data scientists, knowing which company has a viable big data plan. Successful big data ventures have a few things in common, so if you’re going to bet on a winner, you need to make sure your organization isn’t committing these top ten big data blunders.

[Related article: 15 Common Mistakes Made By Newbie Data Scientists]

1. Not Moving to the Cloud

Your CEO may scoff, but if your organization isn’t planning to become Cloud exclusive, you could be backing losing technology. The Cloud is more elastic than your in-house solution and more cost-effective in the long run.

There are lots of reasons your organization may not want to make the switch, and that’s ok. Here are a few excuses you’re likely to hear in this conversation.

  • Problem: Security — Truth: Your in-house security is full of holes. The Cloud offers dedicated, 24/7 security, something your team doesn’t have the time or training to execute.
  • Problem: Cost. Truth: Unless you’re currently cheating the system, moving to the Cloud is going to save you a lot of money in execution costs.
  • Problem: Geographic restrictions. Truth: The Cloud will open up your organization to hosting and reach far beyond current capability.
  • Problem: Legal Restrictions. Truth: Regulation can’t always keep up with the pace of innovation, but just wait.
  • Problem: CEO objects (wait for blunder 11)

And these objections are ok. What’s not ok is refusing to look at the reality of what the Cloud actually offers.

2. Not Planning for AI/ML to Be Disruptive

AI will disrupt your operations, no doubt. It will displace some of your workers and has the potential to upend how you handle your operations. You can be a disruptor in this case, or you can keep your head in the sand and get disrupted.

So what do you do? An organization that wants to be on the disrupting end of AI rather than disrupted must be willing to pay for their talent. ML experts aren’t going to come cheap, and chances are high that HR will balk, but spending money now on experts nets you a much greater return later, i.e., what’s the price of staying in business at all. Remember Blockbuster? We barely do. The bottom line is that aggressive companies are willing to do what it takes.

https://bit.ly/2RAU3dn

3. Not Embracing Rocket Scientists

You need new talent, and you need to pay for it. Even more than that, you must embrace the guiding light principle. Organizations that seek out this caliber of employee and are willing to fully embrace them with all their weird obsessions and bizarre knowledge bases will end up with a better return. Again, they’ll be expensive, and they won’t fit into your typical HR job description, but this new type of employee is here to stay.

4. Not Solving Your Real Data Science Problem

It isn’t glamorous, but genuinely successful data scientists spend 90% of their time on data discovery, data integration, and data cleaning. Without clean data, your big data initiatives mean nothing. Your machine learning is worthless. Don’t miss this step.

Get a system in place and stick to it. Your rocket scientists, your talent that you’ve spent money and fought with HR to hire can help lead the way, but your organization needs to solve your real data problem — the quality of data.

5. Belief that Traditional Data Integration Techniques Will Solve Issues

Traditional data integration isn’t going to cut it in the world of big data. Two most common ones, ETL and MDM, are too old to work properly and won’t scale.

Extract, Transform, and Load

  • too human intensive to work with such massive data sets.
  • difficult to scale upfront
  • doesn’t work with more than 20 data sources, a pitifully small number in the face of big data.

Master Data Management

  • doesn’t scale at all
  • example: GE’s 500 labels only applied to 10% of 20 million spend transactions.

Instead, what you need is a true ML solution that can work through big data. Generating training sets can build a better big data solution that will scale and evolve with your organization.

[Related article: 5 Mistakes to Avoid as a Beginner Data Scientist]

Data Mistakes Will Cost You

Backing organizations that make these mistakes willingly could land you in a lot of trouble when your business is disrupted. Make sure you keep reading for the other five mistakes your organization could be making in the transition to big data.

Original post here.

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.

--

--

ODSC - Open Data Science
ODSC - Open Data Science

Written by ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

Responses (2)