Top 10 Big Data Blunders, Part 2

5 min readJul 19, 2019

In the first part of the series, Dr. Stonebraker outlined five ways to know for sure companies are making mistakes in their big data plans and adoptions. He focused a lot on the missed opportunity businesses would have in not hiring the best talent and embracing the move to the cloud. Let’s move through the next five blunders in big data to round out whether you should stay with an organization or go.

6. Belief That Data Warehouses Will Solve All Your Problems

Data warehouses are great projects for structured data, but now that we have access to large amounts of unstructured data, relying solely on a well-built warehouse misses half your insight. You can use text, images, and video, but they won’t fit neatly in your columns and rows, so find better ways to clean, organize, and make your data available for continued insights. Use the warehouse for what it’s intended, structured data.

7. The belief that Hadoop And Spark Will Solve Your Problems

Do you see a pattern here? Believing that a single program will fix everything is a myth. Spark and Hadoop are great for some things, but they aren’t capable of being competitive in the world of big data. They can’t perform data integration, and they’re terrible with data discovery. Resolve to use more than one program so that you can get the best of the breed and not the lowest common denominator.

8. Belief That A Data Lake Will Solve Your Problems

A warehouse won’t always work, so you’ll just create a data lake, right? Wrong. Data Lakes are great for unstructured data, but they can go wrong really fast. Independently constructed data lakes are never plug-compatible, so forget loading it all in there to correlate.

As you go through your data, a lot of things are going to go off-kilter. Your schemas aren’t going to match, like using “salary” versus “wages.” If you do any business at all with other countries, your units aren’t going to match (dollars versus euros). Your semantics will be off (net versus gross salary), and even your granularity won’t line up (annual versus monthly).

All data is dirty. At least 10% is wrong and is full of duplicates you must clean. If you don’t have a clear, workable in house solution, your data lake will be a data swamp. To avoid that, you need to go through the other ten blunders and make sure your best people are on board, that most of your time is spent with data discovery and cleaning, and that you aren’t sticking with legacy programs that add burden.

If you aren’t prepared for this, use a startup with the best ideas to build your data repository system. Startups are aggressive and have the newest ideas.

9. Outsourcing Your Staff to Palantir, IBM, Mu Sigma and Others

A typical enterprise spends 95% of its timekeeping legacy code alive. It’s boring. Instead, put your best people on your shiny new thing instead of trying to limp along legacy codes. Outsourcing your new initiative while keeping that expensive talent stuck with handling email is a death sentence.

Instead, outsource the pieces that aren’t exciting or innovative. You’re paying your rocket scientists to create those innovations, so don’t stick them with the maintenance of your legacy code. Your current “secret sauce” is probably old and getting ready to be disrupted anyway.

10. Succumbing to the Innovator’s Dilemma

It’s hard to change up a business that’s been working, but to survive, you have to be willing to change up your entire business model if that’s what’s necessary. If a company can’t reinvent itself, it’s ripe for disruption.

During this process, you will probably lose customers, which hurts, but think of the trade. Keeping those customers who won’t see your innovations through could end up sinking your entire business in the long run. Better to lose a few customers now and stay in business.

For example, in the 1940s, cable steam shovels were all the rage in construction. They could lift heavy loads and get massive jobs done. The trade? They were really dangerous and unwieldy. Hydraulics were still new and only available for smaller jobs. So what happened? Hydraulics got better, bigger, and more capable, putting companies who didn’t research and build hydraulics out of business. Don’t be a cable steam shovel.

Bonus: 11. Working for a Company That’s Not Trying to Do Something

Even imperfect execution is better than sitting by complacently. If your company isn’t considering a big data move, isn’t building a data team, and isn’t trying to move forward with innovation, you’re backing a losing proposition. One of the most significant signs of failure is the lack of any forward movement at all. A company trying to stay in the same place will ultimately fail.

Be Part of the Solution to These Big Data Blunders

You can be a part of the problem or a part of the solution in big data. It’s better to offer solutions to your customers that consider the changing times than it is to rely on outmoded forms of business intelligence. If you can solve the high pole in the tent (data integration) for a company and that company is willing to launch a new initiative and totally reinvent if necessary, you have a much better chance of surviving the disruption.

Otherwise, begin looking for a new employer today before you go down with the ship because of these big data blunders.

Original post here.

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.