How to Use Excel in Data Science for 2020

ODSC - Open Data Science
4 min readJan 17, 2020

Wait, don’t leave! Excel has a terrible reputation in data science, and there is about 20 years’ worth of literature cautioning against the use of Excel in data science. There are better, faster, more agile programs that spit fancier representations and offer cooler capabilities. And here’s the lowly Excel spreadsheet. It’s neglected. It’s maligned. But guess what? You’re probably still going to use Excel in data science for 2020. Let’s shift our perception of Excel in data science for 2020 and take a look at how this OG data tool will continue to augment your work in the next (and coming) years.

[Related Article: Practical Ways to Integrate Data Science Into Your Organization]

Excel Is Clear

When you’re examining data to build your algorithms, there’s little else more straightforward than a classic Excel sheet. There’s a reason that the overall look of Excel has barely changed over the years. Information is easily manipulatable and easy to see.

Spreadsheets are great for visualizing raw data and its distributions. You don’t have to do a bunch of work just to see what you’ve got, and poking around in arbitrary rows and columns is fast and straightforward.

You could probably use other programs, but for those of you working in business settings where time is money and real-time analytics is king, the ease of Excel is right there staring at you. It’s not a direct statement that Excel is the best option, but it’s one of the easiest for this type of initial explorative behavior.

Excel is Non-Technical

Again, for those of you in business settings, your work is going to be more and more integral to business operations. Businesses are looking for data scientists with the soft skills required to communicate their findings to stakeholders and to work in collaborative efforts with business intelligence.

Not everyone is going to learn Python and R to be able to read your data. Maybe no one will. You could get lucky and have a supervisor or team leader who’s a former data scientist, but the ones above his or her head may not have that background.

Excel was and is still one of the best ways to export data and track changes when that data moves from department to department. Business analytics thrives on the “export to Excel” function, and it makes sharing packaged information that much easier.

Excel Mapping is Powerful

If you aren’t a mapping whiz, the simple yet powerful capabilities of Excel are serious. If you’re a mapping newbie, you can get a lot done with the simple mapping functions of Excel. Most mapping functions require a lot of packages, but if you want a quick map visualization, you can mock something up with just a little time.

Excel also helps with things like duplicate information and data aggregation. The quick, throwaway visualization capability can help you get a project off the ground without the technical load of working with more complicated mapping feature in other programs.

Excel is Small

If you’re working with a smaller amount of data, Excel can reduce the technical loads of your software capabilities. It’s not great for massive data storage or tabulation, but for smaller amounts of data where deep learning functions aren’t necessary, you can’t beat the benefits of Excel.

It’s perfect as an assistant tool for data analysts or data scientists in smaller teams doing their own preliminary analysis. It’s still a great editor for smaller-scale data visualization and 2D data. It offers integrations with other software programs like Tableau.

On the flip side, your stakeholders may send you spreadsheets to analyze for small projects. It’s a good back and forth tool for handling those data one-offs, in which sales or finance has a data question, and there’s no need for a full data program.

Keeping a Simple Spreadsheet

The ability of spreadsheets to provide value will continue into 2020. One of the biggest problems analysts encounter is the threat of “tech debt” with spreadsheets that morph into a maintainable, legacy document.

If you use spreadsheets for what they are, quick throwaway tools designed to provide you with a quick visualization, or a simple way to analyze data before you begin your more significant projects, you may find that the ease of Excel becomes clear. Coding a complicated program just to visualize data initially may not be a good use of your time, but plug those values into your spreadsheet, and suddenly, you’ve done the first step with little tech debt.

[Related Article: Top Data Science Skills for 2020]

Excel has a lot of legacy resources for quick learning, and it’s not hard to find just about any function you’d want already coded for you with some examples in a fast google search. It’s not the most glamorous, and it’s certainly not the techiest skill you can have, but embracing it for what it is will give you a useful skill for your toolkit.

Want to learn more tips and tricks for easy data science implementation? Head to ODSC East in Boston this April 13–17 and learn from data scientists and managers on how to do just that.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.