The Programmer Myth of Data Science
If you’re thinking of hiring a computer programmer as your data scientist, you may want to back up. While the two careers have a lot in common — writing programs, working with sophisticated machines, creating products that provide business value — putting “programmer” on your job description and hoping for a data scientist could be a frustrating exercise.
If you’re tempted to conflate the two positions, you could be missing out on the qualities that make a good data scientist or a good programmer. Let’s figure out which position you actually need so you can get the right person on your team.
[Related article: Data Scientists Versus Statisticians]
Building or Analyzing?
Programmers hold a lot of power in their codes. They can build your next app, integration, or service and make your Agile sprints work in continuous innovation. They’re responsible primarily for creating a product or service and maintaining that product or service as technology changes.
Data scientists, on the other hand, do build things but the primary purpose of building is making sense of information. Partly due to the achievements of programmers, we now have access to unprecedented amounts of data, more than we could ever analyze using traditional, human-driven means. Data scientists build not as the end-game but as a tool to make sense of that data.
If you list general programming skills on your job description, you may get a lot of applicants who are highly qualified programmers but not proficient as data scientists. Likewise, if you’re advertising for data science out of pressure to launch some kind of data initiative but what you want is a programmer who can maintain your product, you’ll miss the mark.
Customer Value or Business Value
This is kind of misleading because these two things are inextricably linked, but hear me out. Software engineers build value for business directly by providing customer value. An app allows customers to interact better with the product or service of the company, making them more likely to stay. A product solves a customer problem and provides a source of revenue (value) for the business. In the end, programming is often an end product in itself, beginning with the customer and ending with business value.
Data scientists have the opposite effect. By looking for answers within the data, they can give businesses a direction in which to provide value for a customer. The customer isn’t interacting with the data solution itself. Instead, that solution helps businesses create a solution for customers, indirectly providing customer value through business operation. It begins directly with the company in mind and ends in indirect customer value.
Deterministic or Probabilistic?
A programmer’s job, in the end, is to produce a product. There’s an end goal of each sprint and iteration. While there’s always room to look at the data for testing purposes or performance, the goal is typically a finish line. Even when what you’re looking for is a data engineer, the engineer’s responsibility is to build structure and pipelines to direct a data scientist’s insights.
A data scientist works with probabilities. The purpose isn’t an end goal. Instead, the information provided by the data scientist informs the end goal of other departments. They aren’t building a process for consistent identity verification, for example, but looking at the information provided to figure out where a business might direct more of their marketing efforts, as an example.
Do They Overlap? Finding the Unicorn
Positions within different organizations can vary a lot, so these differences are guidelines instead of hard and fast rules. With big data moving to the forefront of business vision, it’s possible that for some smaller businesses who don’t have the resources to hire out an entire team, programmer and data scientists could overlap responsibilities.
If you’re having to overlap the positions for now, here are some skills to look for in a programmer to make it work. It could also be worth investing in continuing education for programmers you love.
- Visualization: All programmers, to some extent, are capable of visualization. You have to imagine how a product will look end to end to build it with clean code. However, data visualization requires skill not just with appearance but also the story. Your programmer/data scientist should have a clear understanding of how visuals both enhance and manipulate data and the best practices for visualizing data insights in good faith.
- ML/DL: Unstructured data is the next wave of business insight, so your programmer should have an understanding of machine learning principles for analyzing structured data and deep learning principles for accessing unstructured data. Depending on the size and purpose of your business, you may not need both, but consider how you’ll scale when (not if) those big data questions become a priority.
[Related article: The Difference Between Data Scientists and Data Engineers]
Consider What’s Coming When Hiring
It’s only going to get harder to force computer programmers and data scientists into one, single job. With big data processing and deep learning becoming a ubiquitous part of business operations, many experts strongly suggest that you pay a data scientist to do what they do best and pay programmers for what they do best. Forcing a single person to master all the intricacies of both positions could be a very short sighted decision.
Your business wants to be on the winning side of disruption, so use a data scientist to offer critical insights for how that can happen and a programmer for building that gorgeous product or service that’s just what your customers need. Deciding once and for all to end the practice of “programmers as data scientists” could help you disrupt your industry.
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.