Hilary Mason on Data Science Product Design Problems
Day three of ODSC East at Hynes Convention Center kicked off with talks from some of the data science community’s most decorated members. One such member to deliver a morning keynote on common data science product design problems was Hilary Mason, General Manager of Machine Learning at Cloudera.
It can’t be easy to give a presentation to 3000 people, let alone 3,000 people who work with data for a living. But Mason makes it look easy, and has a vision to share for the future of AI and machine learning.
Mason began her talk with an anecdote about Cuil, a search engine that was poised to compete with likes of Google and Bing. Cuil’s way of differentiating from its competition was to display long entries with pictures in its query results. She explained how the technology was a great idea in theory, but in practice, many factors like ambiguation errors ended up being the system’s downfall.
[See more past event coverage here!]
“This is not me, this is a picture of English character actress Hilary Mason,” Mason quipped, displaying a result from a search for her name on Cuil.
The point Mason is making is that product designers often have an idealized version of what that product might be, but the process of drawing up the project misses a few steps. She cited three important components to creating a product that lasts: lots of data, context, and longevity, three things that large companies may have a leg up with. She did, however, warn that large companies can have everything lined up, but can fall into common roadblocks.
“Success requires you to own, understand, and be able to analyze data — these aren’t things you can outsource,” Mason said.
Often, when parts of the product design and implementation process are outsourced, generic formulation problems arise. Shortcuts increase the data-product gap, the principle that outsourcing services leads to less control of the details in your product as well as a poorer finished product all together.
To remedy this issue, Mason suggests transforming bits of single projects into repeatable machines that can be used for a variety of applications. Mason also implores developers to consider four factors before even starting on a project: what is the problem you’re trying to solve, how do you know when you’ve solved it, what to do with the product when you’ve solved it, and concerns for deploying the product. With these factors always coming into play throughout the development lifecycle, workflow should be a non-issue, and developers can rejoice in a product they have created from start to finish.
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.