Improving Data Quality for Superior Results
When you’re a data scientist, you see a problem, and you build a model to solve it. If it’s not as accurate as you were hoping, you tweak the model. But what if it’s the quality of your data causing skewed or flawed results? Kaitlin Andryauskas of Wayfair, wants you to take a minute to examine the quality of your data before moving further with your model. Her talk “Improving Data Quality for Superior Results” for AI x outlines the why and the how of this essential task.
How to tackle the right problem
One common thing Andryauskas sees with data scientists is a constant tweaking of the model type when the real problem is garbage data. Building a more interesting, more elaborate model is always a draw for data scientists, but cleaning data must be a priority.
Wayfair experienced this problem in two distinct ways — supply chain issues and warehouse capacity. For each of these problems, the real solution involved ensuring quality data before it ever hit one of their models.
Without improving the management and quality of data, Wayfair had no way to scale operations. Once the company refocused efforts on the quality of data, common challenges cleared up. Let’s take a look at their two most significant challenges.
The Warehouse Capacity Issue
Business problem: No accurate source for what constitutes a full warehouse.
In the early days of Wayfair, the business operated on a drop-ship model (i.e., manufacturers shipped goods directly to the purchaser without Wayfair ever touching merchandise). Wayfair quickly realized that more control over products could mean a better customer experience.
The company created warehouses in which they stored inventory rented from their supply chain, giving them more control over the supply chain. Unfortunately, Wayfair couldn’t accurately predict or gauge how full these warehouses were.
Early on during one year’s Black Friday preparation, Wayfair continued to send inventory to one particular warehouse, overloading their storage space and causing the warehouse manager to call. The algorithm expert claimed that the warehouse still had plenty of space, but the workers on the floor saw a different (more accurate) reality.
From this issue, Wayfair needed to get the model under control, or they’d continue to lose inventory to poor warehouse management. They created their new system using the following key components:
- A champion: Someone genuinely invested in the accuracy of the project
- The cause: Wrong or missing dimensions, misunderstood racking systems, lack of usability coefficients.
- Transparent solutions: Easily verified solutions from subject matter experts.
- Crowdsourcing: Some solutions required expertise outside of the company
- Continued automated reporting: Continued checking and rechecking.
The Supply Chain Data Issues Dashboard
Business problem: Handling international supply chain management so that vendors can get back to selling.
As Wayfair improved and expanded supply chain, becoming more vertically integrated, incorrect, and incomplete data became a huge issue. Each report required massive problem solving and data cleaning at the end of the month to make up for data inconsistencies.
Wayfair created a dashboard to increase transparency with data, reducing the scramble at the end of the month and — more importantly — creating data workers could trust throughout the month.
Wayfair created leaders for different aspects of the supply chain. For example, one person may be in charge of checking and entering information coming out of one particular port. Creating the dashboard required a lot of coordination and effort upfront. Again, Wayfair simplified the problem to its key components:
- A champion: Here, the leader was someone heavily invested in this pilot.
- Identified garbage data causes: Reducing manual entry errors, duplicate data, inconsistencies between sources.
- Self Service allowances: questions allowed for inaccurate data through transparency
- Maintaining relationships with stakeholders: i.e., product management teams.
Wayfair has expanded its business intelligence and data science team to allow them to scale based on data. The most significant part of Wayfair’s transformation from a drop-ship model to heavy involvement throughout the supply chain.
The company is an excellent example of how companies can use quality data to reduce waste and improve response times, even with challenging pipelines like the international supply chain. No matter what type of model you build, your data quality is a direct influence on business operations.
Companies working on razor-thin margins must be able to check, control, and verify the information before it ever goes into the model. Wayfair’s approach to data integrity was a crucial part of business expansion and continued growth.
“What happens is that data scientists and analysts see a problem and build a model. Let’s say it’s 70% accurate, so we tweak the model trying to make that model better… This [talk] is taking another approach where we take a look at our inputs…” Andryauskas says. Garbage in, garbage out.