Bridging the Gap Between Data Scientists and Business Users
Editor’s note: Amir Meimand, PhD is a speaker for ODSC East 2022. Be sure to check out his talk, “Bridging the Gap Between Data Scientists and Business Users,” there!
Today, data scientists use a rapidly evolving and diverse set of tools and platforms to build advanced analytical models, and consequently, they are seeking flexibility in picking their own programming languages and modeling tools. After model development, operationalizing these models and putting them in the hands of end-users is the key to success for any data science project. To this end, having a universal deployment platform that supports any type of model, regardless of the tools or languages used, is a critical piece of the business value chain. Ultimately, to generate value for their organization, business users should be able to apply these models to their business problems in a self-service manner, drawing insights and making decisions independently.
This session focuses on integrating data science workflows with Tableau using the Tableau Analytics Extensions API. The session demonstrates how data scientists and developers can take advantage of Analytics Extensions API to bring sophisticated analyses and machine learning models into Tableau and enable business users to interact with these models dynamically. Through dynamic interactions with advanced models, business users can easily apply these models to their data and find the answers to their business questions quickly.
Operationalizing your Machine Learning Model:
Tableau was founded to help people see and understand data. Tableau revolutionized business intelligence by letting a domain expert explore their own data without involving specialists in databases, analytics, and graphics. And it set an intellectual standard for analytics tools that was both easy to use and allowed deep exploration and analysis.
Today, data scientists can bring their ML models into the Tableau platform via the Analytics Extensions APIs and make them available to their business users. On the other side, business users can interact with these models in a very flexible and intuitive way, building their own reports and analytics in a self-service manner.
Business Case Study:
In this session, we are going to solve a real-world problem to find an optimal real estate investment strategy. Here is the business problem summary:
There are 223 restaurants available for sale in 12 counties of California. The characteristics of each restaurant, such as capacity, area, food diversity, age, and price range are available, as well as the real estate costs. In addition to the list of for-sale restaurants, information about similar restaurants is available, including their characteristics and estimation of annual profit. The business question is: given the investment budget limit, which county and restaurants are the best options for purchase/investment to maximize annual profitability?
Data Science Solution:
Obviously, having the real-estate cost alone is not enough to make the right decision about investment strategy, since the business goal is to maximize the long-term profitability of investment while not exceeding the investment budget. Hence, additional data sets can be used to inform decision-makers about the potential profitability of for-sale restaurants. Having both real estate and profitability predictions would be enough to make the right decisions.
Solving this problem requires dynamic interaction between two models: prediction and optimization. The models are built with Python, then Tableau Analytics Extensions API is used as a deployment platform to combine the models and make them accessible to the end-user. Hence, the business users will be able to interact with these models in real-time.
The analytical solution to the business problem comprises three steps, as outlined below:
- Building a predictive model:
- In this step, we use the sklearn package for data preprocessing and model training to build a predictive model that predicts the profitability of restaurants based on their characteristics.
- Building an optimization model:
- In this step, we build and optimize a model to find the optimal investment strategy at the county level. The model seeks to maximize the total annual profitability while keeping the total investment budget below the limit.
- Model deployment:
- In this step, we use the TabPy package to deploy models and make them accessible through the Tableau platform for the end-users.
After these three steps, the full solution will be available to business users. They can then interact with these models in real-time and build their own reports and analytics without necessarily having any data science or machine learning knowledge or background.
Business User Perspective:
There are many factors that might impact the decision-making process, and most of the time, these factors are dynamic. As a result, business users need to be able to interact with machine learning models in real-time to get the best answer considering the current business circumstances.
In the above example, one of the primary factors impacting the result of an investment strategy is the budget limit. Although the budget limit was assumed to be fixed, in the real world, business users want to try different scenarios to see how an increase or decrease in the investment budget would impact the investment profitability and strategy. In such a situation, having an effective interaction with the model is a vital part of finding the right answer to the right question(s).
- For example, a new report can be built to examine the impact of changing the budget limit and can be used for what-if analysis. In this example, it seems increasing the investment budget by 25% from $8M to $10M would lead to a 30% increase in profitability:
Another example of real-time interaction with a model is the situation in which some of the restaurants are taken off the market. In such a situation, business users need to be able to apply a related filter to exclude the off-the-market restaurant(s) and get an immediate result.
What do you need for this session?
- Python 3.8+
- Python Packages:
- Sklearn package: for model building (data pre-processing, modeling, etc.)
- Knapsack-pip Package: For model optimization
- TabPy Package: For model deployment
- Tableau Desktop (you can download the 14-day trial version here)
- Training Data
- Evaluation Data
Learn more about the upcoming ODSC East 2022 session, “Bridging the Gap Between Data Scientists and Business Users,” here!
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.