3 Common Regression Pitfalls in Business Applications

ODSC - Open Data Science
4 min readNov 13, 2019

Regression is a fantastic tool for aiding business decisions. The traditional purpose of a regression model is to find the mean value of a dependent variable given a set of independent variables. In a business, this purpose should be expanded to include the reduction of uncertainty in future events. This post will highlight three common regression areas that tend to trip up business users.

[Related Article: Logistic Regression with Python]

Regression is relatively easy to understand, easy to explain to others and simple to implement. It would be no stretch of the imagination to expect most businesses with any sort of data systems to have made attempts at implementing regression in their decisions. This is great and regression can be spectacular…when used appropriately.

It is easy to get sucked into the power that a regression model can offer someone in business. Most people in business have heard of regression. However, when there is pressure to meet a deadline it is easy to fall into a freewheeling mindset in which a sense that it is justifiable to believe whatever it is the regression model is producing. This can lead to misleading justifications of business decisions.

The 3 Pitfalls

There are three pitfalls of regression that tend to develop when regression is applied in a business. The three pitfalls of regression highlighted are:

  1. Lack of understanding of the mechanics of regression
  2. Latching onto a simple metric of model evaluation
  3. Narrowly focusing on the value of prediction without considering the prediction interval

Understanding The Mechanics of Regression

https://bit.ly/2oV3cU4

Not all people understand what regression is and how it functions. An explanation that has served me well in the past is:

At a basic level, regression is attempting to fit a line–represented by a mathematical function–to a distribution of data using some combination of variables. This line represents all possible combinations of these variables and their expected influence on a predicted value. Regression purpose is minimizing the error between the actual values of the distribution and the values predicted by the regression function. The exact prediction of the function returns should generally be expected to be wrong.

This last sentence is critical as it is easy to forget that a prediction is unlikely to exactly match what the actual value will be. The prediction from a regression serves to guide us in the right direction and not to necessarily tell us exactly what the future holds.

Beyond p-value and r-squared for Model Evaluation

The p-value and the R-squared statistic tend to dominate the discussion when evaluating a model. These two metrics are thrown around easily and can act as a beacon to latch onto in making a business decision.

A good regression model must consider more than these two metrics. In assessing a model’s goodness-of-fit consideration for correlations between the variables in the model must be taken into account, hidden patterns that may arise in the evaluation of differences between predictions and actual values must be accounted for (i.e. residual analysis) and an assessment of the coefficients on the variables reflecting business expectations must be made (e.g. a negative value appears when expected).

For many business questions, it is often useful to calculate the variance inflation factor (VIF) of the model which is a metric for evaluating the presence of multi-collinearity (i.e. presence of correlation in the explanatory variables). Also, since business data is often some form of a time series, calculating the Durbin-Watson statistic can help to assess whether there is autocorrelation between periods/observations. The presence of multi-collinearity and autocorrelation in a model can lead to wider confidence intervals and spurious coefficient outcomes. There are many other metrics that may be evaluated in assessing the strength of a regression model.

Focus on the Interval

The final pitfall to highlight is the tendency to focus on the point estimate of a regression. Doing so misrepresents what should be the intended application of regression in business. That is, a regression model should reduce the uncertainty that remains when making a decision. A prediction made using a common regression model will return a point estimate and an interval around this point estimate. The interval is what should be focused on. The narrower the interval, the less uncertainty there is in the expectations of what is to come. When presenting the predicted output of a model, the prediction interval should take center stage.

[Related Article: Tips for Linear Regression Diagnostics]

Wrapping It Up

Regression models find great use throughout many businesses due largely to the relative ease of interpreting the models and explaining the output to others. All business environments tend to be unique in the pressures that persist, there is merit it ensuring the business is aware of the details of regression models. An informed application of a common regression model is often many times better than shooting from the hip.

Original post here.

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.