Balancing Interpretability and Predictive Power with Cubist Models in R

#load data – assumes pacman package is loaded alreadypacman::p_load('Cubist','readr','lubridate','dplyr')sales <- read_csv('train.csv')#clean up datesales$Date <- lubridate::date(sales$Date)sales$weekOfYear <- lubridate::week(sales$Date)sales$quarter <- lubridate::quarter(sales$Date)sales$month <- lubridate::month(sales$Date)sales <-  sales %>%   mutate(weekend = ifelse(DayOfWeek %in% c(6,7,1),1,0))#determine columns to usesales <- sales[c(4,2,7,9:13)]#set response and explanatory variablesresp <- sales$Salespred <- sales[-1]
#cubist modelmodel_tree <- cubist(x = pred, y = resp)model_treesummary(model_tree)
> model_treeCall:cubist.default(x = pred, y = resp)Number of samples: 66900 Number of predictors: 7 Number of committees: 1 Number of rules: 17
> summary(model_tree)Call:cubist.default(x = pred, y = resp)Cubist [Release 2.07 GPL Edition]  Tue Nov 05 08:06:16 2019---------------------------------    Target attribute `outcome'Read 66900 cases (8 attributes) from undefined.dataModel:  Rule 1: [935 cases, mean 166.2, range 0 to 26756, est err 166.5]    ifDayOfWeek > 3DayOfWeek <= 4SchoolHoliday > 0weekOfYear > 50    thenoutcome = 0  Rule 2: [9135 cases, mean 206.9, range 0 to 37122, est err 206.9]    ifDayOfWeek > 6    thenoutcome = 224 - 32 DayOfWeek + 116 Promo  Rule 3: [1069 cases, mean 948.6, range 0 to 32169, est err 428.9]    ifDayOfWeek > 4DayOfWeek <= 6SchoolHoliday > 0weekOfYear > 50    thenoutcome = -28880 + 5776 DayOfWeek
Evaluation on training data (66900 cases, sampled):    Average  |error|           1870.7    Relative |error|               0.57    Correlation coefficient        0.77
Attribute usage: Conds Model 100% 71% DayOfWeek 87% 78% weekOfYear 67% 46% Promo 11% 38% SchoolHoliday

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ODSC - Open Data Science

ODSC - Open Data Science

94K Followers

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.