Feature Engineering with Forward and Backward Elimination

Forward Elimination

> library(boot)> # Fit model for cost with intercept term only.
> # This model is insufficient for predicting cost.
> nuclear_lm0 <- lm(cost~1,data=nuclear)
> summary(nuclear_lm0)
> # Start forward elimination
> # nuclear_lm0 is model we wish to update
> # scope arg defines most complex mode to use for fitting,
< # namely all predictors.
> # text="F" for partial F-test to determine accuracy
> add1(nuclear_lm0,
scope=.~.+date+t1+t2+cap+pr+ne+ct+bw+cum.n+pt,test="F")
Single term additions
Model:
Cost ~ 1
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 897172 329.72
date 1 334335 562837 316.80 17.8205 0.0002071 ***t1 1 186984 710189 324.24 7.8986 0.0086296 ** t2 1 27 897145 331.72 0.0009 0.9760597 cap 1 199673 697499 323.66 8.5881 0.0064137 ** pr 1 9037 888136 331.40 0.3052 0.5847053 ne 1 128641 768531 326.77 5.0216 0.0325885 * ct 1 43042 854130 330.15 1.5118 0.2284221 bw 1 16205 880967 331.14 0.5519 0.4633402 cum.n 1 67938 829234 329.20 2.4579 0.1274266 pt 1 305334 591839 318.41 15.4772 0.0004575 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> # date predictor offers most improvement in modeling cost,
> # so update model (could also choose pt)
> nuclear_lm1 <- update(nucleasr_lm0,formula=.~.+date)
> summary(nuclear_lm1) # Now model includes date
> # Call add1 again, this time use nuclear_lm1 model
> # this time cap is most significant
> add1(nuclear_lm1,
scope=.~.+date+t1+t2+cap+pr+ne+ct+bw+cum.n+pt,test="F")
> # cap predictor needs to be added to model
> nuclear_lm2 <- update(nuclear_lm1,formula=.~.+cap)
> summary(nuclear_lm2)
> # Call add1 again, this time use nuclear_lm2 model
> add1(nuclear_lm2,
scope=.~.+date+t1+t2+cap+pr+ne+ct+bw+cum.n+pt,test="F")
> # pt predictor needs to be added to model
> nuclear_lm3 <- update(nuclear_lm2,formula=.~.+pt)
> summary(nuclear_lm3)
> # Call add1 again, this time use nuclear_lm3 model
> add1(nuclear_lm3,
scope=.~.+date+t1+t2+cap+pr+ne+ct+bw+cum.n+pt,test="F")
> # ne predictor needs to be added to model
> nuclear_lm4 <- update(nuclear_lm3,formula=.~.+ne)
> summary(nuclear_lm4)
> # Call add1 again, this time use nuclear_lm4 model
> # No more predictors would add significance in improvement
> # of model.
> add1(nuclear_lm4,
scope=.~.+date+t1+t2+cap+pr+ne+ct+bw+cum.n+pt,test="F")
> # Final model
> summary(nuclear_lm4)
Call:
lm(formula = cost ~ date + t2 + cap + pr + ne + cum.n, data = nuclear)
Residuals:
Min 1Q Median 3Q Max
-152.851 -53.929 -8.827 53.382 155.581 Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.702e+03 1.294e+03 -7.495 7.55e-08 ***
date 1.396e+02 1.843e+01 7.574 6.27e-08 ***
t2 4.905e+00 1.827e+00 2.685 0.012685 *
cap 4.137e-01 8.425e-02 4.911 4.70e-05 ***
pr -8.851e+01 3.479e+01 -2.544 0.017499 *
ne 1.502e+02 3.400e+01 4.419 0.000168 ***
cum.n -7.919e+00 2.871e+00 -2.758 0.010703 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 80.8 on 25 degrees of freedom
Multiple R-squared: 0.8181, Adjusted R-squared: 0.7744
F-statistic: 18.74 on 6 and 25 DF, p-value: 3.796e-08

Backward Elimination

--

--

--

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Understand Regression Performance Metrics

Post-Pruning and Pre-Pruning in Decision Tree

Hike up of Machine learning in Business: Why, What & How

How a Simple Artificial Neural Network (ANN) Works

Day-28 Reinforcement Learning-3

Perceptron and Journey towards Artificial Neural Networks(ANN)

Deep Bayesian Self Learning & Monte Carlo Dropout method

Logistic Regression: Over-fitting, Under-fitting, High Variance, High Bias

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ODSC - Open Data Science

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

More from Medium

CDS Assistant Professor Brian McFee Receives Steinhardt Teaching Excellence Award

Produce, Pricing and Problems

What Is Concept Drift And Why Does It Go Undetected?

Is Google search results ranking mitigated or influenced  in any way by the user’s query?