We restrict the usage to linear and additive multivariate regression models.
It is rarely the case that a response variable will only depend on a single variable. To avoid over complicating the formulas we will show the case with 2 predictors but there could be many.
- With i=1,2, ...,n
- Where
$\varepsilon_{i} \thicksim \mathcal{N}(0, \sigma^{2})$ - Now the
$\hat{MSE}$ depends on$(\beta_{0}, \beta_{1}, \beta_{2})$
So we want to minimize
We could take the derivate with respect to each parameter and set them equal to zero, to obtain the estimating equations and then apply the plug-in principle to work with sample data.
Else we can use the built in function class lm()
and find the coefficients
fit <- lm(formula = y ~ x1 + x2, data = dataset)
coef(fit)
# And use the same lm class functions ...
Since the model is additive, we could find ourselves to face an arbitrary model with
- p-1 predictor variables
- Total of p
$\beta$ -parameters - A single
$\sigma^{2}$ for the variance of the errors
- With i=1,2, ...,n
- Where
$\varepsilon_{i} \thicksim \mathcal{N}(0, \sigma^{2})$ - Now the
$\hat{MSE}$ depends on$(\beta_{0}, \beta_{1}, \beta_{2}, ..., \beta_{(p-1)})$
We could still optimize the function with respect to the p
The model becomes:
# Gets rows, the number of the observed values
n <- nrows(dataset)
# Union of the vectors by column
X <- cbind( rep(1,n), dataset$predictor1, dataset$predictor2)
# Gets columns, the number of parameters
p <- ncols(X)
We now have a vector
y <- dataset$target
# First apply the transpose, then apply multiplication between matrix
beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y
# To show it in a row format
t(beta_hat)
Alternatively we can use the lm class, and forget about manually computing everything
fit <- lm(formula, data)
# To check the first rows of the matrix generated by R
head(model.matrix(fit))
The predicted values can be written
y_hat <- X %*% solve(t(X) %*% X) %*% t(X) %*% y
res_val <- y - y_hat
The estimate for
s2e <- (t(e) %*% e)/(n-p)
# Residual SE in R
se <- sqrt(s2e)
# which is equal to
sqrt(sum((y-y_hat)^2)/(n-p))
#or we can access it directly through
summary(fit)$sigma