Title: | Mean Squared Out-of-Sample Error Projection |
---|---|
Description: | Projects mean squared out-of-sample error for a linear regression based upon the methodology developed in Rohlfs (2022) <doi:10.48550/arXiv.2209.01493>. It consumes as inputs the lm object from an estimated OLS regression (based on the "training sample") and a data.frame of out-of-sample cases (the "test sample") that have non-missing values for the same predictors. The test sample may or may not include data on the outcome variable; if it does, that variable is not used. The aim of the exercise is to project what what mean squared out-of-sample error can be expected given the predictor values supplied in the test sample. Output consists of a list of three elements: the projected mean squared out-of-sample error, the projected out-of-sample R-squared, and a vector of out-of-sample "hat" or "leverage" values, as defined in the paper. |
Authors: | Chris Rohlfs [aut, cre] |
Maintainer: | Chris Rohlfs <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1 |
Built: | 2024-11-11 05:59:46 UTC |
Source: | https://github.com/cran/moose |
This function projects the mean squared out-of-sample error for a linear regression
moose(reg, dataset)
moose(reg, dataset)
reg |
an lm object containing the regression to project out-of-sample |
dataset |
a data.frame containing new cases for out-of-sample projection |
mse |
Projected mean squared out-of-sample error |
R2o |
Projected out-of-sample R-squared |
hat |
Leverage for each out-of-sample observation. For each i, this is the sum of the squared elements of xi [X'X]^-1 X', where X is the predictor matrix from the training sample. |
# set the seed for reproducibility of the example set.seed(04251978) # randomly generate 100 observations of data mydata <- data.frame(x1=rnorm(100),x2=rnorm(100),x3=rnorm(100)) # true outcome variable is y = x1 + x2 + x3 + e y <- mydata$x1 + mydata$x2 + mydata$x3 + rnorm(100) # regression with the first 25 observations from the dataset reg <- lm(y ~ x1 + x2 + x3,data=cbind(y,mydata)[1:25,]) # using the predictor values from the first 25 observations, # project the out-of-sample error we can expect in the case of # "non-stochastic" predictors whose values are the same in the # test sample as in the training sample. # note that mydata does not include the outcome variable. same.predictor.values.error <- moose(reg,mydata[1:25,]) # by comparison, the in-sample R-squared value observed # in training is: summary(reg)$r.squared # using the predictor values from the next 75 obsevervations, # project the out-of-sample error we can expect in the case # of stochastic predictors whose values potentially differ # from those used in training. new.predictor.values.error <- moose(reg,mydata[26:100,]) # by comparison, the actual mse and out-of-sample R-squared value # obtained from observations 26-100 of this random sample are: mse <- mean((y[26:100]-predict(reg,mydata[26:100,]))^2) mse m.total.sqs <- mean((y[26:100]-mean(y[26:100]))^2) r2o <- 1-mse/m.total.sqs r2o
# set the seed for reproducibility of the example set.seed(04251978) # randomly generate 100 observations of data mydata <- data.frame(x1=rnorm(100),x2=rnorm(100),x3=rnorm(100)) # true outcome variable is y = x1 + x2 + x3 + e y <- mydata$x1 + mydata$x2 + mydata$x3 + rnorm(100) # regression with the first 25 observations from the dataset reg <- lm(y ~ x1 + x2 + x3,data=cbind(y,mydata)[1:25,]) # using the predictor values from the first 25 observations, # project the out-of-sample error we can expect in the case of # "non-stochastic" predictors whose values are the same in the # test sample as in the training sample. # note that mydata does not include the outcome variable. same.predictor.values.error <- moose(reg,mydata[1:25,]) # by comparison, the in-sample R-squared value observed # in training is: summary(reg)$r.squared # using the predictor values from the next 75 obsevervations, # project the out-of-sample error we can expect in the case # of stochastic predictors whose values potentially differ # from those used in training. new.predictor.values.error <- moose(reg,mydata[26:100,]) # by comparison, the actual mse and out-of-sample R-squared value # obtained from observations 26-100 of this random sample are: mse <- mean((y[26:100]-predict(reg,mydata[26:100,]))^2) mse m.total.sqs <- mean((y[26:100]-mean(y[26:100]))^2) r2o <- 1-mse/m.total.sqs r2o