Building A Table of Multiple Linear Regression Models with Stargazer

Author

Fendi Tsim

Published

March 28, 2023

Introduction

In this post, I present a method to combine results of multiple linear regression models in a latex regression table with stargazer (R package).

The aim is to simplify regressions generation processes with a user-defined set of Independent Variables (IVs) for each regression model, and show all results in a regression table with stargazer. This is very helpful when adding (or removing) new IVs in current regression model gradually.

Load packages

require(stargazer)  # Load stargazer 
Loading required package: stargazer

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 

Function

I created a function called Regress for combining results of all regression models into a single table:

Regress <- function(data, DV, eq.IVs, OutcomeLabel){
  
  equations <- paste0(DV, " ~ ", eq.IVs) # Create equations 
  RegResults <- list()  # Create an empty list that stores all regression results
  LogLike <- c("Log Likelihood")  # Storing log-likelihood of all regression results
  
  for(i in 1:length(equations)){
    reg <- lm(formula = equations[i], data = data)
    reg$AIC <- AIC(reg) # Include Akaike Inf. Crit.
    reg$BIC <- BIC(reg) # Include Bayesian Inf. Crit.
    RegResults[[paste0(OutcomeLabel, "_", ifelse(i<10, paste0(0, i), i))]] <- reg
    LogLike <- append(x = LogLike, values = round(logLik(reg),2))
  }
  
  # Export RegResults as doc 
  return(stargazer(RegResults, 
            type = 'html', 
            out = paste0("Regression_Results_", OutcomeLabel, ".doc"), add.lines = list(LogLike))
         )
}

This function Regress contains multiple inputs:

  • data for the set of data used in multiple linear regression models

  • DV refers to dependent variable of multiple linear regression models

  • eq.IVs refers to set of IVs in each multiple linear regression model

  • OutcomeLabel refers to the document name of the outcome (user-defined)

Note that this function also includes model results of Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Log likelihood (LogLike) for models comparison.

This function returns with the regression table results in word document format.

Example

Here npk (R Datasets) is used for illustration. Three regressions are generated using lm(), with yield as dependent variable, as well as block and N as independent variables.

First we import the dataset:

npk <- datasets::npk # Using R Datasets (Classical N, P, K Factorial Experiment)

We then define the dependent variable, which is yield in this case:

DV = 'yield'  # Set dependent variable

Next, we define the set of IVs in each regression model. The first equation has block as IV. The second one has N. The last one has both as IVs:

# Create a list of equations with independent variables
# Here 1st equation is block, 2nd with N, 3rd with block and N
eq.IVs <- c("block",
            'N',
            'block + N'
            )

Then we generate the regression table with the function Regress:

RegressionTable <- Regress(data = npk, DV = DV, eq.IVs = eq.IVs, OutcomeLabel = "NPK")

The result looks like this (here I use htmltools package for showing the result, but it is saved as a .doc file in the current working directory once the function is executed):

htmltools::knit_print.html(RegressionTable)
Dependent variable:
equations
(1) (2) (3)
block2 3.425 3.425
(3.848) (3.180)
block3 6.750* 6.750**
(3.848) (3.180)
block4 -3.900 -3.900
(3.848) (3.180)
block5 -3.500 -3.500
(3.848) (3.180)
block6 2.325 2.325
(3.848) (3.180)
N1 5.617** 5.617***
(2.281) (1.836)
Constant 54.025*** 52.067*** 51.217***
(2.721) (1.613) (2.429)
Log Likelihood -71.26 -74.31 -66
Observations 24 24 24
R2 0.392 0.216 0.608
Adjusted R2 0.223 0.180 0.469
Akaike Inf. Crit. 156.523 154.615 147.996
Bayesian Inf. Crit. 164.770 158.149 157.421
Residual Std. Error 5.442 (df = 18) 5.588 (df = 22) 4.497 (df = 17)
F Statistic 2.318* (df = 5; 18) 6.061** (df = 1; 22) 4.389*** (df = 6; 17)
Note: *p<0.1; **p<0.05; ***p<0.01

The complete code is shown as follows:

npk <- datasets::npk # Using R Datasets (Classical N, P, K Factorial Experiment)
DV = 'yield'  # Set dependent variable
# Create a list of equations with independent variables
# Here 1st equation is block, 2nd with N, 3rd with block and N
eq.IVs <- c("block",
            'N',
            'block + N'
            )
RegressionTable <- Regress(data = npk, DV = DV, eq.IVs = eq.IVs, OutcomeLabel = "NPK")

Reference

Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.3. https://CRAN.R-project.org/package=stargazer