This R package implements Bayesian reciprocal regularization and variable selection for regression and classification. Currently, it includes a set of computationally efficient MCMC algorithms (Gibbs and slice samplers) for solving the Bayesian reciprocal LASSO in linear models. It also includes a modified S5 algorithm to solve the (frequentist) reduced reciprocal LASSO problem in linear regression. For technical details, see our papers referenced below.
BayesRecipe
requires the following R
package: devtools
(for installation only). Please install it before installing BayesRecipe
, which can be done as follows (execute from within a fresh R session):
install.packages("devtools")
library(devtools)
Once the dependencies are installed, BayesRecipe
can be loaded using the following command:
devtools::install_github("himelmallick/BayesRecipe")
library(BayesRecipe)
In addition, to run the slice sampler, we need to install and load the following:
devtools::install_github("JingyuHe/bayeslm")
library(bayeslm)
BayesRLasso(x, y)
- x: A numeric matrix with standardized predictors in columns and samples in rows.
- y: A mean-centered continuous response variable with matching rows with
x
. - method: If Gibbs sampling is desired, 'SMU' or 'SMN' must be selected as the data augmentation scheme or mixture representation, although not recommended when the data size is as big as hundreds of covariates. By default it uses an ellipitical slice sampler ('slice') which is scalable for large-scale problems.
- lambda.estimate: Estimating lambda by empirical bayes ('EB'), MCMC ('MCMC'), or apriori ('AP'). Default is 'AP' (currently only available option when method = 'slice').
- update.sigma2: Whether sigma2 should be updated. Default is TRUE (currently only available option when method = 'slice').
- max.steps: Number of MCMC iterations. Default is 11000.
- n.burn: Number of burn-in iterations. Default is 1000.
- n.thin: Lag at which thinning should be done. Default is 1 (no thinning).
- ridge.CV: When method = 'SMU' and X is rank deficient, a ridge parameter is added to the diagonal of the crossproduct (XtX) to allow for proper calculation of the inverse. If TRUE, the ridge parameter is estimated by cross-validation using glmnet. Otherwise, it falls back to adding a small number (1e-05) without the cross-validation. Default is TRUE. This parameter is ignored when method = 'SMN' or 'slice'.
- a: If lambda.estimate = 'MCMC', shape hyperparameter for the Gamma prior on lambda. Default is 0.001. This parameter is ignored when method = 'slice'.
- b: IIf lambda.estimate = 'MCMC', rate hyperparameter for the Gamma prior on lambda. Default is 0.001. This parameter is ignored when method = 'slice'.
- posterior.summary.beta: Posterior summary measure for beta (mean, median, or mode). Default is 'mean'.
- posterior.summary.lambda: Posterior summary measure for lambda (mean, median, or mode). Default is 'median'.
- beta.ci.level: Credible interval level for beta. Default is 0.95 (95%).
- lambda.ci.level: Credible interval level for lambda. Default is 0.95 (95%).
- seed: Seed value for reproducibility. Default is 1234.
A list containing the following components is returned:
- time: Computational time in minutes.
- beta: Posterior estimates of beta.
- lowerbeta: Lower limit of the credible interval of beta.
- upperbeta: Upper limit of the credible interval of beta.
- lambda: Posterior estimate of lambda.
- lambdaci: Posterior credible interval of lambda.
- beta.post: Post-burn-in posterior samples of beta.
- sigma2.post: Post-burn-in posterior samples of sigma2.
- lambda.post: Post-burn-in posterior samples of lambda.
Let's consider the prostate
dataset from the ElemStatLearn
package.
#########################
# Load Prostate dataset #
#########################
library(ElemStatLearn)
prost<-ElemStatLearn::prostate
###########################################
# Scale data and prepare train/test split #
###########################################
prost.std <- data.frame(cbind(scale(prost[,1:8]),prost$lpsa))
names(prost.std)[9] <- 'lpsa'
data.train <- prost.std[prost$train,]
data.test <- prost.std[!prost$train,]
First, we standardize the variables and center the response variable. We also separate out training and test samples using a train/test split included in the dataset.
##################################
# Extract standardized variables #
##################################
y.train = data.train$lpsa - mean(data.train$lpsa)
y.test <- data.test$lpsa - mean(data.test$lpsa)
x.train = scale(as.matrix(data.train[,1:8], ncol=8))
x.test = scale(as.matrix(data.test[,1:8], ncol=8))
Let's apply reciprocal Bayesian LASSO and calculate predictions in the test set.
#############################
# Reciprocal Bayesian Lasso #
#############################
library(BayesRecipe)
fit_BayesRLasso<-BayesRLasso(x.train, y.train)
y.pred.BayesRLasso<-x.test%*%fit_BayesRLasso$beta
Let's compare with various frequentist methods such as OLS, LASSO, adaptive LASSO, classical bridge, elastic net, SCAD, MCP, and the frequentist reciprocal LASSO (rLASSO).
####################
# Compare with OLS #
####################
ols.lm<-lm(y.train ~ x.train)
y.pred.ols<-x.test%*%ols.lm$coefficients[-1]
##################################
# Compare with Frequentist LASSO #
##################################
set.seed(1234)
library(glmnet)
lasso.cv=cv.glmnet(x.train,y.train,alpha = 1)
lambda.cv.lasso=lasso.cv$lambda.min
lasso.sol=glmnet(x.train,y.train,alpha = 1)
lasso.coeff=predict(lasso.sol,type="coef",s=lambda.cv.lasso)
y.pred.lasso=x.test%*%lasso.coeff[-1]
###########################################
# Compare with Frequentist Adaptive LASSO #
###########################################
alasso.cv=cv.glmnet(x.train,y.train,alpha = 1,penalty.factor=1/abs(ols.lm$coefficients))
lambda.cv.alasso=alasso.cv$lambda.min
alasso.sol=glmnet(x.train,y.train,alpha = 1,penalty.factor=1/abs(ols.lm$coefficients))
alasso.coeff=predict(alasso.sol,type="coef",s=lambda.cv.alasso)
y.pred.alasso=x.test%*%alasso.coeff[-1]
###################################
# Compare with Frequentist Bridge #
###################################
library(grpreg)
group<-c(rep(1,8))
fit.bridge <- gBridge(x.train, y.train,group=group)
bridge.coeff<-grpreg::select(fit.bridge,'AIC')$beta
y.pred.bridge<-x.test%*%bridge.coeff[-1]
########################################
# Compare with Frequentist Elastic Net #
########################################
library(glmnet)
enet.cv=cv.glmnet(x.train,y.train,alpha = 0.5)
lambda.cv.enet=enet.cv$lambda.min
enet.sol=glmnet(x.train,y.train,alpha = 0.5)
enet.coeff=predict(enet.sol,type="coef", s=lambda.cv.enet)
y.pred.enet=x.test%*%enet.coeff[-1]
#################################
# Compare with Frequentist SCAD #
#################################
library(ncvreg)
scad.cv <- cv.ncvreg(x.train,y.train,penalty="SCAD", dfmax=1000,max.iter=10^4)
lambda.cv.scad<- scad.cv$lambda.min
scad.coeff<- ncvreg(X=x.train,y=y.train, penalty='SCAD',dfmax=1000,lambda=lambda.cv.scad)$beta[-1,1]
y.pred.scad = x.test%*%scad.coeff
#################################
# Compare with Frequentist MCP #
#################################
mcp.cv <- cv.ncvreg(x.train,y.train,penalty="MCP", dfmax=1000,max.iter=10^4)
lambda.cv.mcp<- mcp.cv$lambda.min
mcp.coeff<- ncvreg(X=x.train,y=y.train, penalty='MCP',dfmax=1000,lambda=lambda.cv.mcp)$beta[-1,1]
y.pred.mcp = x.test%*%mcp.coeff
###################################
# Compare with Frequentist rLASSO #
###################################
fit_rLASSO = rrLASSO.S5(x.train,y.train)
rlasso.coeff<-fit_rLASSO$beta
y.pred.rlasso<-x.test%*%rlasso.coeff[-1]
Let's now compare the performance in test data, which reveals that the reciprocal Bayesian LASSO attains the minimum test error in this dataset.
###########################################
# Prediction Accuracy (MSE on Test Data ) #
###########################################
mean((y.pred.ols-y.test)^2) # 0.5421042 OLS
mean((y.pred.lasso-y.test)^2) # 0.5150523 LASSO
mean((y.pred.alasso-y.test)^2) # 0.7114202 ALASSO
mean((y.pred.bridge-y.test)^2) # 0.5168394 BRIDGE
mean((y.pred.enet-y.test)^2) # 0.506828 ENET
mean((y.pred.scad-y.test)^2) # 0.5388646 SCAD
mean((y.pred.mcp-y.test)^2) # 0.5388779 MCP
mean((y.pred.rlasso - y.test)^2) # 0.5413048 rLASSO
mean((y.pred.BayesRLasso - y.test)^2) # 0.4868719 BayesRLasso
Let's check MCMC convergence of the reciprocal Bayesian LASSO estimator through two visualizations: trace plots and histograms.
######################################
# Visualization of Posterior Samples #
######################################
##############
# Trace Plot #
##############
library(coda)
par(mar=c(2,2,2,2))
plot(mcmc(fit_BayesRLasso$beta.post),density=FALSE,smooth=TRUE)
#############
# Histogram #
#############
library(psych)
multi.hist(fit_BayesRLasso$beta.post,density=TRUE,main="")
It is satisfactory to observe that for this benchmark dataset, the samples traverse the posterior space very fast confirming that the sampler has good mixing properties.
As above, note that this tutorial is still under development. Meanwhile, we are happy to troubleshoot any issue with the package. Please reach out directly via email or create an issue in the repository.
If you use BayesRecipe
in your work, please cite the following as appropriate:
-
Mallick H, Alhamzawi R, Paul E, Svetnik V (2021). The Reciprocal Bayesian LASSO. Statistics in Medicine, 40(22):4830-4849.
-
Alhamzawi R, Mallick H (2022). Bayesian Reciprocal LASSO Quantile Regression. Communications in Statistics - Simulation and Computation, 51(11): 6479-6494.
-
Alhamzawi R, Alhamzawi A, Mallick H (2023). The Reciprocal Elastic Net. Communications in Statistics – Case Studies and Data Analysis, 9(4):422–436.
-
Paul E, Mallick H (2024+). Unified Reciprocal LASSO Estimation Via Least Squares Approximation. Communications in Statistics - Simulation and Computation, In Press.
-
Paul E, He J, Mallick H (2025+). Accelerated Bayesian Reciprocal LASSO. Communications in Statistics - Simulation and Computation, In Press.