Skip to content
Matthew Wheeler edited this page Apr 25, 2023 · 3 revisions

Data With Issues

The standard EPA workflow has problems with data where the likelihood is weakly identified for the model. Unfortunately, this issue occurs with most dose-response data. The reason: we usually have 3 dose groups plus the background dose, and we fit dose-response models with $3+$ parameters. When you have a non-parametric model, there is a reasonable probability you will not have awell defined model!

library(knitr)

doses<- c(0,0,0,0,0.156,0.156,0.156,0.3125,0.3125,0.3125,
          0.625,0.625,0.625,1.25,1.25,1.25,2.5,2.5,2.5,5,5,
          5,5,10,10,10,10,20,20,20,20)

resp <- c(3.972915,3.866893,3.772929,3.417624,3.473904,3.978128,3.917840,
          3.976679,3.721169,3.637042,4.097541,3.602818,3.544474,3.680584,
          3.335199,3.986959,2.526221,2.837538,3.643890,3.345252,3.543438,
          2.957502,2.916890,3.665053,2.334471,2.962853,3.015622,2.840253,
          2.707142,2.711778,3.084766)

means <- aggregate(x=resp,by=list(doses),FUN=mean)
sds   <- aggregate(x=resp,by=list(doses),FUN=sd)
library(ggplot2)
M2           <- matrix(0,nrow=nrow(sds),ncol=4)
colnames(M2) <- c("Dose","Resp","N","StDev")
library(ToxicR)
M2[,1] <- means$Group.1
M2[,2] <- means$x
M2[,3] <- as.numeric(table(doses))
M2[,4] <- sds$x

The Data

The following data comes from toxicogenomic PFOA exposure data from Gwinn et al. (2020), where we investigate the GLIPR1 receptor's response. This study is unique in that we have 8 non-control dose groups, which is rare. Note: in this case, the model is still ill-defined, and this behavior happens all of the time in toxicogenomic data when you have 3000+ dose-responses to fit. Let's look at the fits; where we note that all models fit the data acceptably.

model1 = single_continuous_fit(M2[,1,drop=FALSE], M2[,2:4], BMR_TYPE="sd", BMR=1, 
                              distribution = "normal",fit_type="mle",model_type = "exp-5",degree=2)
AIC_exp5 = -model1$maximum + 2*summary(model1)$GOF[1,2]

model2 = single_continuous_fit(M2[,1,drop=FALSE], M2[,2:4], BMR_TYPE="sd", BMR=1,
                              distribution = "normal",fit_type="mle",model_type =                                    "hill",degree=2)
AIC_hill = -model2$maximum + 2*summary(model2)$GOF[1,2]

model3 = single_continuous_fit(M2[,1,drop=FALSE], M2[,2:4], BMR_TYPE="sd", BMR=1, 
                              distribution = "normal",fit_type="mle",model_type = "polynomial",degree=1)
## WARNING: Polynomial models may provide unstable estimates because of possible non-monotone behavior.
AIC_lin = -model3$maximum + 2*summary(model3)$GOF[1,2]
G = matrix( c(model1$bmd,model2$bmd,model3$bmd),byrow=TRUE,ncol=3)
colnames(G) <- c("BMD","BMDL","BMDU")
b <- data.frame(Model=c("Exponential 5","Hill","Linear") ,AIC=c(AIC_exp5,AIC_hill,AIC_lin),
                BMD = G[,1],BMDL=G[,2],BMDU=G[,3])

kable(b,digits=3)
Model AIC BMD BMDL BMDU
Exponential 5 2.391 1.583 0.871 2.439
Hill 2.390 1.359 1.148 2.395
Linear 1.102 7.681 5.472 12.798

We would pick the AIC with the EPA's default approach, but this is problematic. The data have an asymptote; a linear model is not biologically reasonable because it does not pick up this asymptote. Look at the following three plots fit using maximum likelihood estimation. The two models that allow for an asymptote (Hill and Exponential-5) have a much lower BMD (1.38 (0.87) and 1.58 (1.15)) as compared to the Linear model (7.68 (5.47)).

plot(model1)

plot(model2)

plot(model3)

The issue comes from the fact that the linear model is too simple, but it describes the data well 'enough' given the variability. Further issues come from the fact that there is no information on the 'shape' parameter. For the Hill model, $$f(dose) = a + \frac{b \times dose^d}{c^d + dose^d},$$ this is parameter $d.$ One can analyze the profile likelihood of this parameter as done in Wheeler (2023).

After $d &gt; 5$ there is essentially no information, but there really isn't any information. For $d=1,$ the log-likelihood is only 0.3 different than the maximum likelihood estimate. This difference is not statistically significant!

Bayesian Methods

The problem disappears after adding a little Bayesian information. In fact, if I add a diffuse caucy prior over the shape parameter, we get a unique maximum. Now we get a best estimate of about $1.6,$ which produces an entirely different dose-response curve.

prior <- create_prior_list(normprior(0,1,-100,100),
                           normprior(0,1,-1e4,1e4),
                           lnormprior(0,1, 0, 100),
                           cauchyprior(0,1,0,18),
                           normprior(0, 10,-100,100));

p_hill_norm = create_continuous_prior(prior,"hill","normal")

cauchy_hill = single_continuous_fit(M2[,1,drop=FALSE], M2[,2:4], BMR_TYPE="sd", BMR=1,
                              distribution = "normal",fit_type="laplace",prior=p_hill_norm)

plot(cauchy_hill)

Model Averaging

A Bayesian solution doesn't necessarily obviate the model selection problem. What do you do when multiple models are 'equal?' In the above, the AIC only differed by about 1. The best solution when using parametric models is to use model averaging. With ToxicR this is relatively simply done

MA_ORIG = ma_continuous_fit(M2[,1,drop=FALSE], M2[,2:4], BMR_TYPE="sd", BMR=1,fit_type="mcmc", 
                            EFSA=FALSE)
## NOTE: Parameter 'c' added to prior list. It is not used in the analysis.
## NOTE: Parameter 'c' added to prior list. It is not used in the analysis.
## NOTE: Parameter 'c' added to prior list. It is not used in the analysis.
summary(MA_ORIG)
## Summary of single MA BMD
## 
## Individual Model BMDS
## Model                                                            		 BMD (BMDL, BMDU)	Pr(M|Data)
## ___________________________________________________________________________________________
## Hill Distribution: Normal                                     			2.10 (0.75 ,6.46) 	 0.365
## Exponential-3 Distribution: Normal                            			4.95 (1.39 ,12.99) 	 0.312
## Exponential-5 Distribution: Normal                            			4.67 (1.37 ,13.19) 	 0.152
## Power Distribution: Normal                                    			7.50 (2.16 ,16.60) 	 0.077
## Exponential-5 Distribution: Log-Normal                        			7.14 (2.30 ,18.30) 	 0.050
## Exponential-3 Distribution: Log-Normal                        			8.58 (2.58 ,19.00) 	 0.043
## Hill Distribution: Normal-NCV                                 			3.66 (0.93 ,347.11) 	 0.000
## Exponential-3 Distribution: Normal-NCV                        			11.95 (4.06 ,33.26) 	 0.000
## Exponential-5 Distribution: Normal-NCV                        			10.70 (3.31 ,47.56) 	 0.000
## Power Distribution: Normal-NCV                                			13.82 (5.63 ,32.69) 	 0.000
## ___________________________________________________________________________________________
## Model Average BMD: 3.93 (1.00, 13.29) 90.0% CI
MA_EFSA = ma_continuous_fit(M2[,1,drop=FALSE], M2[,2:4], BMR_TYPE="sd", BMR=2,EFSA=TRUE,
                            fit_type = "mcmc")
summary(MA_EFSA)
## Summary of single MA BMD
## 
## Individual Model BMDS
## Model                                                            		 BMD (BMDL, BMDU)	Pr(M|Data)
## ___________________________________________________________________________________________
## Gamma-EFSA Distribution: Normal                               			9.26 (3.60 ,37.49) 	 0.241
## LMS Distribution: Normal                                      			5.40 (1.82 ,148.66) 	 0.212
## Hill-Aerts Distribution: Normal                               			5.28 (1.84 ,1294.09) 	 0.193
## Exponential-Aerts Distribution: Normal                        			9.88 (2.41 ,278.09) 	 0.079
## Inverse Exponential-Aerts Distribution: Normal                			4.07 (1.87 ,2741.33) 	 0.074
## Lognormal-Aerts Distribution: Normal                          			4.06 (1.75 ,33.77) 	 0.072
## LMS Distribution: Log-Normal                                  			17.41 (3.48 ,322.34) 	 0.029
## Hill-Aerts Distribution: Log-Normal                           			16.61 (2.92 ,73230.75) 	 0.023
## Logistic-Aerts Distribution: Normal                           			14.03 (6.86 ,28.12) 	 0.020
## Gamma-EFSA Distribution: Log-Normal                           			14.30 (4.87 ,148.12) 	 0.012
## Exponential-Aerts Distribution: Log-Normal                    			18.01 (3.99 ,80.38) 	 0.011
## Probit-Aerts Distribution: Normal                             			14.42 (7.46 ,27.47) 	 0.011
## Inverse Exponential-Aerts Distribution: Log-Normal            			28.04 (3.47 ,32353.36) 	 0.009
## Lognormal-Aerts Distribution: Log-Normal                      			22.27 (3.32 ,521.71) 	 0.008
## Logistic-Aerts Distribution: Log-Normal                       			19.92 (8.66 ,38.81) 	 0.003
## Probit-Aerts Distribution: Log-Normal                         			21.29 (9.27 ,45.34) 	 0.002
## ___________________________________________________________________________________________
## Model Average BMD: 7.53 (2.07, 150.21) 90.0% CI

We now use the models proposed by the European Food Safety Authority. Using EFSA=FALSE you can use the origional models.

plot(MA_ORIG)
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.

plot(MA_EFSA)
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Warning: Removed 1 rows containing missing values (`geom_text()`).
## Removed 1 rows containing missing values (`geom_text()`).
## Removed 1 rows containing missing values (`geom_text()`).
## Removed 1 rows containing missing values (`geom_text()`).
## Removed 1 rows containing missing values (`geom_text()`).
## Removed 1 rows containing missing values (`geom_text()`).

Clone this wiki locally