Extracting parameters from reported median and range #1

adamkucharski · 2022-07-21T20:38:18Z

Descriptive studies often report summary values such as median, range, or percentiles (e.g. 95%) for estimated incubation periods. We already have functionality to extract parameters for an assumed distribution and its cdf from a reported percentiles using least squares in extract_param, e.g.
extract_param(type = "percentiles", values = c(5.9,21.4), distribution = "lnorm", percentiles = c(0.025,0.975))

However, it would also be useful to be able to extract parameters for an assumed distribution g(x) from a reported median. If we defined our observed median, range and number of samples as vals = c(median, min, max, n), then one option for a function to minimise over two parameters for a lognormal distribution, a and b would be:

fit_function_lnorm_range <- function(param, val) {
  
  # Median square residual
  median_sr <- (plnorm(val[1], meanlog = param[["a"]],sdlog = param[["b"]]) - 0.5)^2 

  # Probability of obtaining min, max and range:
  min_p <- dlnorm(val[2], meanlog = param[["a"]],sdlog = param[["b"]])
  max_p <- dlnorm(val[3], meanlog = param[["a"]],sdlog = param[["b"]])
  range_p <- (plnorm(val[3], meanlog = param[["a"]],sdlog = param[["b"]]) - plnorm(val[2], meanlog = param[["a"]],sdlog = param[["b"]]))^(val[["n"]]-2)
  
  # Range log likelihood
  range_sr <- -log(min_p*max_p*range_p)
  
  # Total value to be minimised
  range_sr + median_sr 
  
}

This seems to be able to recover the correct expected median and range for a given sample size in bootstrap simulations from the estimated distribution. But there may be a more elegant way of defining the function to be minimised.

The text was updated successfully, but these errors were encountered:

sbfnk · 2022-07-22T07:17:52Z

I think it's a good idea to implement extraction of distributional parameters from a wide range summary statistics (e.g. mean, variance, CIs etc.). I'm not sure I follow what's going on here though: if f(x) is the distribution of the parameter with CDF F(x) this seems to define a loss function for given median, min, max and n as

(F(median) - 0.5)^2 - log(f(min) f(max) (F(max) - F(min))^(n-2))

Is that right?

adamkucharski · 2022-07-22T08:09:47Z

There are two parts to the loss function above. One minimises the least square residual for the median, i.e. (F(median) - 0.5)^2 and the other is based on the probability of observing a specific min and max in n observations. The idea is we can define this as:

P(observe min) P(observe max) P(observe n-2 values between min and max) = f(min) f(max) (F(max) - F(min))^(n-2)

And above aims to minimise the log likelihood, by adding the negation of the above to the loss function for the median. Which is quite a rough approach, as it's implicitly weighting the two parts – and from some simulation recoveries, looks like can make a slight difference, e.g. whether or not to use log likelihood or just likelihood.

adamkucharski · 2023-03-09T14:10:11Z

Reference for range_sr equation above is derived from equation 1 in Gumbel, 1947 – in this rough version, dropped the n multipliers as they wouldn't affect maximum likelihood. From what I recall, had some issues with recovering correct median as well if known, but then need to define some kind of weighing between the likelihood for min/max and likelihood/optimisation for median.

prabasaj · 2024-05-16T09:07:52Z

Two suggested updates to extract_param():

Offer option to extract variance-covariance matrix of the extracted distribution parameters (through the Hessian option for optim...).
For type = "range", a median is not necessary (although median adds precision) -- distribution parameters may be estimated from min and max through the log-likelihood function log(f(min) f(max) (F(max) - F(min))^(n-2)) [part of the function suggested by sbfnk].

kellymccain28 · 2024-05-16T09:28:33Z

Another suggested update to extract_param():

For type = 'percentiles', it woudl be useful to be able to include the mean (e.g. in an example from the Ebola epireview database, there is a record for infectious period that has mean and 95% credible intervals, and as it is now, there is only a way to use the percentile information, not the mean, to estimate parameters). I'm not sure of the maths required for this, though

adamkucharski · 2024-07-11T13:20:59Z

Posting example implementation in current {epiparameter} version for reference:

sample_out <- rlnorm(200,meanlog=2.5,sdlog=0.6)
hist(sample_out)

extract_param(
  type = "range",
  values = c(median(sample_out), round(min(sample_out)), round(max(sample_out))),
  distribution = "lnorm",
  samples = length(sample_out),
  control = list(max_iter = 100)
)

joshwlambert mentioned this issue Apr 27, 2023

Adds vignette on parameter extraction bias #142

Merged

jlessler mentioned this issue May 16, 2024

Getting problems when trying to do a basic print of epidist. #301

Closed

2 tasks

joshwlambert mentioned this issue Jul 9, 2024

Full package review for v0.2.0 #341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting parameters from reported median and range #1

Extracting parameters from reported median and range #1

adamkucharski commented Jul 21, 2022

sbfnk commented Jul 22, 2022

adamkucharski commented Jul 22, 2022

adamkucharski commented Mar 9, 2023

prabasaj commented May 16, 2024

kellymccain28 commented May 16, 2024

adamkucharski commented Jul 11, 2024

Extracting parameters from reported median and range #1

Extracting parameters from reported median and range #1

Comments

adamkucharski commented Jul 21, 2022

sbfnk commented Jul 22, 2022

adamkucharski commented Jul 22, 2022

adamkucharski commented Mar 9, 2023

prabasaj commented May 16, 2024

kellymccain28 commented May 16, 2024

adamkucharski commented Jul 11, 2024