Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying tolerance at to dist_spec upon definition #724

Merged
merged 33 commits into from
Aug 1, 2024
Merged

Conversation

sbfnk
Copy link
Contributor

@sbfnk sbfnk commented Jul 17, 2024

Description

This PR closes #533.

Doing so and paving the way for #525 required a few changes to the infrastructure of how distributions are handled. This PR:

  • introduces a multi_dist_spec class representing distributions that are a convolution of multiple distributions
  • adds accessor functions get_parameters() and get_distribution() for extracting these from single or combined distributions, and ndist() for getting the number of distributions that make up a combined distribution
  • makes max and tolerance attributes, with functionality to specify the tolerance either at the level of the convolved or constituent distribution(s)
  • renames apply_tolerance() to bound_dist() as it now also applies the max
  • updates the plotting function to work with continuous or discrete distributions
  • updates the design document to reflect these changes

Initial submission checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have tested my changes locally (using devtools::test() and devtools::check()).
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required and rebuilt docs if yes (using devtools::document()).
  • I have followed the established coding standards (and checked using lintr::lint_package()).
  • I have added a news item linked to this PR.

After the initial Pull Request

  • I have reviewed Checks for this PR and addressed any issues as far as I am able.

@sbfnk sbfnk changed the title Tolerance Allow specifying tolerance at to dist_spec upon definition Jul 17, 2024
@sbfnk sbfnk requested a review from jamesmbaazam July 25, 2024 09:11
R/dist_spec.R Outdated Show resolved Hide resolved
@jamesmbaazam
Copy link
Contributor

jamesmbaazam commented Jul 26, 2024

I expect dist_spec(), which calls bound_dist() internally, to always set the max and tolerance attributes but it only does if specified. Is this expected behaviour?

> attributes(LogNormal(1,2))
$names
[1] "parameters"   "distribution"

$class
[1] "dist_spec" "list"     

> attributes(LogNormal(1,2, max= 1, tolerance = 0.1))
$names
[1] "parameters"   "distribution"

$class
[1] "dist_spec" "list"     

$max
[1] 1

$tolerance
[1] 0.1

@jamesmbaazam
Copy link
Contributor

I expect dist_spec(), which calls bound_dist() internally, to always set the max and tolerance attributes but it only does if specified. Is this expected behaviour?

> attributes(LogNormal(1,2))
$names
[1] "parameters"   "distribution"

$class
[1] "dist_spec" "list"     

> attributes(LogNormal(1,2, max= 1, tolerance = 0.1))
$names
[1] "parameters"   "distribution"

$class
[1] "dist_spec" "list"     

$max
[1] 1

$tolerance
[1] 0.1

Upon further digging, I see it rather only sets them when they're finite. I think for consistency, it would probably be better to always set them and rather not use them downstream if max = Inf or tolerance = 0.

R/dist_spec.R Outdated Show resolved Hide resolved
@sbfnk
Copy link
Contributor Author

sbfnk commented Jul 31, 2024

Upon further digging, I see it rather only sets them when they're finite. I think for consistency, it would probably be better to always set them and rather not use them downstream if max = Inf or tolerance = 0.

I'm in two minds about this. I see your point but setting them is a way to signal that the user has not made a choice here which then allows e.g. the delay option functions to make an informed default choice when distributions are used for a specific purpose.

With this in mind, does the current set up make more sense, or do you think we should try to work around a set up where these attributes are always set?

@sbfnk sbfnk requested a review from jamesmbaazam July 31, 2024 12:36
@jamesmbaazam
Copy link
Contributor

jamesmbaazam commented Jul 31, 2024

Upon further digging, I see it rather only sets them when they're finite. I think for consistency, it would probably be better to always set them and rather not use them downstream if max = Inf or tolerance = 0.

I'm in two minds about this. I see your point but setting them is a way to signal that the user has not made a choice here which then allows e.g. the delay option functions to make an informed default choice when distributions are used for a specific purpose.

Mmmh I think we may be thinking of the word "setting" differently here. I thought the attributes are rather set when the user makes a choice? i.e., In the case where a user sets a finite max, they have made a choice?

With this in mind, does the current set up make more sense, or do you think we should try to work around a set up where these attributes are always set?

I will trust your judgement on this.

jamesmbaazam
jamesmbaazam previously approved these changes Jul 31, 2024
@sbfnk
Copy link
Contributor Author

sbfnk commented Jul 31, 2024

Mmmh I think we may be thinking of the word "setting" differently here. I thought the attributes are rather set when the user makes a choice? i.e., In the case where a user sets a finite max, they have made a choice?

I agree. My point was that someone who specifies Gamma(2, 2) has not set a tolerance (and thus may be happy for the package to make an informed choice where we need distributions to be constrained) whereas someone who does Gamma(2, 2, tolerance = 0) has.

@jamesmbaazam
Copy link
Contributor

Mmmh I think we may be thinking of the word "setting" differently here. I thought the attributes are rather set when the user makes a choice? i.e., In the case where a user sets a finite max, they have made a choice?

I agree. My point was that someone who specifies Gamma(2, 2) has not set a tolerance (and thus may be happy for the package to make an informed choice where we need distributions to be constrained) whereas someone who does Gamma(2, 2, tolerance = 0) has.

Oh, I see. That makes sense.

@sbfnk
Copy link
Contributor Author

sbfnk commented Jul 31, 2024

I've updated this to be clearer, i.e. now you get

> gt_opts(Gamma(shape = 2, rate = 2))                                                          
Unconstrained distributon passed as a delay. Constraining with default tolerance 0.001
- gamma distribution (tolerance: 0.001):
  shape:
    2
  rate:
    2

@sbfnk sbfnk requested a review from jamesmbaazam July 31, 2024 19:07
Copy link
Contributor

@jamesmbaazam jamesmbaazam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@sbfnk sbfnk added this pull request to the merge queue Aug 1, 2024
Merged via the queue into main with commit ee4043d Aug 1, 2024
9 checks passed
@sbfnk sbfnk deleted the tolerance branch August 1, 2024 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

specify tolerance instead of max for distributions
2 participants