Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to estimate modes in marginalizations #128

Merged
merged 11 commits into from
Jun 9, 2020

Conversation

VasylHafych
Copy link
Contributor

The function returns local mode given posterior samples. The optimal number of bins can be determined using this implementation from Plots.jl.

@codecov-commenter
Copy link

codecov-commenter commented May 28, 2020

Codecov Report

Merging #128 into master will decrease coverage by 0.26%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
- Coverage   30.21%   29.95%   -0.27%     
==========================================
  Files          69       69              
  Lines        3667     3699      +32     
==========================================
  Hits         1108     1108              
- Misses       2559     2591      +32     
Impacted Files Coverage Δ
src/optimization/bat_findmode.jl 42.55% <0.00%> (-21.97%) ⬇️
src/parameters/density_sample.jl 33.76% <0.00%> (-3.92%) ⬇️
src/plotting/recipes_samples_overview.jl 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5796616...f09e193. Read the comment docs.

@VasylHafych
Copy link
Contributor Author

Hi @oschulz! I have extended one of the plots from the tutorial with a new plotting recipe. It is very easy to evaluate it and it makes the plot "Data, True Model and Best Fit" better.

@VasylHafych VasylHafych marked this pull request as ready for review May 28, 2020 21:46
@oschulz
Copy link
Member

oschulz commented May 29, 2020

We should change the plotting code to use the new bat_findmode, so that we don't duplicate that code.

@Cornelius-G
Copy link
Collaborator

@oschulz But it seems the new function just calls the old find_localmodes from plotting. So no duplication

@VasylHafych
Copy link
Contributor Author

Indeed, there is no code duplication. I call old find_localmodes function from plotting recipes.

@oschulz
Copy link
Member

oschulz commented May 29, 2020

Oh, yes - then can should move that code into the new exported function and remove the old one, right?

@oschulz
Copy link
Member

oschulz commented May 29, 2020

Within the scope of this PR, let's replace the BATHistogram functions by a new, exported and generic bat_marginalize() function that follows the scheme of the other bat_... functions (return a (result=..., )).

We can rename the type BATHistogram to Marginalization - that type should not be exported, for now. Next to the histogram that BATHistogram currently contains, it should also contain a reference to the parameter(s), so that the plotting recipe for it doesn't need the separate idx argument anymore.

@oschulz
Copy link
Member

oschulz commented May 29, 2020

Which reminds me: We shouldn't use the term "local mode" anywhere in this context any longer (also in the plots). A local mode is a "bump" in a multi-modal distribution, lower than the global mode. But here we're talking about global modes in the marginalizations of the posterior.

@Cornelius-G
Copy link
Collaborator

Which reminds me: We shouldn't use the term "local mode" anywhere in this context any longer (also in the plots). A local mode is a "bump" in a multi-modal distribution, lower than the global mode. But here we're talking about global modes in the marginalizations of the posterior.

Ok, so maybe call it "marginalized mode(s)" ?

@oschulz
Copy link
Member

oschulz commented May 29, 2020

Yes, or "marginal modes" or so - I sent an email.

src/plotting/recipes_samples_overview.jl Outdated Show resolved Hide resolved

@series begin
ribbon --> (y_ribbons[:,5],y_ribbons[:,6])
fillcolor --> colors[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the colors are inverted compared to our usual convention. So the innermost ribbon should be green, the outermost red.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! By the way, I was always quite confused by this choice of colors. For me, it would be much more natural to use intense and bright red in a center (demonstrating a high density) and light green on the periphery (showing low density). For example, look at this Python Seaborn package.

src/plotting/recipes_samples_overview.jl Outdated Show resolved Hide resolved
src/plotting/recipes_samples_overview.jl Outdated Show resolved Hide resolved
src/plotting/recipes_samples_overview.jl Outdated Show resolved Hide resolved
src/plotting/recipes_samples_overview.jl Outdated Show resolved Hide resolved
@Cornelius-G
Copy link
Collaborator

@VasylHafych I just used the "review" feature to give some comments on the plot recipe. I hope you can work with that. I used this feature for the first time, so just ask me if there is something unclear. The comments are just some possible improvements. But generally this is quite nice work!

@Cornelius-G
Copy link
Collaborator

Cornelius-G commented May 29, 2020

Within the scope of this PR, let's replace the BATHistogram functions by a new, exported and generic bat_marginalize() function that follows the scheme of the other bat_... functions (return a (result=..., )).

We can rename the type BATHistogram to Marginalization - that type should not be exported, for now. Next to the histogram that BATHistogram currently contains, it should also contain a reference to the parameter(s), so that the plotting recipe for it doesn't need the separate idx argument anymore.

@oschulz Ok, this seems generally like a good idea, but I have some questions on how to realize this.

  1. What should bat_marginalize() return? A Marginalization? As a user I would rather expect this to give me the samples of the specified dimensions.
  2. The idea of BATHistogram was also to provide the BAT plotting recipes for other histograms, that have not been created from samples. So basically every StatsBase.Histogram could be converted to BATHistogram to provide plots like smallest_intervals etc. to make it easier for users to plot their data in the same style. When we rename this to Marginalization this does not make so much sense.

@oschulz
Copy link
Member

oschulz commented May 29, 2020

What should bat_marginalize() return? A Marginalization? As a user I would rather expect this to give me the samples of the specified dimensions.

Good point. I believe it should return a marginal distribution. Let's replace BATHistogram by a new type MarginalDist, which would contain a Distribution (single- or multivariate) and a reference to the parameter(s) that remain after marginalization. The Distribution would then usually be a UvBinnedDist or MvBinnedDist from "EmpiricalDistributions.jl", which wrap a histogram in term.

This way, things would be nice an clean: The result of a marginalization could immediate be reused as a prior, and at the same time all necessary information would be available to plotting recipes, without type piracy.

In the future, we can then add non-histogrammed, sample-based Distributions to "EmpiricalDistributions.jl", and offer different marginalization algorithms to allow the user to select which kind of marginalization they want.

The bat_marginalmode function (or whatever we'll call it) can then use this machinery internally, and take the marginialization algorithm as an optional argument (like the other bat_... functions). The rest will be automatic, since the empirical distribution will already implement a mode function (we have to add this to UvBinnedDist and MvBinnedDist).

@oschulz oschulz changed the title Local Mode Function Function to estimate modes in marginalizations May 29, 2020
@oschulz
Copy link
Member

oschulz commented May 29, 2020

Let's go with the term "marginal mode".

@VasylHafych
Copy link
Contributor Author

Hi @Cornelius-G. Thank you for your helpful suggestions. I have included an update function in my latest commit.

@oschulz, considering that your suggestions with renamings require more work to be done, I propose to move my plotting recipe to a separate pull request to decouple these two topics.

@oschulz
Copy link
Member

oschulz commented May 30, 2020

I propose to move my plotting recipe to a separate pull request to decouple these two topics.

Sure!

@VasylHafych
Copy link
Contributor Author

VasylHafych commented May 31, 2020

@oschulz, a new pull request is ready (#129).

@Cornelius-G, if you could rename BATHistogram from your Plotting recipes by a new type MarginalDist as Oliver suggested, I can then use your updates and add two functions: bat_marginalmode, bat_median.

One more question — should we rename these functions in BAT 2.0 to have name consistency? They all are acting on posterior and returning a DensitySampleVector. For example, we can call them:

  • bat_globalmode
  • bat_marginalmode
  • bat_median
  • bat_mean.

@oschulz
Copy link
Member

oschulz commented May 31, 2020

@oschulz, a new pull request is ready

Thanks! I just posted some comments on #129 .

@oschulz
Copy link
Member

oschulz commented May 31, 2020

if you could rename BATHistogram from your Plotting recipes by a new type MarginalDist

Yes - it's not just a rename, though.

@Cornelius-G
Copy link
Collaborator

Let's replace BATHistogram by a new type MarginalDist, which would contain a Distribution (single- or multivariate) and a reference to the parameter(s) that remain after marginalization. The Distribution would then usually be a UvBinnedDist or MvBinnedDist from "EmpiricalDistributions.jl", which wrap a histogram in term.

Ok, I guess this will be on my agenda then.
@oschulz could you maybe give me an example of how exactly you want this MarginalDist to look like. Especially the parameter reference? I would then try to change the plotting accordingly.

@oschulz
Copy link
Member

oschulz commented Jun 2, 2020

I guess it would be something like

struct MarginalDist{N,D<:Distribution,VS<:AbstractValueShape}
    dims::NTuple{N,Int}
    dist::D
    origvalshape::VS
end

with dims referencing the dimensions (in unshaped par-space) left after marginalization. That should be enough information to compute parameter names, etc.

@VasylHafych
Copy link
Contributor Author

VasylHafych commented Jun 2, 2020

Thanks! I just posted some comments on #129 .

@oschulz Cannot see any of your comments on #129

@oschulz
Copy link
Member

oschulz commented Jun 2, 2020

Cannot see any of your comments on #129

Search the page for "oschulz started a review"

@VasylHafych
Copy link
Contributor Author

VasylHafych commented Jun 2, 2020

oschulz started a review

0 results are matched (Reviewers: No reviews). I guess it is not visible for me or something.

@oschulz
Copy link
Member

oschulz commented Jun 2, 2020

Oh, sorry, I forgot to click "submit review" :-)

@VasylHafych
Copy link
Contributor Author

I was playing around with KernelDensity.jl and realized that we can use it to construct KDE of our marginal distributions and then use an optimizer to find marginal modes. In this case, we should be able to find marginal mode with better precision than just doing binning. What do you think about that?

@oschulz
Copy link
Member

oschulz commented Jun 5, 2020

Yes, I've been thinking about KDE, too - that should go into the EmpiricalDistributions package though. We could add KDE-based empirical distributions, in addition to the bin-based ones. That would be very valuable! Especially because they would be differentiable, so suitable to use as priors with HMC and so on - bin-based step-function priors wouldn't do so well there.

@VasylHafych
Copy link
Contributor Author

@oschulz, Yes, it sounds like an interesting task. I have already played a bit with Julia KDE, so I can volunteer to implement EmpiricalDistributions extension. But I cannot say exactly when - maybe a little bit later.

@oschulz
Copy link
Member

oschulz commented Jun 6, 2020

But I cannot say exactly when - maybe a little bit later.

Sure, there's no rush - the nice thing is, if we do things properly in BAT, something like that can be added to EmpiricalDistributions later on and will then "just work".

@oschulz
Copy link
Member

oschulz commented Jun 8, 2020

I just merged the other PR by @Cornelius-G - could you adept this PR to the changes, @VasylHafych ?

@VasylHafych
Copy link
Contributor Author

VasylHafych commented Jun 9, 2020

Hi @oschulz. Here is an updated version. We now have bat_findmode, bat_findlocalmode, and bat_findmedian. The last two functions operate on DensitySampleVector.

Do you think we need these functions to operate on a prior, too?

@oschulz
Copy link
Member

oschulz commented Jun 9, 2020

We now have bat_findmode, bat_findlocalmode, and bat_findmedian

I think we decided to call this marginal mode, not local mode, right? Or is bat_findlocalmode really about actual local modes in the non-marginalized samples?

In addition to bat_findmedian, should we also define Statistics.median(samples::DensitySampleVector) as well (we already have Statistics.mean and Statistics.var), since it's basically straightforward and doesn't require a possible choice of algorithms? bat_findmedian can then just be a wrapper around it.

We should probably define Statistics.mode(samples::DensitySampleVector) as well, and have the trivial case of bat_findmode use it.

@VasylHafych
Copy link
Contributor Author

I think we decided to call this marginal mode, not local mode, right?

sorry, this name remained from the old implementation. The function is now called bat_marginalmode. Also, I added Statistics.median (see last commit).

Ready to merge.

@oschulz
Copy link
Member

oschulz commented Jun 9, 2020

Sorry, we a little change in the output of bat_marginalize that you'll have to adapt to, first: #133

@oschulz
Copy link
Member

oschulz commented Jun 9, 2020

Thanks, will merge as soon as tests are through.

@oschulz oschulz merged commit d757176 into bat:master Jun 9, 2020
@oschulz
Copy link
Member

oschulz commented Jun 9, 2020

Thanks again!

@VasylHafych VasylHafych deleted the add_functions branch June 9, 2020 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants