Function to estimate modes in marginalizations #128

VasylHafych · 2020-05-28T14:59:21Z

The function returns local mode given posterior samples. The optimal number of bins can be determined using this implementation from Plots.jl.

codecov-commenter · 2020-05-28T15:07:39Z

Codecov Report

Merging #128 into master will decrease coverage by 0.26%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
- Coverage   30.21%   29.95%   -0.27%     
==========================================
  Files          69       69              
  Lines        3667     3699      +32     
==========================================
  Hits         1108     1108              
- Misses       2559     2591      +32

Impacted Files	Coverage Δ
src/optimization/bat_findmode.jl	`42.55% <0.00%> (-21.97%)`	⬇️
src/parameters/density_sample.jl	`33.76% <0.00%> (-3.92%)`	⬇️
src/plotting/recipes_samples_overview.jl	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5796616...f09e193. Read the comment docs.

src/plotting/recipes_samples_overview.jl

VasylHafych · 2020-05-28T21:35:23Z

Hi @oschulz! I have extended one of the plots from the tutorial with a new plotting recipe. It is very easy to evaluate it and it makes the plot "Data, True Model and Best Fit" better.

oschulz · 2020-05-29T07:58:39Z

We should change the plotting code to use the new bat_findmode, so that we don't duplicate that code.

Cornelius-G · 2020-05-29T08:04:13Z

@oschulz But it seems the new function just calls the old find_localmodes from plotting. So no duplication

VasylHafych · 2020-05-29T08:06:18Z

Indeed, there is no code duplication. I call old find_localmodes function from plotting recipes.

oschulz · 2020-05-29T08:34:11Z

Oh, yes - then can should move that code into the new exported function and remove the old one, right?

oschulz · 2020-05-29T08:52:33Z

Within the scope of this PR, let's replace the BATHistogram functions by a new, exported and generic bat_marginalize() function that follows the scheme of the other bat_... functions (return a (result=..., )).

We can rename the type BATHistogram to Marginalization - that type should not be exported, for now. Next to the histogram that BATHistogram currently contains, it should also contain a reference to the parameter(s), so that the plotting recipe for it doesn't need the separate idx argument anymore.

oschulz · 2020-05-29T09:12:10Z

Which reminds me: We shouldn't use the term "local mode" anywhere in this context any longer (also in the plots). A local mode is a "bump" in a multi-modal distribution, lower than the global mode. But here we're talking about global modes in the marginalizations of the posterior.

Cornelius-G · 2020-05-29T09:18:32Z

Which reminds me: We shouldn't use the term "local mode" anywhere in this context any longer (also in the plots). A local mode is a "bump" in a multi-modal distribution, lower than the global mode. But here we're talking about global modes in the marginalizations of the posterior.

Ok, so maybe call it "marginalized mode(s)" ?

oschulz · 2020-05-29T09:30:46Z

Yes, or "marginal modes" or so - I sent an email.

src/plotting/recipes_samples_overview.jl

Cornelius-G · 2020-05-29T09:20:43Z

src/plotting/recipes_samples_overview.jl

+
+    @series begin
+        ribbon --> (y_ribbons[:,5],y_ribbons[:,6])
+        fillcolor --> colors[1]


It seems the colors are inverted compared to our usual convention. So the innermost ribbon should be green, the outermost red.

Fixed! By the way, I was always quite confused by this choice of colors. For me, it would be much more natural to use intense and bright red in a center (demonstrating a high density) and light green on the periphery (showing low density). For example, look at this Python Seaborn package.

src/plotting/recipes_samples_overview.jl

Cornelius-G · 2020-05-29T09:36:11Z

@VasylHafych I just used the "review" feature to give some comments on the plot recipe. I hope you can work with that. I used this feature for the first time, so just ask me if there is something unclear. The comments are just some possible improvements. But generally this is quite nice work!

Cornelius-G · 2020-05-29T09:49:37Z

Within the scope of this PR, let's replace the BATHistogram functions by a new, exported and generic bat_marginalize() function that follows the scheme of the other bat_... functions (return a (result=..., )).

We can rename the type BATHistogram to Marginalization - that type should not be exported, for now. Next to the histogram that BATHistogram currently contains, it should also contain a reference to the parameter(s), so that the plotting recipe for it doesn't need the separate idx argument anymore.

@oschulz Ok, this seems generally like a good idea, but I have some questions on how to realize this.

What should bat_marginalize() return? A Marginalization? As a user I would rather expect this to give me the samples of the specified dimensions.
The idea of BATHistogram was also to provide the BAT plotting recipes for other histograms, that have not been created from samples. So basically every StatsBase.Histogram could be converted to BATHistogram to provide plots like smallest_intervals etc. to make it easier for users to plot their data in the same style. When we rename this to Marginalization this does not make so much sense.

oschulz · 2020-05-29T10:42:17Z

What should bat_marginalize() return? A Marginalization? As a user I would rather expect this to give me the samples of the specified dimensions.

Good point. I believe it should return a marginal distribution. Let's replace BATHistogram by a new type MarginalDist, which would contain a Distribution (single- or multivariate) and a reference to the parameter(s) that remain after marginalization. The Distribution would then usually be a UvBinnedDist or MvBinnedDist from "EmpiricalDistributions.jl", which wrap a histogram in term.

This way, things would be nice an clean: The result of a marginalization could immediate be reused as a prior, and at the same time all necessary information would be available to plotting recipes, without type piracy.

In the future, we can then add non-histogrammed, sample-based Distributions to "EmpiricalDistributions.jl", and offer different marginalization algorithms to allow the user to select which kind of marginalization they want.

The bat_marginalmode function (or whatever we'll call it) can then use this machinery internally, and take the marginialization algorithm as an optional argument (like the other bat_... functions). The rest will be automatic, since the empirical distribution will already implement a mode function (we have to add this to UvBinnedDist and MvBinnedDist).

oschulz · 2020-05-29T13:53:48Z

Let's go with the term "marginal mode".

VasylHafych · 2020-05-30T00:10:35Z

Hi @Cornelius-G. Thank you for your helpful suggestions. I have included an update function in my latest commit.

@oschulz, considering that your suggestions with renamings require more work to be done, I propose to move my plotting recipe to a separate pull request to decouple these two topics.

oschulz · 2020-05-30T07:26:12Z

I propose to move my plotting recipe to a separate pull request to decouple these two topics.

Sure!

VasylHafych · 2020-05-31T12:31:08Z

@oschulz, a new pull request is ready (#129).

@Cornelius-G, if you could rename BATHistogram from your Plotting recipes by a new type MarginalDist as Oliver suggested, I can then use your updates and add two functions: bat_marginalmode, bat_median.

One more question — should we rename these functions in BAT 2.0 to have name consistency? They all are acting on posterior and returning a DensitySampleVector. For example, we can call them:

bat_globalmode
bat_marginalmode
bat_median
bat_mean.

oschulz · 2020-05-31T13:29:39Z

@oschulz, a new pull request is ready

Thanks! I just posted some comments on #129 .

oschulz · 2020-05-31T13:30:14Z

if you could rename BATHistogram from your Plotting recipes by a new type MarginalDist

Yes - it's not just a rename, though.

Cornelius-G · 2020-06-02T08:13:47Z

Let's replace BATHistogram by a new type MarginalDist, which would contain a Distribution (single- or multivariate) and a reference to the parameter(s) that remain after marginalization. The Distribution would then usually be a UvBinnedDist or MvBinnedDist from "EmpiricalDistributions.jl", which wrap a histogram in term.

Ok, I guess this will be on my agenda then.
@oschulz could you maybe give me an example of how exactly you want this MarginalDist to look like. Especially the parameter reference? I would then try to change the plotting accordingly.

oschulz · 2020-06-02T08:28:31Z

I guess it would be something like

struct MarginalDist{N,D<:Distribution,VS<:AbstractValueShape}
    dims::NTuple{N,Int}
    dist::D
    origvalshape::VS
end

with dims referencing the dimensions (in unshaped par-space) left after marginalization. That should be enough information to compute parameter names, etc.

VasylHafych · 2020-06-02T09:25:42Z

Thanks! I just posted some comments on #129 .

@oschulz Cannot see any of your comments on #129

oschulz · 2020-06-02T10:23:35Z

Cannot see any of your comments on #129

Search the page for "oschulz started a review"

VasylHafych · 2020-06-02T10:30:03Z

oschulz started a review

0 results are matched (Reviewers: No reviews). I guess it is not visible for me or something.

oschulz · 2020-06-02T10:31:53Z

Oh, sorry, I forgot to click "submit review" :-)

VasylHafych · 2020-06-05T09:57:58Z

I was playing around with KernelDensity.jl and realized that we can use it to construct KDE of our marginal distributions and then use an optimizer to find marginal modes. In this case, we should be able to find marginal mode with better precision than just doing binning. What do you think about that?

oschulz · 2020-06-05T13:30:07Z

Yes, I've been thinking about KDE, too - that should go into the EmpiricalDistributions package though. We could add KDE-based empirical distributions, in addition to the bin-based ones. That would be very valuable! Especially because they would be differentiable, so suitable to use as priors with HMC and so on - bin-based step-function priors wouldn't do so well there.

VasylHafych · 2020-06-06T09:51:07Z

@oschulz, Yes, it sounds like an interesting task. I have already played a bit with Julia KDE, so I can volunteer to implement EmpiricalDistributions extension. But I cannot say exactly when - maybe a little bit later.

oschulz · 2020-06-06T17:24:23Z

But I cannot say exactly when - maybe a little bit later.

Sure, there's no rush - the nice thing is, if we do things properly in BAT, something like that can be added to EmpiricalDistributions later on and will then "just work".

oschulz · 2020-06-08T17:43:19Z

I just merged the other PR by @Cornelius-G - could you adept this PR to the changes, @VasylHafych ?

… updated

VasylHafych · 2020-06-09T08:56:52Z

Hi @oschulz. Here is an updated version. We now have bat_findmode, bat_findlocalmode, and bat_findmedian. The last two functions operate on DensitySampleVector.

Do you think we need these functions to operate on a prior, too?

oschulz · 2020-06-09T10:09:47Z

We now have bat_findmode, bat_findlocalmode, and bat_findmedian

I think we decided to call this marginal mode, not local mode, right? Or is bat_findlocalmode really about actual local modes in the non-marginalized samples?

In addition to bat_findmedian, should we also define Statistics.median(samples::DensitySampleVector) as well (we already have Statistics.mean and Statistics.var), since it's basically straightforward and doesn't require a possible choice of algorithms? bat_findmedian can then just be a wrapper around it.

We should probably define Statistics.mode(samples::DensitySampleVector) as well, and have the trivial case of bat_findmode use it.

…vements

VasylHafych · 2020-06-09T13:16:57Z

I think we decided to call this marginal mode, not local mode, right?

sorry, this name remained from the old implementation. The function is now called bat_marginalmode. Also, I added Statistics.median (see last commit).

Ready to merge.

oschulz · 2020-06-09T13:26:05Z

Sorry, we a little change in the output of bat_marginalize that you'll have to adapt to, first: #133

oschulz · 2020-06-09T13:55:14Z

Thanks, will merge as soon as tests are through.

oschulz · 2020-06-09T14:33:14Z

Thanks again!

VasylHafych added 4 commits May 28, 2020 15:42

Local mode function is added

6e5705e

docs update

b615525

docs corrected

5619bcc

docs sorted

6de0694

Plotting recipe for error bars on the fit function using MCMC samples

0efe85c

VasylHafych commented May 28, 2020

View reviewed changes

src/plotting/recipes_samples_overview.jl Outdated Show resolved Hide resolved

VasylHafych marked this pull request as ready for review May 28, 2020 21:46

Cornelius-G requested changes May 29, 2020

View reviewed changes

oschulz changed the title ~~Local Mode Function~~ Function to estimate modes in marginalizations May 29, 2020

Improvements of the plotting recipe

cb3fd29

VasylHafych mentioned this pull request May 31, 2020

Plotting recipe for showing error bars on a fit function #129

Merged

Cornelius-G mentioned this pull request Jun 8, 2020

Introduce MarginalDist #130

Merged

VasylHafych added 2 commits June 9, 2020 09:56

Merge branch 'master' into add_functions

052535d

Adapted to changes from PR#130. Function 'find_median' is added. Docs…

18f7409

… updated

Statistics.median is added. bat_findlocalmode is renamed. Minor impro…

e67c70a

…vements

VasylHafych added 2 commits June 9, 2020 15:29

Merge branch 'master' into add_functions

b641a60

Adapted to change

f09e193

oschulz merged commit d757176 into bat:master Jun 9, 2020

VasylHafych deleted the add_functions branch June 9, 2020 21:06

Function to estimate modes in marginalizations #128

Function to estimate modes in marginalizations #128

Conversation

VasylHafych commented May 28, 2020

codecov-commenter commented May 28, 2020 • edited Loading

Codecov Report

VasylHafych commented May 28, 2020

oschulz commented May 29, 2020

Cornelius-G commented May 29, 2020

VasylHafych commented May 29, 2020

oschulz commented May 29, 2020

oschulz commented May 29, 2020

oschulz commented May 29, 2020 • edited Loading

Cornelius-G commented May 29, 2020

oschulz commented May 29, 2020

Cornelius-G May 29, 2020

Choose a reason for hiding this comment

VasylHafych May 30, 2020

Choose a reason for hiding this comment

Cornelius-G commented May 29, 2020

Cornelius-G commented May 29, 2020 • edited Loading

oschulz commented May 29, 2020

oschulz commented May 29, 2020

VasylHafych commented May 30, 2020

oschulz commented May 30, 2020

VasylHafych commented May 31, 2020 • edited Loading

oschulz commented May 31, 2020

oschulz commented May 31, 2020

Cornelius-G commented Jun 2, 2020

oschulz commented Jun 2, 2020 • edited Loading

VasylHafych commented Jun 2, 2020 • edited Loading

oschulz commented Jun 2, 2020

VasylHafych commented Jun 2, 2020 • edited Loading

oschulz commented Jun 2, 2020

VasylHafych commented Jun 5, 2020

oschulz commented Jun 5, 2020

VasylHafych commented Jun 6, 2020

oschulz commented Jun 6, 2020

oschulz commented Jun 8, 2020

VasylHafych commented Jun 9, 2020 • edited Loading

oschulz commented Jun 9, 2020 • edited Loading

VasylHafych commented Jun 9, 2020

oschulz commented Jun 9, 2020

oschulz commented Jun 9, 2020

oschulz commented Jun 9, 2020

codecov-commenter commented May 28, 2020 •

edited

Loading

oschulz commented May 29, 2020 •

edited

Loading

Cornelius-G commented May 29, 2020 •

edited

Loading

VasylHafych commented May 31, 2020 •

edited

Loading

oschulz commented Jun 2, 2020 •

edited

Loading

VasylHafych commented Jun 2, 2020 •

edited

Loading

VasylHafych commented Jun 2, 2020 •

edited

Loading

VasylHafych commented Jun 9, 2020 •

edited

Loading

oschulz commented Jun 9, 2020 •

edited

Loading