Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for logL_birth #324

Merged
merged 40 commits into from
Apr 9, 2024
Merged

Fix for logL_birth #324

merged 40 commits into from
Apr 9, 2024

Conversation

williamjameshandley
Copy link
Collaborator

Description

This PR fixes #310 by adding an explicit guard for logL_birth, dropping nans and infs before plotting

It's not the neastest solution, so any suggestions for improvements are welcome.

Checklist:

  • I have performed a self-review of my own code
  • My code is PEP8 compliant (flake8 anesthetic tests)
  • My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
  • New and existing unit tests pass locally with my changes (python -m pytest)
  • I have added tests that prove my fix is effective or that my feature works
  • I have appropriately incremented the semantic version number in both README.rst and anesthetic/_version.py

@codecov
Copy link

codecov bot commented Aug 1, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (16083c4) to head (7eb1869).

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #324   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           36        36           
  Lines         3041      3052   +11     
=========================================
+ Hits          3041      3052   +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@lukashergt lukashergt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments inline.

Comment on lines 247 to 252
self[x].plot(ax=ax, xlabel=xlabel,
*args, **kwargs)
if x == 'logL_birth':
self[x].replace(-np.inf, np.nan
).dropna().plot(ax=ax, xlabel=xlabel,
*args, **kwargs)
else:
self[x].plot(ax=ax, xlabel=xlabel, *args, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The .dropna() shouldn't be necessary.
  • Shall we separate inf replacement and plotting to reduce code duplication? We could have
    if x == 'logL_birth':
        selfx = self[x].replace(...)
    else:
        selfx = self[x]
    selfx.plot(...)
  • I feel like changing infs to nans shouldn't happen silently. Shall we at least have a warning that we are setting -inf to nan for plotting purposes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropna is necessary for the 2D KDE, but not otherwiseo so I've removed those.
I've reduced code repetition as suggested.
With regard to infs to nans, I'm not that keen to have a warning for default behaviour (i.e. samples.plot_1d())

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regard to infs to nans, I'm not that keen to have a warning for default behaviour (i.e. samples.plot_1d())

Well, I'd argue here that samples.plot_1d() or samples.plot_2d() is not default behaviour for nested sampling data (nor for anesthetic dataframes in general, which typically come with very many columns), otherwise this would have come up earlier, too. By default the user should specify which columns are to be plotted.

So I think a warning message could help the new user (who is most likely to call the plotting functions without specifying columns) to learn faster how to use the plotting command. This is also how this PR came to be: new user simply trying samples.plot_1d() to see if it works...

I agree that too many warning messages are annoying and get in the way, but how often do we intentionally plot the posterior distribution of logL_birth? Also, for people that indeed want to intentionally plot logL_birth, they can easily avoid the warning by dropping infs first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am persuaded by this, so think we should have a warning message.

If we're going down a warning route, should we restrict this to logL_birth ? The code would be much neater if we just moved the relevant inf guards into kde_contour_plot_2d and kde_1d and kde_2d. This would then be more consistent with the other functions behaviour, e.g. samples.plot(), which just ignores nans and infs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we restrict this to logL_birth ?

We could do either logL_birth specifically, or anything with infs generally. Happy with both, but the warning should say something along the lines "there are infs in columns [logL_birth, ...]".

The infs will also cause problems for histograms, so I think catching them in plot_1d and plot_2d is probably the right place...?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My input from stumbling across this discussion via #332

Perhaps the default behavior is sensible (include nlive logl and loglbirth in the plot if nothing is specified), but it strikes me that the easiest thing to make the various kde errors less impactful is add a kind=scatter and make that the default for the specific case that the columns to plot are empty?

anesthetic/samples.py Outdated Show resolved Hide resolved
tests/test_samples.py Outdated Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
@williamjameshandley
Copy link
Collaborator Author

Note that this has come up in the past in #96, which this PR obseletes, so that test is now removed.

@lukashergt
Copy link
Collaborator

The current question is whether we should change the default behaviour to ignoring the columns logL, logL_birth, and nlive by default?

I would vote against that default change. plot_1d and plot_2d are more general Samples methods which I don't think should drop any columns by default.

Those columns are more specific NestedSamples parameters. We could create NestedSamples.plot_1d and NestedSamples.plot_2d methods which would handle the treatment of those parameters (and then forward to Samples.plot_1d etc.), be it a warning or a change to default behaviour. I'd prefer a simple warning.

Perhaps the default behavior is sensible (include nlive logl and loglbirth in the plot if nothing is specified), but it strikes me that the easiest thing to make the various kde errors less impactful is add a kind=scatter and make that the default for the specific case that the columns to plot are empty?

I'd keep the kde default, but am open to be convinced otherwise.
Either way, it might be nice to have a simpler shortcut for scatter plots, so maybe we should create a kind='scatter' shortcut, which uses 'scatter_2d' in the lower triangle (and 'hist_1d' on the diagonal?)?

@williamjameshandley
Copy link
Collaborator Author

Either way, it might be nice to have a simpler shortcut for scatter plots, so maybe we should create a kind='scatter' shortcut, which uses 'scatter_2d' in the lower triangle (and 'hist_1d' on the diagonal?)?

Added scatter and scatter_2d to default kinds in b51b3ef

@williamjameshandley
Copy link
Collaborator Author

I would also vote to keep things in Samples and avoid a NestedSamples override.

README.rst Outdated Show resolved Hide resolved
anesthetic/samples.py Outdated Show resolved Hide resolved
anesthetic/samples.py Outdated Show resolved Hide resolved
tests/test_samples.py Outdated Show resolved Hide resolved
tests/test_samples.py Outdated Show resolved Hide resolved
anesthetic/samples.py Outdated Show resolved Hide resolved
lukashergt
lukashergt previously approved these changes Sep 30, 2023
Copy link
Collaborator

@lukashergt lukashergt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamjameshandley, if you are happy with my tweaks, feel free to squash and merge.

lukashergt
lukashergt previously approved these changes Mar 22, 2024
lukashergt
lukashergt previously approved these changes Apr 9, 2024
lukashergt
lukashergt previously approved these changes Apr 9, 2024
@williamjameshandley williamjameshandley merged commit 79c7bb6 into master Apr 9, 2024
22 checks passed
@williamjameshandley williamjameshandley deleted the logL_birth_inf branch April 9, 2024 18:43
lukashergt added a commit that referenced this pull request Apr 23, 2024
lukashergt added a commit that referenced this pull request Apr 24, 2024
* test that limits get accurately updated by successive plots with logscale axes, adjusting to new data limits, see issue #381

* fix typo from PR #324

* bump version to 2.8.10

* update logscale plot limits to datalimits at the end, making use of `ax.dataLim`
AdamOrmondroyd pushed a commit to AdamOrmondroyd/anesthetic that referenced this pull request May 28, 2024
* test that limits get accurately updated by successive plots with logscale axes, adjusting to new data limits, see issue handley-lab#381

* fix typo from PR handley-lab#324

* bump version to 2.8.10

* update logscale plot limits to datalimits at the end, making use of `ax.dataLim`
AdamOrmondroyd added a commit that referenced this pull request Jun 24, 2024
* allow matplotlib 3.9

* bump version to 2.8.10

* Fix logscale limit updates (#383)

* test that limits get accurately updated by successive plots with logscale axes, adjusting to new data limits, see issue #381

* fix typo from PR #324

* bump version to 2.8.10

* update logscale plot limits to datalimits at the end, making use of `ax.dataLim`

* Fix macOS CI (#385)

* attempt at fixing macOS CI by brew installing hdf5

* update from `miniconda@v2` to `miniconda@v3`

* bump version to 2.8.11

* try newer `tables` version, which was previously restricted to 3.8.0 in #379

* Revert "attempt at fixing macOS CI by brew installing hdf5"

This reverts commit 968bdb3.

* Reapply "attempt at fixing macOS CI by brew installing hdf5"

This reverts commit 204014a.

Seems like this is needed after all, otherwise macOS is struggling to
find a local HDF5.

---------

Co-authored-by: Will Handley <[email protected]>

* Fix to `color='C2'` plot_2d error post pandas 2 (#382)

* Added failing test

* bump version to 2.8.10

* Get color from self.color

* Update README.rst

* Update _version.py

* Update README.rst

* Update _version.py

---------

Co-authored-by: Lukas Hergt <[email protected]>

* bump version to 2.8.10

* bump version to 2.8.11

* bump version to 2.8.13

---------

Co-authored-by: Lukas Hergt <[email protected]>
Co-authored-by: Will Handley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plots with logL birth
3 participants