Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing tension statistics #333

Open
wants to merge 63 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
c0946bd
Added Will's example Class for tension calculator
DilyOng Aug 24, 2023
70c0aa4
Changed Version from 2.3.0 to 2.4.0
DilyOng Aug 24, 2023
ee1db5b
Added Class tension
DilyOng Aug 31, 2023
ffd5fae
Merge branch 'master' into tension
lukashergt Sep 30, 2023
a3d06b5
version bump to 2.5.0
lukashergt Sep 30, 2023
4a5d4b5
Created a function called tension_stats for tension statistics calcul…
DilyOng Oct 13, 2023
38d31c2
Added logLmax to the function anesthetic.examples.perfect_ns.correlat…
DilyOng Oct 13, 2023
932a201
Clean up old tension statistics files. Put the correlated guassian li…
DilyOng Oct 13, 2023
4046bc3
Merge branch 'tension' of github.com:handley-lab/anesthetic into tension
DilyOng Oct 13, 2023
de17f04
Updated logLmax
DilyOng Oct 13, 2023
6ab8e4f
remove DS_Store
AdamOrmondroyd Oct 14, 2023
3698669
Merge branch 'master' into tension
AdamOrmondroyd Oct 14, 2023
aa0c177
Added a file tension.py to anesthetic/anesthetic and it contains a fu…
DilyOng Oct 26, 2023
2f472b5
Updated the tests/test_tension_stats.py file for flake8 compliance. A…
DilyOng Oct 26, 2023
5209ad2
Updated tests/test_tension_stats.py
DilyOng Dec 5, 2023
6fb5fe0
Updated anesthetic/tests/test_tension_stats.py. Now it tests whether …
DilyOng Mar 4, 2024
151a91d
Merge branch 'master' into tension
williamjameshandley Mar 5, 2024
b0994b3
bump version to 2.9.0
williamjameshandley Mar 5, 2024
aec6a25
Deleted duplicate files.
DilyOng Mar 5, 2024
46ec095
Merge branch 'tension' of github.com:handley-lab/anesthetic into tension
DilyOng Mar 5, 2024
be72001
Merge branch 'master' into tension
williamjameshandley Mar 18, 2024
03b9b1c
Reorganised tests
williamjameshandley Mar 18, 2024
9dd5ca1
numpy linalg
williamjameshandley Mar 18, 2024
e82f125
For anesthetic.tension.tension_stats(), I 1) added docstrings, 2) add…
DilyOng Mar 20, 2024
979f621
Added docstring in public module.
DilyOng Mar 21, 2024
84c8e6e
Fixed pydocstyle issue.
DilyOng Mar 21, 2024
9909190
Fixed pydocstyle issue.
DilyOng Mar 21, 2024
6723f4c
In test_tension.py, 1) added a test for latex labels, and 2) changed …
DilyOng Apr 4, 2024
9739361
Changed V
DilyOng Apr 4, 2024
25ab2c1
typographical changes
williamjameshandley Apr 5, 2024
1691a02
both tests correct
williamjameshandley Apr 5, 2024
110029a
Correction to docstring kl
williamjameshandley Apr 5, 2024
89fab36
Further docstring updates
williamjameshandley Apr 5, 2024
8f2e60a
Corrected typo
williamjameshandley Apr 8, 2024
290f257
Merge branch 'master' into tension
lukashergt Apr 8, 2024
815a7e5
fix link in docstring
lukashergt Apr 8, 2024
ab2d14e
add anesthetic.read.csv module to documentation
lukashergt Apr 8, 2024
c11c5f6
add anesthetic.tension module to documentation
lukashergt Apr 8, 2024
d1f1c2a
fix links and some formatting issues in `tension.py` docstrings
lukashergt Apr 8, 2024
fbd733e
Merge branch 'master' into tension
williamjameshandley Apr 9, 2024
d4659bd
Merge branch 'master' into tension
lukashergt Apr 9, 2024
877d79d
Merge branch 'master' into tension
williamjameshandley Apr 9, 2024
e568bfd
Merge branch 'master' into tension
lukashergt Apr 9, 2024
4947c69
Updated the docstrings in anesthetic/tension.py.
DilyOng Apr 14, 2024
7a10785
Updated docstrings
williamjameshandley Sep 19, 2024
970f431
Merge branch 'master' into tension
williamjameshandley Sep 19, 2024
69c932a
Updated docstring to avoid Samples
williamjameshandley Sep 19, 2024
7e90ac5
replaced \\m
williamjameshandley Sep 19, 2024
75d3067
Further string corrections
williamjameshandley Sep 19, 2024
740eacf
Further debugging docstrings
williamjameshandley Sep 19, 2024
d236f34
Logarithmic -> Logarithm
williamjameshandley Sep 19, 2024
79838dd
Merge branch 'master' into tension
lukashergt Sep 27, 2024
73d6180
correct spelling of compatible
lukashergt Sep 27, 2024
5675e25
surround `@` by spaces
lukashergt Sep 27, 2024
a7b7fb5
remove occurences of missing leading 0, e.g. `.01`
lukashergt Sep 27, 2024
e3bf650
use `ln` rather than `log` for the latex labels
lukashergt Sep 27, 2024
9dae22f
streamline and speed up tension tests a bit
lukashergt Sep 27, 2024
8c70f2f
make tension docstring more readable
lukashergt Sep 27, 2024
88b7ec6
optionally allow for passing a pre-computed stats instance to tension…
lukashergt Sep 27, 2024
51c0975
simplify tension tests and add test for direct input of nested sampli…
lukashergt Sep 27, 2024
0649d53
Update README.rst
lukashergt Sep 27, 2024
6b07788
Update _version.py
lukashergt Sep 27, 2024
ce82e13
Merge branch 'master' into tension
lukashergt Sep 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
anesthetic: nested sampling post-processing
===========================================
:Authors: Will Handley and Lukas Hergt
:Version: 2.9.0
:Version: 2.10.0
:Homepage: https://github.com/handley-lab/anesthetic
:Documentation: http://anesthetic.readthedocs.io/

Expand Down
2 changes: 1 addition & 1 deletion anesthetic/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '2.9.0'
__version__ = '2.10.0'
2 changes: 1 addition & 1 deletion anesthetic/read/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@


def read_csv(filename, *args, **kwargs):
"""Read a CSV file into a :class:`Samples` object."""
"""Read a CSV file into a :class:`anesthetic.samples.Samples` object."""
filename = Path(filename)
kwargs['label'] = kwargs.get('label', filename.stem)
wldf = wl_read_csv(filename.with_suffix('.csv'))
Expand Down
104 changes: 104 additions & 0 deletions anesthetic/tension.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
"""Tension statistics between two datasets."""
from anesthetic.samples import Samples
from scipy.stats import chi2


def stats(A, B, AB, nsamples=None, beta=None): # noqa: D301
r"""Compute tension statistics between two samples.

Using nested sampling we can compute:

- ``logR``: R statistic for dataset consistency

.. math::
\log R = \log Z_{AB} - \log Z_{A} - \log Z_{B}

- ``logI``: information ratio

.. math::
\log I = D_{KL}^{A} + D_{KL}^{B} - D_{KL}^{AB}

- ``logS``: suspiciousness

.. math::
\log S = \log L_{AB} - \log L_{A} - \log L_{B}

- ``d_G``: Gaussian model dimensionality of shared constrained parameters

.. math::
d = d_{A} + d_{B} - d_{AB}

- ``p``: p-value for the tension between two samples

.. math::
p = \int_{d-2\log{S}}^{\infty} \chi^2_d(x) dx

Parameters
----------
A : :class:`anesthetic.samples.Samples`
:class:`anesthetic.samples.NestedSamples` object from a sampling run
using only dataset A.
Alternatively, you can pass a precomputed stats object returned from
:meth:`anesthetic.samples.NestedSamples.stats`.

B : :class:`anesthetic.samples.Samples`
:class:`anesthetic.samples.NestedSamples` object from a sampling run
using only dataset B.
Alternatively, you can pass the precomputed stats object returned from
:meth:`anesthetic.samples.NestedSamples.stats`.

AB : :class:`anesthetic.samples.Samples`
:class:`anesthetic.samples.NestedSamples` object from a sampling run
using both datasets A and B jointly.
Alternatively, you can pass the precomputed stats object returned from
:meth:`anesthetic.samples.NestedSamples.stats`.

nsamples : int, optional
- If nsamples is not supplied, calculate mean value.
- If nsamples is integer, draw nsamples from the distribution of
values inferred by nested sampling.

beta : float, array-like, default=1
Inverse temperature(s) `beta=1/kT`.

Returns
-------
samples : :class:`anesthetic.samples.Samples`
DataFrame containing the following tension statistics in columns:
['logR', 'logI', 'logS', 'd_G', 'p']
"""
columns = ['logZ', 'D_KL', 'logL_P', 'd_G']
if set(columns).issubset(A.drop_labels().columns):
statsA = A
else:
statsA = A.stats(nsamples=nsamples, beta=beta)
if set(columns).issubset(B.drop_labels().columns):
statsB = B
else:
statsB = B.stats(nsamples=nsamples, beta=beta)
if set(columns).issubset(AB.drop_labels().columns):
statsAB = AB
else:
statsAB = AB.stats(nsamples=nsamples, beta=beta)
if statsA.shape != statsAB.shape or statsB.shape != statsAB.shape:
raise ValueError("Shapes of stats_A, stats_B, and stats_AB do not "
"match. Make sure to pass consistent `nsamples`.")

samples = Samples(index=statsA.index)

samples['logR'] = statsAB['logZ'] - statsA['logZ'] - statsB['logZ']
samples.set_label('logR', r'$\ln\mathcal{R}$')

samples['logI'] = statsA['D_KL'] + statsB['D_KL'] - statsAB['D_KL']
samples.set_label('logI', r'$\ln\mathcal{I}$')
Comment on lines +92 to +93
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamjameshandley: The notation with logI matches the one in eq. (9) of Quantifying tensions in cosmological parameters. However, the Shannon information I_S = log(P/pi) already carries a logarithm in its notation, so I find this logI notation confusing. Wouldn't just I be more appropiate?

The notation logR = logS - I also matches better the Occam equation logZ = logL_P - D_KL...

Thoughts?


samples['logS'] = statsAB['logL_P'] - statsA['logL_P'] - statsB['logL_P']
samples.set_label('logS', r'$\ln\mathcal{S}$')

samples['d_G'] = statsA['d_G'] + statsB['d_G'] - statsAB['d_G']
samples.set_label('d_G', r'$d_\mathrm{G}$')

p = chi2.sf(samples['d_G'] - 2 * samples['logS'], df=samples['d_G'])
samples['p'] = p
samples.set_label('p', '$p$')
return samples
9 changes: 9 additions & 0 deletions docs/source/anesthetic.read.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,15 @@ anesthetic.read.cobaya module
:show-inheritance:


anesthetic.read.csv module
--------------------------

.. automodule:: anesthetic.read.csv
:members:
:undoc-members:
:show-inheritance:


anesthetic.read.getdist module
------------------------------

Expand Down
10 changes: 10 additions & 0 deletions docs/source/anesthetic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ anesthetic.samples module
.. automodule:: anesthetic.samples
:members:
:undoc-members:
:show-inheritance:


anesthetic.scripts module
Expand All @@ -83,6 +84,15 @@ anesthetic.scripts module
:show-inheritance:


anesthetic.tension module
~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: anesthetic.tension
:members:
:undoc-members:
:show-inheritance:
williamjameshandley marked this conversation as resolved.
Show resolved Hide resolved


anesthetic.testing module
~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
126 changes: 126 additions & 0 deletions tests/test_tension.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
from pytest import approx, raises
from anesthetic.examples.perfect_ns import correlated_gaussian
import numpy as np
from numpy.linalg import inv, slogdet
from pandas.testing import assert_series_equal
from anesthetic.tension import stats


def test_tension_stats_compatible_gaussian():
np.random.seed(42)
d = 3
nlive = 10 * d
bounds = [[-1, 1], [0, 3], [0, 1]]
logV = np.log(np.diff(bounds).ravel().prod())

muA = np.array([0.1, 0.3, 0.5])
covA = 0.01 * np.array([[0.010, 0.009, 0.0],
[0.009, 0.010, 0.0],
[0.000, 0.000, 0.1]])
invcovA = inv(covA)
logLmaxA = 0
samplesA = correlated_gaussian(nlive, muA, covA, bounds, logLmaxA)

muB = np.array([0.1, 0.3, 0.5])
covB = 0.01 * np.array([[+0.010, -0.009, +0.010],
[-0.009, +0.010, -0.001],
[+0.010, -0.001, +0.100]])
invcovB = inv(covB)
logLmaxB = 0
samplesB = correlated_gaussian(nlive, muB, covB, bounds, logLmaxB)

covAB = inv(invcovA + invcovB)
muAB = covAB @ (invcovA @ muA + invcovB @ muB)
dmuAB = muA - muB
dmu_cov_dmu_AB = dmuAB @ invcovA @ covAB @ invcovB @ dmuAB
logLmaxAB = logLmaxA + logLmaxB - dmu_cov_dmu_AB / 2
samplesAB = correlated_gaussian(nlive, muAB, covAB, bounds, logLmaxAB)

nsamples = 10
beta = 1
s = stats(samplesA, samplesB, samplesAB, nsamples, beta)

logR_exact = logV - dmu_cov_dmu_AB/2 - slogdet(2*np.pi*(covA+covB))[1]/2
assert s.logR.mean() == approx(logR_exact, abs=3*s.logR.std())

logS_exact = d / 2 - dmu_cov_dmu_AB / 2
assert s.logS.mean() == approx(logS_exact, abs=3*s.logS.std())

logI_exact = logV - d / 2 - slogdet(2*np.pi*(covA+covB))[1] / 2
assert s.logI.mean() == approx(logI_exact, abs=3*s.logI.std())

assert s.logS.mean() == approx(s.logR.mean() - s.logI.mean(),
abs=3*s.logS.std())

assert s.get_labels().tolist() == ([r'$\ln\mathcal{R}$',
r'$\ln\mathcal{I}$',
r'$\ln\mathcal{S}$',
r'$d_\mathrm{G}$',
r'$p$'])

with raises(ValueError):
stats(samplesA.stats(nsamples=5), samplesB, samplesAB, nsamples)
s2 = stats(samplesA.stats(nsamples=nsamples),
samplesB.stats(nsamples=nsamples),
samplesAB.stats(nsamples=nsamples))
assert_series_equal(s2.mean(), s.mean(), atol=s2.std().max())


def test_tension_stats_incompatible_gaussian():
np.random.seed(42)
d = 3
nlive = 10 * d
bounds = [[-1, 1], [0, 3], [0, 1]]
logV = np.log(np.diff(bounds).ravel().prod())

muA = np.array([0.1, 0.3, 0.5])
covA = 0.01 * np.array([[0.010, 0.009, 0.0],
[0.009, 0.010, 0.0],
[0.000, 0.000, 0.1]])
invcovA = inv(covA)
logLmaxA = 0
samplesA = correlated_gaussian(nlive, muA, covA, bounds, logLmaxA)

muB = np.array([0.15, 0.25, 0.45])
covB = 0.01 * np.array([[+0.010, -0.009, +0.010],
[-0.009, +0.010, -0.001],
[+0.010, -0.001, +0.100]])
invcovB = inv(covB)
logLmaxB = 0
samplesB = correlated_gaussian(nlive, muB, covB, bounds, logLmaxB)

covAB = inv(invcovA + invcovB)
muAB = covAB @ (invcovA @ muA + invcovB @ muB)
dmuAB = muA - muB
dmu_cov_dmu_AB = dmuAB @ invcovA @ covAB @ invcovB @ dmuAB
logLmaxAB = logLmaxA + logLmaxB - dmu_cov_dmu_AB / 2
samplesAB = correlated_gaussian(nlive, muAB, covAB, bounds, logLmaxAB)

nsamples = 10
beta = 1
s = stats(samplesA, samplesB, samplesAB, nsamples, beta)

logR_exact = logV - dmu_cov_dmu_AB/2 - slogdet(2*np.pi*(covA+covB))[1]/2
assert s.logR.mean() == approx(logR_exact, abs=3*s.logR.std())

logS_exact = d / 2 - dmu_cov_dmu_AB / 2
assert s.logS.mean() == approx(logS_exact, abs=3*s.logS.std())

logI_exact = logV - d / 2 - slogdet(2*np.pi*(covA+covB))[1] / 2
assert s.logI.mean() == approx(logI_exact, abs=3*s.logI.std())

assert s.logS.mean() == approx(s.logR.mean() - s.logI.mean(),
abs=3*s.logS.std())

assert s.get_labels().tolist() == ([r'$\ln\mathcal{R}$',
r'$\ln\mathcal{I}$',
r'$\ln\mathcal{S}$',
r'$d_\mathrm{G}$',
r'$p$'])

with raises(ValueError):
stats(samplesA.stats(nsamples=5), samplesB, samplesAB, nsamples)
s2 = stats(samplesA.stats(nsamples=nsamples),
samplesB.stats(nsamples=nsamples),
samplesAB.stats(nsamples=nsamples))
assert_series_equal(s2.mean(), s.mean(), atol=s2.std().max())
Loading