-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing tension statistics #333
Open
DilyOng
wants to merge
63
commits into
master
Choose a base branch
from
tension
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
63 commits
Select commit
Hold shift + click to select a range
c0946bd
Added Will's example Class for tension calculator
DilyOng 70c0aa4
Changed Version from 2.3.0 to 2.4.0
DilyOng ee1db5b
Added Class tension
DilyOng ffd5fae
Merge branch 'master' into tension
lukashergt a3d06b5
version bump to 2.5.0
lukashergt 4a5d4b5
Created a function called tension_stats for tension statistics calcul…
DilyOng 38d31c2
Added logLmax to the function anesthetic.examples.perfect_ns.correlat…
DilyOng 932a201
Clean up old tension statistics files. Put the correlated guassian li…
DilyOng 4046bc3
Merge branch 'tension' of github.com:handley-lab/anesthetic into tension
DilyOng de17f04
Updated logLmax
DilyOng 6ab8e4f
remove DS_Store
AdamOrmondroyd 3698669
Merge branch 'master' into tension
AdamOrmondroyd aa0c177
Added a file tension.py to anesthetic/anesthetic and it contains a fu…
DilyOng 2f472b5
Updated the tests/test_tension_stats.py file for flake8 compliance. A…
DilyOng 5209ad2
Updated tests/test_tension_stats.py
DilyOng 6fb5fe0
Updated anesthetic/tests/test_tension_stats.py. Now it tests whether …
DilyOng 151a91d
Merge branch 'master' into tension
williamjameshandley b0994b3
bump version to 2.9.0
williamjameshandley aec6a25
Deleted duplicate files.
DilyOng 46ec095
Merge branch 'tension' of github.com:handley-lab/anesthetic into tension
DilyOng be72001
Merge branch 'master' into tension
williamjameshandley 03b9b1c
Reorganised tests
williamjameshandley 9dd5ca1
numpy linalg
williamjameshandley e82f125
For anesthetic.tension.tension_stats(), I 1) added docstrings, 2) add…
DilyOng 979f621
Added docstring in public module.
DilyOng 84c8e6e
Fixed pydocstyle issue.
DilyOng 9909190
Fixed pydocstyle issue.
DilyOng 6723f4c
In test_tension.py, 1) added a test for latex labels, and 2) changed …
DilyOng 9739361
Changed V
DilyOng 25ab2c1
typographical changes
williamjameshandley 1691a02
both tests correct
williamjameshandley 110029a
Correction to docstring kl
williamjameshandley 89fab36
Further docstring updates
williamjameshandley 8f2e60a
Corrected typo
williamjameshandley 290f257
Merge branch 'master' into tension
lukashergt 815a7e5
fix link in docstring
lukashergt ab2d14e
add anesthetic.read.csv module to documentation
lukashergt c11c5f6
add anesthetic.tension module to documentation
lukashergt d1f1c2a
fix links and some formatting issues in `tension.py` docstrings
lukashergt fbd733e
Merge branch 'master' into tension
williamjameshandley d4659bd
Merge branch 'master' into tension
lukashergt 877d79d
Merge branch 'master' into tension
williamjameshandley e568bfd
Merge branch 'master' into tension
lukashergt 4947c69
Updated the docstrings in anesthetic/tension.py.
DilyOng 7a10785
Updated docstrings
williamjameshandley 970f431
Merge branch 'master' into tension
williamjameshandley 69c932a
Updated docstring to avoid Samples
williamjameshandley 7e90ac5
replaced \\m
williamjameshandley 75d3067
Further string corrections
williamjameshandley 740eacf
Further debugging docstrings
williamjameshandley d236f34
Logarithmic -> Logarithm
williamjameshandley 79838dd
Merge branch 'master' into tension
lukashergt 73d6180
correct spelling of compatible
lukashergt 5675e25
surround `@` by spaces
lukashergt a7b7fb5
remove occurences of missing leading 0, e.g. `.01`
lukashergt e3bf650
use `ln` rather than `log` for the latex labels
lukashergt 9dae22f
streamline and speed up tension tests a bit
lukashergt 8c70f2f
make tension docstring more readable
lukashergt 88b7ec6
optionally allow for passing a pre-computed stats instance to tension…
lukashergt 51c0975
simplify tension tests and add test for direct input of nested sampli…
lukashergt 0649d53
Update README.rst
lukashergt 6b07788
Update _version.py
lukashergt ce82e13
Merge branch 'master' into tension
lukashergt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
__version__ = '2.9.0' | ||
__version__ = '2.10.0' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
"""Tension statistics between two datasets.""" | ||
from anesthetic.samples import Samples | ||
from scipy.stats import chi2 | ||
|
||
|
||
def stats(A, B, AB, nsamples=None, beta=None): # noqa: D301 | ||
r"""Compute tension statistics between two samples. | ||
|
||
Using nested sampling we can compute: | ||
|
||
- ``logR``: R statistic for dataset consistency | ||
|
||
.. math:: | ||
\log R = \log Z_{AB} - \log Z_{A} - \log Z_{B} | ||
|
||
- ``logI``: information ratio | ||
|
||
.. math:: | ||
\log I = D_{KL}^{A} + D_{KL}^{B} - D_{KL}^{AB} | ||
|
||
- ``logS``: suspiciousness | ||
|
||
.. math:: | ||
\log S = \log L_{AB} - \log L_{A} - \log L_{B} | ||
|
||
- ``d_G``: Gaussian model dimensionality of shared constrained parameters | ||
|
||
.. math:: | ||
d = d_{A} + d_{B} - d_{AB} | ||
|
||
- ``p``: p-value for the tension between two samples | ||
|
||
.. math:: | ||
p = \int_{d-2\log{S}}^{\infty} \chi^2_d(x) dx | ||
|
||
Parameters | ||
---------- | ||
A : :class:`anesthetic.samples.Samples` | ||
:class:`anesthetic.samples.NestedSamples` object from a sampling run | ||
using only dataset A. | ||
Alternatively, you can pass a precomputed stats object returned from | ||
:meth:`anesthetic.samples.NestedSamples.stats`. | ||
|
||
B : :class:`anesthetic.samples.Samples` | ||
:class:`anesthetic.samples.NestedSamples` object from a sampling run | ||
using only dataset B. | ||
Alternatively, you can pass the precomputed stats object returned from | ||
:meth:`anesthetic.samples.NestedSamples.stats`. | ||
|
||
AB : :class:`anesthetic.samples.Samples` | ||
:class:`anesthetic.samples.NestedSamples` object from a sampling run | ||
using both datasets A and B jointly. | ||
Alternatively, you can pass the precomputed stats object returned from | ||
:meth:`anesthetic.samples.NestedSamples.stats`. | ||
|
||
nsamples : int, optional | ||
- If nsamples is not supplied, calculate mean value. | ||
- If nsamples is integer, draw nsamples from the distribution of | ||
values inferred by nested sampling. | ||
|
||
beta : float, array-like, default=1 | ||
Inverse temperature(s) `beta=1/kT`. | ||
|
||
Returns | ||
------- | ||
samples : :class:`anesthetic.samples.Samples` | ||
DataFrame containing the following tension statistics in columns: | ||
['logR', 'logI', 'logS', 'd_G', 'p'] | ||
""" | ||
columns = ['logZ', 'D_KL', 'logL_P', 'd_G'] | ||
if set(columns).issubset(A.drop_labels().columns): | ||
statsA = A | ||
else: | ||
statsA = A.stats(nsamples=nsamples, beta=beta) | ||
if set(columns).issubset(B.drop_labels().columns): | ||
statsB = B | ||
else: | ||
statsB = B.stats(nsamples=nsamples, beta=beta) | ||
if set(columns).issubset(AB.drop_labels().columns): | ||
statsAB = AB | ||
else: | ||
statsAB = AB.stats(nsamples=nsamples, beta=beta) | ||
if statsA.shape != statsAB.shape or statsB.shape != statsAB.shape: | ||
raise ValueError("Shapes of stats_A, stats_B, and stats_AB do not " | ||
"match. Make sure to pass consistent `nsamples`.") | ||
|
||
samples = Samples(index=statsA.index) | ||
|
||
samples['logR'] = statsAB['logZ'] - statsA['logZ'] - statsB['logZ'] | ||
samples.set_label('logR', r'$\ln\mathcal{R}$') | ||
|
||
samples['logI'] = statsA['D_KL'] + statsB['D_KL'] - statsAB['D_KL'] | ||
samples.set_label('logI', r'$\ln\mathcal{I}$') | ||
|
||
samples['logS'] = statsAB['logL_P'] - statsA['logL_P'] - statsB['logL_P'] | ||
samples.set_label('logS', r'$\ln\mathcal{S}$') | ||
|
||
samples['d_G'] = statsA['d_G'] + statsB['d_G'] - statsAB['d_G'] | ||
samples.set_label('d_G', r'$d_\mathrm{G}$') | ||
|
||
p = chi2.sf(samples['d_G'] - 2 * samples['logS'], df=samples['d_G']) | ||
samples['p'] = p | ||
samples.set_label('p', '$p$') | ||
return samples |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
from pytest import approx, raises | ||
from anesthetic.examples.perfect_ns import correlated_gaussian | ||
import numpy as np | ||
from numpy.linalg import inv, slogdet | ||
from pandas.testing import assert_series_equal | ||
from anesthetic.tension import stats | ||
|
||
|
||
def test_tension_stats_compatible_gaussian(): | ||
np.random.seed(42) | ||
d = 3 | ||
nlive = 10 * d | ||
bounds = [[-1, 1], [0, 3], [0, 1]] | ||
logV = np.log(np.diff(bounds).ravel().prod()) | ||
|
||
muA = np.array([0.1, 0.3, 0.5]) | ||
covA = 0.01 * np.array([[0.010, 0.009, 0.0], | ||
[0.009, 0.010, 0.0], | ||
[0.000, 0.000, 0.1]]) | ||
invcovA = inv(covA) | ||
logLmaxA = 0 | ||
samplesA = correlated_gaussian(nlive, muA, covA, bounds, logLmaxA) | ||
|
||
muB = np.array([0.1, 0.3, 0.5]) | ||
covB = 0.01 * np.array([[+0.010, -0.009, +0.010], | ||
[-0.009, +0.010, -0.001], | ||
[+0.010, -0.001, +0.100]]) | ||
invcovB = inv(covB) | ||
logLmaxB = 0 | ||
samplesB = correlated_gaussian(nlive, muB, covB, bounds, logLmaxB) | ||
|
||
covAB = inv(invcovA + invcovB) | ||
muAB = covAB @ (invcovA @ muA + invcovB @ muB) | ||
dmuAB = muA - muB | ||
dmu_cov_dmu_AB = dmuAB @ invcovA @ covAB @ invcovB @ dmuAB | ||
logLmaxAB = logLmaxA + logLmaxB - dmu_cov_dmu_AB / 2 | ||
samplesAB = correlated_gaussian(nlive, muAB, covAB, bounds, logLmaxAB) | ||
|
||
nsamples = 10 | ||
beta = 1 | ||
s = stats(samplesA, samplesB, samplesAB, nsamples, beta) | ||
|
||
logR_exact = logV - dmu_cov_dmu_AB/2 - slogdet(2*np.pi*(covA+covB))[1]/2 | ||
assert s.logR.mean() == approx(logR_exact, abs=3*s.logR.std()) | ||
|
||
logS_exact = d / 2 - dmu_cov_dmu_AB / 2 | ||
assert s.logS.mean() == approx(logS_exact, abs=3*s.logS.std()) | ||
|
||
logI_exact = logV - d / 2 - slogdet(2*np.pi*(covA+covB))[1] / 2 | ||
assert s.logI.mean() == approx(logI_exact, abs=3*s.logI.std()) | ||
|
||
assert s.logS.mean() == approx(s.logR.mean() - s.logI.mean(), | ||
abs=3*s.logS.std()) | ||
|
||
assert s.get_labels().tolist() == ([r'$\ln\mathcal{R}$', | ||
r'$\ln\mathcal{I}$', | ||
r'$\ln\mathcal{S}$', | ||
r'$d_\mathrm{G}$', | ||
r'$p$']) | ||
|
||
with raises(ValueError): | ||
stats(samplesA.stats(nsamples=5), samplesB, samplesAB, nsamples) | ||
s2 = stats(samplesA.stats(nsamples=nsamples), | ||
samplesB.stats(nsamples=nsamples), | ||
samplesAB.stats(nsamples=nsamples)) | ||
assert_series_equal(s2.mean(), s.mean(), atol=s2.std().max()) | ||
|
||
|
||
def test_tension_stats_incompatible_gaussian(): | ||
np.random.seed(42) | ||
d = 3 | ||
nlive = 10 * d | ||
bounds = [[-1, 1], [0, 3], [0, 1]] | ||
logV = np.log(np.diff(bounds).ravel().prod()) | ||
|
||
muA = np.array([0.1, 0.3, 0.5]) | ||
covA = 0.01 * np.array([[0.010, 0.009, 0.0], | ||
[0.009, 0.010, 0.0], | ||
[0.000, 0.000, 0.1]]) | ||
invcovA = inv(covA) | ||
logLmaxA = 0 | ||
samplesA = correlated_gaussian(nlive, muA, covA, bounds, logLmaxA) | ||
|
||
muB = np.array([0.15, 0.25, 0.45]) | ||
covB = 0.01 * np.array([[+0.010, -0.009, +0.010], | ||
[-0.009, +0.010, -0.001], | ||
[+0.010, -0.001, +0.100]]) | ||
invcovB = inv(covB) | ||
logLmaxB = 0 | ||
samplesB = correlated_gaussian(nlive, muB, covB, bounds, logLmaxB) | ||
|
||
covAB = inv(invcovA + invcovB) | ||
muAB = covAB @ (invcovA @ muA + invcovB @ muB) | ||
dmuAB = muA - muB | ||
dmu_cov_dmu_AB = dmuAB @ invcovA @ covAB @ invcovB @ dmuAB | ||
logLmaxAB = logLmaxA + logLmaxB - dmu_cov_dmu_AB / 2 | ||
samplesAB = correlated_gaussian(nlive, muAB, covAB, bounds, logLmaxAB) | ||
|
||
nsamples = 10 | ||
beta = 1 | ||
s = stats(samplesA, samplesB, samplesAB, nsamples, beta) | ||
|
||
logR_exact = logV - dmu_cov_dmu_AB/2 - slogdet(2*np.pi*(covA+covB))[1]/2 | ||
assert s.logR.mean() == approx(logR_exact, abs=3*s.logR.std()) | ||
|
||
logS_exact = d / 2 - dmu_cov_dmu_AB / 2 | ||
assert s.logS.mean() == approx(logS_exact, abs=3*s.logS.std()) | ||
|
||
logI_exact = logV - d / 2 - slogdet(2*np.pi*(covA+covB))[1] / 2 | ||
assert s.logI.mean() == approx(logI_exact, abs=3*s.logI.std()) | ||
|
||
assert s.logS.mean() == approx(s.logR.mean() - s.logI.mean(), | ||
abs=3*s.logS.std()) | ||
|
||
assert s.get_labels().tolist() == ([r'$\ln\mathcal{R}$', | ||
r'$\ln\mathcal{I}$', | ||
r'$\ln\mathcal{S}$', | ||
r'$d_\mathrm{G}$', | ||
r'$p$']) | ||
|
||
with raises(ValueError): | ||
stats(samplesA.stats(nsamples=5), samplesB, samplesAB, nsamples) | ||
s2 = stats(samplesA.stats(nsamples=nsamples), | ||
samplesB.stats(nsamples=nsamples), | ||
samplesAB.stats(nsamples=nsamples)) | ||
assert_series_equal(s2.mean(), s.mean(), atol=s2.std().max()) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@williamjameshandley: The notation with
logI
matches the one in eq. (9) of Quantifying tensions in cosmological parameters. However, the Shannon informationI_S = log(P/pi)
already carries a logarithm in its notation, so I find thislogI
notation confusing. Wouldn't justI
be more appropiate?The notation
logR = logS - I
also matches better the Occam equationlogZ = logL_P - D_KL
...Thoughts?