Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add improved goodness of fit implementation #190

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

stes
Copy link
Member

@stes stes commented Oct 27, 2024

This adds a better goodness of fit measure. Instead of the old variant which simply matched the InfoNCE and depends on the batch size, the proposed measure

  • is at 0 for "chance level" (vs. log batch size)
  • does not need an adjustment for single session vs. multi-session solvers
  • increases as the model gets better, which might be more intuitive

The conversion is quite simply done via

GoF(model) = log (batch_size_per_session * num_sessions) - InfoNCE(model)

This measure is also used in DeWolf et al., 2024, Eq. (43)

image

Application example (GoF improves from 0 to a larger value during training):

image


Close https://github.com/AdaptiveMotorControlLab/CEBRA-dev/pull/669

@cla-bot cla-bot bot added the CLA signed label Oct 27, 2024
@stes stes self-assigned this Oct 27, 2024
@stes stes mentioned this pull request Oct 27, 2024
3 tasks
@stes
Copy link
Member Author

stes commented Oct 27, 2024

TODO: Fix case where batch_size is None
image

@stes stes force-pushed the stes/better-goodness-of-fit branch from 5e21cdc to c826b68 Compare October 27, 2024 21:31
@stes stes force-pushed the stes/better-goodness-of-fit branch from c826b68 to f43971f Compare November 29, 2024 13:31
@CeliaBenquet
Copy link
Member

@stes about what I implemented in #202 that I do see here.

I think it would be good to have a really basic function where you provide the loss and the batch size, so that it is easily usable in the pytorch implementation as well.

Also, it would be nice to test for the default CEBRA.batch_size = None, not sure it is handled here.

@stes
Copy link
Member Author

stes commented Dec 16, 2024

Unrelated build issue due to upstream change in sklearn (#204 ), attempted fix in #205

@stes stes requested a review from CeliaBenquet December 16, 2024 18:18
@stes
Copy link
Member Author

stes commented Dec 16, 2024

The build issue is fixed, and once #205 is merged tests should pass here as well.

@stes stes force-pushed the stes/better-goodness-of-fit branch from 1d55ead to ad8ae60 Compare December 16, 2024 19:33
@stes stes changed the title [WIP] Add improved goodness of fit implementation Add improved goodness of fit implementation Dec 16, 2024
@stes stes added the enhancement New feature or request label Dec 16, 2024
Copy link
Member

@CeliaBenquet CeliaBenquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @stes! This looks nice!! Some minor suggestions on the docstrings and maybe add some tests for the different corner cases based on the arguments provided in infonce_to_goodness_of_fit.

Comment on lines +116 to +118
"""Compute the InfoNCE loss on a *single session* dataset on the model.
This function uses the :func:`infonce_loss` function to compute the InfoNCE loss.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it computes the goodness of fit score from the infonce loss no?

Suggested change
"""Compute the InfoNCE loss on a *single session* dataset on the model.
This function uses the :func:`infonce_loss` function to compute the InfoNCE loss.
"""Compute the goodness of fit score on a *single session* dataset on the model.
This function uses the :func:`infonce_loss` function to compute the InfoNCE loss
for a given `cebra_model` and the :func:`infonce_to_goodness_of_fit` function
to derive the goodness of fit from the InfoNCE loss.

return infonce_to_goodness_of_fit(loss, cebra_model)


def goodness_of_fit_history(model):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def goodness_of_fit_history(model):
def goodness_of_fit_history(model: cebra_sklearn_cebra.CEBRA) -> np.ndarray:

Args:
infonce: The InfoNCE loss, either a single value or an iterable of values.
model: The trained CEBRA model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model: The trained CEBRA model
model: The trained CEBRA model.

num_sessions = 1
else:
if batch_size is None or num_sessions is None:
raise ValueError("batch_size should be provided if model is not provided.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("batch_size should be provided if model is not provided.")
raise ValueError(
f"batch_size ({batch_size}) and num_sessions ({num_sessions})"
f"should be provided if model is not provided."
)

@@ -383,3 +383,67 @@ def test_sklearn_runs_consistency():
with pytest.raises(ValueError, match="Invalid.*embeddings"):
_, _, _ = cebra_sklearn_metrics.consistency_score(
invalid_embeddings_runs, between="runs")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, some tests to test infonce_to_goodness_of_fit errors raising based on model / batch size (=None) / num_sessions (= None, num_sessions > than the actual number of sessions, etc) are necessary here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA signed enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants