Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add special domain keys to data #2

Merged
merged 1 commit into from
Jun 17, 2024
Merged

add special domain keys to data #2

merged 1 commit into from
Jun 17, 2024

Conversation

mazabou
Copy link
Member

@mazabou mazabou commented Jun 12, 2024

the Data object is defined by a domain which is an interval object. but more generally we would want to associate different domains with this object. In particular there are two types:

  1. split domains: this would be the train_domain, test_domain etc... which help define the domains from which to sample for a given split
  2. epoch domains: in certain experiments, we will have different "epochs" where a task is performed, this can be defined as a domain as well. so we will have drifting_gratings_domain, natural_movies_domain etc...

These domains are "higher order entities" and are exempt from splitting, so we skip them.


In current code, trials is not checked for data leakage. instead of removing this, currently it is simply deprecated

if (
not hasattr(obj, f"{name}_mask")
or not getattr(obj, f"{name}_mask").all()
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we remove this whole logic? I believe "trials" is not a general concept, and so we shouldn't have an if-else for it in temporaldata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tagged likes 2902-2914 here. GH only shows a few for some reason

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was old code that I wanted to deprecate so that I don't break too many things for people (or at least provide a clear error so they know what to fix), but I am down to also completely remove it.

@vinamarora8
Copy link
Member

I get why this is needed, and I don't mind current implementations. Though I have a reservation about using object names to decide what logic will be executed on them. It's an opinion, but I don't think it's clean. Another way, which I right now think avoids this is - create a new class Domain that is a subclass of Interval, and then condition on the object being a Domain or not.

@mazabou
Copy link
Member Author

mazabou commented Jun 13, 2024

I am open to that. The only drawback is that people will not think that domains are also intervals, or that they can do & and | between domains and intervals. also how do we resolve Domain & Interval, will the output be a Domain or an Interval?

@vinamarora8
Copy link
Member

vinamarora8 commented Jun 13, 2024

I don't see that as a problem. Equivalently, people wouldn't know if they name something "_domain", then it behaves differently. One has to read the docs to understand what's going on in either case.
Re. Domain & Interval, I would say make the output an interval. And we can have something like a classmethod to promote Interval objects to Domain objects.

@mazabou mazabou merged commit 4aa42bc into main Jun 17, 2024
5 checks passed
@mazabou mazabou deleted the mehdi/domain branch June 17, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants