Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] tSNE Lock up #2565
[BUG] tSNE Lock up #2565
Changes from 6 commits
59235bf
4c9e1d6
0d3884e
06e5c0f
d78833d
a396a52
6167b6c
65d15ee
9131614
ea794b4
6cf29f9
45768ca
8cedd09
17f9529
9fec263
9af3ca8
911efcc
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this problem is so dataset dependent and doesn't seem to occur on most real-world datasets, it would be unfortunate to have to double the amount of embedding memory by default. While it's not nearly the same as duplicating the input data, training 50M vertices still requires 400mb of extra memory just for this feature.
What do you think about making this feature optional and maybe mentioning the option in the warning? If the option is disabled, we just return the embedding the way it is w/ the NaN values. The option can be enabled and use a little extra memory if users still want the embeddings knowing that training wasn't able to complete successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. Are you thinking the option would be exposed at the Python level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just a simple flag exposed through the TSNE constructor would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you been able to visually inspect the output embeddings of any of the datasets that are causing this failure? It would be nice to know if they have been reasonably embedded by the time the rollback occurs or if they are just garbage.