bugfix - ignore subsets of near-zero-ratio #187
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
When dataset size is 5 and split ratio is [train=0.1, val=0.9, test=0.0], the splitter splits the dataset into [1, 3, 1] instead of [1, 4, 0].
This is a special case of incorrect partitioning due to the inexact round function of python.
So I fix this bug by ignoring subsets with a near-zero ratio.
How to test
Checklist
develop
branchLicense
Feel free to contact the maintainers if that's a concern.