-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve quality of sequence_index
#1765
Conversation
…nd fix for single sequence
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1765 +/- ##
==========================================
+ Coverage 97.11% 97.12% +0.01%
==========================================
Files 48 48
Lines 4570 4598 +28
==========================================
+ Hits 4438 4466 +28
Misses 132 132 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Maybe we should add an integration test to show that the issues raised that led to this one were resolved
data = data.merge( | ||
sequence_index_context, | ||
left_on=self._sequence_key, | ||
right_index=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's funny that we merge this back in but just separate it out again later. Not sure if there's a way to avoid that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but not easily right now. PAR also takes in the context columns when assembling the sequences and fitting the model so we'd need to do the join regardless. We could investigate if the context is really needed there though, and then skip the join here if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
CU-86az5xcqz
Resolve #1760
The sequence index is now split into a diff column which goes through the sequential model and a context column which is added to the context model. Additionally,
FloatFormatter
is used on the diff column to force sampled columns to be within the given min/max range.Also,
sample_sequential_columns
had to be adjusted to use conditional sampling to get missing extra context columns before getting sequential samples.