Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve quality of sequence_index #1765

Merged
merged 6 commits into from
Feb 5, 2024

Conversation

frances-h
Copy link
Contributor

CU-86az5xcqz
Resolve #1760

The sequence index is now split into a diff column which goes through the sequential model and a context column which is added to the context model. Additionally, FloatFormatter is used on the diff column to force sampled columns to be within the given min/max range.

Also, sample_sequential_columns had to be adjusted to use conditional sampling to get missing extra context columns before getting sequential samples.

@sdv-team
Copy link
Contributor

@codecov-commenter
Copy link

codecov-commenter commented Jan 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e6e508b) 97.11% compared to head (1428364) 97.12%.
Report is 1 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1765      +/-   ##
==========================================
+ Coverage   97.11%   97.12%   +0.01%     
==========================================
  Files          48       48              
  Lines        4570     4598      +28     
==========================================
+ Hits         4438     4466      +28     
  Misses        132      132              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@frances-h frances-h marked this pull request as ready for review January 31, 2024 21:26
@frances-h frances-h requested a review from a team as a code owner January 31, 2024 21:26
@frances-h frances-h requested review from amontanez24 and rwedge and removed request for a team January 31, 2024 21:26
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Maybe we should add an integration test to show that the issues raised that led to this one were resolved

Comment on lines +181 to +184
data = data.merge(
sequence_index_context,
left_on=self._sequence_key,
right_index=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's funny that we merge this back in but just separate it out again later. Not sure if there's a way to avoid that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but not easily right now. PAR also takes in the context columns when assembling the sequences and fitting the model so we'd need to do the join regardless. We could investigate if the context is really needed there though, and then skip the join here if so.

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@frances-h frances-h merged commit 1d2b03e into main Feb 5, 2024
37 checks passed
@frances-h frances-h deleted the issue-1760-improve-sequence_index-quality branch February 5, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve quality of sequence_index: Move the start dates into the context model
5 participants