-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixup logic related to asof join #9913
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #9913 will improve performances by 10.52%Comparing Summary
Benchmarks breakdown
|
db162f9
to
7378818
Compare
having "time" in by and on makes it effectively behave as an exact join
7378818
to
045dcb2
Compare
for current in range(1, len(expected_to_be_equal_without_index)): | ||
prev = current - 1 | ||
prev_df = expected_to_be_equal_without_index[prev] | ||
current_df = expected_to_be_equal_without_index[current] | ||
|
||
assert current_df.drop("index").equals(prev_df.drop("index")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to rewrite this to a parameterize
? Asserts in loops can be quite hard to debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be more readable now, the loop was basically just an assert that all the dataframes, without the index
were equal, index
was approximately, but not strictly equal as it contained the times +/- the .5s
Fixes 3 bugs with asof join relating to time-based observations/responses:
time
was in both theon
andby
argument, see closer description at the bottom of this comment***backward
, what we want itnearest
, as the observations/responses may deviate both forwards and backwards in timeWe also want to sort the summary/observations before doing the asof join, meaning we can discard the sorting of observations by obs_name, as that sorting is applied explicitly just before the join anyway. Meaning, we can store time-based responses and observations sorted by time.
***For the polars asof join, it seems that including the
on
argument in theby
argument will make it behave like an exact join, thus we excludetime
from the `on argument.