Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup logic related to asof join #9913

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

yngve-sk
Copy link
Contributor

@yngve-sk yngve-sk commented Jan 30, 2025

Fixes 3 bugs with asof join relating to time-based observations/responses:

  1. Non-exact time matches were filtered out before the asof join ever happened, making it behave like an exact join
  2. time was in both the on and by argument, see closer description at the bottom of this comment***
  3. The default strategy is apparently backward, what we want it nearest, as the observations/responses may deviate both forwards and backwards in time

We also want to sort the summary/observations before doing the asof join, meaning we can discard the sorting of observations by obs_name, as that sorting is applied explicitly just before the join anyway. Meaning, we can store time-based responses and observations sorted by time.

***For the polars asof join, it seems that including the on argument in the by argument will make it behave like an exact join, thus we exclude time from the `on argument.

pivoted
Out[1]: 
shape: (5, 3)
┌──────────────┬─────────────────────────┬─────┐
│ response_key ┆ time                    ┆ 0   │
│ ---          ┆ ---                     ┆ --- │
│ cat          ┆ datetime[ms]            ┆ f32 │
╞══════════════╪═════════════════════════╪═════╡
│ FOPR         ┆ 2000-01-01 00:59:59.500 ┆ 0.0 │
│ FOPT_OP1     ┆ 2000-01-01 01:00:00.500 ┆ 1.0 │
│ FOPR:OP3     ┆ 2000-01-01 00:59:59.500 ┆ 2.0 │
│ FLAP         ┆ 2000-01-01 01:00:00.500 ┆ 3.0 │
│ F*           ┆ 2000-01-01 00:59:59.500 ┆ 4.0 │
└──────────────┴─────────────────────────┴─────┘
observations_by_type
Out[2]: 
{'summary': shape: (5, 5)
 ┌─────────────────┬──────────────┬─────────────────────┬──────────────┬─────┐
 │ observation_key ┆ response_key ┆ time                ┆ observations ┆ std │
 │ ---             ┆ ---          ┆ ---                 ┆ ---          ┆ --- │
 │ str             ┆ str          ┆ datetime[ms]        ┆ f32          ┆ f32 │
 ╞═════════════════╪══════════════╪═════════════════════╪══════════════╪═════╡
 │ o_FOPR          ┆ FOPR         ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 │
 │ o_FOPT_OP1      ┆ FOPT_OP1     ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 │
 │ o_FOPR:OP3      ┆ FOPR:OP3     ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 │
 │ o_FLAP          ┆ FLAP         ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 │
 │ o_F*            ┆ F*           ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 │
 └─────────────────┴──────────────┴─────────────────────┴──────────────┴─────┘}
observations_for_type.join_asof(
                            pivoted,
                            by=[
                                "response_key",
                                "time",
                            ],
                            on="time",
                            strategy="nearest",
                            tolerance="1s",
                        )
Out[3]: 
shape: (5, 6)
┌─────────────────┬──────────────┬─────────────────────┬──────────────┬─────┬──────┐
│ observation_key ┆ response_key ┆ time                ┆ observations ┆ std ┆ 0    │
│ ---             ┆ ---          ┆ ---                 ┆ ---          ┆ --- ┆ ---  │
│ str             ┆ cat          ┆ datetime[ms]        ┆ f32          ┆ f32 ┆ f32  │
╞═════════════════╪══════════════╪═════════════════════╪══════════════╪═════╪══════╡
│ o_FOPR          ┆ FOPR         ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 ┆ null │
│ o_FOPT_OP1      ┆ FOPT_OP1     ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 ┆ null │
│ o_FOPR:OP3      ┆ FOPR:OP3     ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 ┆ null │
│ o_FLAP          ┆ FLAP         ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 ┆ null │
│ o_F*            ┆ F*           ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 ┆ null │
└─────────────────┴──────────────┴─────────────────────┴──────────────┴─────┴──────┘
observations_for_type.join_asof(
                            pivoted,
                            by=[
                                "response_key",
                            ],
                            on="time",
                            strategy="nearest",
                            tolerance="1s",
                        )
Out[4]: 
shape: (5, 6)
┌─────────────────┬──────────────┬─────────────────────┬──────────────┬─────┬─────┐
│ observation_key ┆ response_key ┆ time                ┆ observations ┆ std ┆ 0   │
│ ---             ┆ ---          ┆ ---                 ┆ ---          ┆ --- ┆ --- │
│ str             ┆ cat          ┆ datetime[ms]        ┆ f32          ┆ f32 ┆ f32 │
╞═════════════════╪══════════════╪═════════════════════╪══════════════╪═════╪═════╡
│ o_FOPR          ┆ FOPR         ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 ┆ 0.0 │
│ o_FOPT_OP1      ┆ FOPT_OP1     ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 ┆ 1.0 │
│ o_FOPR:OP3      ┆ FOPR:OP3     ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 ┆ 2.0 │
│ o_FLAP          ┆ FLAP         ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 ┆ 3.0 │
│ o_F*            ┆ F*           ┆ 2000-01-01 01:00:00 ┆ 1.0          ┆ 0.1 ┆ 4.0 │
└─────────────────┴──────────────┴─────────────────────┴──────────────┴─────┴─────┘

Copy link

codspeed-hq bot commented Jan 30, 2025

CodSpeed Performance Report

Merging #9913 will improve performances by 10.52%

Comparing yngve-sk:25.01.30.fixup-joining (fb0f7b7) with main (c6128f7)

Summary

⚡ 1 improvements
✅ 24 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
test_load_from_context[gen_x: 20, sum_x: 20 reals: 10] 7 ms 6.3 ms +10.52%

@yngve-sk yngve-sk force-pushed the 25.01.30.fixup-joining branch from db162f9 to 7378818 Compare January 30, 2025 10:41
@yngve-sk yngve-sk force-pushed the 25.01.30.fixup-joining branch from 7378818 to 045dcb2 Compare January 31, 2025 09:40
@yngve-sk yngve-sk changed the title Sort obs&responses by time for asof_join Fixup logic related to asof join Jan 31, 2025
@yngve-sk yngve-sk self-assigned this Jan 31, 2025
@yngve-sk yngve-sk added the release-notes:bug-fix Automatically categorise as bug fix in release notes label Jan 31, 2025
@yngve-sk yngve-sk requested a review from oyvindeide January 31, 2025 11:06
Comment on lines 726 to 731
for current in range(1, len(expected_to_be_equal_without_index)):
prev = current - 1
prev_df = expected_to_be_equal_without_index[prev]
current_df = expected_to_be_equal_without_index[current]

assert current_df.drop("index").equals(prev_df.drop("index"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to rewrite this to a parameterize? Asserts in loops can be quite hard to debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be more readable now, the loop was basically just an assert that all the dataframes, without the index were equal, index was approximately, but not strictly equal as it contained the times +/- the .5s

@yngve-sk yngve-sk requested a review from oyvindeide January 31, 2025 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:bug-fix Automatically categorise as bug fix in release notes
Projects
Status: Ready for Review
Development

Successfully merging this pull request may close these issues.

2 participants