-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine ts_init Query to Exclude Start Timestamp #1568
Conversation
@@ -541,7 +541,7 @@ def _build_query( | |||
|
|||
if start: | |||
start_ts = dt_to_unix_nanos(start) | |||
conditions.append(f"ts_init >= {start_ts}") | |||
conditions.append(f"ts_init > {start_ts}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely we want to be inclusive of the start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having end inclusive I don't think start should be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats your reasoning here? I think this would be surprising behaviour to be start exclusive - normally the choice is end
inclusive or exclusive (but start
is always inclusive for most APIs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason we wouldn't want to make this change is Databento is start
inclusive and end
exclusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to adopting the standard of making the end exclusive as well. My main concern arises when both start and end points are inclusive, as that's where potential issues of duplicate record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we just need one end exclusive - so I think better for that to be end
for consistency.
It can still be surprising to some users though to have exclusive endpoints in ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, it seems this issue wouldn't typically arise during normal use. Additionally, I've identified a similar situation elsewhere in the catalog. To sidestep the complexity of adding flags and the potential impact on other users, I'll close this matter for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make a good point about duplicate records if the catalog is repeatedly queried, as start and end are both inclusive. I'll have a deeper think on this and look at the use cases.
Pull Request
Updated the
ts_init
query condition to exclude thestart
timestamp (ts_init > {start_ts}), mitigating duplicate records in sequential queries. For flexibility, an optional flag for including/excluding the start timestamp maybe added in case keeping current use case is necessary.