Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine ts_init Query to Exclude Start Timestamp #1568

Closed
wants to merge 1 commit into from

Conversation

rsmb7z
Copy link
Collaborator

@rsmb7z rsmb7z commented Mar 29, 2024

Pull Request

Updated the ts_init query condition to exclude the start timestamp (ts_init > {start_ts}), mitigating duplicate records in sequential queries. For flexibility, an optional flag for including/excluding the start timestamp maybe added in case keeping current use case is necessary.

@@ -541,7 +541,7 @@ def _build_query(

if start:
start_ts = dt_to_unix_nanos(start)
conditions.append(f"ts_init >= {start_ts}")
conditions.append(f"ts_init > {start_ts}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely we want to be inclusive of the start?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having end inclusive I don't think start should be included.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats your reasoning here? I think this would be surprising behaviour to be start exclusive - normally the choice is end inclusive or exclusive (but start is always inclusive for most APIs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason we wouldn't want to make this change is Databento is start inclusive and end exclusive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to adopting the standard of making the end exclusive as well. My main concern arises when both start and end points are inclusive, as that's where potential issues of duplicate record.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we just need one end exclusive - so I think better for that to be end for consistency.

It can still be surprising to some users though to have exclusive endpoints in ranges.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, it seems this issue wouldn't typically arise during normal use. Additionally, I've identified a similar situation elsewhere in the catalog. To sidestep the complexity of adding flags and the potential impact on other users, I'll close this matter for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a good point about duplicate records if the catalog is repeatedly queried, as start and end are both inclusive. I'll have a deeper think on this and look at the use cases.

@rsmb7z rsmb7z closed this Mar 30, 2024
@rsmb7z rsmb7z deleted the pr_240329b branch April 11, 2024 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants