Refine ts_init Query to Exclude Start Timestamp #1568

rsmb7z · 2024-03-29T19:58:17Z

Pull Request

Updated the ts_init query condition to exclude the start timestamp (ts_init > {start_ts}), mitigating duplicate records in sequential queries. For flexibility, an optional flag for including/excluding the start timestamp maybe added in case keeping current use case is necessary.

cjdsellers · 2024-03-29T20:07:13Z

nautilus_trader/persistence/catalog/parquet.py

@@ -541,7 +541,7 @@ def _build_query(

        if start:
            start_ts = dt_to_unix_nanos(start)
-            conditions.append(f"ts_init >= {start_ts}")
+            conditions.append(f"ts_init > {start_ts}")


Surely we want to be inclusive of the start?

Having end inclusive I don't think start should be included.

Whats your reasoning here? I think this would be surprising behaviour to be start exclusive - normally the choice is end inclusive or exclusive (but start is always inclusive for most APIs).

One reason we wouldn't want to make this change is Databento is start inclusive and end exclusive.

I'm open to adopting the standard of making the end exclusive as well. My main concern arises when both start and end points are inclusive, as that's where potential issues of duplicate record.

In that case, we just need one end exclusive - so I think better for that to be end for consistency.

It can still be surprising to some users though to have exclusive endpoints in ranges.

Understood, it seems this issue wouldn't typically arise during normal use. Additionally, I've identified a similar situation elsewhere in the catalog. To sidestep the complexity of adding flags and the potential impact on other users, I'll close this matter for now.

You make a good point about duplicate records if the catalog is repeatedly queried, as start and end are both inclusive. I'll have a deeper think on this and look at the use cases.

Change ts_init condition to exclusive in query

2c08b08

cjdsellers reviewed Mar 29, 2024

View reviewed changes

rsmb7z closed this Mar 30, 2024

rsmb7z deleted the pr_240329b branch April 11, 2024 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine ts_init Query to Exclude Start Timestamp #1568

Refine ts_init Query to Exclude Start Timestamp #1568

rsmb7z commented Mar 29, 2024

cjdsellers Mar 29, 2024

rsmb7z Mar 29, 2024

cjdsellers Mar 29, 2024

cjdsellers Mar 29, 2024

rsmb7z Mar 29, 2024

cjdsellers Mar 30, 2024

rsmb7z Mar 30, 2024

cjdsellers Mar 30, 2024

Refine ts_init Query to Exclude Start Timestamp #1568

Refine ts_init Query to Exclude Start Timestamp #1568

Conversation

rsmb7z commented Mar 29, 2024

Pull Request

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment