-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest events backwards #6834
Comments
A case for importance (being able to replay events later is already called out and we saw a need for that this week), but additionally network is unpredictable - sometimes a packet sent later will arrive earlier, sometimes an earlier packet fails and will be retried later. For
Regardless of the order of arrival we want the value to be 1 in the end. If 2 is sent first, we need to know it was a Furthermore this needs to also end up with val = 1 even if packet 1 is sent later.
I propose we keep an additional column Second tricky function is |
Hmm... how come we need to complicate this? We give each property a timestamp. Then if Or what am I missing? |
Imagine someone uses |
I'm still not following. If requests arrive later because of network issues, they should still contain the original timestamp, no? That's why we have all these |
I'm not actually sure how & where these three are set, I was making the assumption that we only have
Sorry this can be confusing & maybe I'm misunderstanding something. You can check the table below, maybe that's more helpful. To elaborate on my earlier example ... let me provide the exact calls I was thinking & what we'd want to happen. Also adding numbering to my statements, so it's easier to call out where it doesn't make sense anymore.
(1) For this in the end we want the outcome to be
(4) Now imagine network was bad and the requests arrive in the reverse order
(5) After having processed the first
(7) then we see that Here's a table to explain when the override should happen. The example I provided above corresponds to row number 7. Based on rows 3&4 (also 7&8) differing depending on what the previous call that wrote the value was I can't think of a way to override correctly without that information.
|
Thanks for being patient and taking time to break this down. Now I get the issue and adding this column does make a lot of sense! |
This issue has 1133 words. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:
|
#7454 Person Created At needs to also handle ingestion in any order |
Curious what the status of this is? Anything I can to do help at all (e.g. manual testing, etc)? |
Potentially ... here's how I'm thinking about doing the testing:
I suspect we'll see some differences that we'll need to ignore, e.g. maybe geoIP or maybe turning that off before is better. |
Relevant recent conversation (tied to person properties updates): https://posthog.slack.com/archives/C0374DA782U/p1660840149534569 We have |
This issue hasn't seen activity in two years! If you want to keep it open, post a comment or remove the |
This issue was closed due to lack of activity. Feel free to reopen if it's still relevant. |
Whenever we ingest events, we end up with entries in 3 tables:
events
,persons
andperson_distinct_ids
.Excluding plugins for now, if we ingest the same events again in the same order, we will (most likely) end up with the same entries in those tables.
However, I think we should also get exactly the same entries in those tables if we ingest events... backwards. If we do, it would make ingestion, retries and exports a lot simpler. For example, we could export multiple months in parallel from one posthog instance to another and know that all persons were transferred as they are.
This task touches two things:
The details and feasability of the above points need to be specced out.
The text was updated successfully, but these errors were encountered: