Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for rows updated in the same transaction checks for all the transactions in the instance #36

Closed
shrodingers opened this issue Sep 27, 2022 · 14 comments
Assignees

Comments

@shrodingers
Copy link

shrodingers commented Sep 27, 2022

Hello !

Just ran into some issues where history was not updated for lots of updates, and saw that the txid_current_snapshot() was used in the versionning function to check if the Old row was updated in the current transaction.
However, according to some testings on my end, plus postgres doc, seems that current_snapshot takes the min transaction not commited on the instance, and the max is always the not yet assigned txid at the time of the snapshot. Which means OLD.xid can never be above max, and if there is a long-running transaction in the instance (ie: cleanup work / data export etc..), the OLD.xmin will be between the snapshot min and max. Meaning we won't create an entry inside history despite the fact that we shall have.
I think the way to correctly check for that is to use txid_current to check against OLD.xmin.

I already tested this update to solve my issue, and would gladly submit a PR if you think my reasoning makes sense :)

Thanks again for this repo which helped me a lot since a long time !

EDIT from @jack-robson : The references to txid_current_snapshot, OLD.xmin and OLD.xmax reference an old version of the codebase. This could was changed to the code shown in the screenshot below, in this PR: #29

@denes-fekeshazy denes-fekeshazy self-assigned this Apr 3, 2023
@denes-fekeshazy denes-fekeshazy removed their assignment Apr 13, 2023
@jack-robson
Copy link
Contributor

jack-robson commented Oct 18, 2023

Hello 👋

If I'm not mistaken, we could remove this code from versioning_function.sql:

Screenshot 2023-10-18 at 11 07 19

Which currently is causing the function to ignore any updates made to rows within the same transaction, and those transactions that are slow and outdated when they finally commit. By removing this check, we would be able to version every change, providing a more complete history of changes to each row.

Though this comes with some potential downsides:

  • Performance: More entries in the history table, possibly affecting query performance.
  • Storage: More disk space used.

Also, this would be a breaking change, as it would alter the existing behaviour and warrants a discussion.

@jack-robson
Copy link
Contributor

jack-robson commented Oct 18, 2023

For ease of discussion and potential testing if implemented, here is a mock script that emulates some of the use cases.

version_example.txt

This script does the following:

  • Create employee and employee_history tables.
  • Add versioning trigger to employee.
  • Insert 'Alice' into employee.
  • Start transaction; update name to 'Bob'; don't commit.
  • In new transaction, change name to 'Charlie', then 'Derek'; commit.
  • Another transaction; change name to 'Eva'; commit.
  • Commit first transaction ('Bob').

I guess the design question is what do we expect to be tracked if we ran this example?

What happens now?

What currently happens is that at the end of the whole set of transactions:

  • Eva is the user in the main table
  • Derek is the last inserted row in the history table, preceded by Alice.

This behavior is consistent with postgres and the original C extension because postgres, whether with or without any extension installed, never sees Bob, therefore the history table is consistent with that behavior as well (or in other words, Eva never appears in the history table because Bob never "displaces" her from the main table).

What should happen?

It makes sense Eva is the last employee value recorded in the main table, as it's the last transaction to start. But what should be tracked in employee history?

  1. Should Bob be there? (outdated tx) -> (@simoneb ) I don't think so, because since the main table never sees Bob, then I don't see why Bob should end up in the history table
  2. Should Charlie? (outdated value inside one tx) -> (@simoneb ) definitely not because Charlie also never ends up in the main table

Note: In standard PostgreSQL system-versioning, tstzrange is set at UPDATE, not COMMIT. An old UPDATE committed late won't become the latest value due to its earlier timestamp. Therefore Eva is the last name of the Employee, not Bob.

@jack-robson jack-robson self-assigned this Oct 18, 2023
@olafurw
Copy link

olafurw commented Oct 18, 2023

Talk about timing. I'm doing a migration and I just hit this issue, and since with migration we want an all or nothing we wrap everything in a transaction.

@jack-robson
Copy link
Contributor

Hey @olafurw 👋

Funny timing indeed 😁

Sounds like your doing some sort of atomic transaction? Is that right?

I've been tasked to see if this plugin still solves a valid use case. When this was created 6 years ago, AWS RDS and other providers didn't have support for the original Temporal Tables C extension. Are you still having this problem?

Re: this issue, what sort of scenarios are not being supported for your current use case?

@olafurw
Copy link

olafurw commented Oct 19, 2023

Hey. We are using temporal tables for providing history of data and running it on AWS RDS which from our search doesn't support the C version through trusted extensions, so we're using the function version.

We're migrating from a MySQL db that didn't use temporal tables to a Postgres db that will use it and it's important while migrating to write things in transactions. One of the scripts has the whole history of an object in one transaction and the history table didn't update but hotfixing the function with your diff does the job.

@shrodingers
Copy link
Author

For a bit of history about this issue, it was for a previous issue solved by #29 :) Which was causing issues even when not in the same transaction.

But the point of keeping multi-updates in the same transaction in history or not still seems valid, maybe good for this behaviour to be configurable (I do not know how the C extension behaves in such situation)

@jack-robson
Copy link
Contributor

Glad my diff made a difference 😁

Yeh, a configurable option sounds interesting

I will look into C extension today and go from there.

Thanks for the comments everyone 👍

@shrodingers
Copy link
Author

Thanks for all @jack-robson ! Was glad to see that the fix we did last year was exactly the same that you did in #29 pr :)

Have a nice day !

@olafurw
Copy link

olafurw commented Oct 19, 2023

+1 for configuration option. I don't mind what the default is, it could keep the current behavior as default and a configuration to turn it off.

@jack-robson
Copy link
Contributor

jack-robson commented Oct 19, 2023

After some more research, and internal discussion, we have decided not to support the changes discussed in this issue.

Our main reasoning, is that we want mimic the behaviour set by PostgreSQL and the original C extension.

Going back to my comment above (which has been edited for extra clarity) we don't want Bob and Charlie to end up in the history table, because they never make it to the main table.

The goal of original temporal table extension is to track changes when they are made to a main table, and we feel moving away from that would not be in the best interest of maintaining consistent and expected behavior for our users. Therefore, we're sticking with the current implementation.

Thanks for your invaluable input and the time invested in this discussion. While we're not implementing these changes now, your suggestions have sparked important internal conversations and may shape future updates. We genuinely value community contributions like yours.

@olafurw
Copy link

olafurw commented Oct 19, 2023

Not even as an optional configuration flag? That's disappointing to hear. I guess I'll keep using the patched version then.

@simoneb
Copy link
Member

simoneb commented Oct 19, 2023

Not even as an optional configuration flag? That's disappointing to hear. I guess I'll keep using the patched version then.

Why would you want the history to keep track of changes that, as far as the main table is concerned, never actually happened?

@shrodingers
Copy link
Author

I can understand the use case :
IE: you are replaying all changes from a database to another in the case of a migration, and running it all in the same transaction, and you want the temporal_tables to be updated with all the changes that happen (I can totally such need in the case of wal-replication from a database to another).
Or for audit reason you wanna get some changes which, even if never commited, could have been and need to be kept.
Or you want to keep track to be able to debug some stuff happening in the same transaction, if changes happen but never makes it to the final table.
Anyways, I still think it can be nice to be able to opt-in this kind of behaviour, in the same flavour as the ignore updates when no change flag which also diverges from original implementation.

@simoneb
Copy link
Member

simoneb commented Oct 19, 2023

@olafurw @shrodingers if you do have this fairly unusual use case, for which I believe there are other, better solutions, I would recommend to open another issue. The original issue discussed in this conversation is already solved, as we figured out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants