diff --git a/Site/Posts/postgres_use_views_to_refactor_to_soft_delete.md b/Site/Posts/postgres_use_views_to_refactor_to_soft_delete.md index 736a7cf..7da531f 100644 --- a/Site/Posts/postgres_use_views_to_refactor_to_soft_delete.md +++ b/Site/Posts/postgres_use_views_to_refactor_to_soft_delete.md @@ -13,13 +13,17 @@ } ;;; +_Author's note: I have updated this article with some extra bits of information as well as a repository demonstrating this solution. I hope this is helpful!_ + In the world of persistence, there's two main patterns (maybe "groups of patterns") to handle deletion: _hard_ and _soft_ deletion. Hard deletion is the default for most database systems - when you "delete" a record it is wiped from the database; as soon as the delete is committed the data is gone forever. More common in most business scenarios - particularly server-side - is to retain the deleted data, just using a flag to hide the "deleted" data from the customer. This is soft deletion: it allows us to support more comprehensive internal reporting, maintain more complicated referential integrity, and we can support an "undo" button for our users. If you're designing a system from the ground up and you know you need to accommodate soft deletion, there's a whole host of implementations you can tailor to your needs. If you're updating an existing system from hard to soft delete though, you're more constrained. Yes, the data that has already been hard deleted is irrecoverable after your migration, but that's not your main concern. You might have a lot of tables and there's probably already a lot of code written to query against these tables. This refactor might seem like a lot of work at first glance. Luckily because [you're definitely using Postgres](https://ian.wold.guru/Posts/just_use_postgresql.html), this really isn't a major concern. There's a simple way using views and rules we can add soft deletion _without changing any of our queries_! In fact, there are two options which you might want to chose in different situations. I've implemented these strategies on a few production systems with zero downtime, and I'm sure you'll be able to make just as quick work of it as I can. -The first strategy keeps all records - those that are deleted and those that aren't - in a single table and uses a view to filter out deleted items. The second strategy adds a second table specifically for deleted items and uses a view to join the deleted and non-deleted records when needed. I have found the first option simpler to maintain and _usually_ simpler to implement, while the second option is better in scenarios where I need to run a lot of different queries on the deleted vs. non-deleted data. Note that the second option _can_ be easier to implement depending on the existing schema. These strategies could feasibly be combined in a single database, with some tables using one strategy and others the other, though consistency is probably better here. +The first strategy keeps all records - those that are deleted and those that aren't - in a single table and uses a view to filter out deleted items. The second strategy adds a second table specifically for deleted items and uses a view to join the deleted and non-deleted records when needed. I have found the first option simpler to implement and maintain, while the second option is sometimes better in scenarios where I need to run a lot of different queries on the deleted vs. non-deleted data. These strategies could feasibly be combined in a single database, with some tables using one strategy and others the other, though consistency is probably better here. + +I've set up [a repository on GitHub](https://github.com/IanWold/PostgresRefactorSoftDelete) which runs both of these solutions on a database. If you're seriously considering implementing a migration like this, I would encourage you to fork and play with my repo, I hope it can be an effective way to tinker with some of these ideas before moving into an actual codebase. Throughout the article I'll reference where each bit of code is in this repo. # Setting Up @@ -49,6 +53,8 @@ CREATE TABLE cart_items ( CONSTRAINT fk_cart FOREIGN KEY (cart_id) REFERENCES carts (cart_id) ON DELETE CASCADE, CONSTRAINT fk_item FOREIGN KEY (item_id) REFERENCES items (item_id) ON DELETE CASCADE ); + +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/Program.cs#L18 [tl! autolink] ``` And we're going to need to support all the various queries currently written against these tables. In addition to all the queries `SELECT`ing from these tables, we're inserting, updating, and indeed deleting from each of these tables - those queries all need to remain the same. @@ -57,12 +63,14 @@ There's a couple interesting queries I want to consider. Take the scenario where ```sql DELETE FROM carts WHERE cart_id = @cartId +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/Program.cs#L139 [tl! autolink] ``` In this case, it's pretty straightforward to assume that its items should delete at the same time, so we'll need to preserve the cascade functionality but with soft delete in mind. How about the following though: ```sql DELETE FROM items WHERE item_id = @itemId; +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/Program.cs#L138 [tl! autolink] ``` In our current production system, when we delete that item it will delete from the cart automatically. This is a good demonstration of why we care about soft delete - by preserving the data in our database we'll be able to tell the user that their dinner plate set is no longer available. Indeed, properly supporting that functionality is going to require some extra work in our code, but remember that _that_ is a feature add: we can do our soft delete refactor and deploy it, preserving the current functionality with minimal effort. Later iterations can build on this work to add the new capabilities in. @@ -85,6 +93,8 @@ ALTER TABLE items ADD COLUMN deleted TIMESTAMP DEFAULT NULL; ALTER TABLE carts ADD COLUMN deleted TIMESTAMP DEFAULT NULL; ALTER TABLE cart_items ADD COLUMN deleted TIMESTAMP DEFAULT NULL; + +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/SingleTableMigration.cs#L6 [tl! autolink] ``` This can be deployed at any time and will not affect the rest of the system unless you're doing something squirrely with your tables. Just note though that if there's a long gap between your migration steps, new tables should have this column included. Because this column isn't used, it will be null for all records, and until we proceed to the next steps records will still be hard deleted. @@ -106,6 +116,8 @@ CREATE VIEW carts AS SELECT * FROM carts_all WHERE deleted IS NULL; ALTER TABLE cart_items RENAME TO cart_items_all; CREATE VIEW cart_items AS SELECT * FROM cart_items_all WHERE deleted IS NULL; + +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/SingleTableMigration.cs#L12 [tl! autolink] ``` Here I use the convention that the table name is suffixed "_all", signifying of course that the tables contain "all" of the records. Your requirements might be different, so pick a naming convention that makes sense in the context. I recommend [being extremely consistent](https://ian.wold.guru/Posts/its_better_to_be_consistently_incorrect_than_consistently_correct.html) here, and I also recommend picking a convention that is obvious in distinguishing the tables. @@ -121,9 +133,11 @@ There's two steps we need to take here at once. First, we need to tell Postgres To redefine `DELETE`, Potgres allows us to create a _rule_ to perform an alternate action _instead_. Typical for Postgres, this syntax is quite straightforward: ```sql -CREATE RULE rule_soft_delete AS ON DELETE TO items DO INSTEAD (UPDATE items_all SET deleted = CURRENT_TIMESTAMP WHERE id = OLD.id); -CREATE RULE rule_soft_delete AS ON DELETE TO carts DO INSTEAD (UPDATE carts_all SET deleted = CURRENT_TIMESTAMP WHERE id = OLD.id); -CREATE RULE rule_soft_delete AS ON DELETE TO cart_items DO INSTEAD (UPDATE cart_items_all SET deleted = CURRENT_TIMESTAMP WHERE id = OLD.id); +CREATE RULE rule_soft_delete AS ON DELETE TO items DO INSTEAD (UPDATE items_all SET deleted = CURRENT_TIMESTAMP WHERE item_id = OLD.item_id); +CREATE RULE rule_soft_delete AS ON DELETE TO carts DO INSTEAD (UPDATE carts_all SET deleted = CURRENT_TIMESTAMP WHERE cart_id = OLD.cart_id); +CREATE RULE rule_soft_delete AS ON DELETE TO cart_items DO INSTEAD (UPDATE cart_items_all SET deleted = CURRENT_TIMESTAMP WHERE cart_item_id = OLD.cart_item_id); + +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/SingleTableMigration.cs#L23 [tl! autolink] ``` And then we need to handle the cascades. Let's start with the simple case where we know that deleting a cart should cascade to each of its items. The rule we define will act on updates to the `carts_all` table and propogate new, non-null values for `deleted` to the cart items: @@ -131,7 +145,9 @@ And then we need to handle the cascades. Let's start with the simple case where ```sql CREATE RULE rule_cascade_deleted_cart_items AS ON UPDATE TO carts_all WHERE OLD.deleted IS DISTINCT FROM NEW.deleted - DO ALSO UPDATE cart_items_all SET deleted = NEW.deleted WHERE cart_id = OLD.id; + DO ALSO UPDATE cart_items_all SET deleted = NEW.deleted WHERE cart_id = OLD.cart_id; + +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/SingleTableMigration.cs#L27 [tl! autolink] ``` Recall that the other cascade case, where we delete an item, is more complicated. The current behavior (which we are just trying to preserve for now) is that items are deleted from a cart when the items themselves are deleted. The state we want to achieve (some time after this migration) is that items will still show in the cart but as "no longer available". That will take some extra coding to implement, however depending on the current state of our system we might be able to achieve some cost savings. @@ -153,6 +169,7 @@ At this point, your migration is finished. Congratulations! In order to get reco ```sql SELECT * FROM items_all WHERE ... +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/SingleTableMigration.cs#L37 [tl! autolink] ``` In order to hard delete any records, you can `DELETE` from the `_all` table: @@ -184,6 +201,8 @@ CREATE TABLE cart_items_deleted ( deleted TIMESTAMP DEFAULT CURRENT_TIMESTAMP, LIKE cart_items INCLUDING ALL ); + +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/SeparateTableMigration.cs#L6 [tl! autolink] ``` My same note from earlier about naming conventions applies, though I think `_deleted` is probably the suffix 99% of us will ever use for these. @@ -193,13 +212,17 @@ There is one issue with this snippet though - can you spot it? The problem is in ```sql CREATE TABLE cart_items_deleted ( deleted TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - LIKE cart_items INCLUDING ALL, + LIKE cart_items, CONSTRAINT fk_cart FOREIGN KEY (cart_id) REFERENCES carts_deleted (cart_id) ON DELETE CASCADE, CONSTRAINT fk_item FOREIGN KEY (item_id) REFERENCES items_deleted (item_id) ON DELETE CASCADE ); ``` +But wait, there's more! This works all well and good so long as the carts and items you're referencing from the `cart_items_deleted` table actually exist in `items_deleted` and `carts_deleted`. This is sure to not be the case, as it makes perfect sense that I might delete a cart item without that item having been deleted! Supposing that we _did_ have a schema where we would expect the referant to be deleted also though, are we sure that _that_ record is being deleted first? + +This can all get quite tricky. If you don't care about the referential integrity of your soft-delted data (but trust me, you probably _should_), then you can just forego the foreign key constraints. If you do care about the references though, then you'll need to spend some time thinking about the best approach here. A whole article can be written on potential approaches so I won't distract us here, but beware of this trap! + This demonstrates the extra complexity of this second approach. It allows for a greater separation between deleted and non-deleted items, though it's important to consider which functionality and level of complexity your system requires. ## Adding the Combined Views @@ -210,8 +233,12 @@ There are inevitably operations we're going to want to perform on the whole data CREATE VIEW items_combined AS SELECT null AS deleted, * FROM items UNION ALL SELECT * FROM items_deleted; CREATE VIEW carts_combined AS SELECT null AS deleted, * FROM carts UNION ALL SELECT * FROM carts_deleted; CREATE VIEW cart_items_combined AS SELECT null AS deleted, * FROM cart_items UNION ALL SELECT * FROM cart_items_deleted; + +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/SeparateTableMigration.cs#L23 [tl! autolink] ``` +At least that's easy! + ## Make Delete Not Delete In order to make `DELETE`s perform a soft delete in this setup, we need to do a couple things when deleting: first, we need to copy the record into the respective `_deleted` table, then we need to allow the delete to actually happen. @@ -265,6 +292,8 @@ END; $$ LANGUAGE plpgsql; CREATE TRIGGER trigger_cart_items_delete BEFORE DELETE ON items FOR EACH ROW EXECUTE FUNCTION cart_items_soft_delete(); + +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/SeparateTableMigration.cs#L29 [tl! autolink] ``` You then have the same considerations regarding cascades and foreign key references as you had in the last refactor option. I'll leave that as an exercise for you to extend the trigger functions to include that behavior. @@ -275,6 +304,7 @@ This is either the benefit or the drawback of this setup compared to the first s ```sql SELECT * FROM items_deleted WHERE ... +-- https://github.com/IanWold/PostgresRefactorSoftDelete/blob/main/SeparateTableMigration.cs#L58 [tl! autolink] ``` And of course all records together from the separate view: