-
-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove old abandoned carts #6740
Remove old abandoned carts #6740
Conversation
930f0e7
to
01bccaf
Compare
Wow that sounds cooool! 🔥 FYI I had created this issue: #5236 which is still a question we get on a regular basis. |
|
||
old_cart_adjustments.delete_all | ||
old_cart_line_items.delete_all | ||
old_carts.delete_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of spree_option_values_line_items as a table to clear as well.
Looking at the DB I think we still have tables related to spree promotions... we could delete those. A different clear up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of spree_option_values_line_items as a table to clear as well.
I just had a quick look at this, but I'm a bit nervous about proceeding in deleting those records as part of this task. In most cases option_values
seem to be related to variants and products (with clear join tables / classes / associations representing the joins), but I can't see anywhere that explicitly defines or delineates option_values
records' connection to line items as being separate from variants. It seems pretty murky. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are missing this:
line_item.rb
has_and_belongs_to_many :option_values, join_table: 'spree_option_values_line_items',
class_name: 'Spree::OptionValue'
and extensively used in concern VariantAndLineItemNaming.
I dont think option_values are directly associated with products but are associated with variants in
spree_option_values_variants is for variants.
I think it's safe to delete stuff from spree_option_values_line_items for the old_line_items you delete here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've added this in, but it required some raw SQL, as we're using #delete_all
on the other queries to avoid instantiating the objects or triggering callbacks and we don't have a "nice" way of calling AR methods directly on the contents of that join table.
Also worth noting: we already have some orphaned records in this join table (spree_option_values_line_items
) in production, so those will be cleared up as well 👍
ab2f26c
to
4773c19
Compare
ON spree_option_values_line_items.line_item_id = spree_line_items.id | ||
WHERE spree_line_items.id IS NULL | ||
); | ||
SQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly checked the query plan for this (assuming this DELETE will have a similar cost of a SELECT):
openfoodnetwork=> EXPLAIN SELECT FROM spree_option_values_line_items (...)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=5696.25..5778.98 rows=235 width=0)
-> HashAggregate (cost=5695.96..5698.31 rows=235 width=4)
Group Key: spree_option_values_line_items_1.line_item_id
-> Hash Anti Join (cost=3199.86..5695.37 rows=235 width=4)
Hash Cond: (spree_option_values_line_items_1.line_item_id = spree_line_items.id)
-> Seq Scan on spree_option_values_line_items spree_option_values_line_items_1 (cost=0.00..1210.40 rows=82940 width=4)
-> Hash (cost=1842.05..1842.05 rows=82705 width=4)
-> Seq Scan on spree_line_items (cost=0.00..1842.05 rows=82705 width=4)
-> Index Only Scan using index_option_values_line_items_on_line_item_id on spree_option_values_line_items (cost=0.29..0.33 rows=1 width=4)
Index Cond: (line_item_id = spree_option_values_line_items_1.line_item_id)
(10 rows)
and I'm afraid this will put our DBs in a hard situation. That Seq Scan on spree_line_items (cost=0.00..1842.05 rows=82705 width=4)
bit needs some deeper investigation and perhaps some indexing. Have you checked writing replacing IN with joins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a basic AR model to map the join table (we have a similar one for Spree::ProductOptionType
) and I've moved things around a bit. #delete_all
is now being called directly on a set of records in that join table, and the column being used (spree_option_values_line_items.line_item_id
) already has an index on it in the schema 👍
I don't think we can get any better than that...?
99d192e
to
aed912a
Compare
I've made some changes to foreign keys here to allow bulk deletion of orders without instantiating lots of objects or triggering callbacks, using Rails 4's Background here: https://thoughtbot.com/blog/referential-integrity-with-foreign-keys#cascading-deletes |
Sessions and cart data are removed if older than 3 months, instead of 6.
c30e018
to
f4dc5ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a bulky clean-up, at least the first time, so I'd prefer to update config/schedule.rb to run it still once a month but at night. Running this during business hours could be catastrophic.
@sauloperez I've updated the task to run at 4:30am 👌 |
Cool! thanks @Matt-Yorkley I'll sleep better now 😅 |
4a7e2e7
to
5bce372
Compare
5bce372
to
131a070
Compare
131a070
to
9791287
Compare
Hey @Matt-Yorkley , Thank you for the detailed testing notes! And for fixing the deploy issues so quickly. I tested this on staging-UK, and it worked like a charm. It cleaned up 325274 orders (corrected) in the cart state, in around a minute time (a rough estimation): So, it will be a bulky operation in production - around 3 times (corrected) more such orders. But yeah, 4:30 AM is probably not shopping peak ;-) Ready to go 🎉 |
It's 325274 records removed, not 32574. Is that correct @filipefurtad0 ? |
Absolutely @sigmundpetersen , I missed a figure there... I'll correct the number so it's not misleading. |
What? Why?
Removes cart orders (and associated records) that have not been touched for more than 3 months.
On UK prod, the number of stale carts (not touched in > 3 months) is currently 1,041,785. And that's not including all the related line items and adjustments...
For reference, the total number of all orders on UK prod is 1,354,292, so ~77% of them are stale carts.
🔥🔥
Context: we'll need to do some big migrations at some points around orders/line_items/adjustments, and it'll be a lot smoother if we can clear lots of junk out of our databases first...
What should we test?
Note: requires use of the Rails console.
Spree::Order.where("state = 'cart' AND updated_at < '2020-10-01'").count
bundle exec rake ofn:data:remove_transient_data
Spree::Order.where("state = 'cart' AND updated_at < '2020-10-01'").count
Release notes
Added data cleanup job for removing old cart orders.
Changelog Category: Technical changes