Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove old abandoned carts #6740

Merged
merged 8 commits into from
Jan 29, 2021

Conversation

Matt-Yorkley
Copy link
Contributor

@Matt-Yorkley Matt-Yorkley commented Jan 25, 2021

What? Why?

Removes cart orders (and associated records) that have not been touched for more than 3 months.

On UK prod, the number of stale carts (not touched in > 3 months) is currently 1,041,785. And that's not including all the related line items and adjustments...

For reference, the total number of all orders on UK prod is 1,354,292, so ~77% of them are stale carts.

🔥🔥

Context: we'll need to do some big migrations at some points around orders/line_items/adjustments, and it'll be a lot smoother if we can clear lots of junk out of our databases first...

What should we test?

Note: requires use of the Rails console.

  1. In the Rails console, check the number of stale carts with: Spree::Order.where("state = 'cart' AND updated_at < '2020-10-01'").count
  2. Exit the console and do bundle exec rake ofn:data:remove_transient_data
  3. Go back to the Rails console and check the new count: Spree::Order.where("state = 'cart' AND updated_at < '2020-10-01'").count

Release notes

Added data cleanup job for removing old cart orders.

Changelog Category: Technical changes

@RachL
Copy link
Contributor

RachL commented Jan 25, 2021

Wow that sounds cooool! 🔥

FYI I had created this issue: #5236 which is still a question we get on a regular basis.


old_cart_adjustments.delete_all
old_cart_line_items.delete_all
old_carts.delete_all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of spree_option_values_line_items as a table to clear as well.

Looking at the DB I think we still have tables related to spree promotions... we could delete those. A different clear up...

Copy link
Contributor Author

@Matt-Yorkley Matt-Yorkley Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of spree_option_values_line_items as a table to clear as well.

I just had a quick look at this, but I'm a bit nervous about proceeding in deleting those records as part of this task. In most cases option_values seem to be related to variants and products (with clear join tables / classes / associations representing the joins), but I can't see anywhere that explicitly defines or delineates option_values records' connection to line items as being separate from variants. It seems pretty murky. Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are missing this:
line_item.rb
has_and_belongs_to_many :option_values, join_table: 'spree_option_values_line_items',
class_name: 'Spree::OptionValue'

and extensively used in concern VariantAndLineItemNaming.

I dont think option_values are directly associated with products but are associated with variants in
spree_option_values_variants is for variants.

I think it's safe to delete stuff from spree_option_values_line_items for the old_line_items you delete here.

Copy link
Contributor Author

@Matt-Yorkley Matt-Yorkley Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've added this in, but it required some raw SQL, as we're using #delete_all on the other queries to avoid instantiating the objects or triggering callbacks and we don't have a "nice" way of calling AR methods directly on the contents of that join table.

Also worth noting: we already have some orphaned records in this join table (spree_option_values_line_items) in production, so those will be cleared up as well 👍

ON spree_option_values_line_items.line_item_id = spree_line_items.id
WHERE spree_line_items.id IS NULL
);
SQL
Copy link
Contributor

@sauloperez sauloperez Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly checked the query plan for this (assuming this DELETE will have a similar cost of a SELECT):

openfoodnetwork=> EXPLAIN SELECT FROM spree_option_values_line_items (...)
                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=5696.25..5778.98 rows=235 width=0)
   ->  HashAggregate  (cost=5695.96..5698.31 rows=235 width=4)
         Group Key: spree_option_values_line_items_1.line_item_id
         ->  Hash Anti Join  (cost=3199.86..5695.37 rows=235 width=4)
               Hash Cond: (spree_option_values_line_items_1.line_item_id = spree_line_items.id)
               ->  Seq Scan on spree_option_values_line_items spree_option_values_line_items_1  (cost=0.00..1210.40 rows=82940 width=4)
               ->  Hash  (cost=1842.05..1842.05 rows=82705 width=4)
                     ->  Seq Scan on spree_line_items  (cost=0.00..1842.05 rows=82705 width=4)
   ->  Index Only Scan using index_option_values_line_items_on_line_item_id on spree_option_values_line_items  (cost=0.29..0.33 rows=1 width=4)
         Index Cond: (line_item_id = spree_option_values_line_items_1.line_item_id)
(10 rows)

and I'm afraid this will put our DBs in a hard situation. That Seq Scan on spree_line_items (cost=0.00..1842.05 rows=82705 width=4) bit needs some deeper investigation and perhaps some indexing. Have you checked writing replacing IN with joins?

Copy link
Contributor Author

@Matt-Yorkley Matt-Yorkley Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a basic AR model to map the join table (we have a similar one for Spree::ProductOptionType) and I've moved things around a bit. #delete_all is now being called directly on a set of records in that join table, and the column being used (spree_option_values_line_items.line_item_id) already has an index on it in the schema 👍

I don't think we can get any better than that...?

@Matt-Yorkley Matt-Yorkley force-pushed the carts-cleanup branch 2 times, most recently from 99d192e to aed912a Compare January 27, 2021 18:10
@Matt-Yorkley
Copy link
Contributor Author

Matt-Yorkley commented Jan 27, 2021

I've made some changes to foreign keys here to allow bulk deletion of orders without instantiating lots of objects or triggering callbacks, using Rails 4's on_delete: :cacade. There was a circular dependency of three foreign keys between orders, shipments, and inventory_units that made it impossible.

Background here: https://thoughtbot.com/blog/referential-integrity-with-foreign-keys#cascading-deletes

Copy link
Contributor

@sauloperez sauloperez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be a bulky clean-up, at least the first time, so I'd prefer to update config/schedule.rb to run it still once a month but at night. Running this during business hours could be catastrophic.

@Matt-Yorkley
Copy link
Contributor Author

@sauloperez I've updated the task to run at 4:30am 👌

@sauloperez
Copy link
Contributor

Cool! thanks @Matt-Yorkley I'll sleep better now 😅

@filipefurtad0 filipefurtad0 self-assigned this Jan 28, 2021
@filipefurtad0 filipefurtad0 added pr-staged-fr staging.coopcircuits.fr feedback-needed and removed pr-staged-fr staging.coopcircuits.fr feedback-needed labels Jan 28, 2021
@filipefurtad0 filipefurtad0 added the pr-staged-fr staging.coopcircuits.fr label Jan 28, 2021
@filipefurtad0 filipefurtad0 removed the pr-staged-fr staging.coopcircuits.fr label Jan 28, 2021
@filipefurtad0
Copy link
Contributor

filipefurtad0 commented Jan 28, 2021

Hey @Matt-Yorkley ,

Thank you for the detailed testing notes! And for fixing the deploy issues so quickly.

I tested this on staging-UK, and it worked like a charm. It cleaned up 325274 orders (corrected) in the cart state, in around a minute time (a rough estimation):

image

So, it will be a bulky operation in production - around 3 times (corrected) more such orders. But yeah, 4:30 AM is probably not shopping peak ;-)

Ready to go 🎉

@sigmundpetersen
Copy link
Contributor

It's 325274 records removed, not 32574. Is that correct @filipefurtad0 ?
So there are 3 times more on UK production, not 30?

@filipefurtad0
Copy link
Contributor

Absolutely @sigmundpetersen , I missed a figure there... I'll correct the number so it's not misleading.
So it will be much faster. Thank you for spotting that 👍

@Matt-Yorkley Matt-Yorkley merged commit e4d7e03 into openfoodfoundation:master Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants