Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix customer balance n+1 #6366

Merged

Conversation

sauloperez
Copy link
Contributor

@sauloperez sauloperez commented Nov 11, 2020

What? Why?

This solves v3.3.1's performance regression when calculating the customer balance.

It's simpler and many orders of magnitude more efficient to ask the DB to aggregate the customer balance based on their orders. It removes a nasty N+1. To do that I created a query object based on the work started in #5031.

This is a first attempt at solving the negative performance impact of our current approach but the best solution in the long term is to either offload the balance calculation to a background job to be performed overnight so we don't go through all customer orders every time. No doubt it'll be slow for hubs that have thousands of customers with thousands of orders.

This doesn't close #4732. That needs small and fast iterations using this new approach to each of the pages displaying the customer balance.

What should we test?

This involves two sets of tests.

On one hand, we need to smoke test customer balance in /admin/customers. Just to confirm that, as the automated tests tell, we haven't changed a bit of its behavior.

On the other, we need to check how the new implementation works. For that, the user you test it with needs to have the feature enabled for them. To do that, we need to add the account's email address to the list of beta users. Then, with a shopper account create some orders and payments and check:

  • As Hub manager make adjustments to the orders and payments. Including voiding payments and orders. Add multiple payments to an order and include addition payments so that the orders also have more payments than the order amount.
  • Cancel an order with a payment.
  • Void a payment on an active order
  • As you make the changes in step 2 check the Shopper balance in /admin/customers

The figures should match with all the changes you make. It'd be good to go through all of the states of the order. If there's one broken, please, don't let this pass without a test covering that case.

Release notes

Remove N+1 when fetching customers' balance from /admin/customers and fix payment_total for void payments.
Changelog Category: Technical changes

@sauloperez sauloperez force-pushed the fix-customer-balance-n+1 branch 2 times, most recently from 54ba5cf to 8bc0f6b Compare November 11, 2020 16:25
@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Nov 11, 2020

There's another issue on this page, which is lack of pagination. The "load more" buttons on the page don't make new requests, all the data for all customers is loaded in Angular when a hub is selected. This can be 800+ records for a large hub.

@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Nov 11, 2020

I did some quick comparisons with master and this branch using production data for a large hub:

Screenshot from 2020-11-11 17-24-24
Screenshot from 2020-11-11 17-31-11

21.5 seconds response time and 761 queries -> 6.3 seconds response time and 19 queries.

Nice 💯

Copy link
Contributor

@luisramos0 luisramos0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider this PR?
https://github.com/openfoodfoundation/openfoodnetwork/pull/5031/files
It must include payments.sum(amount). order.payment_total is not the same. I recommend you look at prod data and see the difference for completed orders and for cancelled orders.

@sauloperez
Copy link
Contributor Author

sauloperez commented Nov 12, 2020

Yes, I looked at it and I didn't see why. Is it because payment_total is what the customer needs to pay rather than what was already paid?

@lin-d-hop
Copy link
Contributor

I believe because an order can have multiple payments but spree_orders.payments only records the first payment on an order

@luisramos0
Copy link
Contributor

yes, I checked the code and the DBs at the time I reviewed the PR and then afterwards. It is my opinion that the algorithm is 100% correct in the pending PR and we have to follow it in this new query.

@luisramos0
Copy link
Contributor

luisramos0 commented Nov 12, 2020

or maybe I should say: please prove me wrong here, I may be wrong, this is a tricky space. But I think you need to do that with examples from prod data, ok? This is for the inclusion of spree_payments vs spree_orders.payment_total in the query.

anyway, I am 100% sure you need to include payment_total (wherever it is coming) from cancelled orders but not those orders total, following this logic:
https://github.com/openfoodfoundation/openfoodnetwork/pull/5031/files#diff-87a07ead94986a81b67425065aaa01286682292f4612ada1b384a91e8ddf1d3bR418

@sauloperez
Copy link
Contributor Author

sauloperez commented Nov 19, 2020

Thanks everyone! It was clear that had to validate my ideas early on with you all 💪

Yes, I think that logic is right @luisramos0 . For the record, I also found out that payment_total is a denormalized column that we populate when the payments are successfully processed in

def process_payments!
raise Core::GatewayError, Spree.t(:no_pending_payments) if pending_payments.empty?
pending_payments.each do |payment|
break if payment_total >= total
payment.process!
if payment.completed?
self.payment_total += payment.amount
end
end
rescue Core::GatewayError => e
result = !!Spree::Config[:allow_checkout_on_gateway_error]
errors.add(:base, e.message) && (return result)
end
It says how much the user paid so far so we can rely on it.

@sauloperez sauloperez force-pushed the fix-customer-balance-n+1 branch 2 times, most recently from c0196b0 to 2e49286 Compare November 20, 2020 11:41
@sauloperez
Copy link
Contributor Author

This last commit should implement that same logic but in SQL @luisramos0

@luisramos0
Copy link
Contributor

I think you are missing the fact that the join is by email not customerid. Guest orders will not have customer id.

@Matt-Yorkley
Copy link
Contributor

Red build... looks like it needs a few tweaks.

@sauloperez
Copy link
Contributor Author

sauloperez commented Nov 21, 2020

Yes, that's why it is still draft. It's not yet done. I need to adapt things now to this new query.

@sauloperez
Copy link
Contributor Author

I think you are missing the fact that the join is by email not customerid. Guest orders will not have customer id.

that's right, but then, there's no customer for whom to show the balance, right? oh wait, we do have a "phantom" customer created every time they buy. Got it.

@luisramos0
Copy link
Contributor

luisramos0 commented Nov 21, 2020 via email

@sauloperez
Copy link
Contributor Author

I think this should all be grouped by email so the balance is real.

That's exactly what I thought. Since they are identified by email grouping by that will do the trick.

@sauloperez
Copy link
Contributor Author

sauloperez commented Nov 24, 2020

For the record: in a guest order, there's no customer_id provided by the request. In Spree::Order#associate_customer we look up any customer record with the order email and associate it with the order. If none was found, in Spree::Order#ensure_customer we create one with the provided email. In upcoming checkouts, it'll be associated with the new order due in #associate_customer. The resulting order has an email and a customer_id.

It's also worth noting that the customers DB table was introduced back on June 4th, 2015, and data was never migrated to the new schema. Hence, there are 4209 non-cart orders in AU that have no associated customer although they do have an order email.

@sauloperez sauloperez force-pushed the fix-customer-balance-n+1 branch from 448dd2a to d5b1da4 Compare November 24, 2020 16:43
tag_rule_mapping: tag_rule_mapping,
customer_tags: customer_tags_by_id
render json: @collection,
each_serializer: ::Api::Admin::CustomerWithBalanceSerializer,
Copy link
Contributor Author

@sauloperez sauloperez Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the serializer's class name is a clear indication that we should extract an API endpoint out of this, together with the if branch of #collection #butNotToday.

@sauloperez sauloperez added the bug-s2 The bug is affecting any of the non-critical features described in S1 and there is no workaround. label Nov 25, 2020
@sauloperez sauloperez force-pushed the fix-customer-balance-n+1 branch 2 times, most recently from 80ab1ca to a1fbd27 Compare November 25, 2020 16:51
As is, `payment_total` is only increased after successfully processing
a payment and never updated. This inconsistency breaks
`CustomerWithBalance` which relies on it.

Needless to say that if we keep this denormalized column, we better make
it consistent. I investigated current Spree's master branch (709e686cc0)
and they also realized it was broken. Now `Payment` runs the following
from the `after_save` `update_order` callback.

```rb
order.updater.update_payment_total if completed? || void?
```

I also took the chance to rearrange tests a bit.
We only care about non-cart orders and skipping carts, saves PostgreSQL
query planner to go through thousands of records in production use cases
(my food hub).

We go from

```sql
 ->  Index Scan using index_spree_orders_on_customer_id on spree_orders  (cost=0.42..12049.45 rows=152002 width=15) (actual time=0.015..11.703 rows=13867 loops=1)
```

to

```sql
->  Index Scan using index_spree_orders_on_customer_id on spree_orders  (cost=0.42..12429.46 rows=10802 width=15) (actual time=0.025..17.705 rows=9954 loops=1)
```
This query object is meant to be reusable but those includes are
context-specific and will likely not be needed when reusing the query
elsewhere. If we keep them there, chances are next dev might not notice
it and will introduce a performance regression.
This makes it possible to deploy it without releasing it to users since
the toggle is not enabled for anyone.

It aims to make the balance calculation consistent across pages.
These order states are already taken care of because they fall in the
`WHEN state IS NOT NULL` case.
These states are reached when the order is complete and shipped. An
admin can create a new return authorization, which will set the order in
`awaiting_return` state. It's only after, when we call
`return_authorization#receive` that the return authorization moves to
`received` state and the order to `returned`.

You can do so from the UI by editing the return authorization and
clicking on Receive. However, we didn't find any order in such state in
UK, FR and AU. The UX is quite obfuscated. That must be why no users
clicked on it.

The `returned` state cannot count for the balance as is, since some of
the products are returned to the shop. That's enough for now.
@sauloperez sauloperez force-pushed the fix-customer-balance-n+1 branch from 71e6978 to 33c72ec Compare January 11, 2021 14:56
@sauloperez
Copy link
Contributor Author

This was lacking a rebase to get the various beta_testers pieces that got merged. 33c72ec should fix it.

@filipefurtad0 filipefurtad0 added the pr-staged-uk staging.openfoodnetwork.org.uk label Jan 12, 2021
@filipefurtad0
Copy link
Contributor

filipefurtad0 commented Jan 12, 2021

Allright @sauloperez !

As discussed and as mentioned on the testing notes, I tested this in two steps.

Step 1) testing feature is toggle

Significantly better performance in the admin/customers page will indicate that the feature is toggled.

To verify this, I logged onto the server and tailed the log file by running tail -f log/staging.log | grep 'ActiveRecord' - this seems like a reasonable performance metric. Here are some values, for a Hub with 71 customers (Tamar Valley Food Hub Wholesale). The first output line relates to the page request and the second line relates actual rendering of the customers and their balance (these are separated steps, as superadim)

a) Before staging and toggling the feature (this was ran several times, so these can be seen as average or representative values):

[2021-01-12 13:56:36#30737] INFO -- : Completed 200 OK in 267ms (Views: 212.3ms | ActiveRecord: 23.4ms)
[2021-01-12 13:56:59#30737] INFO -- : Completed 200 OK in 615ms (Views: 445.8ms | ActiveRecord: 147.3ms)

b) After staging the PR (feature not toggled)

[2021-01-12 14:47:34#15577] INFO -- : Completed 200 OK in 240ms (Views: 183.3ms | ActiveRecord: 26.0ms)
[2021-01-12 14:47:47#15577] INFO -- : Completed 200 OK in 554ms (Views: 386.8ms | ActiveRecord: 148.1ms)

c) After staging the PR and toggling the feature - by adding the user as a beta tester

[2021-01-12 15:03:12#19935] INFO -- : Completed 200 OK in 270ms (Views: 211.8ms | ActiveRecord: 25.1ms)
[2021-01-12 15:03:22#19935] INFO -- : Completed 200 OK in 361ms (Views: 271.2ms | ActiveRecord: 51.8ms)

The conclusion is: toggling the feature works!! And it works only after staging and after enabling: we can see a significant reduction in c) only, both on the ActiveRecord and the actual request time.
This is the expected result, so we are good here - now we can proceed with the test cases you point out, namely, all that concerns customer balance - Step 2) in the next comment.

@filipefurtad0
Copy link
Contributor

Step 2) Now, to the actual test cases. I added another user as a beta tester, a hub manager - and placed orders this hub as both logged in and as a guest customer. Customer balance in the /admin/customers was compared with /account#/transactions#<hub_name>, while different adjustments / order edits on made on the placed orders.

i) In all cases, it was verified that incomplete orders resulted in customer balance = 0; the customer is added to the /admin/customers page once the order is in the cart state, but a balance can only differ from zero if the order is complete

ii) after placing an order with cash:

  • the "balance due" corresponded to the expected value
  • after payment capture, the balance was = 0

iii) after placing an order with StripeSCA/Paypal:

  • payment capture is immediate, so the balance was always = 0

iv) editing orders

  • partial refunds on paid orders (balance = 0): adding/removing some items, or a line item -> the order turns to the "balance due" / "credit owed", respectively.
  • partially refunding cash or Stripe -> changes the balance accordingly; refunds chance the state of the order on the customer side: an order which is "Paid" turns into "Not Paid" (this is known and current behaviour on master - one could argue it's a bug, since a partially refunded order is paid for -> I could not find an issue for this one, but I'm pretty sure I've seen it reported somewhere.
  • voiding payments: voiding a payment from an order with a fully captured payment -> a paid order returns to the "balance due" state
  • noticed voiding partial SCA refunds does not work - this is known Voiding partial refunds does not work for Stripe payment method (both SCA and Connect) #5830
  • adding new payments: adding new payments on the payments section of the order
  • cancelling the order: removes the balance respective of that particular order

In all the test cases i) ii) iii) and iv) the customer balance as seen from the Hub matched the values seen on the customer's account.

Reports seem to work as before - payments report /admin/reports/payments is blowing up, I'll need to stage master after this test, just to make sure this PR did not introduce it. Other than that, I'd say this is good to go.

@filipefurtad0
Copy link
Contributor

Yup, after staging master, the report blows up as well, for dates not related with this PR. So, this looks good to me. If you feel there is that should be tested @sauloperez please let me know - moving to ready to go!

@filipefurtad0 filipefurtad0 removed the pr-staged-uk staging.openfoodnetwork.org.uk label Jan 12, 2021
@sauloperez
Copy link
Contributor Author

sauloperez commented Jan 13, 2021

that's an amazing work @filipefurtad0 ! thanks a lot for the patience 😅 . I agree this is more than enough to ship it 🚢 . What I'll do now is make sure the automated tests cover what you checked manually so that the test suite gives us the same confidence as your thorough manual tests. Don't hesitate to suggest any improvements on that regard.

Yup, after staging master, the report blows up as well, for dates not related with this PR.

Let me know if I can help you find the root cause. I'd take the chance to improve that report now that I'll be touching it.

@sauloperez sauloperez merged commit 0b686bb into openfoodfoundation:master Jan 13, 2021
@sauloperez sauloperez deleted the fix-customer-balance-n+1 branch January 13, 2021 09:25
@sauloperez
Copy link
Contributor Author

sauloperez commented Jan 13, 2021

i) In all cases, it was verified that incomplete orders resulted in customer balance = 0; the customer is added to the /admin/customers page once the order is in the cart state, but a balance can only differ from zero if the order is complete

ii) after placing an order with cash:

  • the "balance due" corresponded to the expected value. All the possible states of the balance value are tested in spec/services/customers_with_balance_spec.rb.
  • after payment capture, the balance was = 0: Not directly tested in a feature spec. Needs more investigation. We're probably checking from the state a payment capture leads to. First, payment.capture! gets called which communicates with the payment gateway if any. When successful, we transition to state = 'complete' so testing payment objects in that state seems enough. The rest is covered by the payments' tests.

iii) after placing an order with StripeSCA/Paypal:

  • payment capture is immediate, so the balance was always = 0: same as above

iv) editing orders

@sauloperez
Copy link
Contributor Author

sauloperez commented Jan 19, 2021

For the record, this is the monitoring dashboard we should keep an eye on. Toggling it on should have a deep impact on those response times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-s2 The bug is affecting any of the non-critical features described in S1 and there is no workaround.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants