Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make db timestamps nullable #5663

Merged
merged 2 commits into from
Jun 25, 2020

Conversation

luisramos0
Copy link
Contributor

@luisramos0 luisramos0 commented Jun 23, 2020

What? Why?

Closes #5662

This makes the timestamps nullable for now, it will close 5662.
We can then investigate further and see why the timestamps are not being populated in some cases.

What should we test?

We can run a sanity check of the app.

Release notes

Changelog Category: Changed
Make timestamps nullable so the app doesnt blow up when they are not populated by raisl for some reason.

@luisramos0 luisramos0 changed the title Result of running db:migrate Make db timestamps nullable Jun 24, 2020
@luisramos0 luisramos0 self-assigned this Jun 24, 2020
@luisramos0 luisramos0 marked this pull request as ready for review June 24, 2020 09:40
@Matt-Yorkley
Copy link
Contributor

The second version of the schema seems to only have indentation changes...?

@luisramos0
Copy link
Contributor Author

luisramos0 commented Jun 24, 2020

yes, but if you see the first commit you will see that the second commit removes the constraints on the timestamps which is what really matters in this PR.

Basically, the constraints were there just not in db/schema 🙈 and this PR removes them.

Copy link
Member

@mkllnk mkllnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is confusing. I mean the situation, your commits are good (except maybe one typo?).

OFN v2 on Rails 3.2

# Migration
    create_table :enterprises do |t|
...
      t.timestamps

# Schema
  create_table "enterprises", :force => true do |t|
...
    t.datetime "created_at",                                   :null => false
    t.datetime "updated_at",                                   :null => false
...

# Result in the database:
 created_at               | timestamp without time zone |           | not null | 
 updated_at               | timestamp without time zone |           | not null | 

That's consistent and I guess it was the default with Rails 3 because most migrations don't specify null.

OFN v3 on Rails 4.0

  create_table "enterprises", :force => true do |t|
...
    t.datetime "created_at"
    t.datetime "updated_at"
...

# Result with db:reset:
 created_at               | timestamp without time zone |           |          | 
 updated_at               | timestamp without time zone |           |          | 

# Result with db:migrate from old database
 created_at               | timestamp without time zone |           | not null | 
 updated_at               | timestamp without time zone |           | not null | 

The reason is commit 7640c7b which changed the schema but not the database. So basically the first commit in this pull request is reverting that incomplete commit and the second commit is doing it properly.

Luis, it looks like you identified a problem with null: false in the past.

Rails 4.2 has the default null: true and it looks like our version of Rails 4.0 has that as well. Rails 5 will default to null: false again which means that we should probably find out why timestamps are not set and then go back.

Deprecated add_timestamps and t.timestamps without passing the :null option. The default of null: true will change in Rails 5 to null: false.
https://guides.rubyonrails.org/4_2_release_notes.html#active-record-deprecations

db/migrate/20200624091611_make_timestamps_nullable.rb Outdated Show resolved Hide resolved
@sauloperez
Copy link
Contributor

The reason is commit 7640c7b which changed the schema but not the database. So basically the first commit in this pull request is reverting that incomplete commit and the second commit is doing it properly.

That's right. I'm moving on with this solution until we find out what's the root cause.

@filipefurtad0 filipefurtad0 self-assigned this Jun 25, 2020
@filipefurtad0 filipefurtad0 added the pr-staged-fr staging.coopcircuits.fr label Jun 25, 2020
@filipefurtad0
Copy link
Contributor

Hi @luisramos0 ,

I checked basic functionality around checkout issues:

  • Tested different payment methods, both as guest or logged in customer
  • Checked out with different shipping addresses vs. billing addresses

Smoke tested the app. Found nothing unusual.

Moving to ready to go.

@filipefurtad0 filipefurtad0 removed the pr-staged-fr staging.coopcircuits.fr label Jun 25, 2020
@sauloperez sauloperez merged commit 7338f19 into openfoodfoundation:master Jun 25, 2020
sauloperez added a commit that referenced this pull request Jun 25, 2020
@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 26, 2020

Ultimately we want to revert this, right? The constraint should be there, and we don't want any records with NULL values in timestamp fields, as that might cause even worse problems? Or no?

@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 26, 2020

Some possible clues; suggesting a transaction rollback whilst saving a parent object (possibly due to another validation failure?) causing subsequent problems with child objects:
rails/rails#24219 (comment)
https://stackoverflow.com/a/59470833

@luisramos0 luisramos0 deleted the constraints branch June 26, 2020 09:14
@luisramos0
Copy link
Contributor Author

luisramos0 commented Jun 26, 2020

Ultimately we want to revert this, right? The constraint should be there, and we don't want any records with NULL values in timestamp fields, as that might cause even worse problems? Or no?

Yes, we can revert this, it would be better to have the timestamps, I wonder if we will actually get empty timestamps.
But I was no aware that this could cause "even worse problems". What exactly do you think this could cause?
In terms of being worse, I am still thinking this caused the stripe issue where an order was charges 12 times, it's very bad.

@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 26, 2020

What exactly do you think this could cause?

We're introducing a change that means theoretically any object can have nil as the timestamp. There's lots of code that uses those timestamps in various places, right? And it's not written at all defensively for situations where persisted objects might not have a created_at value (for example).

@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 26, 2020

Looking in Bugsnag at the errors and their timings, we can see these errors actually come in pairs, where the first error looks like this:

PG::InFailedSqlTransaction: ERROR: current transaction is aborted, commands ignored until end of transaction block : SELECT "spree_shipping_rates".* FROM "spree_shipping_rates" WHERE "spree_shipping_rates"."shipment_id" = $1 AND "spree_shipping_rates"."selected" = 't' ORDER BY "spree_shipping_rates"."id" ASC LIMIT 1

and the second error looks like this:

ActiveRecord::StatementInvalid · PG::NotNullViolation: ERROR: null value in column "created_at" violates not-null constraint DETAIL: Failing row contains (<details removed -matt>, null, null, 62, 175, null, null, null, null, null). : UPDATE "spree_addresses" SET "created_at" = $1, "updated_at" = $2 WHERE "spree_addresses"."id" = 36413 

Going back to this comment: rails/rails#24219 (comment) I'd say it looks like the error we focussed on was a symptom and not the cause. My rough theory is: something goes wrong with shipping_rates part (callbacks hell?), then the transaction is rolled back in a disorderly manner, which causes the NULL values problem in associated objects (which are being updated as part of the same transaction).

@luisramos0
Copy link
Contributor Author

hmm, I see, that makes sense and, if correct, it's not good news... any idea what's the root cause and how to solve it?

The only rollback/save code we have, I think, is in the payment

    after_rollback :persist_invalid

    def persist_invalid
      return unless ['failed', 'invalid'].include?(state)
      state_will_change!
      save
    end

Is your logic/order of the pair "first and second" correct? It's a stack trace with "Caused by:". I'd say the order is the opposite.

@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 26, 2020

Is your logic/order of the pair "first and second" correct?

Hmmm... the first couple of pairs I looked at were in that order, but looking again it seems the order is random, it's the other way around in some cases. 😞

@Matt-Yorkley
Copy link
Contributor

I dug a bit deeper and it looks the transaction rollback issue is being caused by something here:

Screenshot from 2020-06-26 13-00-28

@sauloperez
Copy link
Contributor

sauloperez commented Jun 29, 2020

I've been checking those errors in Bugsnag and the issue seems to be that once the order was placed, the #set_default_ship_address fails to associate the address to the user record because the new_address contains "created_at"=>nil, "updated_at"=>nil. The update attempts to set them null. I'm wondering if this is exactly #5694 but these errors are happneing in the checkout controller.

Seeing the number of failures and their timings, I think the user retrying many times, the order can be placed successfully, this step fails and the DB transaction is rolled back. Do we do any special treatment to the payment so that it gets rollback as well in Stripe @luisramos0 ?

@luisramos0
Copy link
Contributor Author

I wonder if set_default_ship_address is still within the same order workflow/payment transaction. Do you think that everything would have to be roll backed if that post checkout action fails?
Why does the new_address has null created_at?

@sauloperez
Copy link
Contributor

sauloperez commented Jun 29, 2020

Do you think that everything would have to be roll backed if that post checkout action fails?

I need to play with locally to confirm it but by the backtrace, it looks like it.

Why does the new_address has null created_at?

This is why

new_order = order.ship_address.clone.attributes # order being a complete one
=> {"id"=>nil, (...) , "created_at"=>nil, "updated_at"=>nil, (...) }
irb(main):008:0> order.created_at
=> Mon, 01 Jun 2020 17:11:54 CEST +02:00
irb(main):009:0> order.updated_at
=> Mon, 01 Jun 2020 17:41:46 CEST +02:00

And it seems that #clone got moved in a different module in Rails 4.0.2 👉 https://apidock.com/rails/v4.0.2/ActiveRecord/Core/clone. Maybe it changed implementation too?

@sauloperez
Copy link
Contributor

sauloperez commented Jun 29, 2020

re that #persist_invalid, it was familiar but I didn't find why. Now I remember 👉 #5678 (comment). We have that callback disabled in OFN.

@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 29, 2020

And it seems that #clone got moved in a different module in Rails 4.0.2

Do we want #dup instead of #clone here?

https://github.com/rails/rails/blob/4-0-stable/activerecord/lib/active_record/core.rb#L217-L220 😬

I'm 99% sure it's this call to #update_attributes in #set_ship_address_attributes that's triggering the transaction rollback (and subsequent mess).

@sauloperez
Copy link
Contributor

Yes, I think that's what we want instead. Passing the timestamps columns as nil to #update_attributes does exactly that :trollface:

@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 29, 2020

Also I think #update_attributes is deprecated anyway, in favour of #update.

Wait, it looks like #dup has changed as well in Rails 4...? 🙈

You'd think they would have mentioned potentially breaking changes to commonly-used methods in the upgrade guide... 😒

@sauloperez
Copy link
Contributor

sauloperez commented Jun 29, 2020

given everything changes, I would just "reimplement" this bit of logic to whatever the Rails 4-way might be. I still think that we'll need to blacklist the two timestamps columns so we don't provide a value for them when updating the user or the customer objects' address.

@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 29, 2020

This is the preceding bit of the stack trace:

Screenshot from 2020-06-29 17-57-40

We're doing @order.ship_address.clone.attributes just before it.

@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 29, 2020

Interestingly this seems to only apply to ship_address and not bill_address...

@sauloperez
Copy link
Contributor

sauloperez commented Jun 29, 2020 via email

@Matt-Yorkley
Copy link
Contributor

I don't know why we didn't have any failing specs for this...

@mkllnk
Copy link
Member

mkllnk commented Jun 30, 2020

I created an issue for the outstanding work discussed here: #5702
And I'm wondering if we should fix it before we roll out v3 to the big instances.

@Matt-Yorkley Matt-Yorkley mentioned this pull request Jun 30, 2020
@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 30, 2020

All the checkout errors were using this option in shipping details (which I think has to be manually selected):

Screenshot from 2020-06-30 08-59-36

...and we don't seem to have a single feature spec that covers this for checkout 😭

@sauloperez
Copy link
Contributor

Oh 💩 . That's useful for Eugeni too. I'll let him know.

@Matt-Yorkley
Copy link
Contributor

Matt-Yorkley commented Jun 30, 2020

Okay, so I think the reason we didn't get any failing specs before was the issue (mentioned earlier in this thread) where the migrations and the schema were not totally in sync because of some mistake in the upgrade branch. When we fixed that problem and made timestamps nullable at the same time, it covered up the loud failures associated with the underlying checkout/addresses problem.

Making timestamps not-nullable again, but with the migrations and schema in sync gives a red build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v3] Checkout error with DB error
5 participants