Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-788 Fix prior_size issues when replacing an embedded object in a list #4844

Merged
merged 6 commits into from
Aug 12, 2021

Conversation

jbreams
Copy link
Contributor

@jbreams jbreams commented Aug 11, 2021

This is similar to RCORE-734, but with lists. If you have a list of embedded objects a = {b: [{c: [1, 2]}]} and update them in the sdk like this a.b[0] = { c: [ 3, 4 ] }, you'll end up replacing the object at a.b[0] without clearing the old one, so any arrays in the sub-object won't be empty when they get updated on other devices.

In RCORE-734 we fixed this by setting the object to null before doing the replacement. We can't set the array element to null, so instead this transforms the Update into an ArrayErase/ArrayInsert.

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)

@jbreams jbreams linked an issue Aug 11, 2021 that may be closed by this pull request
@jbreams jbreams requested review from ironage and simonask August 11, 2021 14:30
@jbreams jbreams marked this pull request as ready for review August 11, 2021 14:31
@tkaye407 tkaye407 self-requested a review August 11, 2021 14:33
Copy link
Contributor

@tkaye407 tkaye407 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach here looks good to me. I will add a test in the applier now to make sure that this will work (though I suspect it wont be a problem) and I can work on adding a new E2E test that would have caught this

@@ -371,6 +371,16 @@ Obj LnkLst::create_and_insert_linked_object(size_t ndx)
Obj LnkLst::create_and_set_linked_object(size_t ndx)
{
Table& t = *get_target_table();
if (t.is_embedded() && m_list.get(ndx) != ObjKey{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks good to me. The other option would be to just clear out the existing contents of the object here and then just update it with the new object, but not sure if that is possible

src/realm/list.cpp Outdated Show resolved Hide resolved
src/realm/list.cpp Outdated Show resolved Hide resolved
@jbreams jbreams requested a review from tgoyne August 11, 2021 18:16
// object gets cleared, otherwise you'll only see the Update ObjectValue instruction, which is idempotent,
// and that will lead to corrupted prior size for array operations inside the embedded object during
// changeset application.
if (value.is_type(type_Link, type_TypedLink) && list.get_target_table()->is_embedded()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type is a list of mixed, then list.get_target_table() will return null. In that case, the mixed value will be a typed link and we should get the target table from value.get_link().get_table_key(). Could you add a test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I now handle type_Link and type_TypedLink separately.

@nirinchev
Copy link
Member

Not familiar with the code, so might be a dumb question, but are dictionaries susceptible to the same problem? I.e. if we have: a = {dict: {"some-key": {c: [1, 2]}}} and we replace the value at some-key like a.dict["some-key"] = { c: [ 3, 4 ] }, would we end up in the same situation? Or does the applier handle setting a dictionary value differently? And if it does, would it make sense to add a test to verify it?

@jbreams
Copy link
Contributor Author

jbreams commented Aug 11, 2021

@nirinchev, if you look at the code in this PR, I added a test for dictionaries. Is it insufficient?

@tgoyne
Copy link
Member

tgoyne commented Aug 11, 2021

Dictionary instructions don't have the prior_size field and this bug is specifically related to failing to reset that to 0, so this isn't applicable to Dictionaries (or Sets).

It's not particularly clear to me why Array instructions have prior_size. There's a whole lot of code which validates it and carefully merges it, but there doesn't appear to be anything which actually uses it. It seems like something which was just accidentally left in the sync instruction log when sync started using a different instruction format from change notifications? I guess it provides some validation that the merge converged correctly?

@nirinchev
Copy link
Member

@jbreams I can see it now - I got confused by the naming and the general C++ obnoxiousness.

The fact that things just work for dicts, coupled with @tgoyne's comment, makes me wonder if I understand the issue. If I read the description correctly, the prior_size bug affects the array contained inside the embedded object (i.e. c here), not the array containing the embedded objects (i.e. b).

@jbreams
Copy link
Contributor Author

jbreams commented Aug 11, 2021

@nirinchev , you are correct. My description sucks. Sorry. It's not the prior_size of the array containing the embedded objects, it's the arrays inside the embedded object. So if you have a nested object like object.parts[0].comments[3].blob and do object.parts[0] = { comments: [ { blob: "foo" } ], the prior_size of the comments array instructions will be corrupted. It's not just the prior_size though, if you removed that validation, you'd end up with the comments array containing both the old comments and the new because there's nothing in the instructions to tell the InstructionApplier that object.parts[0] was actually replaced and not just updated. The prior_size information in the changesets is there to catch exactly this kind of bug.

I don't know why this doesn't effect dictionaries. I will look into it and make sure my test case - that so far has worked out of the box - is actually testing the right thing. However, this is a P2 bug for a customer so I don't want to drag this out too far.

@nirinchev
Copy link
Member

Okay, this makes a lot more sense now, thanks!

Agreed, no need to delay that, but it'll definitely be interesting to come back to dictionaries at some point and figure out why they're working since, intuitively, they should suffer from the same issue. I did briefly look at SyncReplication::dictionary_update in instruction_replication.cpp and it doesn't seem to be sufficiently different from the buggy SyncReplication::list_set to explain why lists of embedded objects exhibit the issue, while dicts don't.

@jbreams
Copy link
Contributor Author

jbreams commented Aug 12, 2021

@nirinchev, I think the implementation for Dictionary::insert is sufficiently different from lists and regular links that it could explain the diverging behavior - see here. I'm willing to do follow-up work to investigate.

@jbreams jbreams requested review from tkaye407 and ironage August 12, 2021 02:48
@jbreams
Copy link
Contributor Author

jbreams commented Aug 12, 2021

I think I've addressed all comments. We're trying to get this merged so that we can unblock a customer so I'd appreciate a review ASAP. Thanks!

Copy link
Member

@tgoyne tgoyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing makes me a bit uneasy since it seems to be pointing at a bug in the protocol or merge algorithm that we're just working around, but that's not a reason not to ship the workaround.

@jbreams
Copy link
Contributor Author

jbreams commented Aug 12, 2021

@tgoyne, totally agree. When we introduced the ObjectValue{} payload type to indicate that an object should be created at a path, we made it idempotent. It turns out we need like a ClearObjectValue{} that isn't idempotent to indicate that we want to replace the embedded object rather than update its fields. For regular links, we can set them to null - and this is what the server side Atlas translator does - but for arrays we can't do that. I think there will be follow-up work to this, but it will involve bumping the protocol version and forcing everyone to do a client reset, and I don't want to do that under the gun when there's a reasonable workaround available.

@jbreams jbreams merged commit 7fb546d into master Aug 12, 2021
@jbreams jbreams deleted the jbr/RCORE-788 branch August 12, 2021 18:30
@tkaye407
Copy link
Contributor

Agreed with everyone here. The ObjectValue being idempotent has been a bit tricky to work with and there are definitely a few issues with it. I think that it would definitely be interesting to add a new Payload type that clears an object because having to set to null or erase and re-insert is a bit hacky but sadly what has to happen now

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bad Changeset w/incorrect prior size in client bootstrapping
5 participants