Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helper function for deleting orphaned embedded objects within migration #7145

Closed
DominicFrei opened this issue Feb 25, 2021 · 18 comments
Closed

Comments

@DominicFrei
Copy link
Contributor

Current Problem

After fixing a problem in the migration of Object to EmbeddedObject it is unclear to users what happens with orphaned objects (see discussion about throwing an error vs. deleting orphaned objects).
At the moment they have to be deleted (or solved) manually to not raise an error.

Expected Results

Add a helper function to RLMMigration for deleting orphaned objects within the migration block.

@sipersso
Copy link

This helper function would be super helpful. I am pretty sure that there might be orphaned objects laying around for my app and wouldn't want it to crash in production because of this.

I am pretty sure I might have orphaned objects for some users.

@DominicFrei DominicFrei removed their assignment Apr 1, 2021
@sipersso
Copy link

sipersso commented Apr 21, 2021

@DominicFrei any updates on the helper function or documentation? Is there any sample code on how I can safeguard my app from crashing on launch when I release the migration to production?

Since the objects are now child objects, I assume that I won't be able to use queries to find them? Then how can I delete them?

I noticed that @saltyskip had managed to do this, but it is not clear how you should be handling this issue. Any guidance?

@DominicFrei
Copy link
Contributor Author

@sipersso This ticket is on our to do list but since it's a convenience function it's ranked behind more important tasks. Therefore I cannot state any ETA at the moment.

Regarding the deletion of orphaned embedded objects and taking care of multiple links:
The change to embedded is happening after the migration is done (see https://github.com/realm/realm-core/blob/master/src/realm/object-store/object_store.cpp#L834) so that you can still use the migration block of this change to work with the old non-embedded objects and take care of everything (a.k.a. making sure that every object is linked once and only once).

@sipersso
Copy link

@DominicFrei I don't mind doing the migration myself, but it is very unclear how you are supposed to be able to do this in practice. How am I supposed to be able to find the orphaned child objects?

migration.enumerateObjects(ofType: ChildObject.className()) { (oldObject, newObject) in

     let parentsAttempt1 = oldObject?["linkedParentsProperty"]  as? List<MigrationObject> //Doesn't work.
     let parentAttempt2 = oldObject?.dynamicList("linkedParentsProperty")  as? List<MigrationObject> //Doesn't work
     

It doesn't seem that the "linkedParentsProperty" can't be accessed from the child object at all? Do you have any suggestions on how to identify orphaned children?

@DominicFrei
Copy link
Contributor Author

You could enumerate both and check their count which has to be one:

migration.enumerateObjects(ofType: ChildObjectClass.className()) { (oldChildObject, newChildObject) in
    int parentCount = 0;
    migration.enumerateObjects(ofType: ParentObjectClass.className()) { (oldChildObject, newChildObject) in
        // Look for the child in the parent and increase counter if so found.
    }
    if (parentCount == 0) {
        // Delete (maybe save necessary information first)
    } else if (parentCount > 1) {
        // Remove from all but one parent (maybe introducing copies of it).
    }
}

How you actually handle it is up to you. Making all this more convenient it what this ticket is for. :)

@sipersso
Copy link

Thanks @DominicFrei! Looking at the api, I was afraid that enumeration would be the only way. Unfortunately this will not be feasible in my case as there might be tens of thousands of grandchildren and thousands of parentes in the new structure.

The helper function won’t help at all in my case and I will have to find an alternative to making this migration, like possibly moving the data to a new property.

@sipersso
Copy link

sipersso commented May 7, 2021

This is a much more complicated problem than it first appears to be. In my case the structure is like the following (simplified)

public class ChildObject:Object {}
public class ParentObject:Object {
    var children = List<ChildObject>()
}
public class GrandParentObject:Object {
    var children = List<ParentObject>()
}

There can be more than 100k ChildObjects
There can be more than 10k ParentObjects
There can be more than 1k GrandParentObject

In the example, using enumerations, each child would have to loop through each of its parent objects and each parent would have to loop through its grandparents. On each level there can be orphaned objects. I have tried other migration strategies, like moving the properties to new classes, but this isn't practical and very risky as it affects the whole codebase. In addition, removing objects with linked properties is problematic (#3686).

Since this migration might be heavy I guess I should do the migration off the UI thread?

I assume that I could do the following?
1: Spin off a new thread on app launch and call perform migration on a background thread.
2: Start by enumerating ParentObjects and GrandParentObjects to detect orphaned ParentObject instances.
3: Now enumerate the ChildObjects and ParentObjects again to detect orphaned ChildObject instances.

In this case, that would mean that step 2 could be 10k* 1k = 10 million loops, and step 3 would mean 100k * 10k = 100 million loops. Is this acceptable? Is there any better way?

@sipersso
Copy link

sipersso commented May 7, 2021

To answer my own question. I just tested this approach using a unit test an a typical database file containing 2 MB of data. This size is fairly normal for my use case, but could be 5x larger if not more for some users. In this case, the migration took about 5 minutes to complete before it crashed because I had missed to delete an orphan.

Even thought the enumeration listed here might work for sample data, I doubt the approach will be suitable for any real world use. Is this really the only way to do it?

I have also tried to migrate the data over to new properties, but as mentioned before, this is a very huge effort. And I will have to deal with the effort of removing objects with linked properties, which seems to be problematic and there does seem to be differences in how Android and how the swift clients behave during migration.

@sipersso
Copy link

sipersso commented May 7, 2021

I was able to reduce the migration time from 5 minutes to 1.2 seconds using the following approach

//First find map all child identifiers to parent identifiers
var parentMap:[String:String] = [:]
migration.enumerateObjects(ofType: ParentObjectClass.className()) { (oldParentObject, newParentObject) in
        // Look for the child in the parent and increase counter if so found.
    let parentId = oldParentObject!["idProperty"] as! String

    oldParentObject!.dynamicList("chilren").forEach { oldChildObject, newChildObject in
    	let childId = oldChildObject!["idProperty"] as! String
    	parentMap[childId] = parentId
    }

}

//Now loop through the children and lookup their parent id and delete orphaned objects
migration.enumerateObjects(ofType: ChildObjectClass.className()) { (oldChildObject, newChildObject) in
    
    let childId = oldChildObject["idProperty"] as! String

    if(parentMap[childId] == null){
    	migration.deleteObject(newChildObject)
    }
}

This migration does not take the "more than one parent" into consideration. But this could be detected in the first loop.

Any pitfalls with this approach @DominicFrei?

@jsflax jsflax assigned jsflax and DominicFrei and unassigned jsflax May 7, 2021
@DominicFrei
Copy link
Contributor Author

@sipersso Thank you for adding an example to this. Your implementation looks good to me but shows that this ticket is valid and creating a helper function would be good to make sure our users can handle the deletion more easily and efficient.

About the more than one parent part: The same way you check if there is a child with an id that cannot be found in your parentMap you could check if there are multiple entries in the parentMap with the same childId. These need to be handled (most likely by creating a new child with the same values).

Let's continue this part in the other issue you created: #7248
I think that's a good idea to keep this ticket on the topic of creating the helper function and talk about problems with the current implementation in the other ticket. 👍

@sipersso
Copy link

@DominicFrei any progress on the helper function? The documentation is really lacking on how to handle this migration. I think I am done with the migration, but I am scared about what will happen if I failed to handle any of the edge cases. Is there any guidance on how you can test your migration to cover all cases?

If I create a realm file using the configuration from before the migration, and ensure that I have both orphaned objects and objects containing multiple backlinks and the migration takes care of both these cases. Is that all edge cases I need to think about for migrating from object to embedded object?

On a side note, I first posted about this migration on the forums in november last year. I still haven't dared to do the migration in production :/ If I make a mistake, the app will crash at launch for 10s of thousands of users, so the migration has to be done right.

@DominicFrei
Copy link
Contributor Author

@sipersso I am currently not working on the Cocoa framework so I cannot give you an estimation on that. It's not a focus at the moment though afaik.

@jsflax can maybe elaborate on that. 👍

Regarding your question: Yes, you only need to take care of 0 backlinks (orphaned) and more than 1 backlink. If those two cases are covered, migrating to EmbeddedObject should work.

Regarding the documentation: Can you point out specifically where you had trouble finding the help you were looking for and what could have been done better? @cbush will surely appreciate the feedback to improve this part of the documentation. 👍

@sipersso
Copy link

The migration example for converting from an object to an embedded object here (that you referred to in another ticket) https://github.com/realm/realm-cocoa/blob/master/examples/ios/swift/Migration/Examples/Example_v5.swift contains the following example for migrating from object to embedded object

if oldSchemaVersion < 5 {
        // Nothing to do here. The `Address` gets migrated to a `LinkingObject` automatically if
        // it has only one linked object.
        // See https://github.com/realm/realm-cocoa/issues/7060
    }

So it doesn't say anything on how to actually handle the edge cases.

The documentation does have a section here https://docs.mongodb.com/realm/sdk/ios/examples/modify-an-object-schema/#convert-from-object-to-embeddedobject which actually says what you are required to do (fix orphaned objects and more than 1 backlink), but there is no info on how and no info on how to make sure that the implementation is correct. But it is a great first step to clarify that I only need to handle the two edge cases. Makes it a lot easier

My current plan:
1: Copy my old pre migration Realm Configuration into a separate project.
2: Create one or more sample realm file using that project and ensure that would trigger crashes if migration is incorrect (it should contain both orphaned objects and object with multiple backlinks for every object that will be converted from Object to EmbeddedObject).
3: Copy the generated realms to the app projects (iOS and Android projects should be able to use the same files) and use these for unit testing the migrationBlock (iOS) and RealmMigration subclass (Android)

Thanks for the clarifications. I think this is the info I need to be confident that the migration works.

@sipersso
Copy link

The more than 1 backlink seems to be even trickier. To prevent data loss I wanted, whenever I find an object that already have a backlink I copy it's value and replace the backlink with a copy. However, there doesn't seem to be an API to do that is the swift SDK. On Android I can use dynamicRealm.createEmbedded which would solve the problem, but the migration object in swift doesn't seem to have this option.

@rhys-rant
Copy link

Having encountered this problem on a legacy project, where I've migrated some Object types to EmbeddedObject, finding a number of cases or orphaned objects... some helper function to delete them would be incredible 🙏🏻
Thank you @sipersso and @DominicFrei for your workarounds for now, though!

@sipersso
Copy link

sipersso commented Mar 11, 2022

In Swift you can only solve the case where you have orphaned object. If you need to create embedded objects to solve the case with multiple back links this is currently not possible in the Swift API because of #7508. So unless the helper function is in Realm Core it won’t be able to solve all migration corner cases.

I have been blocked on the migration for about a year now

@sipersso
Copy link

@DominicFrei I guess this one is no longer relevant since migration is now automatic?

@dianaafanador3
Copy link
Contributor

Hi @sipersso, you are right, this is already handled by the migrations, after this PR was merged.
realm/realm-core#5737
In case of a TopLevel -> EmbeddedObject migration, orphaned EmbeddedObjects will be deleted.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants