-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big performance issues with v10.24.2 and Synced Realm #7743
Comments
OK I just rebuilt using v10.19.0 and the problem goes away so something has been broken that is causing a major slow down with queries in 10.24.2. |
Hmm so unable to build v10.21.0 or v10.22.0 with Xcode 13.3 using SPM. v10.23.0 builds OK but now the Realm App we use for testing has become unresponsive and never responds to the initial client sync request. I have logged an issue with Atlas but since it is a Shared Free environment I doubt they will look at it. So perhaps someone from MongoDB team can check it out as it is most likely a bug that needs fixing as well. So I can either blow away that broken environment and build a new on to continue testing the SDK versions for the performance issues or keep the environment there for someone to check why it has become unresponsive. Here is the trace from a node.js client trying to connect the first time. `Apr 11 2022 16:04:01 : run() Connected to endpoint '13.54.209.90:443' (from '10.0.1.20:56013') WebSocket::handle_http_response_received() Connection[1]: Negotiated protocol version: 3 |
And then it just hangs indefinitely |
Hi @duncangroenewald , thank you for the info. It will help to diagnose the problem. We are working on this. At the moment you can remain on v10.19.0 |
Ok, let me know if you want any testing done. |
I am not sure if it's related, but I use synced realm too. previous my version was |
Hi @jlavyan thanks for informing about the issue, I'll try to check why this is happening |
Just tested with 10.26.0 and same issue exists |
@duncangroenewald could you see if the problem starts with 10.21.1? If so, it's probably the same issue as in #7734 |
@AdamGerthel - I tried to build 10.21.0 using SPM but it fails to build with Xcode 13.4 so I was unable to test. I also tried to clone the repo and build that but get the same build error. |
@AdamGerthel - hang on let me just double check it was 10.21.0 that has the build issue not 10.20.0 |
Yes 10.21.0 won't build ....ates.noindex/Realm.build/Release/RealmSwift.build/Objects-normal/arm64/Sync.o -o /Users/duncangroenewald/Development/realm-cocoa/build/DerivedData/Realm/Build/Intermediates.noindex/Realm.build/Release/RealmSwift.build/Objects-normal/arm64/ThreadSafeReference.o -o /Users/duncangroenewald/Development/realm-cocoa/build/DerivedData/Realm/Build/Intermediates.noindex/Realm.build/Release/RealmSwift.build/Objects-normal/arm64/Util.o
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var |
v10.20.2 appears to be OK. I will try the prebuilt binaries from here to see if they are compatible with Xcode 13.4 |
I don't have any binaries and I don't even know how to use realm-swift :D So whether or not it will build is not something I really have any knowledge of. We're having performance issues over in realm/realm-js#4383 and I'm trying to figure out if this issue is related, because it seems there are at least two issues pointing towards 10.21.1 being the culprit (and possibly because of realm-core 11.8.0 being introduced in that version). |
Oh, ok, sorry though you were one of the realm folks - the prebuilt binaries are an older version of swift so don't work with Xcode 13.4 and I can't build 10.21.0 because of errors. Perhaps @jsflax can help with getting 10.21.0 and 10.21.1 to build under Xcode 13.4 in order to test this. |
@duncangroenewald if you use the xcframework script here on v10.21.0 it should build for Xcode 13.4 |
@leemaguire - ah, thanks I will try that |
@leemaguire - I used the command "sh build.sh xcframework" but get what looks like the same error /Realm/Build/Intermediates.noindex/Realm.build/Release/RealmSwift.build/Objects-normal/arm64/ThreadSafeReference.o -o /Users/duncangroenewald/Development/realm-cocoa/build/DerivedData/Realm/Build/Intermediates.noindex/Realm.build/Release/RealmSwift.build/Objects-normal/arm64/Util.o
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var |
This version will require an older version of Xcode to build, I can build it on Xcode 13.0, You'll also need to enable library evolution mode when building the XCFramework. |
Not sure I understand any of that. Why is is possible to build 10.19.0 and 10.20.2 but not 10.21.0 on XCode 13.4 ? How does one enable library evolution mode ? |
We introduced some new code which at the time worked, then after the next version of the Swift compiler was released a regression was introduced to the compiler. That code is not present in versions of realm older than 10.21.0 To build with evolution mode from Terminal: |
Thanks, btw what does evolution mode do ? And just so I don't waste my time can you confirm that I need to do the following:
|
It allows a library built with an older version of the swift compiler to be forward compatible with newer versions. https://www.swift.org/blog/library-evolution/ Yes, but use the script I attached above (just copy paste it to the build.sh) for the version you're building. Also don't forget to do |
OK thanks. So is that compiler error actually a bug in the compiler and not an error in the code ? Is there not a workaround to fix that code so it does compile rather than having to install Xcode 13 ? |
Yes, it was fixed in 10.23.0 |
Why are the prebuilt binaries not being built with evolution mode - then they would work with Xcode 13.4 no ? |
@leemaguire - ok I installed Xcode 13.2.1 and was able to build Realm-Swift but when I include this framework in the Xcode 13.4 project and try and build I still get the same (or similar) error.
|
I will just rebuild the entire project with Xcode 13.2.1 and test the performance with RealmSwift v10.21.0 and 10.21.1... |
@AdamGerthel @leemaguire @dianaafanador3 OK RealmSwift v10.21.0 appears to be working fine and v10.21.1 does not - same performance issue as with 10.24.x. Hope this helps. |
v10.27.0 partially fixes this, and in the most extreme cases is ~10x faster than v10.26.0. It will still be slower than v10.21.0 as the root cause of the slowdown is a fix for potential data corruption if a device powers off at exact the wrong point in committing a write transaction, and we're trying to keep that fix. If 10.27.0's write performance is still a problem, using async commits may be a good solution. |
@tgoyne - thanks - we are not doing any writes - just reading data and generating reports. I just tested this with v10.27.0 and the report now takes 580 seconds when using a Synced Realm vs 330 seconds with older RealmSwift v10.21.0 - so still nearly twice as slow as is. Note that when using a local realm with v10.27.0 performance is identical to the older version(synced or local) at 330 seconds . Still this is substantially better than the hour or more it was taking with 10.21.1 or greater - but still 10 minutes for report to generate vs 5 minutes !?! What I don't understand is this is a query only process - no writes are being performed during this operation - so why such a big impact to read performance and why the difference in performance with Synced and Local realms for reads ? Now I am wondering if there is something else we can do for the reports - we currently run these from a background thread and perhaps we could open the realm as read only or something ?? Any ideas to get back performance ?? |
@tgoyne - quite a few other queries that support updating the UI information are still noticeably slower (from sub second to > 2 seconds) than the pre v10.21.1 SDK. Given this optimisation is apparently to handle an edge case of power failure during a write transaction - could we have this as an optional configuration setting. Our app runs on desktops and laptops so power failure during a write transaction is an unlikely event. It hasn't happened in 4 years. If it does happen what is the worst case scenario with the corrupt database ? Will it crash the app or synchronise the corrupted data ? Thanks |
v10.27.0 should have no difference from v10.26.0 for pure read scenarios. Since you're seeing a difference only on sync Realms, that suggests that the sync client's background writes are causing problems. They don't block other threads from reading, but if the work you're doing is i/o-bound then perhaps additional writing on the background thread could slow down disk reads on your thread. A dumb workaround could be to call The most likely result of a crash at the wrong time is that the next time you try to open the file it hits an assertion failure, and the file is irrecoverable without manually fiddling with it in a hex editor. If you're really unlucky, the file will open successfully and just read corrupted data. Async commits (particularly with |
@tgoyne - thanks for the explanation. The test scenario we are using is a single client connected to a MongoDB Realm App and there are no realm writes being performed at any time during the running of the reports so I don't think it can be background writes causing a problem. For this pure read scenario there is a substantial improvement in performance with v10.27.0 over v10.26.0 - at a guess at least 10x faster. I will test using syncSession.suspend() to see if that prevents the problem. Don't really want to go down the reduced durability path at this stage. I am happy to have a closer look at what is happening in the database but it would be helpful if you could provide an explanation of what the specific changes were - I can obviously look into the commit code changes but a higher level explanation of how things worked before and how they work now would be helpful to get started. Also any pointers as to what is different with a synced realm for pure read scenarios that might have been effected by the changes to improve durability. I assume some call somewhere made frequently during reads on synced realms is taking considerably longer than it used to. Thanks |
@tgoyne - I tried suspending sync but that doesn't help. What is did find is that the versions since 10.21.1 are completely destroying thread performance which is the main cause of the problem. Our reports break up the work into segments and then run each segments queries on different threads - typically N-2 threads where N = number of CPU cores. When running with versions prior to 10.21.1 all threads run at nearly 100% regardless of whether synced or local realm is being used. Since 10.21.1 and including 10.27.0 something is killing the thread performance and performance of threads declines to zero as per the attached graphs. So it seems something is progressively blocking threads. |
I've successfully reproduced a significant performance regression in multithreaded read-only usage and I'm now looking into the cause. |
@tgoyne - thanks, let me know if there is anything I can do to help. |
It turns out I accidentally actually just compared the performance of The thing I'm testing is variations on the following, which is attempting to simulate a read-heavy workflow on many threads at once (using 14 threads as I'm testing on a m1 ultra with 16 performance cores): var config: Realm.Configuration!
try autoreleasepool {
let user = try logInUser(for: basicCredentials())
let realm = try openRealm(partitionValue: #function, user: user)
config = realm.configuration
for _ in 0..<5 {
try realm.write {
for _ in 0..<100000 {
realm.add(SwiftPerson(firstName: "a", lastName: "b"))
}
}
}
waitForUploads(for: realm)
}
measure {
let queue = DispatchQueue.global()
let group = DispatchGroup()
for _ in 0..<14 {
group.enter()
queue.async {
for _ in 0..<5 {
autoreleasepool {
var str = ""
let realm = try! Realm(configuration: config)
for obj in realm.objects(SwiftPerson.self) {
str += obj.firstName
}
}
autoreleasepool {
var str = ""
let realm = try! Realm(configuration: config)
for obj in realm.objects(SwiftPerson.self) {
str += obj.lastName
}
}
}
group.leave()
}
}
group.wait()
} Running this on sync and local Realms on both v10.20.0 and v10.28.1 give very similar results, and all of them sit around 1400% CPU during the testing phase. This of course is a very simple test case, so it's unsurprising that it's failing to hit problems. Are you using frozen objects anywhere? With no frozen objects, no encryption, no writes being performed, and the sync session suspended, very little involves acquring locks or any sort of cross-thread coordination. Initializing new Realm instances does, so are you possible initializing and then throwing away a very large number of Realm instances? Since you're seeing blocked threads, enabling "Record Waiting Threads" in File -> Recording Options in Instruments may be informative. This makes it so that the threads which should be running on the idle cores report the spot where they're blocked as time spent, which should reveal something useful. |
@tgoyne - OK, thanks let me see see if I can find out where the threads are being blocked. |
@tgoyne - not sure I really understand Instruments but it seems to be blocking on the call to SyncManager.get_current_user(). HOWEVER - we have a deep hierarchy and the report traverses the entire hierarchy and what was happening was that right near the deepest part of the hierarchy we were trying to open a new Realm to access an object using it's ID instead of just reusing the parent objects realm. This would be called many times - obviously inefficient. Fixing this and our performance problem in the report goes away now (testing with 10.27.0) - however it is still a bit unclear why this call would block suddenly in the newer version. I suspect we have the same issue in some of the background threads performing calculations for the UI as they still appear slow so will investigate if there are more unnecessary calls to Realm() that are blocking. Am I mistaken in the understanding that there should be no additional overhead to call Realm() more than once from the same thread. I seem to recall this coming up in discussion a long time ago and Realm would simply return the same realm if one is already open on the same thread ? 1.45 min 38.7% 0 s -[RLMApp currentUser] |
@tgoyne - it seems the same issue on the other background threads where we are opening another realm inside the query from the same thread which seems to cause blocking. These UI updates take around 3 seconds normally and are taking 10 seconds or more. We just have a static function that gets called from anywhere to fetch a reference record from Realm and we don't pass in a realm we just open one in the static function. In the earlier versions of the SDK this appears to have no performance impact whereas now it is quite a severe impact. Workaround is to just pass in a realm from the extension method call rather than opening a new realm. Still you might want to check this out since it should just pass back the realm that is already open on the same thread - I think. 544.35 ms 0.1% 0 s GlobalVars.activeLaunchPeriod.getter |
Not sure if this will be of any use but the basic code to test would be this
|
Thanks, I think I see what the actual problem is now. The Swift sync config type only stores the partition key but not the file path derived from the partition, and we compute the file path each time a Realm instance is obtained. This isn't a particularly expensive operation, but it ends up being the bulk of the runtime of The weird thing though is that this isn't anything new and I see the same behavior in v10.20.0, so I'm not sure why it used to be faster. |
Thanks for the explanation - it is probably worth understanding why it has become slower. I am happy to do more testing with the two versions of the SDK if you have any suggestions on specific things to test/measure and the best way to do that. |
#7857 fixes the problem of Realm cache lookups being really slow for synchronized Realms. |
OK I am going to close this now |
How frequently does the bug occur?
All the time
Description
We have just encountered quite significant performance issues with generating reports from a Synced Realm.
The same issue does not occur when the same realm is opened as a local realm.
I am going to revert to earlier versions to see if I can find out when this because an issue.
I believe we might have previously used v10.19.0 but updated because of the Xcode 13.3 support in the built binaries with 10.24.
In the meantime if anyone has any ideas please let me know. I will revert to 10.19.0 to confirm that version does not have the issue.
Stacktrace & log output
Can you reproduce the bug?
Yes, always
Reproduction Steps
Generating standard reports usually take around 111 seconds but now take upwards of 30 minutes or more.
Version
10.24.2
What SDK flavour are you using?
MongoDB Realm (i.e. Sync, auth, functions)
Are you using encryption?
No, not using encryption
Platform OS and version(s)
macOS 12.0
Build environment
Xcode version: 13.3
Dependency manager and version: Prebuilt Binaries
The text was updated successfully, but these errors were encountered: