Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted Database #1212

Closed
youens opened this issue Dec 11, 2014 · 15 comments
Closed

Corrupted Database #1212

youens opened this issue Dec 11, 2014 · 15 comments

Comments

@youens
Copy link

youens commented Dec 11, 2014

We had a crash of our app presumably during a write transaction. In any case, afterwards the app crashes on launch every time. Lucky I was able to get it tethered and grab the Realm file. It appears it's corrupted as it crashes the desktop app when I try to open it as well. Here it is if you want to try: http://j.youens.com/Yv4S

As for the crash, it was in tightdb::LangBindHelper commit_and_continue_as_read. It happens as soon as the Realm tries to load. (I set the default realm path to the file, then use [RLMRealm defaultRealm].)

Any ideas would be helpful. We're about to submit this app and this has me a be paranoid being that there's no real recovery for it for our users. 😮

@timanglade
Copy link
Contributor

Thanks for passing the file along and our apologies for the issue. We’ll investigate this asap.

@timanglade
Copy link
Contributor

@youens forgot to ask, can you share some code here (or privately to [email protected]) that recreates the issue?

@youens
Copy link
Author

youens commented Dec 11, 2014

I'd be happy to, though there's not really any code that runs before this other than setting the path...

// all is good here
NSString *filePath = [self databasePath];
[RLMRealm setDefaultRealmPath:filePath];

// then on the first call to this... crash
RLMRealm *realm = [RLMRealm defaultRealm];

The path is valid and points to the corrupted file (linked above) in my code. It is in a shared app container, though access to the file isn't the problem. (I am able to copy it from the path to Documents, etc.)

If you're asking how to recreate getting it in the corrupted state in the first place... I'm not able to replicate. I can only assume it was an ill-timed crash or some other such thing.

PS: Using Realm (0.88.0)

@jpsim
Copy link
Contributor

jpsim commented Dec 11, 2014

There's not much we'll be able to do without a reproducible case. Sharing your code would still be invaluable, since we don't even have an occasional reproducible code sample right now.

@youens
Copy link
Author

youens commented Dec 11, 2014

Do you mean code to reproduce the crash that led to the corrupted database? I can't reproduce that... it happened one time, to a beta tester in my office. Knows which step it was on – the only write code there is this:

// uses the same realm of the group
RLMRealm *realm = self.group.realm;

[realm beginWriteTransaction];

self.group.name = self.nameCell.text;

// where "people" is a normal NSArray of RLMObject
[self.group.people removeAllObjects];
[self.group.people addObjects:people];

[realm commitWriteTransaction];

Perhaps there's something odd with that or I'm using it wrong? Have have a few hundred testers on this, and it's the only case so far... but I hope you'll agree that any corrupted file is one too many. :)

@timanglade
Copy link
Contributor

We definitely agree one corrupted file is too many! Which is why we’re very eager to get to the bottom of this. We’ve gotten 1-2 other reports of files getting corrupted, all happening under similarly vague circumstances (i.e. only one tester sees it on one device, and then that case can never be reproduced),

Are you using an iOS 8 extension by any chance?
Do you have concurrent read/writes happening?
Is there any way you could share the full code for the app privately? (We can sign an NDA if necessary.)

@youens
Copy link
Author

youens commented Dec 11, 2014

We do have a Share extension, but it was not and had not been running in the same testing session. (That was my first question as that's been an issue as well!)

We may occasionally make some concurrent read/writes, but it would be very rare, and very unlikely in this piece of code.

I may followup via email about sharing code, but I'm not sure on that point. Either way, I certainly will keep an eye out and report back any further occurrences.

PS: I'm now making a copy of the .realm file to documents on clean app exits... so at least if it's corrupted I can recover. Though, the user will probably try deleting and reinstalling first... but at least I tried. ;-)

@timanglade
Copy link
Contributor

For what it’s worth, here’s a fixed version of your realm file https://dl.dropboxusercontent.com/u/348446/fixed.realm We’re investigating what could be causing this.

Would you be open to using a slightly different (slower) version of Realm that would help us log more details about what could be causing this?

@jpsim
Copy link
Contributor

jpsim commented Dec 19, 2014

Hi @youens, we just released Realm 0.89, which fixes a few bugs and adds quite a few consistency assertions and validation checks.

I strongly encourage you to update your apps to 0.89 as soon as possible. If corruption or crashes happen again, the backtrace is likely to have more information so we can identify what happened and increase the odds of fixing it.

@jacobsantos
Copy link

I can give you a reproduction example, usually occurs during dispatch_async, either using global or separate group. You of course, need data prior to the crash.

  1. In dispatch_async 1: Delete what is in the database.
  2. In dispatch_async 2: Attempt to Read what is in the database.

The best explanation I can think of is that given that the RLMObject appears to lazy read attributes, that when an object is deleted that it is expecting to still exist given that the read was able to retrieve the rows prior that once it attempts the read, it fails and produces a crash when asserting that the index is less than rows.

This crash seems to corrupt the database preventing reads or writes.

@jacobsantos
Copy link

I think technically, any crash taking place during a read or write that leaves the database still open might corrupt the database, but that it is just a guess.

Does Realm do any Journaling?

If you need code, then I should be able to create a project fairly quickly (a couple of hours maybe).

@timanglade
Copy link
Contributor

Sorry for the delay. This got stuck in my inbox over the holidays.

A crash should always leave the database in a valid state. The latest commit should either be fully complete or not done at all. Hence, a corrupt database indicates an error which we would like very much to investigate.

Re journaling: we don’t technically do any — we do something different with a similar effect.

A repro case is always much appreciated. You can send it here or to [email protected]

@semireg
Copy link

semireg commented Jan 15, 2015

I wrote two emails on January 14th highlighting a reproducible crash that corrupts the database. Pasted here for analysis:

We’re seeing crashes that corrupt the Realm DB. We’re using the latest available version 0.89.2.

Realm is powering our graphing and trend analysis on data coming in from bluetooth sensors. This works great, so long as there is only one trend object writing to the database.

Since Realm can’t yet generate fine-grained notifications, the solution is architected in a self-admitted funky way.

We have a trend object that accumulates trend points. We call a trend’s addValue: method, and it adds the trend to the correct time bucket, trims any old trend points, and creates relations for any trend points added (to the front)/deleted (from the end). If the trend adds/deletes any points, that method will send a notification with that trend’s primary key.

Then, on the graphing side, the trend graph receives the notification, compares it to it’s own trend’s key, and reads the trend object. It updates the chart with the added/deleted trend points (which is a critical step for us - we need to know exactly how the trend has changed). Then, it removes all added/deleted trend points from those RLMArrays (relationships).

Right now we modify all trends on one NSOperationQueue. If we disable these blocks and run them straight-away, it crashes faster. Setting the queue to maxConcurrentOperations = 1 lets us run without almost any crashes… that is, until it does.

What do you think?

Caylan

p.s. These are almost always EXC_BAD_ACCESS

here’s an example of a crash on write:

#0 0x0045f8dc in tightdb::Array::init_from_mem(tightdb::MemRef) ()
#1 0x0045ff20 in tightdb::Array::update_from_parent(unsigned long) ()
#2 0x004d3bba in tightdb::Table::update_from_parent(unsigned long) ()
#3 0x004ddade in tightdb::Group::update_refs(unsigned long, unsigned long) ()
#4 0x004536ba in std::__1::unique_ptr<tightdb::SharedGroup, std::__1::default_deletetightdb::SharedGroup >::operator*() const [inlined] at /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:2602
#5 0x004536b2 in -[RLMRealm commitWriteTransaction] at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMRealm.mm:496
#6 0x00401c16 in __55-[PPPSensorReadingTrendPlugin updateTrendsWithObject:]_block_invoke at /Users/developer/Dev/App2/PPPSensorKit/PPPSensorKit/PPPSensorReadingTrendPlugin.m:182
#7 0x28ec40fc in NSBLOCKOPERATION_IS_CALLING_OUT_TO_A_BLOCK ()
#8 0x28e2efc4 in -NSBlockOperation main
#9 0x28e21844 in -__NSOperationInternal _start:
#10 0x28ec6a56 in __NSOQSchedule_f ()
#11 0x01234e1c in _dispatch_queue_drain ()
#12 0x0122f2f4 in _dispatch_queue_invoke ()
#13 0x01236558 in _dispatch_root_queue_drain ()
#14 0x01237880 in _dispatch_worker_thread3 ()
#15 0x35fdee24 in _pthread_wqthread ()

here’s another example of a crash on read:

#0 0x004bf950 in tightdb::Table::connect_opposite_link_columns(unsigned long, tightdb::Table&, unsigned long) ()
#1 0x004c0162 in tightdb::Table::refresh_column_accessors(unsigned long) ()
#2 0x004ce07c in tightdb::Group::do_get_table(unsigned long, bool ()(tightdb::Spec const&)) ()
#3 0x004c00ec in tightdb::Table::refresh_column_accessors(unsigned long) ()
#4 0x004ce07c in tightdb::Group::do_get_table(unsigned long, bool (
)(tightdb::Spec const&)) ()
#5 0x004c00ec in tightdb::Table::refresh_column_accessors(unsigned long) ()
#6 0x004ce07c in tightdb::Group::do_get_table(unsigned long, bool ()(tightdb::Spec const&)) ()
#7 0x004ce14c in tightdb::Group::do_get_table(tightdb::StringData, bool (
)(tightdb::Spec const&)) ()
#8 0x00421ee4 in tightdb::StringData::StringData(char const_) [inlined] at /Users/realm/workspace/objc_ios/tightdb_objc/core/include/tightdb/group.hpp:628
#9 0x00421ebe in RLMTableForObjectClass(RLMRealm, NSString) at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMRealm_Private.hpp:60
#10 0x0042233a in -[RLMObjectSchema table] at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMObjectSchema.mm:250
#11 0x004263f2 in RLMGetObjects(RLMRealm, NSString, NSPredicate_) at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMObjectStore.mm:404
#12 0x0041f232 in +[RLMObject objectsWithPredicate:] at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMObject.mm:189
#13 0x003a585a in +[PPPSensorReading(Realm) realmSensorForSensorName:sensorClassString:createIfMissing:inRealm:] at /Users/developer/Dev/App2/PPPSensorKit/PPPSensorKit/PPPSensorReading+Realm.m:27
#14 0x003a5c42 in +[PPPSensorReading(Realm) realmReadingForSensorName:sensorClassString:sensorKey:readingName:unitString:createIfMissing:inRealm:] at /Users/developer/Dev/App2/PPPSensorKit/PPPSensorKit/PPPSensorReading+Realm.m:74
#15 0x003f2596 in __55-[PPPSensorReadingTrendPlugin updateTrendsWithObject:]_block_invoke at /Users/developer/Dev/App2/PPPSensorKit/PPPSensorKit/PPPSensorReadingTrendPlugin.m:150

and here’s the ugly message when we restart the app:

#0 0x0047e950 in tightdb::Table::connect_opposite_link_columns(unsigned long, tightdb::Table&, unsigned long) ()
#1 0x0047f162 in tightdb::Table::refresh_column_accessors(unsigned long) ()
#2 0x0048d07c in tightdb::Group::do_get_table(unsigned long, bool ()(tightdb::Spec const&)) ()
#3 0x0047f0ec in tightdb::Table::refresh_column_accessors(unsigned long) ()
#4 0x0048d07c in tightdb::Group::do_get_table(unsigned long, bool (
)(tightdb::Spec const&)) ()
#5 0x0047f0ec in tightdb::Table::refresh_column_accessors(unsigned long) ()
#6 0x0048d07c in tightdb::Group::do_get_table(unsigned long, bool ()(tightdb::Spec const&)) ()
#7 0x0047f138 in tightdb::Table::refresh_column_accessors(unsigned long) ()
#8 0x0048d07c in tightdb::Group::do_get_table(unsigned long, bool (
)(tightdb::Spec const&)) ()
#9 0x0048d496 in tightdb::Group::do_get_or_add_table(tightdb::StringData, bool ()(tightdb::Spec const&), void ()(tightdb::Table&), bool_) ()
#10 0x003e2bfe in tightdb::StringData::StringData(char const_) [inlined] at /Users/realm/workspace/objc_ios/tightdb_objc/core/include/tightdb/group.hpp:653
#11 0x003e2bd2 in RLMTableForObjectClass(RLMRealm, NSString, bool&) [inlined] at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMRealm_Private.hpp:55
#12 0x003e2ba6 in RLMRealmCreateTables(RLMRealm, RLMSchema, bool) at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMObjectStore.mm:169
#13 0x00402780 in (anonymous namespace)::createTablesInTransaction(RLMRealm, RLMSchema) at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMRealm.mm:137
#14 0x004020c6 in +[RLMRealm realmWithPath:readOnly:inMemory:dynamic:schema:error:] at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMRealm.mm:389
#15 0x00401af6 in +[RLMRealm realmWithPath:readOnly:error:] at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMRealm.mm:291
#16 0x00401a36 in +[RLMRealm defaultRealm] at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMRealm.mm:279
#17 0x003dda72 in +[RLMObject allObjects] at /Users/realm/workspace/objc_ios/tightdb_objc/Realm/RLMObject.mm:161
#18 0x0032c666 in +[PPPRealmTrend nextKey] at /Users/developer/Dev/App2/PPPSensorKit/PPPSensorKit/PPPRealmTrend.m:32
#19 0x0032c7a0 in +[PPPRealmTrend trendWithNumberOfPoints:secondsBetweenPoints:] at /Users/developer/Dev/App2/PPPSensorKit/PPPSensorKit/PPPRealmTrend.m:42
#20 0x0016abf0 in -[PPPTrendGraphViewController defaultPreferences] at /Users/developer/Dev/App2/App/App/VCs/PPPTrendGraphViewController.m:530
#21 0x00179a56 in -[PPPWidgetViewController setWidget:] at /Users/developer/Dev/App2/App/App/VCs/PPPWidgetViewController.m:344
#22 0x0016a99c in -[PPPTrendGraphViewController setWidget:] at /Users/developer/Dev/App2/App/App/VCs/PPPTrendGraphViewController.m:519
#23 0x0031b4be in -[PPPWidgetFlowLayoutViewController cellForWidget:] at /Users/developer/Dev/App2/App/App/VCs/PPPWidgetFlowLayoutViewController.m:266
#24 0x0031b126 in -[PPPWidgetFlowLayoutViewController flowLayoutView:cellAtIndex:] at /Users/developer/Dev/App2/App/App/VCs/PPPWidgetFlowLayoutViewController.m:242
#25 0x004ddacc in -[IGFlowLayoutView arrangeItems] at /Users/Shared/Jenkins/Home/jobs/NucliOS_20142_Installer/workspace/Source/IG/IG/FlowLayout/IGFlowLayoutView.m:889
#26 0x004dc168 in -[IGFlowLayoutView updateSize:prevSize:prevOffset:] at /Users/Shared/Jenkins/Home/jobs/NucliOS_20142_Installer/workspace/Source/IG/IG/FlowLayout/IGFlowLayoutView.m:503
#27 0x004dbebe in -[IGFlowLayoutView setBounds:] at /Users/Shared/Jenkins/Home/jobs/NucliOS_20142_Installer/workspace/Source/IG/IG/FlowLayout/IGFlowLayoutView.m:466

And then... another email a few hours later:

I have some more evidence that may be of use in tracking down the reason for the crashes.

I refactored the notifications so that the graph UI doesn't need to modify any Realm relationships. This didn't mitigate the crashes. I can reliably run multiple trends on multiple sensors at any given time without any crashes. It seems to crash only when I incorporate additional writes on a trend that's active. So, multiple trends updating is no problem... but modifying a property on one of those active trends reliably causes a crash.

Does anyone have any advice on minimizing Realm fragility? I've had such a great experience with Realm up to this point.

Thoughts?

We're working on a github project to aggravate this condition. Stay tuned.

@timanglade
Copy link
Contributor

Thanks @semireg. We should be tackling these emails now. Anything reproducible is good news for us (and for you), and those crashes are our top priority. Please do send a project along if you can, otherwise, stay tuned for response from us on those emails.

@alazier
Copy link
Contributor

alazier commented Jan 23, 2015

We fixed some issues which could have potentially caused corruption in release 0.90.0. Closing this for now but if you continue to have issues please re-open.

@alazier alazier closed this as completed Jan 23, 2015
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants