Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

[core] Update SQLite schema with WAL journal mode and normal sync #5796

Merged
merged 2 commits into from
Aug 23, 2016

Conversation

friedbunny
Copy link
Contributor

@friedbunny friedbunny commented Jul 26, 2016

This pull request changes our offline/ambient cache database schema to use write-ahead logging and normal sync, which greatly increases the speed that resources can be written and read.

Journal modes

SQLite offers several journal modes — we were using the default, DELETE, which continuously writes a “rollback journal” that is then deleted at the end of the transaction. This mode offers good data safety and is atomic, but is subsequently somewhat slow.

Write-head logging (WAL) is a different type of journaling that offers better performance but marginally reduced data safety. WAL effectively writes to a separate cache and periodically flushes back to the main database.

The SQLite docs go into great detail about the pros and cons of WAL, so I won’t rehash them here.

Synchronous modes

SQLite also offers several modes of write synchronization. FULL is the default (which we were using) and is quite safe, but slow. NORMAL syncs less often (but still at “critical moments”) and is faster, but comes with a “non-zero” chance of data loss. NORMAL is the suggested mode to be used in conjunction with WAL and is what this PR uses.

NORMAL only offers a minimal performance increase, so far as I could tell.

Performance tests with offline resources

Bottom-line: WAL offers 2× to 5× performance for us on iOS, depending on device vintage. Speeds appear to scale linearly — two rounds of tests were done with ~1000 resources and ~6000 resources. Initial writes/downloads were on my home internet connection, which was not saturated.

iPhone 6s (iOS 9.3.2, SQLite 3.8.10.2) iPhone 5 (iOS 10b2, SQLite 3.13.0)
DELETE + FULL 2.5 MB/s writing, 4.0 MB/s verifying 1.2 MB/s writing, 1.8 MB/s verifying
WAL + NORMAL 4.4 MB/s writing, 21.9 MB/s verifying 1.2 MB/s writing, 4.2 MB/s verifying

The iPhone 5 appeared to be severely CPU-bound and was pegged at 100%.

Availability

WAL was first supported in SQLite 3.7.0, which is available from iOS 5 and Android API 11 — so it should be available everywhere.

Unknowns

  • This PR will increase the chances of data loss, but it’s unknown whether or not this increase is significant.
  • If data loss were to occur, the failure mode is unknown.
  • Android performance is untested.
  • The impact on other platforms?

/cc @mapbox/gl

@friedbunny friedbunny added performance Speed, stability, CPU usage, memory usage, or power usage offline Core The cross-platform C++ core, aka mbgl labels Jul 26, 2016
@mention-bot
Copy link

@friedbunny, thanks for your PR! By analyzing the annotation information on this pull request, we identified @jfirebaugh, @kkaefer and @brunoabinader to be potential reviewers

@kkaefer
Copy link
Member

kkaefer commented Jul 26, 2016

This PR will increase the chances of data loss, but it’s unknown whether or not this increase is significant.

When I experimented with WAL dbs, I never saw any sort of data loss. Even if it happens, we're writing a cache, so if we lose a couple of records, it's not a big deal.

If data loss were to occur, the failure mode is unknown.

Our current layer is implemented so that it deletes the cache and starts over if the DB file is unreadable.

@zugaldia
Copy link
Member

Our current layer is implemented so that it deletes the cache and starts over if the DB file is unreadable.

Considering that the cached tiles and the offline tiles share the same database, wouldn't this delete as well any offline tiles the user has previously downloaded, not just cached ones?

@friedbunny
Copy link
Contributor Author

Considering that the cached tiles and the offline tiles share the same database, wouldn't this delete as well any offline tiles the user has previously downloaded, not just cached ones?

This is a good point and is a shortcoming of our combined ambient+offline approach. If the risk of data loss is not meaningfully increased by WAL+Normal, then this won’t be any more of a problem than it already is.

It’s also worth noting that Core Data switched to WAL for its backing SQLite database in iOS 7 and OS X 10.9 in 2013, so this is a well-tested configuration on Apple systems.

The test failures on CI look legit — I’ll investigate updating them for this PR.

@mention-bot
Copy link

mention-bot commented Jul 26, 2016

@friedbunny, thanks for your PR! By analyzing this pull request, we identified @jfirebaugh to be potential reviewers.

FB: this was me testing @mention-bot.

@friedbunny
Copy link
Contributor Author

Test failures were because OfflineDatabase.MigrateFromV2Schema had a hard-coded expectation of schema v3.

@incanus
Copy link
Contributor

incanus commented Jul 27, 2016

Couple notes based on reviewing this, some discussed offline:

  • I'm not seeing much mention of data loss—@friedbunny where are you reading up on that?

  • Docs mention adverse performance:

    WAL might be very slightly slower (perhaps 1% or 2% slower) than the traditional rollback-journal approach in applications that do mostly reads and seldom write.

    I wonder if we should concretely test this (also on multiple platforms) for the more general use case of normal caching.

  • We should flag in code comments notes around read-only media just so we don't get bitten in future.

    Hence, SQLite databases should always be converted to PRAGMA journal_mode=DELETE prior to being transferred to read-only media.

  • Should we also consider TRUNCATE mode, which might be a more middle-ground compromise between the default DELETE and WAL?

    https://www.sqlite.org/pragma.html#pragma_journal_mode

  • Lastly, and sorry if I missed this elsewhere, but can we get a crosslink to a master ticket for the main driver behind this, presumably offline performance on older devices? I know another issue (Ask HTTP implementations not to do content decoding #3886) that is holding up better offline performance. If we can ticket this holistically and make measurable gains over time, it will be easier to gauge priorities on individual tickets.

@friedbunny
Copy link
Contributor Author

friedbunny commented Aug 2, 2016

I'm not seeing much mention of data loss—@friedbunny where are you reading up on that?

Many of the examples here mention WAL as a potential aggravator. There are some circumstances where WAL is an improvement (e.g., out of order writes), though.

Docs mention adverse performance

I haven’t seen slower performance in any circumstance (read or write) so far, but yeah, we should test the ambient caching use case more throughly.

Should we also consider TRUNCATE mode, which might be a more middle-ground compromise between the default DELETE and WAL?

I’d prefer to go whole-hog and use WAL, which seems like it should be the best option for us, but I haven’t tested TRUNCATE yet.

Lastly, and sorry if I missed this elsewhere, but can we get a crosslink to a master ticket for the main driver behind this

There’s no general ticket related to database performance AFAIK. This pull request offers a fairly significant performance increase for everyone, but there’s no particular impetus other than that. Do we need more than GitHub’s “offline” + “performance” tags?

@friedbunny friedbunny self-assigned this Aug 2, 2016
@jfirebaugh
Copy link
Contributor

This sounds good to me.

The iPhone 5 appeared to be severely CPU-bound and was pegged at 100%.

This is due to #3886.

@friedbunny friedbunny added this to the ios-v3.4.0 milestone Aug 22, 2016
@friedbunny
Copy link
Contributor Author

friedbunny commented Aug 22, 2016

Cool! On iOS we’re feeling good about getting this in v3.4.0 and would like to merge this week.

This will affect other platforms, so it’s important that we get sign-off from the Android and Qt folks. Any concerns about switching to WAL, @tmpsantos @brunoabinader @zugaldia @tobrun?

@zugaldia
Copy link
Member

@friedbunny 👍

@incanus
Copy link
Contributor

incanus commented Aug 22, 2016

Is it possible at all to have some sort of baseline test for speed so we can attach numbers to this?

@tmpsantos
Copy link
Contributor

Any concerns about switching to WAL?

There is one corner case that we need to keep in mind which is this only works with write access to the cache, so if you have a read-only offline database on a read-only partition, it won't work.

That said, :shipit:. This is a fairly easy change to revert in a custom build if we ever need to address this corner case.

@friedbunny
Copy link
Contributor Author

Thanks everybody, sounds like we’re good to go. I’m going to rebase this, let the tests pass, and then merge today.

@friedbunny friedbunny force-pushed the fb-sqlite-wal branch 2 times, most recently from 2cf843d to 0cf9e54 Compare August 23, 2016 17:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Core The cross-platform C++ core, aka mbgl offline performance Speed, stability, CPU usage, memory usage, or power usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants