-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely slow file handling with archives in Android #42
Comments
I think I've narrowed it down to whatever lies behind this code in the bindings: The app hangs (in Android) for a ridiculously long time even with Ray Charles on this JS invocation of the above code: Strongly suggests that the problem is in libzim, or in the interaction between the file representation in Android and libzim. Note that this code runs AFTER the archive has been passed to the Web Worker, so problems with passing the ArrayBuffer are definitively ruled out. |
Perhaps @mgautierfr's analysis of search performance in openzim/libzim#418 is relevant to this issue, in particular the observation that most of the "lost" time is spent on I/O. However, the real bottleneck we're experiencing is in the function that loads the ZIM archive. Unless some of the caching ideas regarding Xapian search have been implemented, just loading the archive should not even initiate Xapian code, right? |
This comment from mossroy is pertinent, but using pools of Web Workers wouldn't solve the loading bottleneck:
|
I don't know how you "system" is architectured. Where the io is happening ? When you want to access libzim or when libzim access the file ? You speak about fulltext search. fulltext search is made by xapian library itself and I don't know how it is working. But it seems that loading the xapian database need a lot of io (we can humanly see that even locally in cpp). Once the database is loaded, following search are pretty quick. Maybe it is the same problem but augmented by a io a bit slower in JS/SW/WebWorker (even if 90s seem really a lot, even for that) Your test for access to large file seems to get only one bytes (at different offsets). How it is behaving when you try to read several bytes (few KB, few MB, few hundreds of MB) ? How it is behaving when you try to access at random offsets (not only increasing offsets) ? |
@mgautierfr The bottleneck is entirely inside the libzim WASM from what I can tell. It's not the Xapian I/O (inside libzim) that is causing the bottleneck, as we are simply loading the ZIM archive into libzim at this point. I presume it starts to read metadata, establish caches, etc. when it runs the File reading on Android is always slow, but not this slow with an archive like Ray Charles! Our legacy back end, which emulates libzim in JavaScript, can load full English Wikipedia on Android, a bit slowly, but acceptably. Hence I'm so surprised that simply loading Ray Charles in libzim WASM is taking so long on Android, and that archives larger than about 500MB never seem to finish loading. I guess we're a bit stuck with this issue if there's no scope for reducing I/O or whatever it is that is taking so long. I'll keep investigating... |
Can you have a trace of all io made by libzim (how many bytes read, at which offsets) ? |
Good idea -- although I probably can't do that for the WASM (which is a kind of machine assembly code), I can probably do it in the ASM version which is JavaScript bytecode, and so vaguely human-readable and traceable/debuggable. |
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
Issue was initially discussed and diagnosed over at kiwix/kiwix-js-pwa#343. There, I thought the issue was with Emscripten's WORKERFS. However, using the test case in this Repo for large file access, and debugging on a Chromium instance on a midrange Samsung Android, the WORKERFS has no problem loading and reading (instantly) a 92GB Wikipedia ZIM from a microSD card. See screenshot below. If we can read bytes from the end of the file nearly instantly, why is javascript-libzim getting such awful performance? Loading Ray Charles into the WASM on Android takes 30 seconds in Samsung Internet (a Chromium browser), and nearly 90 seconds in Chrome. I was unable to load anything larger than 500MB into either instance.
The slowdown is mostly in relation to instantiating the archive. Once the archive has registered, full-text searching is reasonable. This is why I initially thought the issue was to do with passing the ArrayBuffer to the Web Worker, but big file test does this instantly. @mgautierfr would you have any thoughts on what could be going on here?
NB we can't test on Firefox because Firefox on Android attempts to copy picked archives into memory (or possibly an internal file system), and crashes on anything larger than about 2GB.
The text was updated successfully, but these errors were encountered: