Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when using libzim wasm #872

Closed
mossroy opened this issue Jun 6, 2022 · 8 comments
Closed

Memory leak when using libzim wasm #872

mossroy opened this issue Jun 6, 2022 · 8 comments
Milestone

Comments

@mossroy
Copy link
Contributor

mossroy commented Jun 6, 2022

If you watch the memory consumption of the browser (tested with Chromium) on branch https://github.com/kiwix/kiwix-js/tree/libzim-experiments, you see that it gradually raises when browsing.

It's more obvious when reading large contents, like ZIM files that have videos (I tested with dirtybiology ones)

After opening a few videos, it crashes with the following error message in the log:

Cannot enlarge memory, asked to go up to 2213478400 bytes, but the limit is 2147483648 bytes!

So I suspect that the ZIM contents are not deleted from memory after they are used, probably in the C bindings

@Jaifroid
Copy link
Member

Jaifroid commented Jun 6, 2022

I haven't looked closely at the glue you implemented, but when I did the ZSTD decoding glue, I ran into a similar issue because I thought I had to malloc and de-allocate memory between each call to the Emscripten VM. It turned out that de-allocation never entirely got rid of the memory structures, and so each new allocation made the memory grow until it crashed.

The solution I arrived at is summarized in the comment here -- essentially I assigned the control structures and memory space once, and then just re-used them for every operation.

Of course this was for a single decoder, not for a whole application like libzim. So the experience here may not be relevant.

@mossroy mossroy modified the milestones: v3.5, v3.6 Aug 4, 2022
@Jaifroid Jaifroid modified the milestones: v3.6, v3.7 Nov 9, 2022
@kelson42
Copy link
Collaborator

kelson42 commented Nov 13, 2022

Here a few comments about how other Kiwix readers deal with videos:

  • Videos are not compressed within the ZIM because this is not necessary. Like for pictures, the native compression algorithm, webp for example, is good enough.
  • Considering that videos can be large files, their decompression - if they were compressed - would take anyway a large amount of CPU/Time and memory. This would be a problem in term of usability (at least). This would probably not be possible to do that at all on low-end devices.
  • For the same reason, its not recommended to ask the libzim to read the cluster to exctract videos and transmit them as a big blob to the Kiwix reader.

The recommend way is to request the information from the libzim, where exactly the video starts and how long it is. Then the reader should open itself a file/video handle on that portion of data and read it directly.

@Jaifroid
Copy link
Member

The recommend way is to request the information from the libzim, where exactly the video starts and how long it is. Then the reader should open itself a file/video handle on that portion of data and read it directly.

That's interesting, so we need custom support in the reader for videos. We currently do this of course, but we would need to do it with the correct libzim API. I imagine there could be some difficulty with split ZIM archives, or rather, the reader would need to handle the offsets taking into account the split.

@kelson42
Copy link
Collaborator

kelson42 commented Nov 13, 2022

The recommend way is to request the information from the libzim, where exactly the video starts and how long it is. Then the reader should open itself a file/video handle on that portion of data and read it directly.

That's interesting, so we need custom support in the reader for videos. We currently do this of course, but we would need to do it with the correct libzim API. I imagine there could be some difficulty with split ZIM archives, or rather, the reader would need to handle the offsets taking into account the split.

There is an API for that, but can not tell which one exactly, but browsing through the API doc should help to answer that question. There is no problem with split ZIM files, because this should never happen in the middle of a cluster (this is exactly for this use case that we don't allow random cut position anymore).

@mossroy
Copy link
Contributor Author

mossroy commented Nov 13, 2022

From what I remember, this memory leak issue was not specifically related to videos.
The fact that videos take a lot of space simply made the problem more obvious.
I suspect the problem is in our C/emscripten glue code to call the libzim.

Regarding videos, I had opened #869 to extract them in small chunks. It's probably enough, and I currently don't see how to do something more in SW mode.

@mossroy
Copy link
Contributor Author

mossroy commented Nov 19, 2022

Maybe this ticket should be transferred in https://github.com/openzim/javascript-libzim/
Because it's probable that it's where the problem is (but not 100% sure)

@kelson42
Copy link
Collaborator

I will try to do so without scrambling everything...

@Jaifroid
Copy link
Member

I couldn't work out how to transfer this issue to a different namespace (openzim). So instead, I posted openzim/javascript-libzim#34 with a link back to this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants