Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kiwix-js + IPFS = unstoppable wikipedia mirror #49

Closed
wants to merge 1 commit into from

Conversation

lidel
Copy link
Member

@lidel lidel commented Apr 23, 2020

This is an early draft of a devgrant and my notes on potential work that could benefit IPFS, distributed-wikipedia-mirror and Kiwix projects.

Publishing here as a draft so we can iterate on details.

cc @kelson42 @jnthnvctr @mossroy ipfs/distributed-wikipedia-mirror#42, ipfs/distributed-wikipedia-mirror#71, ipfs/distributed-wikipedia-mirror#69

Project Description

In this devgrant we are looking at adding permalink and IPFS support to kiwix-js project, making it a viable option for browsing distributed Wikipedia mirror from a static, regular web page.

Version for Reading

TODO

  • Gather feedback about the draft
  • Find someone with bandwidth/interest to do this
  • Figure out funding (should this be paid work, or could be volounteer work, and funding goes for pinning ZIMs etc)

License: MIT
Signed-off-by: Marcin Rataj <[email protected]>
@lidel lidel added type:open-grant Open Grant Tracking (https://github.com/protocol/ipfs-grants/blob/master/open-grants/README.md) type:targeted-grant A grant made directly to a specific recipient labels Apr 23, 2020
@momack2 momack2 removed the type:open-grant Open Grant Tracking (https://github.com/protocol/ipfs-grants/blob/master/open-grants/README.md) label May 1, 2020
@mossroy
Copy link

mossroy commented May 2, 2020

Hi, thanks for sharing this draft.
Here is some feedback.

At the beginning is written :

In its current form kiwix-js is unable to read huge files from user's machine, unless the reader is wrapped in a browser extension.

It looks wrong to me. Wrapping kiwix-js in a browser extension makes no difference on its ability to read huge files.
And, in its current state, kiwix-js is able to read ZIM files of all sizes.
If you were refering to this @kelson42 comment ipfs/distributed-wikipedia-mirror#42 (comment), he was probably talking about an enhancement of kiwix-js that we're trying to implement. It's about recompiling libzim/kiwix-lib with emscripten, in order to use them in kiwix-js (instead of the current custom javascript implementation, which is hard to maintain). It was indeed suffering from kiwix/kiwix-js#513. The corresponding issue on emscripten side has been fixed, but we did not have the time to check and go further on this enhancement.

In any case, it's true that the IPFS implementation will need to be tested with big ZIM files, because there are common pitfalls with them.

Regarding the permalinks, there's already an unperfect and incomplete kind of similar implementation in kiwix-js, that adds the article title to the current URL. But it currently only works in jQuery mode, and there is no code to be able to read back this title and fetch the article if you copy/paste the URL. There's at least one good reason for that : in a standard browser context, we can not open a local file without user interaction (no API for that in a browser, for security reasons), which is a blocker for this kind of feature. In ServiceWorker mode, updating the window URL can probably be achieved by adding an event listener on the iframe.

Regarding, the different milestones, I'd suggest to keep us informed during each milestone. i.e not wait for milestone 4 to discover what has been implemented.

@ikreymer
Copy link

ikreymer commented Jul 3, 2020

I have an experimental prototype for client-side loading of ZIM files using the replayweb.page system

Here's a comment with more details about it:
kiwix/kiwix-js#595 (comment)

In theory, this should also support loading from IPFS as long as range requests are supported. This is not wikipedia specific though and is thus far designed to be generic replay of any web page(s).. It seems like it may be possible with this approach, though it is not using kiwix-js at the moment..

@parkan
Copy link
Contributor

parkan commented Sep 22, 2020

closing this in favor of an alternate implementation

@parkan parkan closed this Sep 22, 2020
@lidel
Copy link
Member Author

lidel commented Oct 15, 2020

Quick notes on next steps:

  • Research into reading ZIM over HTTP and range requests continues upstream in Can Kiwix-JS deal with an online ZIM file available through HTTP? kiwix/kiwix-js#595
    • Additional metadata most likely needs to be published/added to ZIM, to ensure random access does not require fetching entire ZIM and is fast enough.
  • We will work with Kiwix project towards publishing .zim files on IPFS as part of their ZIM build pipeline
    • Everyone will be able to pin specific ZIM and keep it alive, even when upstream no longer pays for its hosting
    • When that happens, one could read them either via range requests to one of CORS-friendly gateways,
      or directly from IPFS via ipfs.cat(ipfsPath, { offset, length })

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:targeted-grant A grant made directly to a specific recipient
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants