Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying zh(Chinese) version of Wikipedia shows 'failed to parse input: OutOfBounds' #73

Closed
FledgeXu opened this issue Jan 24, 2021 · 8 comments
Labels
dif/expert Extensive knowledge (implications, ramifications) required effort/days Estimated to take multiple days, but less than a week kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding

Comments

@FledgeXu
Copy link
Contributor

I'm attempting to deploy zh(Chinese) version of Wikipedia and the script shows 'failed to parse input: OutOfBounds'
OS version:

Linux localhost 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08) x86_64 GNU/Linux

Rustc version:

rustc 1.49.0 (e1884a8e3 2020-12-29)

logs:

root@localhost:~/distributed-wikipedia-mirror# ./mirrorzim.sh --languagecode=zh --wikitype=wikipedia

Download the zim file...
base64: invalid input
--2021-01-23 22:01:06--  https://download.kiwix.org/zim/wikipedia/wikipedia_zh_all_maxi_2021-01.zim
Resolving download.kiwix.org (download.kiwix.org)... 195.154.156.115
Connecting to download.kiwix.org (download.kiwix.org)|195.154.156.115|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ftpmirror.your.org/pub/kiwix/zim/wikipedia/wikipedia_zh_all_maxi_2021-01.zim [following]
--2021-01-23 22:01:06--  https://ftpmirror.your.org/pub/kiwix/zim/wikipedia/wikipedia_zh_all_maxi_2021-01.zim
Resolving ftpmirror.your.org (ftpmirror.your.org)... 204.9.55.82, 2001:4978:1:420::cc09:3752
Connecting to ftpmirror.your.org (ftpmirror.your.org)|204.9.55.82|:443... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable

    The file is already fully retrieved; nothing to do.


Remove tmp directory ./tmp/wikipedia_zh_all_maxi_2021-01 before run ...
Unpack the zim file into ./tmp/wikipedia_zh_all_maxi_2021-01...
thread 'main' panicked at 'failed to parse input: OutOfBounds', src/bin/extract_zim.rs:56:36
stack backtrace:
   0:     0x55d36d1e8360 - std::backtrace_rs::backtrace::libunwind::trace::h04d12fdcddff82aa
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/../../backtrace/src/backtrace/libunwind.rs:100:5
   1:     0x55d36d1e8360 - std::backtrace_rs::backtrace::trace_unsynchronized::h1459b974b6fbe5e1
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55d36d1e8360 - std::sys_common::backtrace::_print_fmt::h9b8396a669123d95
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x55d36d1e8360 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he009dcaaa75eed60
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x55d36d209aec - core::fmt::write::h77b4746b0dea1dd3
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/fmt/mod.rs:1078:17
   5:     0x55d36d1e49f2 - std::io::Write::write_fmt::heb7e50902e98831c
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/io/mod.rs:1518:15
   6:     0x55d36d1ea965 - std::sys_common::backtrace::_print::h2d880c9e69a21be9
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x55d36d1ea965 - std::sys_common::backtrace::print::h5f02b1bb49f36879
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x55d36d1ea965 - std::panicking::default_hook::{{closure}}::h658e288a7a809b29
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:208:50
   9:     0x55d36d1ea608 - std::panicking::default_hook::hb52d73f0da9a4bb8
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:227:9
  10:     0x55d36d1eb101 - std::panicking::rust_panic_with_hook::hfe7e1c684e3e6462
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:593:17
  11:     0x55d36d1eac47 - std::panicking::begin_panic_handler::{{closure}}::h42939e004b32765c
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:499:13
  12:     0x55d36d1e881c - std::sys_common::backtrace::__rust_end_short_backtrace::h9d2070f7bf9fd56c
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:141:18
  13:     0x55d36d1eaba9 - rust_begin_unwind
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:495:5
  14:     0x55d36d207a51 - core::panicking::panic_fmt::ha0bb065d9a260792
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/panicking.rs:92:14
  15:     0x55d36d207873 - core::option::expect_none_failed::h7e1dd0a94971eb61
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/option.rs:1268:5
  16:     0x55d36d0feef0 - extract_zim::main::h0d770a376a8e6eab
  17:     0x55d36d0f9bd3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h5ecc56c6658a80dd
  18:     0x55d36d0fa599 - std::rt::lang_start::{{closure}}::hb0d654310eb3e6ce
  19:     0x55d36d1eb617 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h57e2a071d427b24c
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/ops/function.rs:259:13
  20:     0x55d36d1eb617 - std::panicking::try::do_call::h81cbbe0c3b30a28e
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:381:40
  21:     0x55d36d1eb617 - std::panicking::try::hbeeb95b4e1f0a876
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:345:19
  22:     0x55d36d1eb617 - std::panic::catch_unwind::h59c48ccb40a0bf20
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panic.rs:396:14
  23:     0x55d36d1eb617 - std::rt::lang_start_internal::ha53ab63f88fee728
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/rt.rs:51:25
  24:     0x55d36d100ba2 - main
  25:     0x7f3fc854609b - __libc_start_main
  26:     0x55d36d0f80da - _start
  27:                0x0 - <unknown>
@FledgeXu FledgeXu added the need/triage Needs initial labeling and prioritization label Jan 24, 2021
@welcome

This comment has been minimized.

@FledgeXu
Copy link
Contributor Author

FledgeXu commented Jan 24, 2021

I check out this #66.
It seems extract_zim can not extract the newest snapshots.

@kelson42
Copy link

kelson42 commented Jan 24, 2021

The problem is probably that it does not handle zstd compression introduced early 2020 in the ZIM format.

@lidel lidel added dif/expert Extensive knowledge (implications, ramifications) required effort/days Estimated to take multiple days, but less than a week kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding and removed need/triage Needs initial labeling and prioritization labels Jan 25, 2021
@lidel
Copy link
Member

lidel commented Jan 25, 2021

The fix requires switching to zimtools – #66

@lidel
Copy link
Member

lidel commented Feb 8, 2021

@FledgeXu
Copy link
Contributor Author

FledgeXu commented Feb 9, 2021

@lidel Thanks, using the zimdump works for me now.

@FledgeXu FledgeXu closed this as completed Feb 9, 2021
@lidel
Copy link
Member

lidel commented Feb 9, 2021

@FledgeXu if you want to give it a try and generate zh version, make sure you use updated scripts from #77 – the old ones won't work correctly with version produced by zimdump.

@FledgeXu
Copy link
Contributor Author

FledgeXu commented Feb 9, 2021

Thanks, @lidel.
I have done the test on my machine manual and I will try the updated scripts on the server. If I meet any bugs, I will report them under #77.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dif/expert Extensive knowledge (implications, ramifications) required effort/days Estimated to take multiple days, but less than a week kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding
Projects
None yet
Development

No branches or pull requests

3 participants