-
updates to use
[email protected]
- a major result-format change -
renames bin cmd to
wiki2mongo
-
supports use from cli, or use via javascript
require()
-
support --plaintext flag
- add try/catch
- supoprt --skip_redirects && --skip_disambig
- add a 3s 'break' to avoid build-up of mongo inserts
- add new --verbose and --skip_first options
- MASSIVE SPEEDUP! full re-write by @devrim 🙏 to fix #59
- rename from
wikipedia-to-mongo
todumpster-dive
- use wtf_wikipedia v3 (a big re-factor too!)
- use
line-by-line
, andworker-nodes
to run parsing in parallel
- fix connection time-outs & improve logging output
- change default collection name to
pages
- add
.custom()
function support
- update to wtf_wikipedia v4.2.0
- support passing-in arbitrary functions to worker
- bugfix for runtime parsing error
- update deps, wtf library improvements
- relicense as MIT
- use latest mongo api
⚠️ remove.infoboxes
and.citations
from top-level result. this is duplicate data. find them both insection[i].templates
- improve handling of redirect pages
- refactor encoding logic