Simple Namuwiki Extractor extension of Namu Wiki Extractor
This module strips the namu mark from a namu wiki document and extracts its plain text only.
- Python 2, 3
- tqdm
-
Clone this repo :
git clone https://github.com/j-min/Easy-Namuwiki-Extractor
-
Download Namuwiki json dump inside directory of repo :
wget http://file2.unofficialnis.ga/namuwiki_161031.json
-
You can find latest dumps here
-
Run extractor:
python Run_extractor.py -i input_json_file -o outputfile_name
-
Tags:
--input (-i) : input filename
--output (-o) : output filename
--multiprocess (-m) : run multiprocessing module
--title (-t) : include titles of documents while extracting
- from web json viewer