Document extraction app added

lintool · Jul 26, 2013 · b7c327d · b7c327d
1 parent db38e53
commit b7c327d
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -102,10 +102,10 @@ $ hadoop jar clueweb-tools-0.3-SNAPSHOT-fatjar.jar \
 ```
 
 The parameters are:
-+`docidsfile`: a file with one docid per line; all docids are extracted from the WARC input files
-+`input`: list of WARC files
-+`keephtml`: parameter that is either `true` (keep the HTML source of each document) or `false` (parse the documents, remove HTML)
-+`output`: folder where the documents' content is stored - one file per docid
++ `docidsfile`: a file with one docid per line; all docids are extracted from the WARC input files
++ `input`: list of WARC files
++ `keephtml`: parameter that is either `true` (keep the HTML source of each document) or `false` (parse the documents, remove HTML)
++ `output`: folder where the documents' content is stored - one file per docid
 
 
 Retrieval runs