Home

Categories: use only categories
Types: use only types
Subtypes: use only subtypes
Filtered: use filtered categories

This wiki documents the development process for my master's thesis, named Entity and relation extraction from web content.

First, the HAREM dataset was used to perform NER using available tools, namely Stanford NER, NLTK, OpenNLP and spaCy.

Main repository folders

brat: annotation tool and annotated SIGARRA's news
datasets: Keeps the datasets used
scripts:
- extra: scripts not yet used
- evaluation: scripts to compute the evaluation of all tools, using the conlleval script
- filter-harem: scripts to manipulate HAREM dataset
  - harem-to-opennlp: transform HAREM in opennlp input format
  - harem-to-standoff: transform HAREM in standoff format, used in spacy
  - harem-to-stanford: transform HAREM in conll format, used in stanfordNER
  - src: source files for scripts
  - run-scripts: commands to run scripts
tools:
- nltk: folder to keep nltk related data/scripts
- open-nlp: folder to keep open-nlp related data/scripts
- spacy: folder to keep spacy related data/scripts
- stanford-ner: folder to keep stanford NER related data/scripts