CorNet

Prerequisites

Pretrained Word Embeddings in gensim format

Preprocess (the EUR-Lex dataset is already tokenized in advance)

./scripts/preprocess_eurlex.sh

or (the other datasets need to be tokenized using NLTK)

./scripts/preprocess_others.sh

Train and evaluate

./scripts/run_models.sh

The codes for the baseline models are adapted from the following repositories: XML-CNN, BERT, MeSHProbeNet, and AttentionXML.