A wrapper around magpie for Inspire that provides trained models and functions to learn from the High Energy Physics corpus.
$ git clone https://github.com/inspirehep/inspire-magpie.git
$ cd inspire-magpie
$ pip install .
There exists a UI and REST API based on Flask that you can run with:
$ python wsgi.py
Access the UI on http://localhost:5051 and the REST interface under http://localhost:5051/api.
$ curl -i -X POST -H 'Content-Type: application/json' -d '{"corpus": "keywords", "positive": ["lhc"]}' http://localhost:5051/api/word2vec
For the training, you can use two functions that the API provides: train()
and batch_train()
. The latter performs out-of-core training, but both of them take the same parameters:
$ from inspire_magpie.api import batch_train
$ batch_train('/path/to/the/training/set', test_dir='if/you/have/a/test/set', nn='cnn', nb_epochs=5, batch_size=64, persist=True, no_of_labels=10000, verbose=1)
test_dir
- is the path to the test set (optional)nn
- defines the NN model to use for training. Currently supported:cnn
andrnn
nb_epochs
- how many times should we feed the training set to the NNbatch_size
- size of the batch with which the training occurspersist
- whether to save to disk the final model after training (in the log directory)no_of_labels
- number of labels to train the model on. It defines whether we want to train keyword extraction (10k labels), experiment prediction (500 labels) or category assignment (14 labels).verbose
- the same values as in Keras. 1 is the most verbose with a progress bar
Other configuration variables might be found in the config file.