K-RET is a flexible biomedical RE system, allowing for the use of any pre-trained BERT-based system (e.g., SciBERT and BioBERT) to inject knowledge in the form of knowledge graphs from a single source or multiple sources simultaneously. This knowledge can be applied to various contextualizing tokens or just to the tokens of the candidate relation for single and multi-token entities.
Our academic paper which describes K-RET in detail can be found here.
The uer folder corresponds to an updated version of the toolkit developed by Zhao et al. (2019) available here.
You should use both a baseline model and one of our pre-trained models to make predictions on new data. If you wish to train a new model on your data, you only need a baseline model, which can be either model referenced in our academic paper.
After downloading a baseline model, for instance SciBERT, the model needs to be converted using the uer toolkit. For this, you can run the following example, making the necessary adaptations given different baseline models or paths.
cd K-RET/uer/
python3 convert_bert_from_huggingface_to_uer.py --input_model_path ../models/pre_trained_model_scibert/scibert_scivocab_uncased/pytorch_model.bin --output_model_path ../models/pre_trained_model_scibert/output_model.bin
Available versions of the best performing pre-trained models are as follows:
The training details are described in our academic paper.
Our project includes code adaption of the K-BERT model available here. Use the K-RET Image available at Docker Hub to set up the rest of the experimental environment.
CUDA_VISIBLE_DEVICES='1,2,3' python3 -u run_classification.py \
--pretrained_model_path ./models/pre_trained_model_scibert/output_model.bin \
--config_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/config.json \
--vocab_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/vocab.txt \
--train_path ./datasets/ddi_corpus/train.tsv \
--dev_path ./datasets/ddi_corpus/dev.tsv \
--test_path ./datasets/ddi_corpus/test.tsv \
--class_weights True \
--weights "[0.234, 3.377, 4.234, 6.535, 24.613]" \
--epochs_num 30 \
--batch_size 32 \
--kg_name "['ChEBI']" \
--output_model_path ./outputs/scibert_ddi.bin | tee ./outputs/scibert_ddi.log &
For more options check run.sh and, for additional configuration settings (e.g., max_number_entities and contextual_knowledge), check brain/config.py.
CUDA_VISIBLE_DEVICES='0' python3 -u run_classification.py \
--pretrained_model_path ./models/pre_trained_model_scibert/output_model.bin \
--config_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/config.json \
--vocab_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/vocab.txt \
--train_path ./datasets/ddi_corpus/train.tsv \
--dev_path ./datasets/ddi_corpus/dev.tsv \
--class_weights True \
--weights "[0.234, 3.377, 4.234, 6.535, 24.613]" \
--test_path ./datasets/ddi_corpus/test.tsv \
--epochs_num 30 --batch_size 32 --kg_name "[]" \
--testing True \
--to_test_model ./outputs/scibert_ddi.bin \
| tee ./outputs/ddi_results.log &
python3 src/process_results.py ./outputs/ddi_results.log ./datasets/ddi_corpus/test.tsv ddi_results.tsv
- Diana Sousa and Francisco M. Couto. 2022. K-RET: Knowledgeable Biomedical Relation Extraction System. Bioinformatics.