A Natural Language Processing and Understanding (NLP/NLU) tool for biomedical research
Contents
Clone repo: | git clone https://github.com/aivscovid19/covid-19_research_collaboration.git |
---|---|
Install BioMedBERT: | pip install . |
- Syntax:
biomedbert gcp project set <project-id> <project-zone>
- Setting the zone to europe-west4-a:
biomedbert gcp project set ai-vs-covid19 europe-west4-a
- Syntax for training BioMedBERT-Large model:
biomedbert code train model <model_type> <model_dir> <pretrain_dir> <bucket_name> <tpu_name> <train_steps> <train_bs> <eval_bs> <tpu_cores>
- Train BioMedBERT-Large from BERT weights:
biomedbert code train model large biomedbert_large_breathe_wikipedia_bert_weights_vocab pre_trained_biomed_wikipedia_data_bert_vocab ekaba-assets biomedbert 100000 128b
- BioMedBERT-Large from BERT weights (Lagos):
gs://ekaba-assets/biomedbert_large_bert_weights_and_vocab
- BioMedBERT-Large from scratch (Calabar):
gs://ekaba-assets/biomedbert_large_scratch_breathe_bert_vocab
- First set the GCP zone to europe-west4-a
biomedbert gcp project set ai-vs-covid19 europe-west4-a
- Syntax:
- Create TPU:
biomedbert gcp vm create tpu <vm-instance> [preemptible]
- Delete TPU:
biomedbert gcp vm delete tpu <vm-instance>
- Create TPUs:
biomedbert gcp vm create tpu biomedbert
- Create Preemptible TPUs:
biomedbert gcp vm create tpu biomedbert-preempt preemptible
- Delete TPUs:
biomedbert gcp vm delete tpu <vm-instance>
- Fine-tune SQuAD Syntax:
biomedbert squad finetune (v1|v2) <model_type> <bucket_name> <model_dir> <train_file> <predict_file> <tpu_name> <tpu_cores>
- Evaluate SQuAD Syntax:
biomedbert squad evaluate (v1|v2) <bucket_name> <model_dir> <evaluate_file> <predict_file>
- v1:
biomedbert squad finetune v1 large ekaba-assets biomedbert_large_bert_weights_and_vocab train-v1.1.json dev-v1.1.json biomedbert 128
- v2:
biomedbert squad finetune v2 large ekaba-assets biomedbert_large_bert_weights_and_vocab train-v2.0.json dev-v2.0.json biomedbert 128
- v1:
biomedbert squad finetune v1 large ekaba-assets biomedbert_large_scratch_breathe_bert_vocab train-v1.1.json dev-v1.1.json biomedbert-preempt 128
- v2:
biomedbert squad finetune v2 large ekaba-assets biomedbert_large_scratch_breathe_bert_vocab train-v2.0.json dev-v2.0.json biomedbert-preempt 128
- v1:
biomedbert squad evaluate v1 ekaba-assets biomedbert_large_bert_weights_and_vocab evaluate-v1.1.py dev-v1.1.json predictions.json
- v2:
biomedbert squad evaluate v2 ekaba-assets biomedbert_large_bert_weights_and_vocab evaluate-v2.0.py dev-v2.0.json
- v1:
biomedbert squad evaluate v1 ekaba-assets biomedbert_large_scratch_breathe_bert_vocab evaluate-v1.1.py dev-v1.1.json
- v2:
biomedbert squad evaluate v2 ekaba-assets biomedbert_large_scratch_breathe_bert_vocab evaluate-v2.0.py dev-v2.0.json
- Fine-tune BioASQ Syntax:
biomedbert bioasq finetune <model_type> <train_file> <predict_file> <bucket_name> <model_dir> <squad_folder> [<tpu_name> <tpu_cores>]
- Evaluate BioASQ Syntax:
biomedbert bioasq evaluate <bucket_name> <model_dir> <train_file> <eval_file> <squad_folder>
Change the <train_file>
(BioASQ-train-factoid-4b.json) and <predict_file>
(BioASQ-test-factoid-4b-1.json) accordingly.
- From SQuAD v1:
biomedbert bioasq finetune large BioASQ-train-factoid-4b.json BioASQ-test-factoid-4b-1.json ekaba-assets biomedbert_large_bert_weights_and_vocab squad_v1 biomebert 128
- From SQuAD v2:
biomedbert bioasq finetune large BioASQ-train-factoid-4b.json BioASQ-test-factoid-4b-1.json ekaba-assets biomedbert_large_bert_weights_and_vocab squad_v2 biomedbert-preempt 128
- From SQuAD v1:
biomedbert bioasq finetune large BioASQ-train-factoid-4b.json BioASQ-test-factoid-4b-1.json ekaba-assets biomedbert_large_scratch_breathe_bert_vocab squad_v1 biomebert 128
- From SQuAD v2:
biomedbert bioasq finetune large BioASQ-train-factoid-4b.json BioASQ-test-factoid-4b-1.json ekaba-assets biomedbert_large_scratch_breathe_bert_vocab squad_v2 biomedbert-preempt 128
- From SQuAD v1:
biomedbert bioasq evaluate ekaba-assets biomedbert_large_bert_weights_and_vocab BioASQ-train-factoid-4b.json 4B1_golden.json squad_v1
- From SQuAD v2:
biomedbert bioasq evaluate ekaba-assets biomedbert_large_bert_weights_and_vocab BioASQ-train-factoid-4b.json 4B1_golden.json squad_v2
- From SQuAD v1:
biomedbert bioasq evaluate ekaba-assets biomedbert_large_scratch_breathe_bert_vocab BioASQ-train-factoid-4b.json 4B1_golden.json squad_v1
- From SQuAD v2:
biomedbert bioasq evaluate ekaba-assets biomedbert_large_scratch_breathe_bert_vocab BioASQ-train-factoid-4b.json 4B1_golden.json squad_v2
- Fine-tune RE Syntax:
biomedbert re finetune <model_type> <re_dataset> <re_dataset_no> <model_dir> <bucket_name> <tpu_name> <tpu_cores>
- Evaluate RE Syntax:
biomedbert re evaluate <re_dataset> <re_dataset_no> <model_dir> <bucket_name>
- GAD 1:
biomedbert re finetune large GAD 1 biomedbert_large_bert_weights_and_vocab ekaba-assets biomedbert-preempt 128
- EU-ADR 1:
biomedbert re finetune large euadr 1 biomedbert_large_bert_weights_and_vocab ekaba-assets biomedbert-preempt 128
- GAD 1:
biomedbert re finetune large GAD 1 biomedbert_large_scratch_breathe_bert_vocab ekaba-assets biomedbert 128
- EU-ADR 1:
biomedbert re finetune large euadr 1 biomedbert_large_scratch_breathe_bert_vocab ekaba-assets biomedbert 128
- GAD 1:
biomedbert re evaluate GAD 1 biomedbert_large_bert_weights_and_vocab ekaba-assets
- EU-ADR 1:
biomedbert re evaluate euadr 1 biomedbert_large_bert_weights_and_vocab ekaba-assets
- GAD 1:
biomedbert re evaluate GAD 1 biomedbert_large_scratch_breathe_bert_vocab ekaba-assets
- EU-ADR 1:
biomedbert re evaluate euadr 1 biomedbert_large_scratch_breathe_bert_vocab ekaba-assets
- Python >= 2.6 or >= 3.3
MIT licensed. See the bundled LICENSE file for more details.