The repo includes code to assess the robustness of large language models (LLMs) in biomedical question answering (QA). A collection of generalist and specialist LLMs are assessed.
This repo contains only the code for sampling-based attack, for running the benchmarks using the variant of discreteZOO, the zeroth-order optimization-based attack used in the manuscript, please see discretezoo-span.
Clone the code repo. Go to the qstab-main
folder, install using the following from the command line
pip install .
The baseline assesses the zero-shot performance of LLMs for the specified datasets. Under the qstab-main
folder, type the following from the command line
Run Flan-T5-large model on MedQA-USMLE drug-mention questions
python ./scripts/run_hf_entity.py -grp="drugs" -nq=6000
Run MedAlpaca-7B model on MedQA-USMLE disease-mention questions
python ./scripts/run_medalpaca_entity.py -grp="diseases" -nq=6000
Here 6000 is just a large enough number such that all instances are run, a smaller number evaluates only a subset of the data.
The substitution attack targets entities in the distractors of a question by constructing adversarial distractors. For each data instance, the attacker selects a replacement entity from a perturbations set using a sampling-based approach to substitute the original entity in the distractor.
- Random sampling
An example to attack the MedAlpaca-7B model on MedQA-USMLE drug-mention questions using random sampling.
python ./scripts/run_medalpaca_entity.py -ptb=True -ptb_samp="random" -grp="drugs" -nq=6000
- Powerscaled distance-weighted sampling (PDWS)
An example to attack the MedAlpaca-7B model on MedQA-USMLE drug-mention questions using PDWS.
python ./scripts/run_medalpaca_entity.py -ptb=True -ptb_samp="distance" -grp="drugs" -ndist=20 -nq=6000
- Hyperparameter tuning for PDWS
An example to tune the PDWS attacker's hyperparameter for the MedAlpaca-7B model on MedQA-USMLE drug-mention questions
bash ./scripts/medalpaca_tune.sh