RaTEScore

[EMNLP 2024] RaTEScore: A Metric for Radiology Report Generation

Overview

RaTEScore is a novel, entity-aware metric to assess the quality of medical reports generated by AI models. It emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. The evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics.

Here is an illustration of the Computation of RaTEScore.

For more detailed about our pipeline, please refer to our paper.

Installation

Environment Preparation

pip install RaTEScore

Please note that the code in the Python repository may undergo debugging and updates. We kindly remind you to specify the version of RaTEScore you are using to ensure fair comparisons and reproducibility.

Usage

from RaTEScore import RaTEScore

pred_report = ['There are no intracranial hemorrhages.',
              'The musculature and soft tissues are intact.']

gt_report = ['There is no finding to suggest intracranial hemorrhage.',
            'The muscle compartments are intact.']

assert len(pred_report) == len(gt_report)

ratescore = RaTEScore()
# Add visualization_path here if you want to save the visualization result
# ratescore = RaTEScore(visualization_path = '')

scores = ratescore.compute_score(pred_report, gt_report)

RaTE-NER

RaTE-NER dataset is a large-scale, radiological named entity recognition (NER) dataset. We set up this dataset to serve our Medical Entity Recognition module of our proposed metric. To download or find out more about our dataset, please refer to Hugginface and our paper.

RaTE-Eval

To effectively measure the alignment between automatic evaluation metrics and radiologists' assessments in medical text generation tasks, we have established a comprehensive benchmark, RaTE-Eval, that encompasses three tasks:

Sentences-level Human Rating.
Paragraph-level Human Rating.
Rating on the Synthetic Reports.

To download or find out more about our dataset, please refer to Hugginface and our paper.

Default Scheme

For Medical Entity Recognition module, we default to use our NER model which is fine-tuned version of DeBERTa on the RaTE-NER dataset. It is based on the IOB scheme.

For Synonym Disambiguation Encoding module, we default to utilize BioLORD-2023-C. We have discussed this part in our paper. You can also change this to other pretrained bert models.

Contact

If you have any questions, please feel free to contact [email protected].

Citation

@inproceedings{zhao2024ratescore,
  title={RaTEScore: A Metric for Radiology Report Generation},
  author={Zhao, Weike and Wu, Chaoyi and Zhang, Xiaoman and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
  booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  pages={15004--15019},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
RaTEScore		RaTEScore
affinity_matrix		affinity_matrix
figure		figure
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RaTEScore

Overview

Installation

Environment Preparation

Usage

RaTE-NER

RaTE-Eval

Default Scheme

Contact

Citation

About

Releases

Packages

Languages

License

MAGIC-AI4Med/RaTEScore

Folders and files

Latest commit

History

Repository files navigation

RaTEScore

Overview

Installation

Environment Preparation

Usage

RaTE-NER

RaTE-Eval

Default Scheme

Contact

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages