Add PubMedQA dataset #740

kurbanrita · 2024-12-27T13:27:11Z

Add PubMedQA dataset

PubMedQA is a biomedical question-answering dataset collected from PubMed abstracts. The task of PubMedQA is to answer research questions with yes/no/maybe. The subset that is used for validation has 1k expert-annotated QA instances.

Add LanguageDataset
Add TextClassification dataset
Add PubMedQA dataset that is loaded from HuggingFace (incl. add HuggingFace datasets as a new language dependency)
Add tests

Remaining questions:

Vision validators could probably be reused for language. Maybe refactor some common functions into core?
The PR won't be merged to main, the development will continue on this branch.

…o nlp_integration

ioangatop

Great work @kurbanrita 🎉 here are some initial comments

src/eva/language/data/datasets/classification/base.py

src/eva/language/data/datasets/language.py

src/eva/language/data/datasets/classification/pubmedqa.py

ioangatop

Some more comments 🤗

src/eva/language/data/datasets/classification/pubmedqa.py

nkaenzig

Very nice work @kurbanrita! Left a couple of nitpicks, after those I think this should be ready to go.

tests/eva/language/data/datasets/classification/test_pubmedqa.py

src/eva/language/data/datasets/language.py

src/eva/language/data/datasets/classification/pubmedqa.py

ioangatop

Great start @kurbanrita 🎉 Lets address @nkaenzig comments and merge :D

Add PubMedQA dataset

cf878b9

kurbanrita self-assigned this Dec 27, 2024

Rita Kurban and others added 3 commits December 27, 2024 13:28

Rename TextClassification

362ac74

Delete src/eva/language/experiment.ipynb

905889b

Linting

4c61b10

kurbanrita force-pushed the nlp_integration branch from 905889b to 4c61b10 Compare December 27, 2024 13:56

Rita Kurban added 7 commits December 27, 2024 14:04

Remove notebook

25d746b

Add datasets package

5d5fbca

nox changes

bff7350

Import order

8a993ac

Add download functionality

515d37d

Fix ruff

7ce4aad

Remove validators for now

6e12c86

kurbanrita requested a review from ioangatop December 30, 2024 13:54

Update language.py

f338d58

kurbanrita marked this pull request as ready for review December 30, 2024 14:01

Rita Kurban added 2 commits December 30, 2024 14:08

Pyright fix

421b9d5

Merge branch 'nlp_integration' of https://github.com/kaiko-ai/eva int…

02e6503

…o nlp_integration

ioangatop reviewed Jan 13, 2025

View reviewed changes

src/eva/language/data/datasets/classification/base.py Outdated Show resolved Hide resolved

src/eva/language/data/datasets/language.py Outdated Show resolved Hide resolved

src/eva/language/data/datasets/classification/pubmedqa.py Outdated Show resolved Hide resolved

Rita Kurban added 4 commits January 13, 2025 11:43

Addressed comments

aa9ec20

Addressed comments

0f253c6

Update tests

c40eced

Linting

5de4151

ioangatop reviewed Jan 14, 2025

View reviewed changes

Rita Kurban added 4 commits January 15, 2025 12:49

Address feedback

52717b2

Reduce duplicate code

365f6b3

Renamed

22abe4b

Pyright fix

a2e4191

kurbanrita changed the base branch from main to language January 17, 2025 16:23

kurbanrita requested a review from ioangatop January 17, 2025 16:24

Rita Kurban added 2 commits January 17, 2025 17:07

Fix pyright

3670cdd

Linting

ff2280a

nkaenzig approved these changes Jan 21, 2025

View reviewed changes

ioangatop approved these changes Jan 21, 2025

View reviewed changes

Rita added 2 commits January 21, 2025 12:32

Feedback

768aadb

Fix val split

6d3fff6

kurbanrita merged commit f84881c into language Jan 21, 2025
4 checks passed

kurbanrita deleted the nlp_integration branch January 21, 2025 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PubMedQA dataset #740

Add PubMedQA dataset #740

kurbanrita commented Dec 27, 2024 •

edited

Loading

ioangatop left a comment

ioangatop left a comment

nkaenzig left a comment

ioangatop left a comment

Add PubMedQA dataset #740

Add PubMedQA dataset #740

Conversation

kurbanrita commented Dec 27, 2024 • edited Loading

ioangatop left a comment

Choose a reason for hiding this comment

ioangatop left a comment

Choose a reason for hiding this comment

nkaenzig left a comment

Choose a reason for hiding this comment

ioangatop left a comment

Choose a reason for hiding this comment

kurbanrita commented Dec 27, 2024 •

edited

Loading