Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate Retagging Experimentation #19

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Validate Retagging Experimentation #19

wants to merge 26 commits into from

Conversation

agombert
Copy link

Validate Retagging Experimentation

In this pull request, we validate the experimentation results for retagging using a newly corrected/retagged file. The goal is to assess the performance of the updated model trained on this new data.

New Data Source

The new data source can be found in /data/raw/retagging, specifically in the file named allMeSH_2021.2016-2021.jsonl. This dataset contains corrected and retagged annotations for various documents. Notably, it includes annotations for five key tags: "Artificial Intelligence," "HIV," "Data Collection," "Mathematics," and "Geography."

Environment Setup

Before validating the experimentation results, it is essential to set up the environment correctly. Here are the steps to follow:

  1. On your local machine or a similar g5.12xlarge instance, ensure that you are on the main branch.
  2. Activate your Python environment using Poetry.
  3. Ensure that you have the latest changes by pulling from the remote repository.
  4. Fetch the data from DVC that is required for the experimentation.
  5. Set your Weights and Biases API key as an environment variable (WANDB_API_KEY).

Launching Preprocessing and Training

To validate the experimentation, we will perform preprocessing and training using one of the following methods:

Method: Using DVC

  1. Navigate to the pipelines/bertmesh/ directory.
  2. Reproduce the DVC pipeline to execute preprocessing and training.

After Training

After initiating the training, please wait until the process completes. Once training is finished, we will proceed with the evaluation of model performance.

The next steps include running examples of documents with problematic tags using the model that is currently in use and the model that you have trained. The results should demonstrate an improvement in tagging accuracy and alignment with the newly corrected and retagged dataset.

@agombert
Copy link
Author

A few comments from experimentations:

  • here is the loss function for training set during training
    image
  • It looks like there is something wrong with the training: The metrics are always 0 in the logs
  • The best model is the first iteration model saved
  • When applying any checlpoints on a random sample, once we apply the sigmoid, probabilities goes from 0.8e-4, 0.01, 0,015, 0.03, 0.055, but still "uniforms" therefore no signal is caught.
  • When computing the loss over those 100 examples
model loss on sample
"best" 0.0038
current 0.0078
last iteration 0.0312

We tried the latest model from Juan with

git checkout c021da7

And we went through some evaluations modifying the evaluation_model.py to make it work. There was the sigmoid problem too and other problems such as: the variables names had to be modified.

We computed a few examples, a sample of 10 random examples (but maybe present in the training set has I don't have access to the split, and making the split again was kind of too long)... I think there is also a little mess with ids and labels at some points.

However what I saw:

  • There are, on the 10 examples 25 predictions, meaning it can predict something !
  • On the example: 'This grant is about malaria and HIV', the max proba is 0.49 (id 6424 which is "Colon, Ascending" in the config, but I have a doubt about it to be True)
  • On the example: 'My name is Arnault and I live in barcelona'], the max proba is 0.90 (same id as previous example)

@@ -30,7 +30,7 @@ class BertMeshTrainingArguments(TrainingArguments):
default=8
) # set to 256 in grants-tagger repo
per_device_eval_batch_size: int = field(default=8)
gradient_accumulation_steps: int = field(default=1)
gradient_accumulation_steps: int = field(default=2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not change the defaults ideally, just the params that get passed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put back the default to 1 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants