Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with script inputs, training, predicting and evaluation #138

Closed
eedenong opened this issue Nov 10, 2021 · 7 comments
Closed

Help with script inputs, training, predicting and evaluation #138

eedenong opened this issue Nov 10, 2021 · 7 comments

Comments

@eedenong
Copy link

Hi! I have been running into the issue that after following the steps to train and predict as highlighted in the README, the evaluation scores are quite poor. I am not sure if it is due to misformatting the training data, so I would like to seek some help here! Here are the steps that I took to carry out the training, prediction, and evaluation:

All of this was done on Google Colab.

Data preprocessing
I used the FCE dataset to generate the train and dev sets, specifically

  1. Use the error.py script from the PIE repo (https://github.com/awasthiabhijeet/PIE/tree/master/errorify) to generate the parallel text files correct.txt and incorrect.txt

Screen Shot 2021-11-11 at 00 03 03 AM
2. Use the preprocess_data.py from the GECToR repo to generate the output files (train.txt and dev.txt respectively)
Screen Shot 2021-11-11 at 00 03 41 AM

Model training
Then, I trained the model using the generated train.txt and dev.txt
My Google Colab runtime timed out and I got the following:
Screen Shot 2021-11-10 at 19 23 27 PM

Prediction and evaluation
Afterwards, I ran the prediction script using a txt file, train_incorr_sentences.txt from the PIE repository (https://github.com/awasthiabhijeet/PIE/tree/master/scratch), to obtain the predictions as preds_output.m2. The model path specified was pointing to the best.th file in the model_output folder
Screen Shot 2021-11-11 at 00 03 57 AM

Then, I used the two parallel text files from the same PIE folder (train_incorr_sentences.txt and train_corr_sentences.txt) to generate a reference file ref_output.m2
Screen Shot 2021-11-11 at 00 04 06 AM

Then, I ran the m2scorer script with the SYSTEM argument set as preds_output.m2 and SOURCE_GOLD as ref_output.m2
Screen Shot 2021-11-11 at 00 04 56 AM

These were the resulting scores (after training once i.e. stage1):
Precision: 0.0831
Recall: 0.0780
F0.5: 0.0820

I am not sure if I am using the wrong datasets and passing them into the wrong scripts, as there isn't much documentation specifying exactly what kind of files and the format of the files to pass in. It would be a very big help if someone could help to point me in the correct direction of the specifics of what kind of data I should be using for each step, and if I am processing them correctly!

I also read that you did 3 stages of training, is this expected behaviour after the first stage of training?

@skurzhanskyi
Copy link
Collaborator

Hi @eedenong

Please take a look at the corresponding README sections if you want to reproduce the results in the paper.

  1. The data for the first stage could be found here, as it was mentioned in the Dataset section.
  2. It looks like you're using default parameters for training. We explained in detail our parameters at each stage here.
  3. From what I see, you used preprocess_data.py correctly (errorful data as a source and error-free data as a target). You may also want to look at similar issues (Format of SOURCE and TARGET #136, Are source correct txt file and target incorrect txt file in prerprocessing? #104, What kind of data format do you use? #53).
  4. You can take a look at our scores after each stage in Table 4 in the paper.

@eedenong
Copy link
Author

Thank you, I will take a look at them!

Just to clarify, for the model inference input file, should it be in the m2 format or the txt format, and should it be a dataset of incorrect sentences to be corrected? And in this case, will it suffice for me to simply use a dataset of incorrect sentences for example a1_train_incorr_sentences.txt from the PIE synthetic dataset? So far the issues that I have seen only refer to the formats of the text files with regards to the preprocessing.

@skurzhanskyi
Copy link
Collaborator

skurzhanskyi commented Nov 10, 2021

If you're talking about predict.py, it takes model input that is incorrect sentences.
For the prediction stage, the model shouldn't require correct output as part of the input. Thus m2 and preprocess_data.py formats don't fit here.

@eedenong
Copy link
Author

eedenong commented Nov 11, 2021

For the prediction stage, the model should require correct output as part of the input.

Regarding this, may I know which of the input are you referring to? Are you referring to the --output_file argument that is passed into predict.py, or do you mean that the correct output should be in the same text file as the input file into --input_file for precict.py?

@skurzhanskyi
Copy link
Collaborator

Oh, sorry. I meant
For the prediction stage, the model shouldn't require correct output as part of the input

@eedenong
Copy link
Author

I see, thank you! I have another query:

The data for the first stage could be found here, as it was mentioned in the Dataset section.

May I clarify if I am supposed to generate the 98/2 train/dev split from the file generated from the preprocess.py? Or am I supposed to find separate train/dev sets and preprocess them separately to generate the train and dev sets?

@skurzhanskyi
Copy link
Collaborator

The results will be the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants