-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with script inputs, training, predicting and evaluation #138
Comments
Hi @eedenong Please take a look at the corresponding README sections if you want to reproduce the results in the paper.
|
Thank you, I will take a look at them! Just to clarify, for the model inference input file, should it be in the m2 format or the txt format, and should it be a dataset of incorrect sentences to be corrected? And in this case, will it suffice for me to simply use a dataset of incorrect sentences for example a1_train_incorr_sentences.txt from the PIE synthetic dataset? So far the issues that I have seen only refer to the formats of the text files with regards to the preprocessing. |
If you're talking about |
Regarding this, may I know which of the input are you referring to? Are you referring to the --output_file argument that is passed into predict.py, or do you mean that the correct output should be in the same text file as the input file into --input_file for precict.py? |
Oh, sorry. I meant |
I see, thank you! I have another query:
May I clarify if I am supposed to generate the 98/2 train/dev split from the file generated from the preprocess.py? Or am I supposed to find separate train/dev sets and preprocess them separately to generate the train and dev sets? |
The results will be the same |
Hi! I have been running into the issue that after following the steps to train and predict as highlighted in the README, the evaluation scores are quite poor. I am not sure if it is due to misformatting the training data, so I would like to seek some help here! Here are the steps that I took to carry out the training, prediction, and evaluation:
All of this was done on Google Colab.
Data preprocessing
I used the FCE dataset to generate the train and dev sets, specifically
correct.txt
andincorrect.txt
2. Use the
preprocess_data.py
from the GECToR repo to generate the output files (train.txt and dev.txt respectively)Model training
Then, I trained the model using the generated train.txt and dev.txt
My Google Colab runtime timed out and I got the following:
Prediction and evaluation
Afterwards, I ran the prediction script using a txt file, train_incorr_sentences.txt from the PIE repository (https://github.com/awasthiabhijeet/PIE/tree/master/scratch), to obtain the predictions as
preds_output.m2
. The model path specified was pointing to thebest.th
file in the model_output folderThen, I used the two parallel text files from the same PIE folder (train_incorr_sentences.txt and train_corr_sentences.txt) to generate a reference file ref_output.m2
Then, I ran the m2scorer script with the SYSTEM argument set as
preds_output.m2
and SOURCE_GOLD asref_output.m2
These were the resulting scores (after training once i.e. stage1):
Precision: 0.0831
Recall: 0.0780
F0.5: 0.0820
I am not sure if I am using the wrong datasets and passing them into the wrong scripts, as there isn't much documentation specifying exactly what kind of files and the format of the files to pass in. It would be a very big help if someone could help to point me in the correct direction of the specifics of what kind of data I should be using for each step, and if I am processing them correctly!
I also read that you did 3 stages of training, is this expected behaviour after the first stage of training?
The text was updated successfully, but these errors were encountered: