Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with reproducing zero-shot learning results #67

Open
blazejdolicki opened this issue Apr 16, 2020 · 2 comments
Open

Problems with reproducing zero-shot learning results #67

blazejdolicki opened this issue Apr 16, 2020 · 2 comments

Comments

@blazejdolicki
Copy link

blazejdolicki commented Apr 16, 2020

I tried replicating results for zero-shot learning on CLS, but my results don't match those from the paper. Since the script for predicting labels with LASER seems not be a part of Multifit repository I trained LASER on the CLS dataset (only en and de books for now) by adjusting the MLDoc script from LASER repo to CLS. My fork of LASER with these adjustment is [here]h(ttps://github.com/blazejdolicki/LASER). For the time being I only tested on books in German. After some hyperparameter tuning performed on English training set, my best setup obtains 82.25% accuracy compared to 84.15% from the Multifit paper. My hyperparams are:

n_epochs=200
lr=0.001
wd=0.0
nhid="10 8"
drop=0.2
seed=1
bsize=12

and I'm using the last 10% of the test set as validation.
When I tried to make them more similar to Multifit (n_epochs=8, wd=0.001,bsize=18), the accuracy dropped to around 60%.

Afterwards, I used the best (82.25% acc) LASER classifier (trained on English training set) to predict labels for German books. Then I copied test, training and unsupervised sets in Multifit repo from folder de-books into de-books-laser and replaced ground truth labels in training set with pseudolabels. Afterwards I trained the Multifit classifier on those pseudolabels and while my validation accuracy isn't great but at least similar, my test set accuracy is as low as 70% (compared to 89.60 from the paper and here) as you can see in the attached logs.
Multifit CLS zero shot terrible results 15.04.2020.txt

I did expect some drop due to the issue explained in #63, but such big difference shows that the unsupervised set size can't be the only factor deteriorating the results. Other possible reason of the drop in performance that come to my mind are:

  • I used different hyperparameters for training and predicting LASER pseudolabels?
  • I used different train-dev split for training and predicting LASER pseudolabels?
  • your script was loading the LASER model with fastai library and training the classifier with it instead of Pytorch ?

My fork of mutlifit is here, I'm using the ulmfit-original-scripts branch.

I would really appreciate a reply :)

@blazejdolicki blazejdolicki changed the title Implementation details about zero-shot learning Problem with reproducing zero-shot learning results Apr 16, 2020
@blazejdolicki blazejdolicki changed the title Problem with reproducing zero-shot learning results Problems with reproducing zero-shot learning results Apr 16, 2020
@eisenjulian
Copy link
Contributor

Hey Blazej, I updated the other issue with a solution, can you let me know if that fixed the issue or you still cannot reproduce the results?

@blazejdolicki
Copy link
Author

Thanks for your response. Using more data helped to some extent, but after some more digging I realized the real issue. The CLS dataset has three columns - label, summary and the actual review text. Initially, in zero-shot learning I was discarding the summary column thinking it's irrelevant. All that adding the summary does is it increases the amount of data used for finetuning the LM. After I included the summary to my surprise the classification test results jumped by ~15%! Without "summary" column the LM had 60% (val) accuracy in the first epoch (out of 20) while with it it has an accuracy of 37%. Not sure why including summaries that are usually shorter than the main text makes such a difference. The LM training time per epoch also changed from 18 seconds to 2 mins and 23 seconds.

So currently my laser results are still ~2% lower than those from the paper and so are zero-shot learning multifit results. So it's just a matter of differences in my implementation of CLS on LASER and yours. Do you have access to the script that you used to train LASER on CLS? Would be great to compare hyperparameters and check if they are responsible for this difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants