-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code in text datasets? #10
Comments
I apologize for the missing code for text datasets, which another mate implemented and hasn't organized. Anyway, implementation details such as the link to the text backbone, the data augmentation scheme, the optimizer and learning rate, etc. have been introduced in the paper. The loss function is exactly the same as the one used for images. |
That‘s Ok, I'm trying to reproduce the results of the text data set. I find that there is a normalized layer "nn.BatchNorm1d" in the backbone of image data sets, does my backbone on text data sets use the same function? Or should it be some other normalization function like "nn.functional.normalize" |
Yes, the projection heads should be the same. |
Thank for your reply! I had trained the first phase on cifar10 and saved the checkpot-999pth. When I started bootsing, the program ended automatically. Do you know what happened? The outputs as follow: $ OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=3 boost.py /home/wangxin/anaconda3/envs/tcl/lib/python3.9/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated warnings.warn( When i run boost.py with --resume './save/checkpoint-0.pth' it's works, I don't have any idea. |
Could you check the value of args.start_epoch and args.epochs to make sure the program has entered the training loop in line 270 in boost.py? |
Thank for your reply! To reproduce the results in the paper ,I should load and execute boost.py with the 999th checkpoint , right? So I execute the following command and it works. OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=3 boost.py --resume ./save/checkpoint-999.pth --epoch 1200 Is there a problem with my understanding? |
Yes, that is correct. I wonder if you modified the value of args.start_epoch by the misc.load_model function in line 59 in boost.py. |
I didn't change the value of "args.start_epoch", that was still 1000 when I loaded the checkpoint-999.pth. And I changed the value of "args.epochs" to 1200, which trained exactly 200 epochs, Is that right? |
That is right. But notice that the misc.load_model function would modify args.start_epoch. |
Yes, I see it as follows, Thanks for your help! On the other hands, I want to confirm whether the weak enhancement strategy mentioned in your paper is the function "contextual_augment" in sccl as shown below. If so, did you enhance it while load the text data? Because sccl seems to prepare all the enhanced data before training, rather than loading and enhancing data at the same time. |
Yes. All augmented data needs to be regenerated at each training epoch. |
@Yunfan-Li Thank for your help! I have a question about the boost phase now. When I finished the phase of training, the test results were as follows: At the same time, I saved the model's 999 checkpoint, which provided for the boost phase. But when I run through the boost phase with following command $ OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=3 boost.py --resume ./save/checkpoint-999.pth --epoch 1200 The test results were as follows: It trains from 1000epoch to 1200 epooch But why is it going down? Is there something wrong with me? |
It seems that only three clusters have pseudo labels so the model collapsed. You may need to check the process and quality of pseudo-label generation to see if it matches Fig.4 in the manuscript. |
I guess there is something wrong with my training phase. Is it convenient for you to provide your checkpot-999.pth on the cifar10 dataset? If that's all right, my email is [email protected]! |
Please refer to #4 |
@Yunfan-Li Thank you for your reply! Forgive me for being new to pytorch please, I got the following error when loading "checkpoint_cifar.tar". On the other hand, I observe that the clusters begin to decrease from the 1040epoch by checking the logs of boost phase. Finally, I want to confirm how my test results during training stage compare with yours. Is it a normal result? If not, I might have to run the training stage phase again. |
About model loading, sorry for the inconvenience, it may be due to the model being trained with an earlier version of the code. You could print the keys of the checkpoint to see where the parameters are saved. To check the quality of pseudo-label generation, you could evaluate the clustering metrics on those pseudo-labels. Yes, you can lower alpha or raise gamma to get more precise pseudo-labels. The performance of the training and boosting phases is provided in the original paper. |
@Yunfan-Li Hi, I am reproducing the results of the text dataset. However, I found that each Epoch will perform data enhancement on the whole data set once, which will take a long time, about 100 hours. May I ask if you will encounter this problem when performing data enhancement on the text dataset? |
Hi, I'm very interested in your work. But i coudn't find the code in text datasets, such as 'stackoverflow'. I want to reproduce the stackoverflow results in your paper. Is it convenient for you to share the text datasets code and parameters? Your reply will help me very much!
The text was updated successfully, but these errors were encountered: