Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code in text datasets? #10

Open
mrFocusXin opened this issue Mar 1, 2023 · 18 comments
Open

Code in text datasets? #10

mrFocusXin opened this issue Mar 1, 2023 · 18 comments

Comments

@mrFocusXin
Copy link

Hi, I'm very interested in your work. But i coudn't find the code in text datasets, such as 'stackoverflow'. I want to reproduce the stackoverflow results in your paper. Is it convenient for you to share the text datasets code and parameters? Your reply will help me very much!

@Yunfan-Li
Copy link
Owner

I apologize for the missing code for text datasets, which another mate implemented and hasn't organized. Anyway, implementation details such as the link to the text backbone, the data augmentation scheme, the optimizer and learning rate, etc. have been introduced in the paper. The loss function is exactly the same as the one used for images.

@mrFocusXin
Copy link
Author

That‘s Ok, I'm trying to reproduce the results of the text data set. I find that there is a normalized layer "nn.BatchNorm1d" in the backbone of image data sets, does my backbone on text data sets use the same function? Or should it be some other normalization function like "nn.functional.normalize"

image

@Yunfan-Li
Copy link
Owner

Yes, the projection heads should be the same.

@mrFocusXin
Copy link
Author

mrFocusXin commented Mar 2, 2023

Thank for your reply! I had trained the first phase on cifar10 and saved the checkpot-999pth. When I started bootsing, the program ended automatically. Do you know what happened? The outputs as follow:

$ OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=3 boost.py

/home/wangxin/anaconda3/envs/tcl/lib/python3.9/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
| distributed init (rank 0): env://, gpu 0
| distributed init (rank 1): env://, gpu 1
| distributed init (rank 2): env://, gpu 2
[08:15:47.268017] job dir: /home/wangxin/clustering/TCL/Twin-Contrastive-Learning
[08:15:47.268111] Namespace(batch_size=256,
epochs=200,
model='resnet34',
feat_dim=128,
ins_temp=0.5,
clu_temp=1.0,
weight_decay=0.0001,
lr=0.0001,
data_path='./datasets/',
dataset='CIFAR-10',
nb_cluster=10,
output_dir='./save/',
device='cuda',
seed=0,
resume='./save/checkpoint-999.pth',
start_epoch=0,
save_freq=20,
eval_freq=10,
num_workers=10,
pin_mem=False,
dist_eval=False,
world_size=3,
local_rank=0,
dist_on_itp=False,
dist_url='env://',
rank=0,
gpu=0,
distributed=True,
dist_backend='nccl')
/home/wangxin/anaconda3/envs/tcl/lib/python3.9/site-packages/torchvision/transforms/transforms.py:890: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
warnings.warn(
/home/wangxin/anaconda3/envs/tcl/lib/python3.9/site-packages/torchvision/transforms/transforms.py:890: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
warnings.warn(
/home/wangxin/anaconda3/envs/tcl/lib/python3.9/site-packages/torchvision/transforms/transforms.py:890: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
warnings.warn(
/home/wangxin/anaconda3/envs/tcl/lib/python3.9/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
warnings.warn(
/home/wangxin/anaconda3/envs/tcl/lib/python3.9/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
warnings.warn(
/home/wangxin/anaconda3/envs/tcl/lib/python3.9/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
warnings.warn(
[08:15:48.224142] <torch.utils.data.dataset.ConcatDataset object at 0x7ff6b8501e50>
[08:15:49.080911] <torch.utils.data.dataset.ConcatDataset object at 0x7ff6b8501550>
[08:15:49.876517] <torch.utils.data.dataset.ConcatDataset object at 0x7ff6b85010d0>
[08:15:49.876588] Sampler_train = <torch.utils.data.distributed.DistributedSampler object at 0x7ff6b8501400>
[08:15:50.214733] Load pre-trained checkpoint from: ./save/checkpoint-999.pth
[08:15:50.236392]
[08:15:54.529156] Test: [ 0/235] eta: 0:16:43 time: 4.2706 data: 1.9340 max mem: 7711
[08:15:58.198646] Test: [ 20/235] eta: 0:01:21 time: 0.1834 data: 0.0357 max mem: 11216
[08:16:02.675670] Test: [ 40/235] eta: 0:00:59 time: 0.2237 data: 0.0543 max mem: 11217
[08:16:06.752821] Test: [ 60/235] eta: 0:00:47 time: 0.2038 data: 0.0527 max mem: 11217
[08:16:10.889287] Test: [ 80/235] eta: 0:00:39 time: 0.2067 data: 0.0535 max mem: 11217
[08:16:14.946861] Test: [100/235] eta: 0:00:32 time: 0.2028 data: 0.0538 max mem: 11217
[08:16:19.035355] Test: [120/235] eta: 0:00:27 time: 0.2043 data: 0.0547 max mem: 11217
[08:16:23.133606] Test: [140/235] eta: 0:00:22 time: 0.2048 data: 0.0571 max mem: 11217
[08:16:27.210013] Test: [160/235] eta: 0:00:17 time: 0.2037 data: 0.0596 max mem: 11217
[08:16:31.258333] Test: [180/235] eta: 0:00:12 time: 0.2023 data: 0.0573 max mem: 11217
[08:16:35.236336] Test: [200/235] eta: 0:00:07 time: 0.1988 data: 0.0561 max mem: 11217
[08:16:39.069576] Test: [220/235] eta: 0:00:03 time: 0.1916 data: 0.0548 max mem: 11217
[08:16:41.070576] Test: [234/235] eta: 0:00:00 time: 0.1495 data: 0.0375 max mem: 11217
[08:16:41.232386] Test: Total time: 0:00:50 (0.2169 s / it)
[08:16:41.333241] Feat shape (60000, 128), Label shape (60000,)
[08:16:41.334737] Model = Network(
(resnet): ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(5): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=512, out_features=512, bias=True)
)
(instance_projector): Sequential(
(0): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(4): ReLU()
(5): Linear(in_features=512, out_features=128, bias=True)
)
(cluster_projector): Sequential(
(0): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(4): ReLU()
(5): Linear(in_features=512, out_features=10, bias=True)
)
)
[08:16:41.334849] number of params (M): 22.15
[08:16:41.334867] base lr: 1.000e-04
[08:16:41.334879] effective batch size: 768
[08:16:42.777963] Resume checkpoint ./save/checkpoint-999.pth
[08:16:42.915007] With optim!
[08:16:42.926591] Start training for 200 epochs
[08:16:42.945285] Training time 0:00:00

When i run boost.py with --resume './save/checkpoint-0.pth' it's works, I don't have any idea.

@Yunfan-Li
Copy link
Owner

Could you check the value of args.start_epoch and args.epochs to make sure the program has entered the training loop in line 270 in boost.py?

@mrFocusXin
Copy link
Author

mrFocusXin commented Mar 2, 2023

Thank for your reply! To reproduce the results in the paper ,I should load and execute boost.py with the 999th checkpoint , right? So I execute the following command and it works.

OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=3 boost.py --resume ./save/checkpoint-999.pth --epoch 1200

Is there a problem with my understanding?

@Yunfan-Li
Copy link
Owner

Yes, that is correct. I wonder if you modified the value of args.start_epoch by the misc.load_model function in line 59 in boost.py.

@mrFocusXin
Copy link
Author

mrFocusXin commented Mar 2, 2023

I didn't change the value of "args.start_epoch", that was still 1000 when I loaded the checkpoint-999.pth. And I changed the value of "args.epochs" to 1200, which trained exactly 200 epochs, Is that right?

@Yunfan-Li
Copy link
Owner

That is right. But notice that the misc.load_model function would modify args.start_epoch.

@mrFocusXin
Copy link
Author

mrFocusXin commented Mar 2, 2023

Yes, I see it as follows, Thanks for your help!

image

On the other hands, I want to confirm whether the weak enhancement strategy mentioned in your paper is the function "contextual_augment" in sccl as shown below. If so, did you enhance it while load the text data? Because sccl seems to prepare all the enhanced data before training, rather than loading and enhancing data at the same time.

image

@Yunfan-Li
Copy link
Owner

Yes. All augmented data needs to be regenerated at each training epoch.

@mrFocusXin
Copy link
Author

mrFocusXin commented Mar 3, 2023

@Yunfan-Li Thank for your help! I have a question about the boost phase now. When I finished the phase of training, the test results were as follows:

At the same time, I saved the model's 999 checkpoint, which provided for the boost phase.

42669e2e55e989775deeab08f443af9

But when I run through the boost phase with following command

$ OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=3 boost.py --resume ./save/checkpoint-999.pth --epoch 1200

The test results were as follows:

image

It trains from 1000epoch to 1200 epooch But why is it going down? Is there something wrong with me?

@Yunfan-Li
Copy link
Owner

It seems that only three clusters have pseudo labels so the model collapsed. You may need to check the process and quality of pseudo-label generation to see if it matches Fig.4 in the manuscript.

@mrFocusXin
Copy link
Author

I guess there is something wrong with my training phase. Is it convenient for you to provide your checkpot-999.pth on the cifar10 dataset? If that's all right, my email is [email protected]!

@Yunfan-Li
Copy link
Owner

Please refer to #4

@mrFocusXin
Copy link
Author

mrFocusXin commented Mar 3, 2023

@Yunfan-Li Thank you for your reply! Forgive me for being new to pytorch please, I got the following error when loading "checkpoint_cifar.tar".

image

On the other hand, I observe that the clusters begin to decrease from the 1040epoch by checking the logs of boost phase.
And my debug finds that there seems to be no problem with pseudo-tag generation, or how to check the quality of pseudo-labels generation?
It seems that some clusters don't have the standard pseudo-labels that cause the clusters to disappear. Should I lower alpha or raise gamma to get more confident pseudo-labels? I know this may not make sense, but I don't have any other ideas.

Finally, I want to confirm how my test results during training stage compare with yours. Is it a normal result? If not, I might have to run the training stage phase again.

image

@Yunfan-Li
Copy link
Owner

About model loading, sorry for the inconvenience, it may be due to the model being trained with an earlier version of the code. You could print the keys of the checkpoint to see where the parameters are saved.

To check the quality of pseudo-label generation, you could evaluate the clustering metrics on those pseudo-labels. Yes, you can lower alpha or raise gamma to get more precise pseudo-labels.

The performance of the training and boosting phases is provided in the original paper.

@mrFocusXin
Copy link
Author

mrFocusXin commented Mar 6, 2023

@Yunfan-Li Hi, I am reproducing the results of the text dataset. However, I found that each Epoch will perform data enhancement on the whole data set once, which will take a long time, about 100 hours. May I ask if you will encounter this problem when performing data enhancement on the text dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants