Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

90% sparsity QQP result #1

Open
kongds opened this issue Jul 12, 2022 · 5 comments
Open

90% sparsity QQP result #1

kongds opened this issue Jul 12, 2022 · 5 comments

Comments

@kongds
Copy link

kongds commented Jul 12, 2022

Hello,
I find the results of CAP-m in 90% sparsity QQP is "91.6/87.7", while CAP-soft is "90.7/87.4"(bold).
Is the result of CAP-m correct?
CleanShot 2022-07-12 at 14 42 40@2x

@RunxinXu
Copy link
Owner

Thanks for your interest in our work! The results are correct.
I suppose that this is because CAP yields more improvement in movement pruning than soft movement pruning in 90% sparsity.

@kongds
Copy link
Author

kongds commented Jul 13, 2022

Thanks for your answer.
Another concern is the F1(87.7) seems not match accuracy(91.6) in CAP-m, which means the FN (false negative) and TN (true negative) is huge unbalance compared to the results in another settings.

@kongds
Copy link
Author

kongds commented Jul 14, 2022

Hello, I run the cap-m 0.10 on QQP based on run_glue_topk_kd.sh, but get the following results (90.5/87.2).

07/14/2022 23:41:19 - INFO - __main__ -   ***** Eval results  *****
07/14/2022 23:41:19 - INFO - __main__ -     acc = 0.904699480583725
07/14/2022 23:41:19 - INFO - __main__ -     acc_and_f1 = 0.888130932998286
07/14/2022 23:41:19 - INFO - __main__ -     eval_avg_entropy = 1.0659542
07/14/2022 23:41:19 - INFO - __main__ -     f1 = 0.871562385412847

The command is:

OUTPUT=cap
TASK=qqp
DATA_DIR=../data/glue_data/QQP
MODEL=bert-base-uncased
BATCH=32
EPOCH=10
LR=3e-5

# pruning
METHOD=topK
MASK_LR=1e-2
WARMUP=11000
INITIAL_TH=1
FINAL_TH=0.10 # 50% -> 0.5 90% -> 0.1 97% -> 0.03

# contrastive
CONTRASTIVE_TEMPERATURE=0.1
EXTRA_EXAMPLES=4096
ALIGNREP=cls
CL_UNSUPERVISED_LOSS_WEIGHT=0.1
CL_SUPERVISED_LOSS_WEIGHT=10

# distill
TEACHER_TYPE=bert
TEACHER_PATH=../teacher/qqp
CE_LOSS_WEIGHT=0.1
DISTILL_LOSS_WEIGHT=0.9


CUDA_VISIBLE_DEVICES=${GPU} python masked_run_glue.py \
    --output_dir ${OUTPUT}/${FINAL_TH}/${TASK} \
    --data_dir ${DATA_DIR} \
    --do_train --do_eval --do_lower_case \
    --model_type masked_bert \
    --model_name_or_path ${MODEL} \
    --per_gpu_train_batch_size ${BATCH} \
    --warmup_steps ${WARMUP} \
    --num_train_epochs ${EPOCH} \
    --learning_rate ${LR} --mask_scores_learning_rate ${MASK_LR} \
    --initial_threshold ${INITIAL_TH} --final_threshold ${FINAL_TH} \
    --initial_warmup 2 --final_warmup 3 \
    --pruning_method ${METHOD} --mask_init constant --mask_scale 0.0 \
    --task_name ${TASK} \
    --save_steps 30000 \
    --use_contrastive_loss \
    --contrastive_temperature ${CONTRASTIVE_TEMPERATURE} \
    --cl_unsupervised_loss_weight ${CL_UNSUPERVISED_LOSS_WEIGHT} \
    --cl_supervised_loss_weight ${CL_SUPERVISED_LOSS_WEIGHT} \
    --extra_examples ${EXTRA_EXAMPLES} \
    --alignrep ${ALIGNREP} \
    --use_distill \
    --teacher_name_or_path ${TEACHER_PATH} \
    --teacher_type ${TEACHER_TYPE} \
    --ce_loss_weight ${CE_LOSS_WEIGHT} \
    --distill_loss_weight ${DISTILL_LOSS_WEIGHT}

@RunxinXu
Copy link
Owner

Hi,the performance can also be affected by the teacher model. How is the performance of your teacher model?

@kongds
Copy link
Author

kongds commented Jul 22, 2022

I use the checkpoint provided by dyanbert.
The performance is 90.9 acc on QQP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants