You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I can remember I only changed the following in search_glue_no_trainer.py line 544:
------------------------------------------------
data_collator = DataCollatorWithPadding(tokenizer, pad_to_multiple_of=(8 if accelerator.use_fp16 else None))
------------------------------------------------
there is an error (*** AttributeError: 'Accelerator' object has no attribute 'use_fp16'), so I changed it to:
------------------------------------------------
try:
pad_to_multiple_of = (8 if accelerator.use_fp16 else None)
except:
pad_to_multiple_of = (None)
data_collator = DataCollatorWithPadding(tokenizer, pad_to_multiple_of=pad_to_multiple_of)
------------------------------------------------
The text was updated successfully, but these errors were encountered:
During training, EMoE does a grid search on seeds (0,1,2) and lr (2e-5, 3e-5, 5e-5), each combination will produce a result. And when grid search ends, a txt file will be saved at the output dir. You may see something like this:
The filename of this txt file contains the best learning rate found during training. In test_glue_no_trainer.py, you should see the follow lines of code, which extracts the best lr from the filename of the txt file. So to find the bug, I think you need to check whether the txt file is saved successfully during training.
Hello, during the Language training and testing process of EMoE, when I test after training, the following is displayed:
['cola']
Namespace(adaptive_experts=False, add_expert_size=0, aux_loss_weight=0.01, cache_dir='./.cache', capacity_factor=1.5, checkpointing_steps=None, disable_peft=False, expert_repeat=1, gate_noise=1.0, gate_type='top', gradient_accumulation_steps=1, hub_model_id=None, hub_token=None, ignore_mismatched_sizes=False, include_training=False, is_gshard_loss=False, key_gate=False, learning_rates=[2e-05, 3e-05, 5e-05], load_model=None, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, max_expert_num=8, max_length=128, max_train_steps=None, model_name_or_path='/MyData/bert-large-cased', moe_drop=0.1, moe_layers=[10, 11], normalize_one_score_gate=False, num_experts=16, num_train_epochs=10, num_warmup_steps=0, one_score=False, one_score_gate_update_momentum=0.0, output_dir='test', pad_to_max_length=False, per_device_eval_batch_size=32, per_device_train_batch_size=64, push_to_hub=False, random_ cluster=False, random_init_gate=False, report_to='tensorboard', resume_from_checkpoint=None, save_model=False, seeds=[0, 1, 2], source_dir='/MyData/bert-large-cased_save/cola', task_name='cola', to_MoE=False, top_k=4, train_file=None, use_fp1 6=True, use_slow_tokenizer=False, validation_file=None, weight_decay=0.0, with_tracking=True) learn_gate_random_False_repeat16
test
No best results found
What is the problem?
As far as I can remember I only changed the following in search_glue_no_trainer.py line 544:
------------------------------------------------
data_collator = DataCollatorWithPadding(tokenizer, pad_to_multiple_of=(8 if accelerator.use_fp16 else None))
------------------------------------------------
there is an error (*** AttributeError: 'Accelerator' object has no attribute 'use_fp16'), so I changed it to:
------------------------------------------------
try:
pad_to_multiple_of = (8 if accelerator.use_fp16 else None)
except:
pad_to_multiple_of = (None)
data_collator = DataCollatorWithPadding(tokenizer, pad_to_multiple_of=pad_to_multiple_of)
------------------------------------------------
The text was updated successfully, but these errors were encountered: