Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added P-Tuning method #3488

Merged
merged 31 commits into from
Jan 27, 2022
Merged

Added P-Tuning method #3488

merged 31 commits into from
Jan 27, 2022

Conversation

yidong72
Copy link
Collaborator

Added P-Tuning method to use Large Megatron GPT model for downstream NLP tasks.
The PTune_Sentiment_Analysis.ipynb tutorial notebook shows how to use P-Tuning for financial sentiment analysis. It achieve 92% accuracy using GPT 344M model.

@lgtm-com
Copy link

lgtm-com bot commented Jan 21, 2022

This pull request introduces 2 alerts when merging 0f8444f into 0bae758 - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method
  • 1 for Redundant assignment

@okuchaiev
Copy link
Member

/blossom-ci

@lgtm-com
Copy link

lgtm-com bot commented Jan 21, 2022

This pull request introduces 2 alerts when merging 0552a40 into 0bae758 - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method
  • 1 for Redundant assignment

@lgtm-com
Copy link

lgtm-com bot commented Jan 24, 2022

This pull request introduces 1 alert when merging b1005fe into 7c97e33 - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method

@lgtm-com
Copy link

lgtm-com bot commented Jan 25, 2022

This pull request introduces 1 alert when merging 593125e into 7c97e33 - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method

Signed-off-by: Yi Dong <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Jan 26, 2022

This pull request introduces 1 alert when merging d4e2cdd into 3146fca - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method

ericharper
ericharper previously approved these changes Jan 26, 2022
Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ericharper
Copy link
Collaborator

Please add a CI test as well.

@yidong72
Copy link
Collaborator Author

Please add a CI test as well.

I added a CI test for the whole p-tuning workflow.

Signed-off-by: Yi Dong <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Jan 26, 2022

This pull request introduces 1 alert when merging b3db907 into 360fa7c - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method

@lgtm-com
Copy link

lgtm-com bot commented Jan 26, 2022

This pull request introduces 1 alert when merging df55257 into 9dc612e - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method

@lgtm-com
Copy link

lgtm-com bot commented Jan 26, 2022

This pull request introduces 1 alert when merging f70542c into 9dc612e - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method

@yidong72 yidong72 requested a review from titu1994 January 26, 2022 20:50
@@ -471,6 +473,9 @@ def setup_optimization(self, optim_config: Optional[Union[DictConfig, Dict]] = N
optim_config['sched']['t_num_workers'] = self._trainer.num_processes * self._trainer.num_nodes
elif self._trainer.accelerator == "ddp":
optim_config['sched']['t_num_workers'] = self._trainer.num_gpus * self._trainer.num_nodes
elif isinstance(self._trainer.accelerator.training_type_plugin, NLPDDPPlugin):
app = AppState()
optim_config['sched']['t_num_workers'] = app.data_parallel_size
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The t_num_workers should be the data parallel size in the presence of model parallelism workers. Otherwise, the max_step calculation for optim scheduler will be off

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine from my side.

@lgtm-com
Copy link

lgtm-com bot commented Jan 26, 2022

This pull request introduces 1 alert when merging 2856e84 into 9dc612e - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method

ericharper
ericharper previously approved these changes Jan 26, 2022
Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@lgtm-com
Copy link

lgtm-com bot commented Jan 26, 2022

This pull request introduces 1 alert when merging 105f6db into 6b51350 - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method

@okuchaiev
Copy link
Member

/blossom-ci

@lgtm-com
Copy link

lgtm-com bot commented Jan 27, 2022

This pull request introduces 1 alert when merging 11cb31b into d8354a2 - view on LGTM.com

new alerts:

  • 1 for Superclass attribute shadows subclass method

@ericharper ericharper merged commit cdb409b into main Jan 27, 2022
@ericharper ericharper deleted the feature_ptune branch January 27, 2022 18:47
nithinraok pushed a commit that referenced this pull request Feb 2, 2022
* init checking of p-tune method

Signed-off-by: Yi Dong <[email protected]>

* training is working

Signed-off-by: Yi Dong <[email protected]>

* refactor to seperate prediction and loss computation

Signed-off-by: Yi Dong <[email protected]>

* updated the notebook

Signed-off-by: Yi Dong <[email protected]>

* match the original hyper parameters

Signed-off-by: Yi Dong <[email protected]>

* fixed the loss bug

Signed-off-by: Yi Dong <[email protected]>

* better scheduler

Signed-off-by: Yi Dong <[email protected]>

* notebook runs

Signed-off-by: Yi Dong <[email protected]>

* added neural types

Signed-off-by: Yi Dong <[email protected]>

* updated the doc

Signed-off-by: Yi Dong <[email protected]>

* fixed the notebook

Signed-off-by: Yi Dong <[email protected]>

* updated expected result

Signed-off-by: Yi Dong <[email protected]>

* added accuracy

Signed-off-by: Yi Dong <[email protected]>

* style fix

Signed-off-by: Yi Dong <[email protected]>

* fix reassgin

Signed-off-by: Yi Dong <[email protected]>

* log accuracy

Signed-off-by: Yi Dong <[email protected]>

* load the best checkpoint

Signed-off-by: Yi Dong <[email protected]>

* address PR comments

Signed-off-by: Yi Dong <[email protected]>

* added ci test

Signed-off-by: Yi Dong <[email protected]>

* fixed max_step calculation error due to wrong number of workers

Signed-off-by: Yi Dong <[email protected]>

* add import guard for nlp plugin

Signed-off-by: Yi Dong <[email protected]>

* fixed the metric report issue when using tensor parallel

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
fayejf pushed a commit that referenced this pull request Mar 2, 2022
* init checking of p-tune method

Signed-off-by: Yi Dong <[email protected]>

* training is working

Signed-off-by: Yi Dong <[email protected]>

* refactor to seperate prediction and loss computation

Signed-off-by: Yi Dong <[email protected]>

* updated the notebook

Signed-off-by: Yi Dong <[email protected]>

* match the original hyper parameters

Signed-off-by: Yi Dong <[email protected]>

* fixed the loss bug

Signed-off-by: Yi Dong <[email protected]>

* better scheduler

Signed-off-by: Yi Dong <[email protected]>

* notebook runs

Signed-off-by: Yi Dong <[email protected]>

* added neural types

Signed-off-by: Yi Dong <[email protected]>

* updated the doc

Signed-off-by: Yi Dong <[email protected]>

* fixed the notebook

Signed-off-by: Yi Dong <[email protected]>

* updated expected result

Signed-off-by: Yi Dong <[email protected]>

* added accuracy

Signed-off-by: Yi Dong <[email protected]>

* style fix

Signed-off-by: Yi Dong <[email protected]>

* fix reassgin

Signed-off-by: Yi Dong <[email protected]>

* log accuracy

Signed-off-by: Yi Dong <[email protected]>

* load the best checkpoint

Signed-off-by: Yi Dong <[email protected]>

* address PR comments

Signed-off-by: Yi Dong <[email protected]>

* added ci test

Signed-off-by: Yi Dong <[email protected]>

* fixed max_step calculation error due to wrong number of workers

Signed-off-by: Yi Dong <[email protected]>

* add import guard for nlp plugin

Signed-off-by: Yi Dong <[email protected]>

* fixed the metric report issue when using tensor parallel

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants