Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Angular loss1.0 #1101

Merged
merged 14 commits into from
Sep 4, 2020
Merged

Angular loss1.0 #1101

merged 14 commits into from
Sep 4, 2020

Conversation

nithinraok
Copy link
Collaborator

@nithinraok nithinraok commented Sep 1, 2020

  • Added angular loss with cosine angle for 1.0
  • Fixed multigpu metric issue by reusing classficationtopkaccuracy
  • Added support for embedding extraction for speaker diarization

@nithinraok nithinraok requested a review from blisc September 1, 2020 19:49
nemo/utils/exp_manager.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@blisc blisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM

nemo/collections/asr/losses/angularloss.py Outdated Show resolved Hide resolved
@blisc blisc requested review from fayejf and titu1994 September 2, 2020 18:46
@NVIDIA NVIDIA deleted a comment from lgtm-com bot Sep 2, 2020
@NVIDIA NVIDIA deleted a comment from lgtm-com bot Sep 2, 2020
@NVIDIA NVIDIA deleted a comment from lgtm-com bot Sep 2, 2020
@lgtm-com
Copy link

lgtm-com bot commented Sep 2, 2020

This pull request fixes 2 alerts when merging c6529f6 into 2ab5b64 - view on LGTM.com

fixed alerts:

  • 2 for Unused import

@NVIDIA NVIDIA deleted a comment from lgtm-com bot Sep 2, 2020
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes to pertinent to the model itself, and some major concerns regarding logging callbacks.

examples/speaker_recognition/spkr_get_emb.py Show resolved Hide resolved
slice_length = self.featurizer.sample_rate * self.time_length
_, audio_lengths, _, tokens_lengths = zip(*batch)
slice_length = min(slice_length, max(audio_lengths))
shift = 1 * 16000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded sample_rate? Replace with featurizer.sample_rate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks missed it, Done

"""
return {"loss": NeuralType(elements_type=LossType())}

def __init__(self, s=20.0, m=1.35):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No option to override epsilon for other tasks? Add default eps=1e-7.

Also, dont use 1 character names for variables. And add docstring to this class.

Copy link
Collaborator Author

@nithinraok nithinraok Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eps is not a parameter, its to avoid negligible division by zero. Yes, I'll add docstring. s and m are very well known short forms in angular loss literature for scale and margin. If it is compulsory I will look.

super().__init__()

self.eps = 1e-7
self.s = s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, dont save variables with 1 char names. If its from a paper, add a reference section and explain what this variable is supposed to do (better yet, just use a descriptive name).

self.loss = CELoss()
if 'angular' in cfg.decoder.params and cfg.decoder.params['angular']:
logging.info("Training with Angular Softmax Loss")
s = cfg.loss.s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config needs to have descriptive names, not one char variable name.

self,
feat_in,
num_classes,
emb_sizes=[1024, 1024],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont directly use lists here, use None and check for None below and create [1024, 1024] if None. Refer https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

embs.append(emb)

if self.angular:
for W in self.final.parameters():
_ = F.normalize(W, p=2, dim=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pytorch.org/docs/master/nn.functional.html#torch.nn.functional.normalize

F.normalize is not an inplace op unless you use out=, so whats the point of this loop then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just normalizes the weights before calculating loss. I missed W = here, thanks Som

batch_idx + 1,
total_batches,
pl_module.loss_value,
pl_module.accuracy,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all modules have accuracy so this callback will fail for a lot of models. Why not just read the log in its entirety and just print all of the values in the log? Cant we access the log at the end of train_batch_end?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not yet, PTL is working on it


@rank_zero_only
def on_train_batch_end(self, trainer, pl_module, batch, batch_idx, dataloader_idx):
print_freq = trainer.row_log_interval
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will print every single batch (since PTL default is 10, nemo default is 1, not 1.0).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, its provided by user based on what % of num_batches or exact number he/she requires.

)

def on_validation_epoch_end(self, trainer, pl_module):
logging.info(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, not all models have accuracy, so this will crash for them. is there a way to access the log dictionary itself?

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Sep 3, 2020

This pull request introduces 2 alerts and fixes 3 when merging fdd898d into 292e2fb - view on LGTM.com

new alerts:

  • 1 for Testing equality to None
  • 1 for Variable defined multiple times

fixed alerts:

  • 3 for Unused import

Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@@ -49,12 +39,15 @@
def main(cfg):

logging.info(f'Hydra config: {cfg.pretty()}')
trainer = pl.Trainer(logger=False, checkpoint_callback=False)
if cfg.trainer.gpus > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait do this only during inference (trainer.test()) otherwise you can't use multi GPU training

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spkr_get_emb.py is only run for inference purposes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok.

examples/speaker_recognition/spkr_get_emb.py Show resolved Hide resolved
nemo/collections/asr/losses/angularloss.py Show resolved Hide resolved
nemo/collections/asr/modules/conv_asr.py Show resolved Hide resolved
nemo/collections/asr/modules/conv_asr.py Show resolved Hide resolved
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to go. I'll let @fayejf look it over for comments and then let's merge

@@ -49,12 +39,15 @@
def main(cfg):

logging.info(f'Hydra config: {cfg.pretty()}')
trainer = pl.Trainer(logger=False, checkpoint_callback=False)
if cfg.trainer.gpus > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok.

@lgtm-com
Copy link

lgtm-com bot commented Sep 4, 2020

This pull request introduces 2 alerts and fixes 3 when merging 8e2cd41 into b5ecf8f - view on LGTM.com

new alerts:

  • 1 for Testing equality to None
  • 1 for Variable defined multiple times

fixed alerts:

  • 3 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Sep 4, 2020

This pull request introduces 2 alerts and fixes 3 when merging 8007677 into e9d98c6 - view on LGTM.com

new alerts:

  • 1 for Testing equality to None
  • 1 for Variable defined multiple times

fixed alerts:

  • 3 for Unused import

Copy link
Collaborator

@fayejf fayejf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just have two minor questions.

examples/speaker_recognition/speaker_reco.py Show resolved Hide resolved
nemo/collections/asr/data/audio_to_label.py Show resolved Hide resolved
@nithinraok nithinraok merged commit c765631 into main Sep 4, 2020
@nithinraok nithinraok deleted the angularLoss1.0 branch September 4, 2020 18:36
@jainal09
Copy link

jainal09 commented Oct 2, 2020

hey @nithinraok i want to perform speaker diarialisation providing a audio file and getting multi speaker transcript ( stt of identified speakers) how to do that with this pr?

@nithinraok
Copy link
Collaborator Author

nithinraok commented Oct 2, 2020

You could extract embeddings with this script, and use those frame level embeddings to perform Spectral Clustering by mentioning num_speakers as number of clusters. We don't have this as a to go unified script for now, but all the individual pieces are already there. I will for sure add in next coming weeks with more features.

@jainal09
Copy link

jainal09 commented Oct 3, 2020

Thanks and waiting for these for a long time. @nithinraok

dcurran90 pushed a commit to dcurran90/NeMo that referenced this pull request Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants