Wer tracker #414

JorisCos · 2021-01-21T16:22:43Z

About this PR

This PR makes it possible to keep track of the transcriptions made by the ASR models.

The file containing the transcriptions is a .json that looks like this :

This PR also adds jiwer transformation to the measure computation. It removes the punctuation and puts everything to lowercase. This leads to a more accurate WER.

mpariente

The estimates seem a bit off 😂 but this was clearly missing, thanks!

asteroid/metrics.py

JorisCos · 2021-01-21T17:27:42Z

Haha the estimates are wrong because the model was trained on noisy inputs and the enhancement is making things worse...

mpariente · 2021-01-25T10:58:18Z

Let's include the wav normalization in the WERTracker class?

popcornell · 2021-01-25T22:14:31Z

What is the ID field in .json annotation ?

JorisCos · 2021-01-26T13:24:20Z

Let's include the wav normalization in the WERTracker class?

We can but we have to do it anyway before saving the files in eval.py.

What is the ID field in .json annotation ?

In a general way the ID is something we introduced for librimix to match transcriptions and wav files . For this specific screen shot this annotation and ID's are taken from CHIME 4. ( I will open a PR soon)

popcornell · 2021-01-26T13:37:01Z

In a general way the ID is something we introduced for librimix to match transcriptions and wav files . For this specific screen shot this annotation and ID's are taken from CHIME 4. ( I will open a PR soon)

Maybe call em UtteranceID or ExampleID? Because we might need also speaker IDs

mpariente · 2021-01-27T19:51:55Z

Also, please make the fields in JSON all lower case: "text_0", "utt_id_0" etc...

mpariente · 2021-01-28T15:23:35Z

asteroid/metrics.py

        self.sample_rate = int(d.data_frame[d.data_frame["name"] == model_name]["fs"])
        self.trans_df = trans_df
        self.trans_dic = self._df_to_dict(trans_df)
        self.mix_counter = Counter()
        self.clean_counter = Counter()
        self.est_counter = Counter()
+        self.transformation = jiwer.Compose([jiwer.ToLowerCase(), jiwer.RemovePunctuation()])


Is this transformation enough?
The default is

[<jiwer.transforms.RemoveMultipleSpaces at 0x7fbc79a75df0>, <jiwer.transforms.Strip at 0x7fbc79a75e20>, <jiwer.transforms.SentencesToListOfWords at 0x7fbc79a75f10>, <jiwer.transforms.RemoveEmptyStrings at 0x7fbc7aa17bb0>]

When I tested on CHIME4 these were the two that made a difference but you are right let's add the others. It doesn't cost that much anyway.

mpariente · 2021-01-28T15:25:03Z

asteroid/metrics.py

+    def all_transcriptions(self):
+        return dict(transcriptions=self.transcriptions)


I don't really see the point of the dict with one field, returning the list.
I'd remove this method entirely

remove all_transcriptions method

mpariente · 2021-02-02T14:39:09Z

/lint

JorisCos added 3 commits January 21, 2021 16:43

add all_transciptions as json

f908ae9

update doc

aed14a9

black reformated

0b5036d

mpariente reviewed Jan 21, 2021

View reviewed changes

asteroid/metrics.py Outdated Show resolved Hide resolved

asteroid/metrics.py Show resolved Hide resolved

asteroid/metrics.py Outdated Show resolved Hide resolved

move transformation in init

46a36ff

lowercase

7de1eee

mpariente reviewed Jan 28, 2021

View reviewed changes

add transformations

6cd1c90

remove all_transcriptions method

mpariente mentioned this pull request Feb 2, 2021

Add MetricTracker #394

Merged

/lint

f7d9c6b

mpariente merged commit cc2602e into asteroid-team:master Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wer tracker #414

Wer tracker #414

JorisCos commented Jan 21, 2021

mpariente left a comment

JorisCos commented Jan 21, 2021

mpariente commented Jan 25, 2021

popcornell commented Jan 25, 2021

JorisCos commented Jan 26, 2021 •

edited

Loading

popcornell commented Jan 26, 2021

mpariente commented Jan 27, 2021

mpariente Jan 28, 2021

JorisCos Jan 29, 2021

mpariente Jan 28, 2021

mpariente commented Feb 2, 2021

		def all_transcriptions(self):
		return dict(transcriptions=self.transcriptions)

Wer tracker #414

Wer tracker #414

Conversation

JorisCos commented Jan 21, 2021

About this PR

mpariente left a comment

Choose a reason for hiding this comment

JorisCos commented Jan 21, 2021

mpariente commented Jan 25, 2021

popcornell commented Jan 25, 2021

JorisCos commented Jan 26, 2021 • edited Loading

popcornell commented Jan 26, 2021

mpariente commented Jan 27, 2021

mpariente Jan 28, 2021

Choose a reason for hiding this comment

JorisCos Jan 29, 2021

Choose a reason for hiding this comment

mpariente Jan 28, 2021

Choose a reason for hiding this comment

mpariente commented Feb 2, 2021

JorisCos commented Jan 26, 2021 •

edited

Loading