calculate and return hypothesis confidence number #42

farmnerd · 2016-03-30T19:13:36Z

This PR is a proposal for including a confidence number in the HTTP JSON response. The confidence algorithm is from some commits to the sample full postprocessor: a hypothesis whose kaldi likelihood is relatively much higher than the next hypothesis's likelihood gets a higher confidence number; hypotheses whose likelihoods are closer together get a lower confidence number.

A couple points to notice when considering this PR:

If a result contains multiple segments, the overall confidence is the average of all segments' confidences (not sure if there's a better or more standard approach there)
It's not exposed in the final result, but I am calculating confidences for all the n-best hypotheses for each segment. The algorithm is the same, except a confidence won't ever be higher than the previous n-best hypothesis's confidence. For example for a 3rd-best hypothesis, I'm not sure if "confidence" means "I'm confident that this is the 3rd best", or if it means "I'm confident that this is the right answer overall". I went with the latter, meaning confidences of later hypotheses will not be greater than confidences of hypotheses earlier in the n-best list.

Comments or suggestions welcome - thanks!

…od numbers. Confidence algorithm borrowed from the sample_full_post_processor commits 2d1be9d and ea87b6a, and implemented in the main master/worker code.

alumae · 2016-06-17T18:05:58Z

Sorry for not checking out this PR earlier.
This seems like a good idea. However, it seems to me that the confidence for the mult-segment hyp should be the product of the confidences of the individual segments. Confidences are like probabilities, and when you combine the probabilities of multiple events, you need to multiply their probabilities. Or do you have any other viewpoint, perhaps from a practical perspective. Of course, this would mean that the confidence of long multi-segment utterance will be very small, but it seems to me that this reflects the reality (after all, you cannot be so sure that at least one word is not correct).

Estimate confidence numbers for hypotheses, based on Kaldi's likeliho…

4814eaf

…od numbers. Confidence algorithm borrowed from the sample_full_post_processor commits 2d1be9d and ea87b6a, and implemented in the main master/worker code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calculate and return hypothesis confidence number #42

calculate and return hypothesis confidence number #42

farmnerd commented Mar 30, 2016

alumae commented Jun 17, 2016

calculate and return hypothesis confidence number #42

Are you sure you want to change the base?

calculate and return hypothesis confidence number #42

Conversation

farmnerd commented Mar 30, 2016

alumae commented Jun 17, 2016