You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The easy answer is to remove such sentences as abbreviations should not have been included in the texts to read. However, there is also a longer, more painful answer too....
If you wanted to keep all readings of this sentence, then an option might be to listen to each one. (I don't know how many there are as I haven't looked.) If a person says it "correctly" in French, which I assumed you are concerned with, then you can just leave it as is "PHP".
However, some people might have not said it "incorrectly" in French. (I'm not sure how it is said in French, but I'll assume it's like in English with a French accent.) For these people youd have to transform the text to a transcript of which they actually said.
The French preprocessor fr.py, and all other language specific preprocessors, allow you to do this per user transformation of a transcript as it is passed both the transcript in sentence and the ID id the user who said the sentence in client_id.
The design of the language specific preprocessor was made for just this use case, where text "Room 246" could be validly read in different ways, e.g. "Room two four six" or "Room two hundred forty six", by different people and the text would have to be fixed on a user by user basis.
as described in common-voice/sentence-collector#169, a common issue is that some words are spelled (ie letter by letter).
Example: PHP => P H P
what should be the output of CorporaCreator in such situations ?
The text was updated successfully, but these errors were encountered: