Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix invalid_characters bug #681

Closed
wants to merge 1 commit into from

Conversation

ain-soph
Copy link
Contributor

@ain-soph ain-soph commented Aug 11, 2024

There are 2 issues.

  1. In Chinese/English dataset texts, there are often "?" or "?". Currently, they are regarded as invalid characters but no map in character_simplier.
  2. We should call count_invalid_characters again after _apply_character_map. Otherwise, if there is still no tag in text, It would go into the next if condition and raise an error because tags are None.

ChatTTS/ChatTTS/norm.py

Lines 194 to 200 in 69aa900

if len(invalid_characters):
texts, tags = _split_tags(text)
self.logger.debug("split texts %s, tags %s", str(texts), str(tags))
texts = [self.reject_pattern.sub("", t) for t in texts]
self.logger.debug("normed texts %s", str(texts))
text = _combine_tags(texts, tags) if len(tags) > 0 else texts[0]
self.logger.debug("combined text %s", text)

@ain-soph
Copy link
Contributor Author

ain-soph commented Aug 11, 2024

This PR fails #655 unittest, which requires a further check.

@ain-soph
Copy link
Contributor Author

The first change in character_simplier seems not to be the expected behavior according to #655 test file, as it requires the refined text still has ? inside.

I wonder since this ? is invalid_character, which means it's not seen in training data. Shall we still keep that ? in inference stage?

@ain-soph
Copy link
Contributor Author

After check, it seems both concerns are not expected behaviors. Close this issue.

@ain-soph ain-soph closed this Aug 11, 2024
@ain-soph ain-soph deleted the fix-invalid_characters branch August 11, 2024 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant