community: add hugging face text-to-speech inference API #18880

h0rv · 2024-03-10T20:36:56Z

Description: I implemented a tool to use Hugging Face text-to-speech inference API.

Issue: n/a

Dependencies: n/a

Twitter handle: No Twitter, but do have LinkedIn lol.

vercel · 2024-03-10T20:37:00Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Mar 29, 2024 3:00pm

libs/community/langchain_community/tools/audio/base.py

libs/community/langchain_community/tools/audio/huggingface_text_to_speech_inference.py

eyurtsev · 2024-03-12T16:09:06Z

libs/community/langchain_community/tools/audio/huggingface_text_to_speech_inference.py

+    model: str
+    api_url: str
+    huggingface_api_key: SecretStr
+    format: HuggingFaceSupportedAudioFormat


let's avoid using the enum here instead use Literal and do a run-time check, it saves user the trouble of having to import enums in their code.

format = Literal['wave']

Fixed here: d49cf82

I just turned it into a string. The model I was exploring supported only wav, but I believe others support other formats, but wasn't able to find anything in the huggingface docs. So I think it would be easier to let the user specify rather than trying to keep up with supported formats.

eyurtsev · 2024-03-12T16:10:22Z

libs/community/langchain_community/tools/audio/huggingface_text_to_speech_inference.py

+        output_name = (
+            f"{output_name or self._default_output_name()}.{self.format.value}"
+        )
+        output_path = os.path.join(self.output_dir, output_name)


This logic forces the user to reinstantiate the tool every time. I think it would be better to have an option to generate an output_name that's a uuid.uuid4. what do you think about making that the default? Users would be allowed to over-ride to specify a name if they really want

I thought it would be good for the user to instantiate the tool with a directory, so it doesn't give flood the directory where its called from, and then output everything there, this way they can define either a tmp directory they will throw out or a place they want to save it.

I am using a timestamp rather than a UUID if the user does not specify a specific filename. I will switch it to a UUID though.

But yeah I will remove the output_dir field and the user can handle that logic by optionally passing in a file name.

h0rv · 2024-03-14T13:26:18Z

This is ready for another review

eyurtsev · 2024-03-20T13:43:20Z

libs/community/langchain_community/tools/audio/huggingface_text_to_speech_inference.py

+    def _run(
+        self,
+        query: str,
+        output_base_name: Optional[str] = None,


output_base_name should not be parameter for the the tool input -- without proper validation this exposes a security risk and allows an LLM or a malicious user using the LLM to write content anywhere on the file system.

It's OK to specify a containing folder as part of the initializer of the tool. (e.g., directory)

HuggingFaceTextToSpeechModelInference( destination_dir='...' )

We can also add some additional configuration in the initializer that user to specify what file names are chosen (e.g., timestamp or UUID etc) -- if we want to parameterize this aspect of file naming.

Added destination_dir and added functionality to name the files using uuid4 or timestamps

The default will just be uuid4 for as well. Additionally, I made it so each run call creates the destination directory if it does not exist.

Is that the right approach, or create it on tool initialization? If the directory gets deleted between init and writing to the output, the file write will fail

libs/community/langchain_community/tools/audio/huggingface_text_to_speech_inference.py

h0rv · 2024-03-30T00:33:06Z

@eyurtsev Thanks for merging and the review! Good timing for the release of VoiceCraft: HuggingFace Demo

…chain-ai#18880) Description: I implemented a tool to use Hugging Face text-to-speech inference API. Issue: n/a Dependencies: n/a Twitter handle: No Twitter, but do have [LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol. --------- Co-authored-by: Robby <[email protected]> Co-authored-by: Eugene Yurtsev <[email protected]>

Description: I implemented a tool to use Hugging Face text-to-speech inference API. Issue: n/a Dependencies: n/a Twitter handle: No Twitter, but do have [LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol. --------- Co-authored-by: Robby <[email protected]> Co-authored-by: Eugene Yurtsev <[email protected]>

baskaryan reviewed Mar 11, 2024

View reviewed changes

libs/community/langchain_community/tools/audio/base.py Outdated Show resolved Hide resolved

h0rv force-pushed the huggingface-tts branch 2 times, most recently from 82860ef to 5cac9ec Compare March 11, 2024 02:54

eyurtsev reviewed Mar 12, 2024

View reviewed changes

baskaryan assigned eyurtsev Mar 12, 2024

h0rv force-pushed the huggingface-tts branch 2 times, most recently from 0287697 to 91701d7 Compare March 14, 2024 13:18

eyurtsev reviewed Mar 20, 2024

View reviewed changes

h0rv force-pushed the huggingface-tts branch from 4e19053 to e1e6d6f Compare March 27, 2024 03:11

h0rv added 8 commits March 26, 2024 23:12

Add Hugging Face TTS

c7338b9

Fix doc string

c762c09

Fix

19421ca

Remove AudioTool

7ef332a

Add test

db2dd1c

Fix syntax for 3.8

167fb03

Review fixes

48d8823

Refactor file naming (also fixes security issue)

c339340

h0rv force-pushed the huggingface-tts branch from e1e6d6f to c339340 Compare March 27, 2024 03:12

h0rv mentioned this pull request Mar 29, 2024

openai: add text-to-speech #18427

Closed

Add doc-strings

e280cc4

eyurtsev approved these changes Mar 29, 2024

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Mar 29, 2024

eyurtsev enabled auto-merge (squash) March 29, 2024 15:00

eyurtsev merged commit f7e8a38 into langchain-ai:master Mar 29, 2024
59 checks passed

h0rv deleted the huggingface-tts branch March 29, 2024 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

community: add hugging face text-to-speech inference API #18880

community: add hugging face text-to-speech inference API #18880

h0rv commented Mar 10, 2024

vercel bot commented Mar 10, 2024 •

edited

Loading

eyurtsev Mar 12, 2024

h0rv Mar 13, 2024

h0rv Mar 13, 2024

eyurtsev Mar 12, 2024

h0rv Mar 13, 2024

h0rv Mar 13, 2024

h0rv commented Mar 14, 2024

eyurtsev Mar 20, 2024

h0rv Mar 27, 2024

h0rv Mar 27, 2024

h0rv commented Mar 30, 2024

community: add hugging face text-to-speech inference API #18880

community: add hugging face text-to-speech inference API #18880

Conversation

h0rv commented Mar 10, 2024

vercel bot commented Mar 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h0rv commented Mar 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h0rv commented Mar 30, 2024

vercel bot commented Mar 10, 2024 •

edited

Loading