-
Notifications
You must be signed in to change notification settings - Fork 16.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community: add hugging face text-to-speech inference API #18880
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
82860ef
to
5cac9ec
Compare
libs/community/langchain_community/tools/audio/huggingface_text_to_speech_inference.py
Outdated
Show resolved
Hide resolved
libs/community/langchain_community/tools/audio/huggingface_text_to_speech_inference.py
Outdated
Show resolved
Hide resolved
model: str | ||
api_url: str | ||
huggingface_api_key: SecretStr | ||
format: HuggingFaceSupportedAudioFormat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's avoid using the enum here instead use Literal
and do a run-time check, it saves user the trouble of having to import enums in their code.
format = Literal['wave']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here: d49cf82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just turned it into a string. The model I was exploring supported only wav, but I believe others support other formats, but wasn't able to find anything in the huggingface docs. So I think it would be easier to let the user specify rather than trying to keep up with supported formats.
output_name = ( | ||
f"{output_name or self._default_output_name()}.{self.format.value}" | ||
) | ||
output_path = os.path.join(self.output_dir, output_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic forces the user to reinstantiate the tool every time. I think it would be better to have an option to generate an output_name that's a uuid.uuid4. what do you think about making that the default? Users would be allowed to over-ride to specify a name if they really want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it would be good for the user to instantiate the tool with a directory, so it doesn't give flood the directory where its called from, and then output everything there, this way they can define either a tmp directory they will throw out or a place they want to save it.
I am using a timestamp rather than a UUID if the user does not specify a specific filename. I will switch it to a UUID though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yeah I will remove the output_dir field and the user can handle that logic by optionally passing in a file name.
0287697
to
91701d7
Compare
This is ready for another review |
def _run( | ||
self, | ||
query: str, | ||
output_base_name: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_base_name
should not be parameter for the the tool input -- without proper validation this exposes a security risk and allows an LLM or a malicious user using the LLM to write content anywhere on the file system.
It's OK to specify a containing folder as part of the initializer of the tool. (e.g., directory
)
HuggingFaceTextToSpeechModelInference(
destination_dir='...'
)
We can also add some additional configuration in the initializer that user to specify what file names are chosen (e.g., timestamp or UUID etc) -- if we want to parameterize this aspect of file naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added destination_dir
and added functionality to name the files using uuid4 or timestamps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default will just be uuid4 for as well. Additionally, I made it so each run
call creates the destination directory if it does not exist.
Is that the right approach, or create it on tool initialization? If the directory gets deleted between init and writing to the output, the file write will fail
libs/community/langchain_community/tools/audio/huggingface_text_to_speech_inference.py
Show resolved
Hide resolved
@eyurtsev Thanks for merging and the review! Good timing for the release of VoiceCraft: HuggingFace Demo |
…chain-ai#18880) Description: I implemented a tool to use Hugging Face text-to-speech inference API. Issue: n/a Dependencies: n/a Twitter handle: No Twitter, but do have [LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol. --------- Co-authored-by: Robby <[email protected]> Co-authored-by: Eugene Yurtsev <[email protected]>
Description: I implemented a tool to use Hugging Face text-to-speech inference API. Issue: n/a Dependencies: n/a Twitter handle: No Twitter, but do have [LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol. --------- Co-authored-by: Robby <[email protected]> Co-authored-by: Eugene Yurtsev <[email protected]>
Description: I implemented a tool to use Hugging Face text-to-speech inference API.
Issue: n/a
Dependencies: n/a
Twitter handle: No Twitter, but do have LinkedIn lol.