-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Assistants] Use textToImage task for avatar generation #662
Merged
Merged
Changes from 7 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
db9922e
Generate assistants avatar using stablediffusion
nsarrazin 6d7b229
wording
nsarrazin 09d39d1
Update +page.server.ts
nsarrazin f59f521
Merge branch 'feature/assistants' into feature/use_sd_for_avatar
nsarrazin 62cea33
Add timeout & controls to avatar generation
nsarrazin 650c9d0
Add controls for avatar generation in .env
nsarrazin 5a17527
Merge branch 'feature/assistants' into feature/use_sd_for_avatar
nsarrazin e70deb6
Update src/routes/+layout.server.ts
nsarrazin 70a1363
Update src/lib/components/AssistantSettings.svelte
nsarrazin 734275a
Fix avatar gen feature flag
nsarrazin 33b748c
Can only upload avatar if generate is unchecked
nsarrazin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
import { HF_TOKEN, TEXT_TO_IMAGE_MODEL } from "$env/static/private"; | ||
import { generateFromDefaultEndpoint } from "$lib/server/generateFromDefaultEndpoint"; | ||
import { HfInference } from "@huggingface/inference"; | ||
|
||
export async function generateAvatar(description?: string, name?: string): Promise<File> { | ||
const queryPrompt = `Generate a prompt for an image-generation model for the following: | ||
Name: ${name} | ||
Description: ${description} | ||
`; | ||
const imagePrompt = await generateFromDefaultEndpoint({ | ||
nsarrazin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
messages: [{ from: "user", content: queryPrompt }], | ||
preprompt: | ||
"You are an assistant tasked with generating simple image descriptions. The user will ask you for an image, based on the name and a description of what they want, and you should reply with a short, concise, safe, descriptive sentence.", | ||
}); | ||
|
||
const hf = new HfInference(HF_TOKEN); | ||
|
||
const blob = await hf.textToImage({ | ||
inputs: imagePrompt, | ||
model: TEXT_TO_IMAGE_MODEL, | ||
}); | ||
|
||
return new File([blob], "avatar.png"); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
export const timeout = <T>(prom: Promise<T>, time: number): Promise<T> => { | ||
let timer: NodeJS.Timeout; | ||
return Promise.race([prom, new Promise<T>((_r, rej) => (timer = setTimeout(rej, time)))]).finally( | ||
() => clearTimeout(timer) | ||
); | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: but we can use newer models like https://huggingface.co/latent-consistency/lcm-lora-ssd-1b (assuming that they are hosted on inference API) that produce higher quality images faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's fair! I picked sd1.5 over sdxl because I assumed it'd be faster but if we have faster models available by all mean let's change it 😁
I think we should prio speed>image quality for this feature, most of the time this will be seen as only a small thumbnail, so if you have good model recommendations, feel free!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, https://huggingface.co/latent-consistency/lcm-lora-sdv1-5 would be faster than
runway/sd-1.5
. https://huggingface.co/latent-consistency/lcm-lora-sdxl might still be faster thanrunway/sd-1.5
. Maybe @patil-suraj @sayakpaul can confirm?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://huggingface.co/blog/lcm_lora
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's SDXL, it will not be SD speed. You can try out Segmind's SSD-1B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So https://huggingface.co/latent-consistency/lcm-lora-ssd-1b would be the best, in terms of speed/quality tradeoff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say so. But, would also consider playing SD Turbo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the license of SD Turbo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried
latent-consistency/lcm-lora-ssd-1bz
but it doesn't seem to load, I think it would need to be pinned in the API on our side if we go for it, just a note not necessarily a blocker 😁