Releases: philsupertramp/factory
Releases · philsupertramp/factory
v0.1.1
HART support
Current configuration on cuda
device:
- hart-LLM
float16
- Qwen2-VL
int4
bitsandbytes quant
This configuration requires around 3.7GB VRAM when the models are loaded + additional ~3GB during inference (due to Qwen2 text embeddings) maxing out around 7.7GB VRAM
Examples
We did not cherry-pick any results, the generated images are the first ones received when prompting the model.
curl -X 'POST' \
'http://localhost:8001/models/hart' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"inputs": "An astronaut riding a horse on the moon, oil painting by Van Gogh.",
"parameters": {}
}'
curl -X 'POST' \
'http://localhost:8001/models/hart' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"inputs": "A panda that has been cybernetically enhanced.",
"parameters": {}
}'
Full Changelog: v0.1.0...v0.1.1
v0.1.0
First release
This release is created to only set a point in time from which to build up.
It contains the MVP version of the service.
See #6 for more.
What's Changed
- Bump aiohttp from 3.9.3 to 3.9.4 by @dependabot in #1
- v1 by @philipp-zettl in #6
- Bump certifi from 2024.6.2 to 2024.7.4 by @dependabot in #9
- Bump jupyterlab from 4.2.1 to 4.2.5 by @dependabot in #10
New Contributors
- @dependabot made their first contribution in #1
- @philipp-zettl made their first contribution in #6
Full Changelog: https://github.com/philipp-zettl/factory/commits/v0.1.1