Skip to content

v0.1.1

Latest
Compare
Choose a tag to compare
@philipp-zettl philipp-zettl released this 16 Oct 08:40
· 1 commit to main since this release

HART support

Current configuration on cuda device:

  • hart-LLM float16
  • Qwen2-VL int4 bitsandbytes quant

This configuration requires around 3.7GB VRAM when the models are loaded + additional ~3GB during inference (due to Qwen2 text embeddings) maxing out around 7.7GB VRAM

Examples

We did not cherry-pick any results, the generated images are the first ones received when prompting the model.

curl -X 'POST' \
  'http://localhost:8001/models/hart' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "inputs": "An astronaut riding a horse on the moon, oil painting by Van Gogh.",
  "parameters": {}
}'

image

curl -X 'POST' \
  'http://localhost:8001/models/hart' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "inputs": "A panda that has been cybernetically enhanced.",
  "parameters": {}
}'

image

Full Changelog: v0.1.0...v0.1.1