DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
-
Updated
Dec 12, 2024 - C
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
Add a description, image, and links to the guided-decoding topic page so that developers can more easily learn about it.
To associate your repository with the guided-decoding topic, visit your repo's landing page and select "manage topics."