Pinned Loading
-
intel/neural-speed
intel/neural-speed Public archiveAn innovative library for efficient LLM inference via low-bit quantization
-
intel/intel-extension-for-transformers
intel/intel-extension-for-transformers Public archive⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
-
-
leejet/stable-diffusion.cpp
leejet/stable-diffusion.cpp PublicStable Diffusion and Flux in pure C/C++
-
vllm-fork
vllm-fork PublicForked from HabanaAI/vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
-
intel/neural-compressor
intel/neural-compressor PublicSOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.