DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.

C 233 24 Updated Feb 13, 2025

pytorch-labs / tokenizers

C++ implementations for various tokenizers (sentencepiece, tiktoken etc).

C++ 11 2 Updated Feb 20, 2025

Anemll / Anemll

Artificial Neural Engine Machine Learning Library

Python 199 4 Updated Feb 16, 2025

microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime

C++ 618 154 Updated Feb 20, 2025

AmberSahdev / Open-Interface

Control Any Computer Using LLMs.

Python 1,772 163 Updated Feb 18, 2025

mzbac / flux.swift

Swift implementation of Flux.1 using mlx-swift

Swift 76 7 Updated Dec 12, 2024

spcl / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 342 31 Updated Nov 26, 2024

wangkuiyi / huggingface-tokenizer-in-cxx

C++ 57 10 Updated Feb 27, 2023

vladkens / macmon

🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️

Rust 558 19 Updated Feb 15, 2025

armbues / SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

Python 252 25 Updated Feb 20, 2025

browser-use / browser-use

Make websites accessible for AI agents

Python 30,107 3,111 Updated Feb 20, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 20,863 1,823 Updated Feb 20, 2025

foldl / chatllm.cpp

Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)

C++ 516 40 Updated Feb 19, 2025

bytedance / UI-TARS-desktop

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.

TypeScript 2,748 201 Updated Feb 20, 2025

VITA-Group / READ-ME

[NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Z…

Python 6 Updated Dec 16, 2024

BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

Python 17,814 2,174 Updated Feb 20, 2025