Skip to content
View enduringstack's full-sized avatar

Block or report enduringstack

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A framework for few-shot evaluation of language models.

Python 7,867 2,114 Updated Feb 20, 2025

[arXiv] On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices

Python 101 12 Updated Feb 13, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 10,717 669 Updated Feb 20, 2025

High-speed and easy-use LLM serving framework for local deployment

C++ 88 6 Updated Feb 12, 2025

Train transformer language models with reinforcement learning.

Python 11,832 1,592 Updated Feb 20, 2025

My own approach at trying to create a conversational GPT

C++ 1 1 Updated Feb 11, 2025
C++ 1 2 Updated Sep 24, 2024

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.

C 233 24 Updated Feb 13, 2025

C++ implementations for various tokenizers (sentencepiece, tiktoken etc).

C++ 11 2 Updated Feb 20, 2025

Artificial Neural Engine Machine Learning Library

Python 199 4 Updated Feb 16, 2025

Generative AI extensions for onnxruntime

C++ 618 154 Updated Feb 20, 2025

Control Any Computer Using LLMs.

Python 1,772 163 Updated Feb 18, 2025

Swift implementation of Flux.1 using mlx-swift

Swift 76 7 Updated Dec 12, 2024

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 342 31 Updated Nov 26, 2024

🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️

Rust 558 19 Updated Feb 15, 2025

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

Python 252 25 Updated Feb 20, 2025

Make websites accessible for AI agents

Python 30,107 3,111 Updated Feb 20, 2025

Fully open reproduction of DeepSeek-R1

Python 20,863 1,823 Updated Feb 20, 2025

Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)

C++ 516 40 Updated Feb 19, 2025

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.

TypeScript 2,748 201 Updated Feb 20, 2025

[NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Z…

Python 6 Updated Dec 16, 2024

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

Python 17,814 2,174 Updated Feb 20, 2025

Mixture-of-Agents Framework Implementation at Distributed Edge Devices with Theoretical Guarantee of Finite Average Latency

Python 5 1 Updated Jan 3, 2025

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

207 9 Updated Dec 31, 2024

Contains libraries for use in making provably-private applications.

Kotlin 32 7 Updated Nov 15, 2024
Jupyter Notebook 188 6 Updated Feb 11, 2025
Python 156 20 Updated Aug 12, 2024

[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents

Python 168 11 Updated Feb 20, 2025
Next
Showing results