From d167b4724a79aabb1a8598b5856d4445e54aed77 Mon Sep 17 00:00:00 2001 From: Saurabh Srivastava Date: Sat, 25 Jan 2025 14:36:03 -0600 Subject: [PATCH] Updated readme --- README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ee8514e..6c38aaa 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,13 @@ We appreciate your understanding and patience as we work to ensure the best poss ## Overview -UI-TARS is a next-generation native GUI agent model designed to interact seamlessly with graphical user interfaces (GUIs) using human-like perception, reasoning, and action capabilities. Unlike traditional modular frameworks, UI-TARS integrates all key components—perception, reasoning, grounding, and memory—within a single vision-language model (VLM), enabling end-to-end task automation without predefined workflows or manual rules. +UI-TARS is a next-generation native GUI agent model that enables seamless interaction with graphical user interfaces (GUIs). It combines **perception, reasoning, grounding, and memory** into a single vision-language model (VLM), allowing for end-to-end task automation without predefined workflows or manual rules. + +Key Highlights: +- **Human-like interaction**: Mimics human perception, reasoning, and action. +- **Unified framework**: Integrates all components into a single model. +- **Cross-platform support**: Works across desktop, mobile, and web environments. + ![Local Image](figures/UI-TARS-vs-Previous-SOTA.png) ![Local Image](figures/UI-TARS.png)