Skip to content

Latest commit

 

History

History
80 lines (59 loc) · 3.61 KB

README.md

File metadata and controls

80 lines (59 loc) · 3.61 KB

Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding

About MultiUI

MultitUI is a dataset of 7.3 million samples spanning various UI types and tasks, structured using enhanced accessibility trees and task taxonomies.

Repository Structure

This repository is divided into two parts:

  • Train: contains training code for LLaVA-OneVision, the base model we used.

  • Evaluation: contains evaluation code on all benchmarks we tested in the paper.

Dataset Download

  • MultiUI: Download our 7.3 million sample training dataset from huggingface.

Models Checkpoint

Model Name LLM Vision Tower Checkpoint
UIX-Qwen2 Qwen2-7B-Instruct siglip-so400m-patch14-384 neulab/UIX-Qwen2
UIX-Qwen2-M2W Qwen2-7B-Instruct siglip-so400m-patch14-384 neulab/UIX-Qwen2-Mind2Web

Run Evaluation

VisualWebBench

To evaluate VisualWebBench related tasks:

cd eval/VisualWebBench
bash run.sh

lmms-eval-MultiUI

We evaluate on GUI understanding&grounding benchmarks (WebSRC, ScreenQA-short, WidgetCap, ScreenSpot, RefExp), OCR/Doc/Chart-related QA (DocVQA, ChartQA, TextVQA, InfoVQA, VisualMRC, OCRBench), and general grounding benchmark (RefCOCO+) with the lmms-eval framework.

To evaluate these datasets:

cd eval/lmms-eval-MultiUI
model=MODEL_NAME
model_type=MODEL_TYPE
python3 -m accelerate.commands.launch \
         --num_processes=8 \
         -m lmms_eval \
         --model $model_type \
         --model_args pretrained=$model,conv_template=qwen_2 \
         --tasks ${task} \
         --batch_size 1 \
         --log_samples \
         --log_samples_suffix ${task} \
         --output_path eval_logs

Mind2Web Evaluation

Download our processed Mind2Web evaluation dataset from huggingface and place it under eval/Mind2Web-SeeAct/src/offline_experiments/screenshot_generation/data

Run inference

cd eval/Mind2Web-SeeAct/src/offline_experiments/

python eval_m2w.py \
--model_name MODEL_NAME \
--model_path MODEL_PATH \
--task_types test_{task/website/domain}

Calculate metrics

python ./action_generation/metric.py

Dataset Disclaimer

The MultiUI dataset is released for open-source use by the research and developer community. The data is largely sourced from publicly available web content or generated by large language models (LLMs). We constructed this dataset using links from Hugging Face’s FineWeb dataset, which is based on a Common Crawl dump, representing publicly accessible data from the web.

This dataset is mostly intended for research purposes, it may contain material that could have inaccuracies, biases, or other unintended issues. We do not intentionally include any copyrighted material, and any resemblance to such content is unintentional.

If you have any concerns regarding specific data or believe that any content should be removed, please contact us, and we will review the request and take appropriate action.