From bd76f1d072b0031a0b29f3599b9df94ae950bf3a Mon Sep 17 00:00:00 2001
From: zclzc <38581401+lkevinzc@users.noreply.github.com>
Date: Tue, 5 Nov 2024 19:48:15 +0800
Subject: [PATCH 1/4] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index 9ef2dee..ad3ca97 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/logo.png" height="230" alt="OAT" />
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/logo.png" width=90% alt="OAT" />
 </p>
 
 [![PyPI - Version](https://img.shields.io/pypi/v/oat-llm.svg)](https://pypi.org/project/oat-llm)
@@ -175,4 +175,4 @@ We thank the following awesome projects that have contributed to the development
 
 ## Disclaimer
 
-This is not an official Sea Limited or Garena Online Private Limited product.
\ No newline at end of file
+This is not an official Sea Limited or Garena Online Private Limited product.

From cde0b050920c5f242385d2d4217199066d884be5 Mon Sep 17 00:00:00 2001
From: zclzc <38581401+lkevinzc@users.noreply.github.com>
Date: Tue, 5 Nov 2024 19:50:36 +0800
Subject: [PATCH 2/4] Update README.md

---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index ad3ca97..6f1b34e 100644
--- a/README.md
+++ b/README.md
@@ -34,7 +34,7 @@ LLM alignment is essentially an online learning and decision making problem wher
 In our [paper](https://arxiv.org/abs/2411.01493), we formalize LLM alignment as a **contextual dueling bandit (CDB)** problem (see illustration below) and propose a sample-efficient alignment approach based on Thompson sampling.
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e0da719024bdc16fb4a993a8405e15cb0cf2b53a/interface.png" height="160"/>
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e0da719024bdc16fb4a993a8405e15cb0cf2b53a/interface.png" width=75%/>
 </p>
 
 The CDB framework necessitates an efficient online training system to validate the proposed method and compare it with other baselines. Oat 🌾 is developed as part of this research initiative.
@@ -42,7 +42,7 @@ The CDB framework necessitates an efficient online training system to validate t
 Using the CDB framework, existing LLM alignment paradigms can be summarized as follows:
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/acbb25a20dd6c1e7619539b0fa449076ade2f873/compare.png" height="270"/>
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/acbb25a20dd6c1e7619539b0fa449076ade2f873/compare.png" width=95%/>
 </p>
 
 For more details, please check out our [paper](https://arxiv.org/abs/2411.01493)!
@@ -128,7 +128,7 @@ python -m oat.experiment.main \
 ```
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/example_result.png" height="330"/>
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/example_result.png" width=60%/>
 </p>
 
 Check out this [tutorial](./examples/) for more examples covering:
@@ -140,11 +140,11 @@ Check out this [tutorial](./examples/) for more examples covering:
 The benchmarking compares oat with the online DPO implementation from [huggingface/trl](https://huggingface.co/docs/trl/main/en/online_dpo_trainer). Below, we outline the configurations used for oat and present the benchmarking results. Notably, oat 🌾 achieves up to **2.5x** computational efficiency compared to trl 🤗.
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/system_configs.png" height="320"/>
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/system_configs.png" width=90%/>
 </p>
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/bench_results.png" height="300" alt="OAT" />
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/bench_results.png" width=70% />
 </p>
 
 Please refer to [Appendix C of our paper](https://arxiv.org/pdf/2411.01493#page=17.64) for a detailed discussion of the benchmarking methods and results.

From e48203ea940563056b5cc9f4e7cb0114a606474d Mon Sep 17 00:00:00 2001
From: zclzc <38581401+lkevinzc@users.noreply.github.com>
Date: Tue, 5 Nov 2024 19:52:53 +0800
Subject: [PATCH 3/4] Update README.md

---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 6f1b34e..f8d8260 100644
--- a/README.md
+++ b/README.md
@@ -34,7 +34,7 @@ LLM alignment is essentially an online learning and decision making problem wher
 In our [paper](https://arxiv.org/abs/2411.01493), we formalize LLM alignment as a **contextual dueling bandit (CDB)** problem (see illustration below) and propose a sample-efficient alignment approach based on Thompson sampling.
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e0da719024bdc16fb4a993a8405e15cb0cf2b53a/interface.png" width=75%/>
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e0da719024bdc16fb4a993a8405e15cb0cf2b53a/interface.png" width=80%/>
 </p>
 
 The CDB framework necessitates an efficient online training system to validate the proposed method and compare it with other baselines. Oat 🌾 is developed as part of this research initiative.
@@ -128,7 +128,7 @@ python -m oat.experiment.main \
 ```
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/example_result.png" width=60%/>
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/example_result.png" width=50%/>
 </p>
 
 Check out this [tutorial](./examples/) for more examples covering:
@@ -140,11 +140,11 @@ Check out this [tutorial](./examples/) for more examples covering:
 The benchmarking compares oat with the online DPO implementation from [huggingface/trl](https://huggingface.co/docs/trl/main/en/online_dpo_trainer). Below, we outline the configurations used for oat and present the benchmarking results. Notably, oat 🌾 achieves up to **2.5x** computational efficiency compared to trl 🤗.
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/system_configs.png" width=90%/>
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/system_configs.png" width=95%/>
 </p>
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/bench_results.png" width=70% />
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/bench_results.png" width=60% />
 </p>
 
 Please refer to [Appendix C of our paper](https://arxiv.org/pdf/2411.01493#page=17.64) for a detailed discussion of the benchmarking methods and results.

From 451ebfafe5925a580cfb8b5068af772cc934ba7a Mon Sep 17 00:00:00 2001
From: zclzc <38581401+lkevinzc@users.noreply.github.com>
Date: Tue, 5 Nov 2024 19:53:55 +0800
Subject: [PATCH 4/4] Update README.md

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index f8d8260..22e842d 100644
--- a/README.md
+++ b/README.md
@@ -128,7 +128,7 @@ python -m oat.experiment.main \
 ```
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/example_result.png" width=50%/>
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/example_result.png" width=55%/>
 </p>
 
 Check out this [tutorial](./examples/) for more examples covering:
@@ -140,11 +140,11 @@ Check out this [tutorial](./examples/) for more examples covering:
 The benchmarking compares oat with the online DPO implementation from [huggingface/trl](https://huggingface.co/docs/trl/main/en/online_dpo_trainer). Below, we outline the configurations used for oat and present the benchmarking results. Notably, oat 🌾 achieves up to **2.5x** computational efficiency compared to trl 🤗.
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/system_configs.png" width=95%/>
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/system_configs.png" width=97%/>
 </p>
 
 <p align="center">
-  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/bench_results.png" width=60% />
+  <img src="https://gist.githubusercontent.com/lkevinzc/98afee30a5141d7068a0b35a88901a31/raw/e23f40d33e8a2fa4220e8122c152b356084b8afb/bench_results.png" width=65% />
 </p>
 
 Please refer to [Appendix C of our paper](https://arxiv.org/pdf/2411.01493#page=17.64) for a detailed discussion of the benchmarking methods and results.