From 9f62da60e489dc17ec047602c51a95a4372f5adb Mon Sep 17 00:00:00 2001 From: zclzc <38581401+lkevinzc@users.noreply.github.com> Date: Tue, 5 Nov 2024 19:56:42 +0800 Subject: [PATCH] Fix image width (#8) * Update README.md * Update README.md * Update README.md * Update README.md --- README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 9ef2dee..22e842d 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@

- OAT + OAT

[![PyPI - Version](https://img.shields.io/pypi/v/oat-llm.svg)](https://pypi.org/project/oat-llm) @@ -34,7 +34,7 @@ LLM alignment is essentially an online learning and decision making problem wher In our [paper](https://arxiv.org/abs/2411.01493), we formalize LLM alignment as a **contextual dueling bandit (CDB)** problem (see illustration below) and propose a sample-efficient alignment approach based on Thompson sampling.

- +

The CDB framework necessitates an efficient online training system to validate the proposed method and compare it with other baselines. Oat 🌾 is developed as part of this research initiative. @@ -42,7 +42,7 @@ The CDB framework necessitates an efficient online training system to validate t Using the CDB framework, existing LLM alignment paradigms can be summarized as follows:

- +

For more details, please check out our [paper](https://arxiv.org/abs/2411.01493)! @@ -128,7 +128,7 @@ python -m oat.experiment.main \ ```

- +

Check out this [tutorial](./examples/) for more examples covering: @@ -140,11 +140,11 @@ Check out this [tutorial](./examples/) for more examples covering: The benchmarking compares oat with the online DPO implementation from [huggingface/trl](https://huggingface.co/docs/trl/main/en/online_dpo_trainer). Below, we outline the configurations used for oat and present the benchmarking results. Notably, oat 🌾 achieves up to **2.5x** computational efficiency compared to trl 🤗.

- +

- OAT +

Please refer to [Appendix C of our paper](https://arxiv.org/pdf/2411.01493#page=17.64) for a detailed discussion of the benchmarking methods and results. @@ -175,4 +175,4 @@ We thank the following awesome projects that have contributed to the development ## Disclaimer -This is not an official Sea Limited or Garena Online Private Limited product. \ No newline at end of file +This is not an official Sea Limited or Garena Online Private Limited product.