AutoRAG-example-tokenizer-benchmark

This is a benchmark of Korean tokenizers at BM25 retriever. With AutoRAG, you can make this kind of benchmark easy and fast.

Result

top-k 1

Module Name	F1 Score	Recall	Precision	mAP	NDCG
ko_kkma	0.7544	0.7544	0.7544	0.7544	0.7544
ko_kiwi	0.7281	0.7281	0.7281	0.7281	0.7281
space	0.6667	0.6667	0.6667	0.6667	0.6667
ko_okt	0.7982	0.7982	0.7982	0.7982	0.7982
upstage_embed	0.6667	0.6667	0.6667	0.6667	0.6667

top-k 3

Module Name	F1 Score	Recall	Precision	mAP	NDCG
ko_kkma	0.4649	0.9298	0.3099	0.8319	0.8570
ko_kiwi	0.4430	0.8860	0.2953	0.7968	0.8197
space	0.4167	0.8333	0.2778	0.7383	0.7626
ko_okt	0.4781	0.9561	0.3187	0.8684	0.8910
upstage_embed	0.4298	0.8596	0.2865	0.3582	0.4842

top-k 5

Module Name	F1 Score	Recall	Precision	mAP	NDCG
ko_kkma	0.3216	0.9649	0.1930	0.8402	0.8718
ko_kiwi	0.3158	0.9474	0.1895	0.8108	0.8449
space	0.2836	0.8509	0.1702	0.7418	0.7694
ko_okt	0.3216	0.9649	0.1930	0.8706	0.8948
upstage_embed	0.3041	0.9123	0.1825	0.2232	0.3862

top-k 10

Module Name	F1 Score	Recall	Precision	mAP	NDCG
ko_kkma	0.1770	0.9737	0.0974	0.8417	0.8749
ko_kiwi	0.1754	0.9649	0.0965	0.8129	0.8504
space	0.1659	0.9123	0.0912	0.7509	0.7901
ko_okt	0.1786	0.9825	0.0982	0.8731	0.9005
upstage_embed	0.1738	0.9561	0.0956	0.1094	0.2898

top-k 50

Module Name	F1 Score	Recall	Precision	mAP	NDCG
ko_kkma	0.0392	1.0000	0.0200	0.8427	0.8804
ko_kiwi	0.0389	0.9912	0.0198	0.8144	0.8566
space	0.0372	0.9474	0.0189	0.7532	0.7988
ko_okt	0.0392	1.0000	0.0200	0.8743	0.9050
upstage_embed	0.0392	1.0000	0.0200	0.0206	0.1776

Installation

pip install -r requirements.txt

Running the project

Download dataset to data folder.
Make .env file using .env.template file. (You have to specify Upstage API key)
Run evaluator with the following command.

python3 main.py

Check the result in the benchmark folder.

You can check the config file at config folder. (tokenizer_benchmark.yaml)

And you can specify project dir if you want.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoRAG-example-tokenizer-benchmark

Result

top-k 1

top-k 3

top-k 5

top-k 10

top-k 50

Installation

Running the project

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
data		data
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
validate.py		validate.py

Marker-Inc-Korea/AutoRAG-example-tokenizer-benchmark

Folders and files

Latest commit

History

Repository files navigation

AutoRAG-example-tokenizer-benchmark

Result

top-k 1

top-k 3

top-k 5

top-k 10

top-k 50

Installation

Running the project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages