Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard to reproduce the metrics of VES and R-VES. #11

Open
JimXiongGM opened this issue Aug 3, 2024 · 1 comment
Open

Hard to reproduce the metrics of VES and R-VES. #11

JimXiongGM opened this issue Aug 3, 2024 · 1 comment

Comments

@JimXiongGM
Copy link

The Valid Efficiency Score (VES) and the Reward-based Valid Efficiency Score (R-VES) are calculated based on the execution time of SQL queries, which means the results depend on the computer hardware. Therefore, it is recommended to release the submitters' DEV prediction files on the official leaderboard so that researchers can make fair comparisons on their own machines.
Thank you.

@bird-bench
Copy link
Owner

@JimXiongGM Thanks for your interests in our work. Make sense, we already add this into submission guidelines as required files. For previous submissions, you still need to request for dev files of theirs. However, even the execution time would depend on hardware, we already mitigate this by repeating each for 100 times, reducing outliers and setting ceilings for each. But this is a nice suggestion. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants