In eval，why generate 64 responses per query to estimate pass@1? #22

Hunter-P · 2025-01-21T06:33:39Z

In general, model generates just one response per query to estimate pass@1, why generate 64 responses per query to estimate pass@1?

kaiyliu · 2025-01-21T12:03:36Z

Where does it say to generate 64 responses per query to estimate the pass@1？

bdytx5 · 2025-01-21T13:58:22Z

had the same question. was able to pry this out of chatgpt

Why Multiple Samples?

When a model generates a single response, that output is influenced by inherent randomness due to sampling methods like temperature and top-p. This randomness means that a single response might not reliably represent the model's true performance. By generating multiple responses (e.g., 64) for each query, researchers can better account for this variability and more accurately estimate the likelihood that the model's first response is correct.

Supporting Research

The paper "Evaluating Large Language Models Trained on Code"
arXiv
discusses this approach in detail. The authors explain that to evaluate pass@k, they generate multiple samples per task, count the number of correct samples, and calculate an unbiased estimator for pass@k. This method helps in providing a more reliable measure of the model's performance.

github-actions · 2025-02-21T00:54:02Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions!

github-actions bot added the stale label Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In eval，why generate 64 responses per query to estimate pass@1? #22

In eval，why generate 64 responses per query to estimate pass@1? #22

Hunter-P commented Jan 21, 2025

kaiyliu commented Jan 21, 2025

bdytx5 commented Jan 21, 2025

github-actions bot commented Feb 21, 2025

In eval，why generate 64 responses per query to estimate pass@1? #22

In eval，why generate 64 responses per query to estimate pass@1? #22

Comments

Hunter-P commented Jan 21, 2025

kaiyliu commented Jan 21, 2025

bdytx5 commented Jan 21, 2025

github-actions bot commented Feb 21, 2025