Any relevant documentation from the QA run should go in this folder.
- Basic info about the run in
qa.json
- Progress log with timestamps in
progress.md
- Review from QA tester in
review.md
- Any other relevant files (e.g. notes from QA tester, files they created, results from scoring)
This file is a good place for any miscellaneous information about the QA run. For example:
- Skill level of the human participant in the relevant domain
- Context they had about the task
- Which AI tools and models they used
- If the task involves waiting for something (e.g. training an ML model), a rough estimate of how long that took
- Any discrepancies between the QA environment and what an LLM agent would have
- If the task is automatically scored: instructions to run the scoring procedure on the participant's solution