This repository contains the data used by the paper "Automated Repair of Code from Language Models". The repisotory is split into multiple main folders with explanations given below:
- APR_Patches
- Defects_Classifications
- LMDefects
The LMDefects folder contains the LMDefects the dataset, split into two main folders - Codex_Generated_Solutions and Codex_Generated_Solutions_Ground_Truth, the former having the originally generated solutions and the later fixed versions of the problems, assuming such a fix was found.
The APR_Patches folder contsins the fault localization information, used by both Recorder and TBar, the patches generated by the aforementioned tools and the patches generated by the three versions of Codex-e we have used.
The Defects_Classifications sheet contains the data of our classifications of the different solutions, whether they could compile or not, whether the solution is plausible and the type of fix needed.
.
├── APR_Patches // Contain all patches generated by APR tools
│ ├── Codex_Edit_Patches
│ │ ├── Codex_Edit_Bug // All correct patches produced by Codex_e_bug
│ │ ├── Codex_Edit_Line // All correct patches produced by Codex_e_line
│ │ ├── Codex_Edit_Stmt // All correct patches produced by Codex_e_stmt
│ │ └── raw_data // All patches produced by Codex Edit Mode
│ ├── Codex_fl_result // Fault localization info used by APR tools
│ ├── Recoder_Patches
│ └── TBar_Patches
├── Defects_Classifications // All defects category classification for Codex produced incorrect solution
├── LMDefects
│ ├── Codex_Generated_Solutions // All solutions generated by Codex
│ └── Codex_Generated_Solutions_Ground_Truth // All constructed ground truth
└── README.md