prompts
:
de_gpt4_end2end_prompt_utils.py
: prompts used for Section 3 in the paperde_prompt_utils.py
: prompts for German atomic edit extraction and explanation generationzh_prompt_utils.py
: prompts for Chinese atomic edit extraction and explanation generation
fine-tune_llama2-7b
:
fine-tune_llama2-7b.sh
: parameters for fine-tuning the modelqlora.py
: see source code here
rule_based_screening.py
: the heuristic rules for screening out low-level mistakes in atomic edit extraction
SequenceMatcher_rough_edits.py
: use SequenceMatcher from difflib to extract rough edits
fine-tune_data
: the training and test data of LLM fine-tuning for German and Chinese atomic edit extraction. The data is in the format for fine-tuning ChatGPT. Sentence pair
is the source and target sentence; list of edits
are the rough edits extracted by SequenceMatcher; list of labels
are the labels of the edits; content
is the gold atomic edits.
human_annotation_data
: the anonymized raw human annotation data
We modify the paragraph aligner from here to align sentences in the datasets.