Finetuning Qwen2.5-0.5B on QA and Preference Datasets
This project involves fine-tuning the Qwen2.5-0.5B model, a large language model, for question-answering (QA) tasks and direct preference optimization (DPO) to enhance model response quality. The training leverages LoRA (Low-Rank Adaptation) and quantization techniques to optimize VRAM usage, making it feasible to perform the fine-tuning on limited hardware resources.