Overesteemed results.
Use ROUGE-L score instead of cosine similarity
Cosine similarity is used e.g. for prompt recovery quality estimation
ML task setting: metric choice for model scoring
kaggle "LLM Prompt Recovery" competition KHOI NGUYEN solution