Combining Contrastive Learning and Knowledge Graph Embeddings to develop medical word embeddings for the Italian language
Word embeddings play a significant role in today's Natural Language Processing tasks and applications. However, there is a significant gap in the availability of high quality-word embeddings specific to the Italian medical domain. This study aims to address this gap by proposing a tailored solution that combines Contrastive Learning (CL) methods and Knowledge Graph Embedding (KGE), introducing a new variant of the loss function. Given the limited availability of medical texts and controlled vocabularies in the Italian language, traditional approaches for word embedding generation may not yield adequate results. To overcome this challenge, our approach leverages the synergistic benefits of CL and KGE techniques. We achieve a significant performance boost compared to the initial model, while using a considerably smaller amount of data. This work establishes a solid foundation for further investigations aimed at improving the accuracy and coverage of word embeddings in low-resource languages and specialized domains.
Data and pretrained model are available at drive.google.com/medita_embeddings