CLS-metaphorit

This repository contains the code and results of our project on automatic metaphor detection using the BERT-base-multilingual-cased model. The goal of the project was to classify sentences as metaphorical or non-metaphorical.

Dataset

We used two datasets for this project, one with single annotator labels and the other with multiple annotators. The datasets contain sentences in Italian and English. The label distribution was skewed towards positive labels in the single annotator dataset and negative labels in the multi-annotator dataset. A subset of the multi-annotator dataset also had multiclass labels for 'migrant', 'covid', or 'none'.

Dataset 1: This dataset consists of sentences annotated with binary labels for metaphor detection. Each sentence is annotated by a single annotator and the labels are skewed towards positive labels, i.e., the presence of a metaphor.

Metaphor Count

Yes 172

None 47
Dataset 2: This dataset is a multi-annotator dataset with sentences annotated by 2 to 5 annotators. The labels in this dataset include binary classifications for metaphor detection as well as multiclass labels indicating the type of metaphor.

The binary labels are skewed towards negative labels, i.e., the absence of a metaphor, and there are 41 instances where the annotations are evenly split and an agreement cannot be established by majority vote.

aggregated_label_binary Count

No 407

Yes 156

Equal split: yes, no 41

To tackle the metaphor detection task as a binary classification problem, we merge both datasets. This allows us to benefit greatly from each other as the datasets are skewed in different directions. After merging, the distribution of labels is as follows:

Label	Count
No	454
Yes	328

For training and evaluating the model, random undersampling was performed to balance the dataset. Furthermore, we use an 80-10-10 random stratified train-dev-test split with a fixed seed for reproducibility.

Model

We used the 'bert-base-multilingual-cased' model for this project. The model was trained on an 80-10-10 random stratified train-dev-test split of the dataset. We performed hyperparameter search in all runs, and 10 runs with randomly initialized heads.

Training

The training procedure was performed using HuggingFace Transformers library, and consists of the following steps:

Initialization: We start with a pre-trained BERT model, specifically 'bert-base-multilingual-cased'. This model has been trained on a large corpus of multilingual data and has learned to generate useful language representations.
Adding a classification head: On top of the pre-trained BERT model, we add a new classification layer. This head is a simple linear layer followed by a softmax activation function, which is used to output class probabilities.
Fine-tuning: We fine-tune the entire model (both the pre-trained BERT part and the new classification head) on our specific task using our training dataset.
Hyperparameter search: We perform a search over possible values of hyperparameters (like learning rate, number of training epochs, batch size, etc.) to find the combination that gives the best performance on our development dataset.
Multiple runs: To counter variances due to the randomness in neural network training, we repeat the training process with the best hyperparameters 10 times. Each run starts with a different randomly initialized head.
Aggregating results: At the end, we have 10 slightly different models and their performances on the development dataset. We calculate the average performance and standard deviation across these 10 runs to get a robust estimate of our model's performance.

Results

The results for the binary classification problem were as follows:

Undersampling:
- F1 score: 76.91
- Accuracy: 77.95

We have also implemented a method with TF-IDF and a Support Vector Machine classifier, achieving a 75.31 F1 score.

Future Work

Extension of the current dataset would significantly aid the learning process of a model and help exploit the strengths of neural networks.
'covid' metaphors may be too scarce to obtain a sample comparatively large enough.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Notebooks		Notebooks
data_preprocess		data_preprocess
.gitignore		.gitignore
README.md		README.md
main..py		main..py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLS-metaphorit

Dataset

Model

Training

Results

Future Work

About

Releases

Packages

Languages

Metaphor	Count
Yes	172
None	47

aggregated_label_binary	Count
No	407
Yes	156
Equal split: yes, no	41

linhd-postdata/CLS-metaphorit

Folders and files

Latest commit

History

Repository files navigation

CLS-metaphorit

Dataset

Model

Training

Results

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages