Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis
We called the dataset "Motamot" in Bengali (মতামত) and in English (Opinion). It was meticulously compiled from a range of online newspapers focusing on political events and conversations during Bangladeshi elections. Our data collection process involved scraping articles and opinion pieces from reputable news sources, ensuring a diverse and representative sample of political discourse. "Motamot" gives a broad look into the many opinions and conversations that shape Bangladesh's political environment. The dataset can be accessed from here.
Train | Test | Validation | |
---|---|---|---|
Total | 5647 | 706 | 705 |
Positive | 3306 | 413 | 413 |
Negative | 2341 | 293 | 292 |
Positive | Negative | |
---|---|---|
Count | 3306 | 2341 |
Positive | Negative | |
---|---|---|
Count | 413 | 293 |
Positive | Negative | |
---|---|---|
Count | 413 | 292 |
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
BanglaBERT | 0.8204 | 0.8222 | 0.8204 | 0.8203 |
Bangla BERT Base | 0.6803 | 0.6907 | 0.6812 | 0.6833 |
DistilBERT | 0.6320 | 0.6358 | 0.6320 | 0.6317 |
mBERT | 0.6427 | 0.6496 | 0.6428 | 0.6153 |
sahajBERT | 0.6708 | 0.6791 | 0.6709 | 0.6707 |
LLMs | Metric | Zero-shot | 5-shot | 10-shot | 15-shot |
---|---|---|---|---|---|
GPT 3.5 Turbo | Accuracy | 0.8500 | 0.8900 | 0.9133 | 0.9400 |
Precision | 0.8467 | 0.8867 | 0.9200 | 0.9467 | |
Recall | 0.8533 | 0.8926 | 0.9079 | 0.9342 | |
F1-Score | 0.8495 | 0.8896 | 0.9139 | 0.9404 | |
Gemini 1.5 Pro | Accuracy | 0.8608 | 0.8981 | 0.9200 | 0.9633 |
Precision | 0.8931 | 0.8846 | 0.9333 | 0.9667 | |
Recall | 0.8477 | 0.9205 | 0.9091 | 0.9603 | |
F1-Score | 0.8698 | 0.9022 | 0.9211 | 0.9635 |
For any questions, collaboration opportunities, or further inquiries, please feel free to reach out:
-
Fatema Tuj Johora Faria
- Email: [email protected]
-
Mukaffi Bin Moin
- Email: [email protected]
-
Rabeya Islam Mumu
- Email: [email protected]
-
Md Mahabubul Alam Abir
- Email: [email protected]
-
Abrar Nawar Alfy
- Email: [email protected]
@INPROCEEDINGS{10752197,
author={Johora Faria, Fatema Tuj and Moin, Mukaffi Bin and Mumu, Rabeya Islam and Alam Abir, Md Mahabubul and Alfy, Abrar Nawar and Alam, Mohammad Shafiul},
booktitle={2024 IEEE Region 10 Symposium (TENSYMP)},
title={Motamot: A Dataset for Revealing the Supremacy of Large Language Models Over Transformer Models in Bengali Political Sentiment Analysis},
year={2024},
volume={},
number={},
pages={1-8},
keywords={Sentiment analysis;Analytical models;Accuracy;Voting;Large language models;Transformers;Market research;Few shot learning;Portals;IEEE Regions;Political Sentiment Analysis;Pre-trained Language Models;Large Language Models;Gem-ini 1.5 Pro;GPT 3.5 Turbo;Zero-shot Learning;Fewshot Learning;Low-resource Language},
doi={10.1109/TENSYMP61132.2024.10752197}}