Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis

Abstract

Proposed Methodology

Dataset Availability

We called the dataset "Motamot" in Bengali (মতামত) and in English (Opinion). It was meticulously compiled from a range of online newspapers focusing on political events and conversations during Bangladeshi elections. Our data collection process involved scraping articles and opinion pieces from reputable news sources, ensuring a diverse and representative sample of political discourse. "Motamot" gives a broad look into the many opinions and conversations that shape Bangladesh's political environment. The dataset can be accessed from here.

Specifics of the Core Data:

	Train	Test	Validation
Total	5647	706	705
Positive	3306	413	413
Negative	2341	293	292

Train Data:

	Positive	Negative
Count	3306	2341

Test Data:

	Positive	Negative
Count	413	293

Validation Data:

	Positive	Negative
Count	413	292

Results

Comparative Analysis of Pre-trained Language Models for Different Performance Metrics

Model	Accuracy	Precision	Recall	F1-Score
BanglaBERT	0.8204	0.8222	0.8204	0.8203
Bangla BERT Base	0.6803	0.6907	0.6812	0.6833
DistilBERT	0.6320	0.6358	0.6320	0.6317
mBERT	0.6427	0.6496	0.6428	0.6153
sahajBERT	0.6708	0.6791	0.6709	0.6707

Comparative Analysis of Large Language Models for Different Performance Metrics

LLMs	Metric	Zero-shot	5-shot	10-shot	15-shot
GPT 3.5 Turbo	Accuracy	0.8500	0.8900	0.9133	0.9400
	Precision	0.8467	0.8867	0.9200	0.9467
	Recall	0.8533	0.8926	0.9079	0.9342
	F1-Score	0.8495	0.8896	0.9139	0.9404
Gemini 1.5 Pro	Accuracy	0.8608	0.8981	0.9200	0.9633
	Precision	0.8931	0.8846	0.9333	0.9667
	Recall	0.8477	0.9205	0.9091	0.9603
	F1-Score	0.8698	0.9022	0.9211	0.9635

Contact Information

For any questions, collaboration opportunities, or further inquiries, please feel free to reach out:

Fatema Tuj Johora Faria
- Email: [email protected]
Mukaffi Bin Moin
- Email: [email protected]
Rabeya Islam Mumu
- Email: [email protected]
Md Mahabubul Alam Abir
- Email: [email protected]
Abrar Nawar Alfy
- Email: [email protected]

Citation

@INPROCEEDINGS{10752197,
  author={Johora Faria, Fatema Tuj and Moin, Mukaffi Bin and Mumu, Rabeya Islam and Alam Abir, Md Mahabubul and Alfy, Abrar Nawar and Alam, Mohammad Shafiul},
  booktitle={2024 IEEE Region 10 Symposium (TENSYMP)}, 
  title={Motamot: A Dataset for Revealing the Supremacy of Large Language Models Over Transformer Models in Bengali Political Sentiment Analysis}, 
  year={2024},
  volume={},
  number={},
  pages={1-8},
  keywords={Sentiment analysis;Analytical models;Accuracy;Voting;Large language models;Transformers;Market research;Few shot learning;Portals;IEEE Regions;Political Sentiment Analysis;Pre-trained Language Models;Large Language Models;Gem-ini 1.5 Pro;GPT 3.5 Turbo;Zero-shot Learning;Fewshot Learning;Low-resource Language},
  doi={10.1109/TENSYMP61132.2024.10752197}}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Codes		Codes
LICENSE		LICENSE
README.md		README.md
political_diagram.jpg		political_diagram.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis

Abstract

Table of Contents

Proposed Methodology

Dataset Availability

Specifics of the Core Data:

Train Data:

Test Data:

Validation Data:

Results

Comparative Analysis of Pre-trained Language Models for Different Performance Metrics

Comparative Analysis of Large Language Models for Different Performance Metrics

Contact Information

Citation

About

Releases

Packages

Contributors 4

Languages

License

Mukaffi28/Bengali-Political-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis

Abstract

Table of Contents

Proposed Methodology

Dataset Availability

Specifics of the Core Data:

Train Data:

Test Data:

Validation Data:

Results

Comparative Analysis of Pre-trained Language Models for Different Performance Metrics

Comparative Analysis of Large Language Models for Different Performance Metrics

Contact Information

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages