Skip to content

Mukaffi28/Bengali-Political-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis

Abstract

Table of Contents

Proposed Methodology

Methodology

Dataset Availability

We called the dataset "Motamot" in Bengali (মতামত) and in English (Opinion). It was meticulously compiled from a range of online newspapers focusing on political events and conversations during Bangladeshi elections. Our data collection process involved scraping articles and opinion pieces from reputable news sources, ensuring a diverse and representative sample of political discourse. "Motamot" gives a broad look into the many opinions and conversations that shape Bangladesh's political environment. The dataset can be accessed from here.

Specifics of the Core Data:

Train Test Validation
Total 5647 706 705
Positive 3306 413 413
Negative 2341 293 292

Train Data:

Positive Negative
Count 3306 2341

Test Data:

Positive Negative
Count 413 293

Validation Data:

Positive Negative
Count 413 292

Results

Comparative Analysis of Pre-trained Language Models for Different Performance Metrics

Model Accuracy Precision Recall F1-Score
BanglaBERT 0.8204 0.8222 0.8204 0.8203
Bangla BERT Base 0.6803 0.6907 0.6812 0.6833
DistilBERT 0.6320 0.6358 0.6320 0.6317
mBERT 0.6427 0.6496 0.6428 0.6153
sahajBERT 0.6708 0.6791 0.6709 0.6707

Comparative Analysis of Large Language Models for Different Performance Metrics

LLMs Metric Zero-shot 5-shot 10-shot 15-shot
GPT 3.5 Turbo Accuracy 0.8500 0.8900 0.9133 0.9400
Precision 0.8467 0.8867 0.9200 0.9467
Recall 0.8533 0.8926 0.9079 0.9342
F1-Score 0.8495 0.8896 0.9139 0.9404
Gemini 1.5 Pro Accuracy 0.8608 0.8981 0.9200 0.9633
Precision 0.8931 0.8846 0.9333 0.9667
Recall 0.8477 0.9205 0.9091 0.9603
F1-Score 0.8698 0.9022 0.9211 0.9635

Contact Information

For any questions, collaboration opportunities, or further inquiries, please feel free to reach out:

Citation

@INPROCEEDINGS{10752197,
  author={Johora Faria, Fatema Tuj and Moin, Mukaffi Bin and Mumu, Rabeya Islam and Alam Abir, Md Mahabubul and Alfy, Abrar Nawar and Alam, Mohammad Shafiul},
  booktitle={2024 IEEE Region 10 Symposium (TENSYMP)}, 
  title={Motamot: A Dataset for Revealing the Supremacy of Large Language Models Over Transformer Models in Bengali Political Sentiment Analysis}, 
  year={2024},
  volume={},
  number={},
  pages={1-8},
  keywords={Sentiment analysis;Analytical models;Accuracy;Voting;Large language models;Transformers;Market research;Few shot learning;Portals;IEEE Regions;Political Sentiment Analysis;Pre-trained Language Models;Large Language Models;Gem-ini 1.5 Pro;GPT 3.5 Turbo;Zero-shot Learning;Fewshot Learning;Low-resource Language},
  doi={10.1109/TENSYMP61132.2024.10752197}}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •