Skip to content

INFORMSJoC/2022.0285

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2022.0285

INFORMS Journal on Computing Logo

A Fusion Pre-Trained Approach for Identifying the Cause of Sarcasm Remarks

This archive is distributed in association with the INFORMS Journal on Computing under the MIT License.

The software and data in this repository are a snapshot of the software and data that were used in the research reported on in the paper A Fusion Pre-Trained Approach for Identifying the Cause of Sarcasm Remarks by Q. Li, D. Xu, H. Qian, L. Wang, M. Yuan and D. Zeng.

Cite

To cite the contents of this repository, please cite both the paper and this repo, using their respective DOIs.

https://doi.org/10.1287/ijoc.2022.0285

https://doi.org/10.1287/ijoc.2022.0285.cd

Below is the BibTex for citing this snapshot of the repository.

@article{A Fusion Pre-Trained Approach for Identifying the Cause of Sarcasm Remarks,
  author =        {Q. Li, D. Xu, H. Qian, L. Wang, M. Yuan and D. Zeng},
  publisher =     {INFORMS Journal on Computing},
  title =         {A Fusion Pre-Trained Approach for Identifying the Cause of Sarcasm Remarks},
  year =          {2024},
  doi =           {10.1287/ijoc.2022.0285.cd},
  url =           {https://github.com/INFORMSJoC/2022.0285},
}  

Description

This repository provides data for the problem and code for the method. The main folders are 'data', 'src', 'scripts', and 'results'.

'data': This folder includes reddit data and twitter data. The detailed description can be seen README.

"src": This folder includes the code for training and testing.

"scripts": This folder provides a running script.

"results": This folder provides the results.

Building

The following packages should be installed before you run our model.

python >= 3.8.13
pytorch >= 1.8.0
transformers >= 4.7.0
huggingface-hub >= 0.0.8

Replicating

You should download the pre-trained model bert-base-uncased and put the files under ./models folder.

Then the model can be trained using the scripts.

cd scripts
bash run.sh

This script will execute three Python programs.

First, the data could be split into training set, validation set and testing set by five-fold.

python kfold_split.py

Then, the model can be trained by each training set. The filetrain_classifier_linear.pyis the starting program for training. The file framework.py includes the main framework for training and evaluating the model. The model is in the file cross_encoder.py. The parameter --test_prefix can be set to different ratios of data to train the model on partial data. The parameter --attention_head can be used to set the number of attention heads.

python train_classifier_linear.py

Finally, the average evaluation metric is calculated.

python avg.py

For testing, this script can be used to infer the results based on the trained model. To test the cross-subreddit and cross-platform data, the parameters --data_path and --test_prefix should be set to corresponding subreddit or platform name.

cd scripts
bash test.sh

Results

The results folder show the tables and figures reported in the paper.

Table 3 shows the overall Precision, Recall and F1 of the proposed model. And Appendix A is the corresponding results of each subreddit.

Table 4 presents the Precision, Recall and F1 on cross-subreddit and cross-platform.

Table 5 and Appendix B show the evaluation metric of the variant models.

Figure 2 shows the F1 scores of different training data ratios.

Table 6 and Appendix C are results of different attention heads.

For more detailed analysis, see Section 4.4 and Appendix.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published