Directory created to share the main files and the codes used in the construction of the data tables of the article "Application of Machine Learning Techniques for Fake News Classification" of the journal "Measurement: Interdisciplinary Research and Perspectives."
Abstract - Fake News consists of disseminating fake news in various social and digital media such as newspapers, television networks, and the internet. Fake news is not a new phenomenon in human behavior. However, the current dissemination is very different from what happened in the past. Social networks and the contemporary world have made it possible for the spread of lies to occur quickly and even intentionally. This causes serious problems, and its impacts can be felt in the real world. The identification of fake news can be useful in several contexts and can be used, for example, as a news filter in the virtual space. Thus, the present work aims to propose and evaluate strategies for processing and applying machine learning models, to improve the performance of classifiers in the problem of identifying fake news in Brazilian news.
The news used for this project are from the corpus available on GitHub Fake.br-Corpus. From this corpus, we built the term matrices used to apply the models available in the data folder. In this folder, there are also other databases used for exploratory analysis. The script with the exploratory data analysis can be accessed in the exploratory_data_analysis file. The models and selecao_variavel scripts contain the models used and the code for the selection of the added variables.