Skip to content

Commit

Permalink
paper revision
Browse files Browse the repository at this point in the history
  • Loading branch information
caimeng2 committed Feb 21, 2024
1 parent 4968c35 commit 3f35864
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 4 deletions.
13 changes: 13 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,19 @@ @article{Olsen_Fenhann_2008
pages={2819–2830}
}

@article{Pukelis_Bautista-Puig_Statulevičiūtė_Stančiauskas_Dikmener_Akylbekova_2022,
title={OSDG 2.0: a multilingual tool for classifying text data by UN Sustainable Development Goals (SDGs)},
url={http://arxiv.org/abs/2211.11252},
DOI={10.48550/arXiv.2211.11252},
abstractNote={Despite concrete indicators and targets, monitoring the progress of the UN Sustainable Development Goals (SDGs) remains a challenge, given the many different actors, initiatives, and institutions involved. OSDG, an open-source classification tool aims to help navigate the SDG related ambiguities through a simple and easy to use application. The tool allows to map and connect activities to the SDGs by identifying SDG -relevant content in any text. This paper presents OSDG 2.0, a new iteration of the partnership’s work, which marks a significant improvement in the tool’s methodology, as well as support for content in 15 languages.},
note={arXiv:2211.11252 [cs]},
number={arXiv:2211.11252},
publisher={arXiv},
author={Pukelis, Lukas and Bautista-Puig, Nuria and Statulevičiūtė, Gustė and Stančiauskas, Vilius and Dikmener, Gokhan and Akylbekova, Dina},
year={2022},
month=nov
}

@misc{Rawat_2022,
type={Jupyter Notebook},
title={SDG-Classifier},
Expand Down
12 changes: 8 additions & 4 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ affiliations:
- name: Environmental Science and Policy Program, Michigan State University, East Lansing, MI 48823, United States
index: 8

date: 8 November 2023
date: 21 February 2024
bibliography: paper.bib

---
Expand All @@ -56,19 +56,23 @@ bibliography: paper.bib

Sustainability is an important topic in contemporary discourse. However, the delineation and interpretation of this concept are often different across disciplines [@Salas-Zapata_Ríos-Osorio_Cardona-Arias_2017], which hinders effective communication, causes inconsistencies in research and practice, and impedes measurable actions to achieve sustainability [@Waseem_Kota_2017; @Yamada_Kanoi_Koh_Lim_Dove_2022]. With the increasing popularity of text-based assessments [@Amini_Bienstock_Narcum_2018; @Olsen_Fenhann_2008; @Singh_Meena_Khandelwal_Dangayach_2023], these issues have become more prominent, as the criteria vary for evaluating sustainability commitments and contributions.

`seesus`, based on the United Nations (UN) Sustainable Development Goals (SDGs), addresses the critical need in text analysis to capture the concept of sustainability with a rigorous and credible definition. The SDGs provide an international framework and a shared understanding of what it means to be sustainable, balancing the environmental, economic, and social dimensions of sustainability [@UN_2015]. Automated text analysis to align and classify statements according to the SDGs can help identify the focal points for sustainable development strategies and facilitate data-driven decision-making processes in pursuit of the SDGs. `seesus` identifies expressions regarding achieving the 17 SDGs and their associated 169 targets within a text and labels whether the expressions pertain to social, environmental, or economic sustainability. Unlike other SDG text-mining packages, it is designed to identify not only terms related to the SDGs but also the attainment of SDGs.
`seesus`, based on the United Nations (UN) Sustainable Development Goals (SDGs), addresses the critical need in text analysis to capture the concept of sustainability with a rigorous and credible definition. The SDGs provide an international framework and a shared understanding of what it means to be sustainable, balancing the environmental, economic, and social dimensions of sustainability [@UN_2015]. Automated text analysis to align and classify statements according to the SDGs can help identify the focal points for sustainable development strategies and facilitate data-driven decision-making processes in pursuit of the SDGs. `seesus` identifies expressions regarding the 17 SDGs and their associated 169 targets within text and labels whether the expressions pertain to social, environmental, or economic sustainability.

`seesus` achieves an accuracy rate of 75.5%, as determined by alignment with manual coding. Detailed information on the accuracy evaluation and manual refinement can be found in `SDGdetector` [@Li_Frans_Song_Cai_Zhang_Liu_2023], our R package employing the same matching logic as `seesus`. In an era of large language models, `seesus` chooses to use predefined regular expression patterns instead of machine learning for text classification, because this method is more transparent, replicable, and controllable. Users of `seesus` can examine the matching logic and customize the syntax if necessary. In addition, compared to other text classifiers based on the SDGs in Python, including `SDG-Classifier` [@Rawat_2022], `SDG Auto Labeller` [@Glass_2020], `UN-SDG-Classifier` [@Lamichaney_2021], `EUR-SDG-Mapper` [@Jelicic_van_der_Vorst_Ranjbar_Mijnhardt_2022], `seesus` is the only one that covers all the SDGs and is fine-tuned to the target level.
In an era of large language models, `seesus` chooses to use predefined regular expression patterns instead of machine learning for text classification, because this method is more transparent, replicable, and controllable. Users of `seesus` can examine the matching logic and customize the syntax if necessary, so users can always understand and maintain control over the results. In addition, compared to other text classifiers based on the SDGs in Python, including `SDG-Classifier` [@Rawat_2022], `SDG Auto Labeller` [@Glass_2020], `UN-SDG-Classifier` [@Lamichaney_2021], `EUR-SDG-Mapper` [@Jelicic_van_der_Vorst_Ranjbar_Mijnhardt_2022], and `OSDG` [@Pukelis_Bautista-Puig_Statulevičiūtė_Stančiauskas_Dikmener_Akylbekova_2022],`seesus` is the only one that covers all the SDGs and is fine-tuned to the target level.

`seesus` achieves an accuracy rate of 76%, as determined by alignment with manual coding. Human intercoder agreement on the same text stands at 83%. Considering the inherent ambiguity and complexity of language, as well as the interconnected nature of the SDGs, the accuracy of `seesus` is rather high. Other SDG text classifiers did not report accuracy evaluations. Detailed information on our accuracy evaluation and manual refinement can be found in `SDGdetector` [@Li_Frans_Song_Cai_Zhang_Liu_2023], our R package employing the same matching logic as `seesus`.

Given the interdisciplinary nature of the sustainability concept, the usage of this package is not confined to a specific scientific context. It has a wide application in research based on text analysis across various domains. For example, sustainability scientists can use `seesus` to label academic publications to quantify which dimension of sustainability receives the most attention. Policy analysts can utilize `seesus` to conduct large-scale scans of planning documents to assess efforts toward urban sustainability and track the changes over time. Scholars in business research engaged in environmental, social, and governance reporting can employ `seesus` to evaluate the alignment of corporate messaging with the SDGs. In K12 education, teachers and students can use this tool to delve into community sustainability studies. Individuals who are actively engaged in civic participation may leverage this tool to examine local sustainability plans and efforts. In addition, `seesus` can be used in combination with translation software to support text analysis in languages other than English.

It is worth noting the limitations of `seesus`. Because of regular expressions’ limited logic capability and lack of context awareness, `seesus` is not able to capture negative connotations. In other words, it identifies if an expression is related to the SDGs and their targets, but it cannot distinguish whether the expression is about achieving the SDGs or failing to do so. Following the best practices of open-source software, we welcome and encourage users to contribute to improving `seesus`. We also recommend users cross-validate their results by different text analysis tools and manual checking.

# Functionality

`seesus` currently has four main functions: (1) evaluating whether a statement aligns with the concept of sustainability; (2) identifying SDGs and associated targets in a statement; (3) classifying a statement into social, environmental, and economic sustainability; (4) customizing match syntax.

# Acknowledgements

Funded by the European Union (ERC, scAInce, 101087218). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. MC was supported by the Michigan State University Dissertation Completion Fellowship. VFF was supported by the National Science Foundation Graduate Research Fellowship Program (Fellow ID: 2018253044) and the Michigan State University Enrichment Fellowship.
MC was supported by the Michigan State University Dissertation Completion Fellowship. VFF was supported by the National Science Foundation Graduate Research Fellowship Program (Fellow ID: 2018253044) and the Michigan State University Enrichment Fellowship. This project was partly funded by the European Union (ERC, scAInce, 101087218). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

![](LOGO_ERC-FLAG_FP.png){ width=30% }

Expand Down

0 comments on commit 3f35864

Please sign in to comment.