Bulk metadata corrections 2025-02-08 (#4588)

* Process metadata corrections for 2025.genaidetect-1.10 (closes #4579) * Process metadata corrections for 2025.mcg-1.4 (closes #4578) * Process metadata corrections for 2023.emnlp-main.212 (closes #4576) * Process metadata corrections for 2024.findings-acl.220 (closes #4572) * Process metadata corrections for 2024.findings-eacl.156 (closes #4571) * Process metadata corrections for 2025.finnlp-1.30 (closes #4570) * Process metadata corrections for 2022.findings-acl.21 (closes #4567) * Process metadata corrections for 2025.comedi-1.6 (closes #4564) * Process metadata corrections for 2025.coling-main.535 (closes #4563) * Process metadata corrections for 2022.emnlp-main.788 (closes #4562) * Process metadata corrections for 2024.acl-long.191 (closes #4556) * Process metadata corrections for 2024.acl-srw.29 (closes #4555) * Process metadata corrections for 2024.conll-1.17 (closes #4554) * Process metadata corrections for 2024.emnlp-main.59 (closes #4551) * Process metadata corrections for 2020.nlposs-1.2 (closes #4550) * Process metadata corrections for 2022.naacl-main.13 (closes #4548) * Process metadata corrections for 2025.genaidetect-1.31 (closes #4544) * Process metadata corrections for 2024.ltedi-1.16 (closes #4543) * Process metadata corrections for 2024.wnut-1.5 (closes #4542) * Process metadata corrections for 2024.figlang-1.8 (closes #4521) * Handle errors in script - No title or abstract in frontmatter - Print issue number when JSON fails
acl-org · Feb 9, 2025 · 6d3e96c · 6d3e96c
1 parent 4ad7489
commit 6d3e96c
Show file tree

Hide file tree

Showing 18 changed files with 79 additions and 68 deletions.
diff --git a/bin/process_bulk_metadata.py b/bin/process_bulk_metadata.py
@@ -90,13 +90,10 @@ def _parse_metadata_changes(self, issue_body):
         # For some reason, the issue body has \r\n line endings
         issue_body = issue_body.replace("\r", "")
 
-        try:
-            if (
-                match := re.search(r"```json\n(.*?)\n```", issue_body, re.DOTALL)
-            ) is not None:
-                return json.loads(match[1])
-        except Exception as e:
-            print(f"Error parsing metadata changes: {e}", file=sys.stderr)
+        if (
+            match := re.search(r"```json\n(.*?)\n```", issue_body, re.DOTALL)
+        ) is not None:
+            return json.loads(match[1])
 
         return None
 
@@ -119,19 +116,20 @@ def _apply_changes_to_xml(self, xml_repo_path, anthology_id, changes):
             raise Exception(f"-> Paper not found in XML file: {xml_repo_path}")
 
         # Apply changes to XML
-        for key in ["title", "abstract"]:
-            if key in changes:
-                node = paper_node.find(key)
-                if node is None:
-                    node = make_simple_element(key, parent=paper_node)
-                # set the node to the structure of the new string
-                try:
-                    new_node = ET.fromstring(f"<{key}>{changes[key]}</{key}>")
-                except ET.XMLSyntaxError as e:
-                    print(f"Error parsing XML for key {key}: {e}", file=sys.stderr)
-                    raise e
-                # replace the current node with the new node in the tree
-                paper_node.replace(node, new_node)
+        if paper_id != "0":
+            # frontmatter has no title or abstract
+            for key in ["title", "abstract"]:
+                if key in changes:
+                    node = paper_node.find(key)
+                    if node is None:
+                        node = make_simple_element(key, parent=paper_node)
+                        # set the node to the structure of the new string
+                    try:
+                        new_node = ET.fromstring(f"<{key}>{changes[key]}</{key}>")
+                    except ET.XMLSyntaxError as e:
+                        raise e
+                    # replace the current node with the new node in the tree
+                    paper_node.replace(node, new_node)
 
         if "authors" in changes:
             """
@@ -234,7 +232,15 @@ def process_metadata_issues(
                     )
 
                 # Parse metadata changes from issue
-                json_block = self._parse_metadata_changes(issue.body)
+                try:
+                    json_block = self._parse_metadata_changes(issue.body)
+                except json.decoder.JSONDecodeError as e:
+                    print(
+                        f"Failed to parse JSON block in #{issue.number}: {e}",
+                        file=sys.stderr,
+                    )
+                    json_block = None
+
                 if not json_block:
                     if close_old_issues:
                         # for old issues, filed without a JSON block, we append a comment
@@ -267,7 +273,10 @@ def process_metadata_issues(
                             continue
                     else:
                         if verbose:
-                            print("-> Skipping (no JSON block)", file=sys.stderr)
+                            print(
+                                f"-> Skipping #{issue.number} (no JSON block)",
+                                file=sys.stderr,
+                            )
                     continue
 
                 self.stats["relevant_issues"] += 1
@@ -294,8 +303,10 @@ def process_metadata_issues(
                         xml_repo_path, anthology_id, json_block
                     )
                 except Exception as e:
-                    if verbose:
-                        print(e, file=sys.stderr)
+                    print(
+                        f"Failed to apply changes to #{issue.number}: {e}",
+                        file=sys.stderr,
+                    )
                     continue
 
                 if tree:
@@ -312,7 +323,7 @@ def process_metadata_issues(
                     # Commit changes
                     self.local_repo.index.add([xml_repo_path])
                     self.local_repo.index.commit(
-                        f"Processed metadata corrections (closes #{issue.number})"
+                        f"Process metadata corrections for {anthology_id} (closes #{issue.number})"
                     )
 
                     closed_issues.append(issue)

diff --git a/data/xml/2020.nlposs.xml b/data/xml/2020.nlposs.xml
@@ -36,7 +36,7 @@
       <bibkey>madeira-etal-2020-framework</bibkey>
     </paper>
     <paper id="2">
-      <title><fixed-case>ARBML</fixed-case>: Democritizing <fixed-case>A</fixed-case>rabic Natural Language Processing Tools</title>
+      <title><fixed-case>ARBML</fixed-case>: Democratizing <fixed-case>A</fixed-case>rabic Natural Language Processing Tools</title>
       <author><first>Zaid</first><last>Alyafeai</last></author>
       <author><first>Maged</first><last>Al-Shaibani</last></author>
       <pages>8–13</pages>

diff --git a/data/xml/2022.emnlp.xml b/data/xml/2022.emnlp.xml
@@ -10383,13 +10383,13 @@
       <doi>10.18653/v1/2022.emnlp-main.787</doi>
     </paper>
     <paper id="788">
-      <title>Attentional Probe: Estimating a Module’s Functional Potential</title>
+      <title>The Architectural Bottleneck Principle</title>
       <author><first>Tiago</first><last>Pimentel</last><affiliation>University of Cambridge</affiliation></author>
       <author><first>Josef</first><last>Valvoda</last><affiliation>University of Cambridge</affiliation></author>
       <author><first>Niklas</first><last>Stoehr</last><affiliation>ETH Zurich</affiliation></author>
       <author><first>Ryan</first><last>Cotterell</last><affiliation>ETH Zürich</affiliation></author>
       <pages>11459-11472</pages>
-      <abstract/>
+      <abstract>In this paper, we seek to measure how much information a component in a neural network could extract from the representations fed into it. Our work stands in contrast to prior probing work, most of which investigates how much information a model's representations contain. This shift in perspective leads us to propose a new principle for probing, the architectural bottleneck principle: In order to estimate how much information a given component could extract, a probe should look exactly like the component. Relying on this principle, we estimate how much syntactic information is available to transformers through our attentional probe, a probe that exactly resembles a transformer's self-attention head. Experimentally, we find that, in three models (BERT, ALBERT, and RoBERTa), a sentence's syntax tree is mostly extractable by our probe, suggesting these models have access to syntactic information while composing their contextual representations. Whether this information is actually used by these models, however, remains an open question.</abstract>
       <url hash="7514164a">2022.emnlp-main.788</url>
       <bibkey>pimentel-etal-2022-attentional</bibkey>
       <doi>10.18653/v1/2022.emnlp-main.788</doi>

diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml
@@ -349,7 +349,7 @@
       <author><first>Andrey</first><last>Chertok</last></author>
       <author><first>Sergey</first><last>Nikolenko</last></author>
       <pages>239-245</pages>
-      <abstract>We present RuCCoN, a new dataset for clinical concept normalization in Russian manually annotated by medical professionals. It contains over 16,028 entity mentions manually linked to over 2,409 unique concepts from the Russian language part of the UMLS ontology. We provide train/test splits for different settings (stratified, zero-shot, and CUI-less) and present strong baselines obtained with state-of-the-art models such as SapBERT. At present, Russian medical NLP is lacking in both datasets and trained models, and we view this work as an important step towards filling this gap. Our dataset and annotation guidelines are available at <url>https://github.com/sberbank-ai-lab/RuCCoN</url>.</abstract>
+      <abstract>We present RuCCoN, a new dataset for clinical concept normalization in Russian manually annotated by medical professionals. It contains over 16,028 entity mentions manually linked to over 2,409 unique concepts from the Russian language part of the UMLS ontology. We provide train/test splits for different settings (stratified, zero-shot, and CUI-less) and present strong baselines obtained with state-of-the-art models such as SapBERT. At present, Russian medical NLP is lacking in both datasets and trained models, and we view this work as an important step towards filling this gap. Our dataset and annotation guidelines are available at <url>https://github.com/AIRI-Institute/RuCCoN</url>.</abstract>
       <url hash="8f620f3a">2022.findings-acl.21</url>
       <bibkey>nesterov-etal-2022-ruccon</bibkey>
       <doi>10.18653/v1/2022.findings-acl.21</doi>

diff --git a/data/xml/2022.naacl.xml b/data/xml/2022.naacl.xml
@@ -197,7 +197,7 @@
     </paper>
     <paper id="13">
       <title>Two Contrasting Data Annotation Paradigms for Subjective <fixed-case>NLP</fixed-case> Tasks</title>
-      <author><first>Paul</first><last>Rottger</last></author>
+      <author><first>Paul</first><last>Röttger</last></author>
       <author><first>Bertie</first><last>Vidgen</last></author>
       <author><first>Dirk</first><last>Hovy</last></author>
       <author><first>Janet</first><last>Pierrehumbert</last></author>

diff --git a/data/xml/2023.emnlp.xml b/data/xml/2023.emnlp.xml
@@ -2986,7 +2986,7 @@
       <author><first>Soda Marem</first><last>Lo</last></author>
       <author><first>Valerio</first><last>Basile</last></author>
       <author><first>Simona</first><last>Frenda</last></author>
-      <author><first>Alessandra</first><last>Cignarella</last></author>
+      <author><first>Alessandra Teresa</first><last>Cignarella</last></author>
       <author><first>Viviana</first><last>Patti</last></author>
       <author><first>Cristina</first><last>Bosco</last></author>
       <pages>3496-3507</pages>

diff --git a/data/xml/2024.acl.xml b/data/xml/2024.acl.xml
@@ -2663,11 +2663,11 @@
     </paper>
     <paper id="191">
       <title><fixed-case>L</fixed-case>lama2<fixed-case>V</fixed-case>ec: Unsupervised Adaptation of Large Language Models for Dense Retrieval</title>
-      <author><first>Chaofan</first><last>Li</last></author>
       <author><first>Zheng</first><last>Liu</last></author>
+      <author><first>Chaofan</first><last>Li</last></author>
       <author><first>Shitao</first><last>Xiao</last></author>
-      <author><first>Yingxia</first><last>Shao</last><affiliation>Beijing University of Posts and Telecommunications</affiliation></author>
-      <author><first>Defu</first><last>Lian</last><affiliation>University of Science and Technology of China</affiliation></author>
+      <author><first>Yingxia</first><last>Shao</last></author>
+      <author><first>Defu</first><last>Lian</last></author>
       <pages>3490-3500</pages>
       <abstract>Dense retrieval calls for discriminative embeddings to represent the semantic relationship between query and document. It may benefit from the using of large language models (LLMs), given LLMs’ strong capability on semantic understanding. However, the LLMs are learned by auto-regression, whose working mechanism is completely different from representing whole text as one discriminative embedding. Thus, it is imperative to study how to adapt LLMs properly so that they can be effectively initialized as the backbone encoder for dense retrieval. In this paper, we propose a novel approach, called <b>Llama2Vec</b>, which performs unsupervised adaptation of LLM for its dense retrieval application. Llama2Vec consists of two pretext tasks: EBAE (Embedding-Based Auto-Encoding) and EBAR (Embedding-Based Auto-Regression), where the LLM is prompted to <i>reconstruct the input sentence</i> and <i>predict the next sentence</i> based on its text embeddings. Llama2Vec is simple, lightweight, but highly effective. It is used to adapt LLaMA-2-7B on the Wikipedia corpus. With a moderate steps of adaptation, it substantially improves the model’s fine-tuned performances on a variety of dense retrieval benchmarks. Notably, it results in the new state-of-the-art performances on popular benchmarks, such as passage and document retrieval on MSMARCO, and zero-shot retrieval on BEIR. The model and source code will be made publicly available to facilitate the future research. Our model is available at https://github.com/FlagOpen/FlagEmbedding.</abstract>
       <url hash="e0092648">2024.acl-long.191</url>
@@ -13986,8 +13986,8 @@
     <paper id="29">
       <title>Compromesso! <fixed-case>I</fixed-case>talian Many-Shot Jailbreaks undermine the safety of Large Language Models</title>
       <author><first>Fabio</first><last>Pernisi</last></author>
-      <author><first>Dirk</first><last>Hovy</last><affiliation>Bocconi University</affiliation></author>
-      <author><first>Paul</first><last>R�ttger</last><affiliation>Bocconi University</affiliation></author>
+      <author><first>Dirk</first><last>Hovy</last></author>
+      <author><first>Paul</first><last>Röttger</last></author>
       <pages>245-251</pages>
       <abstract>As diverse linguistic communities and users adopt Large Language Models (LLMs), assessing their safety across languages becomes critical. Despite ongoing efforts to align these models with safe and ethical guidelines, they can still be induced into unsafe behavior with jailbreaking, a technique in which models are prompted to act outside their operational guidelines. What research has been conducted on these vulnerabilities was predominantly on English, limiting the understanding of LLM behavior in other languages. We address this gap by investigating Many-Shot Jailbreaking (MSJ) in Italian, underscoring the importance of understanding LLM behavior in different languages. We base our analysis on a newly created Italian dataset to identify unique safety vulnerabilities in 4 families of open-source LLMs.We find that the models exhibit unsafe behaviors even with minimal exposure to harmful prompts, and–more alarmingly–this tendency rapidly escalates with more demonstrations.</abstract>
       <url hash="5b5c8ec0">2024.acl-srw.29</url>

diff --git a/data/xml/2024.conll.xml b/data/xml/2024.conll.xml
@@ -203,10 +203,10 @@
     </paper>
     <paper id="17">
       <title>The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading</title>
-      <author><first>Keren Gruteke</first><last>Klein</last><affiliation>Technion - Israel Institute of Technology, Technion</affiliation></author>
+      <author><first>Keren</first><last>Gruteke Klein</last></author>
       <author><first>Yoav</first><last>Meiri</last></author>
       <author><first>Omer</first><last>Shubi</last></author>
-      <author><first>Yevgeni</first><last>Berzak</last><affiliation>Technion - Israel Institute of Technology, Technion</affiliation></author>
+      <author><first>Yevgeni</first><last>Berzak</last></author>
       <pages>219-230</pages>
       <abstract>The effect of surprisal on processing difficulty has been a central topic of investigation in psycholinguistics. Here, we use eyetracking data to examine three language processing regimes that are common in daily life but have not been addressed with respect to this question: information seeking, repeated processing, and the combination of the two. Using standard regime-agnostic surprisal estimates we find that the prediction of surprisal theory regarding the presence of a linear effect of surprisal on processing times, extends to these regimes. However, when using surprisal estimates from regime-specific contexts that match the contexts and tasks given to humans, we find that in information seeking, such estimates do not improve the predictive power of processing times compared to standard surprisals. Further, regime-specific contexts yield near zero surprisal estimates with no predictive power for processing times in repeated reading. These findings point to misalignments of task and memory representations between humans and current language models, and question the extent to which such models can be used for estimating cognitively relevant quantities. We further discuss theoretical challenges posed by these results.</abstract>
       <url hash="edbdb721">2024.conll-1.17</url>

diff --git a/data/xml/2024.emnlp.xml b/data/xml/2024.emnlp.xml
@@ -826,11 +826,11 @@
     </paper>
     <paper id="59">
       <title><fixed-case>HEART</fixed-case>-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with <fixed-case>LLM</fixed-case>s</title>
-      <author><first>Jocelyn J</first><last>Shen</last><affiliation>Massachusetts Institute of Technology</affiliation></author>
+      <author><first>Jocelyn</first><last>Shen</last></author>
       <author><first>Joel</first><last>Mire</last></author>
-      <author><first>Hae Won</first><last>Park</last><affiliation>Amazon and Massachusetts Institute of Technology</affiliation></author>
+      <author><first>Hae Won</first><last>Park</last></author>
       <author><first>Cynthia</first><last>Breazeal</last></author>
-      <author><first>Maarten</first><last>Sap</last><affiliation>Carnegie Mellon University</affiliation></author>
+      <author><first>Maarten</first><last>Sap</last></author>
       <pages>1026-1046</pages>
       <abstract>Empathy serves as a cornerstone in enabling prosocial behaviors, and can be evoked through sharing of personal experiences in stories. While empathy is influenced by narrative content, intuitively, people respond to the way a story is told as well, through narrative style. Yet the relationship between empathy and narrative style is not fully understood. In this work, we empirically examine and quantify this relationship between style and empathy using LLMs and large-scale crowdsourcing studies. We introduce a novel, theory-based taxonomy, HEART (Human Empathy and Narrative Taxonomy) that delineates elements of narrative style that can lead to empathy with the narrator of a story. We establish the performance of LLMs in extracting narrative elements from HEART, showing that prompting with our taxonomy leads to reasonable, human-level annotations beyond what prior lexicon-based methods can do. To show empirical use of our taxonomy, we collect a dataset of empathy judgments of stories via a large-scale crowdsourcing study with <tex-math>N=2,624</tex-math> participants. We show that narrative elements extracted via LLMs, in particular, vividness of emotions and plot volume, can elucidate the pathways by which narrative style cultivates empathy towards personal stories. Our work suggests that such models can be used for narrative analyses that lead to human-centered social and behavioral insights.</abstract>
       <url hash="204518d9">2024.emnlp-main.59</url>