Parse Errors in Publisher RIS #129

adamsmd · 2023-02-19T14:51:26Z

Hello,

As part of a project I am working on, I have code that ingests RIS data from publishers. Unfortunately, I have found a case for which your code throws an exception while parsing.

How to Reproduce

The example paper is https://content.iospress.com/articles/fundamenta-informaticae/fi65-1-2-02. Clicking the "Cite" button then "Reference manager (RIS)" under "Export to:" downloads the RIS for this paper. Alternatively, you can use the following link to download the RIS directly: https://content.iospress.com/export/ris?stripAbstractHtml=true&entryId=fundamenta-informaticae%2Ffi65-1-2-02.

This produces the following RIS:

TY  - JOUR
AU  - Abbott, Michael
AU  - Altenkirch, Thorsten
AU  - McBride, Conor
AU  - Ghani, Neil
EP  - 28
AB  - This paper and our conference paper (Abbott, Altenkirch, Ghani, and McBride, 2003b) explain and analyse the notion of the derivative of a data structure as the type of its one-hole contexts based on the central observation made by McBride (2001). To make the idea precise we need a generic notion of a data type, which leads to the notion of a container, introduced in (Abbott, Altenkirch, and Ghani, 2003a) and investigated extensively in (Abbott, 2003). Using containers we can provide a notion of linear map which is the concept missing from McBride's first analysis. We verify the usual laws of differential calculus including the chain rule and establish laws for initial algebras and terminal coalgebras.
SP  - 1
TI  - ∂ for Data: Differentiating Data Structures
T2  - Fundamenta Informaticae
VL  - 65
M1  - 1-2
PY  - 2005
PB  - IOS Press
ER  -

Calling toRisRecords() on this text results in the following exception:

Exception in thread "main" java.lang.NumberFormatException: For input string: "1-2"
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.base/java.lang.Long.parseLong(Long.java:692)
	at java.base/java.lang.Long.parseLong(Long.java:817)
	at ch.difty.kris.implementation.RisImport.typeSafeValueFrom(RisImport.kt:60)
	at ch.difty.kris.implementation.RisImport.fillFrom(RisImport.kt:52)
	at ch.difty.kris.implementation.RisImport.access$fillFrom(RisImport.kt:17)
	at ch.difty.kris.implementation.RisImport$process$1$2.emit(RisImport.kt:33)
	at ch.difty.kris.implementation.RisImport$process$1$2.emit(RisImport.kt:28)
	at ch.difty.kris.implementation.RisImport$process$1$invokeSuspend$$inlined$filterNot$1$2.emit(Emitters.kt:223)
	at kotlinx.coroutines.flow.FlowKt__TransformKt$filterNotNull$$inlined$unsafeTransform$1$2.emit(Emitters.kt:223)
	at kotlinx.coroutines.flow.FlowKt__BuildersKt$asFlow$$inlined$unsafeFlow$3.collect(SafeCollector.common.kt:115)
	at kotlinx.coroutines.flow.FlowKt__TransformKt$filterNotNull$$inlined$unsafeTransform$1.collect(SafeCollector.common.kt:113)
	at ch.difty.kris.implementation.RisImport$process$1$invokeSuspend$$inlined$filterNot$1.collect(SafeCollector.common.kt:113)
	at ch.difty.kris.implementation.RisImport$process$1.invokeSuspend(RisImport.kt:28)
	at ch.difty.kris.implementation.RisImport$process$1.invoke(RisImport.kt)
	at ch.difty.kris.implementation.RisImport$process$1.invoke(RisImport.kt)
	at kotlinx.coroutines.flow.SafeFlow.collectSafely(Builders.kt:61)
	at kotlinx.coroutines.flow.AbstractFlow.collect(Flow.kt:230)
	at kotlinx.coroutines.flow.FlowKt__CollectionKt.toCollection(Collection.kt:26)
	at kotlinx.coroutines.flow.FlowKt.toCollection(Unknown Source)
	at kotlinx.coroutines.flow.FlowKt__CollectionKt.toList(Collection.kt:15)
	at kotlinx.coroutines.flow.FlowKt.toList(Unknown Source)
	at kotlinx.coroutines.flow.FlowKt__CollectionKt.toList$default(Collection.kt:15)
	at kotlinx.coroutines.flow.FlowKt.toList$default(Unknown Source)
	at ch.difty.kris.KRis$processList$1.invokeSuspend(KRis.kt:58)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
	at ch.difty.kris.KRis.processList(KRis.kt:57)
	at ch.difty.kris.KRisExtensionsKt.toRisRecords(KRisExtensions.kt:34)

Cause

The problem is that the RIS from the publisher contains M1 - 1-2, but KRis expects M1 to be a number. Note that M1 is used for the "issue number" in which an article occurs, and in general, issue numbers often contain ranges (e.g., for https://www.sciencedirect.com/science/article/abs/pii/S0020019010000487?via%3Dihub it is issues 8-9) or letters (e.g., in https://dl.acm.org/doi/10.1145/242224.242324 the issue number is "4es"), or even letters without numbers (e.g.,
for https://dl.acm.org/doi/10.1145/3133906 the issue is "OOPSLA"). (Not all of these produce RIS that trigger this bug, but they are examples of the general idea of issue number as used by publishers.)

One could argue that the publisher is at fault for producing RIS that contains M1 - 1-2, but I don't have control over that input source, so my project is forced to deal with such RIS.

Solutions

Aside from manually parsing the RIS myself to remove the offending line, I think there are two solutions KRis could adopt.

The first is to support continuing the parse even if a field is malformed. Exceptions from various fields could be collected (but not thrown) so client code can decide to do with them. For example, bibtex.parser.BibtexParser from BibSonomy does this (see throwAllParseExceptions and getExceptions in https://bitbucket.org/bibsonomy/bibsonomy/src/bfe90976cfb47c9f3fd38a0483495af48f7b7576/bibsonomy-bibtex-parser/src/main/java/bibtex/parser/BibtexParser.java).

The second is to support access to the "raw" contents of a field regardless of whether that field parsed correctly. One way to do this would be to have a "raw" field for every field of RisRecord (e.g., rawNumber: String? in addition to number: Long?). Another, would be to wrap each field in a "Raw" type. For example, the type of RisRecord.number would become Raw<Long>?, where "Raw" is something like class <T>Raw(val raw: String, val value: T?). However, that would change the API a bit, and I don't know how open to such changes you are.

The text was updated successfully, but these errors were encountered:

ursjoss · 2023-02-19T20:06:14Z

Hi @adamsmd

Thanks for your interest in KRis and your extensive bug report and solution variants. I'm currently on vacation and have no access to a computer.

When back, I could imagine a two step approach: to unblock you assp, I would consider changing the type of M1 to String. It seems more appropriate after a quick Web search.

In the longer term, it seems reasonable to implement a more graceful error handling, and I like your suggestions, but need to evaluate it with a better tool than my mobile phone.

What do you think?

adamsmd · 2023-02-20T11:07:28Z

That sounds reasonable to me. Thank you!

…to capture M3 BREAKING CHANGE: Deprecation for Java users, but breaking change for Kotlin

…aneous3 to capture M3" This reverts commit 54b4d41.

…ord.miscellaneous1 using String

…sRecord.miscellaneous3

…ord.miscellaneous1 using String

…sRecord.miscellaneous3

…ord.miscellaneous1 using String

…sRecord.miscellaneous3

ursjoss · 2023-02-26T14:40:21Z

PR #129 is expected to work around the current issue with M1 not complying with purely numeric values.

In addition to the request from this ticket, I also deprecated RisRecord.number and even RisRecord.typeOfWork, adding the new nullable String properties RisRecord.miscellaneous1 and RisRecord.miscellaneous3 as the new way of handling M1 and M3. The deprecated methods will be removed in a future version of KRis in the context of issue #132 .

The enhanced suggestions from the section Solutions in the issue descriptions are extracted into issue #134.

…ord.miscellaneous1 using String

…sRecord.miscellaneous3

* Changes the data type of the property holding the M1 content in RisRecord from Long? to String?. This allows the import of ranges (e.g. "M1 - 1-2") or non-numeric chars (e.g. "M1 - 4es"). * Deprecates RisRecord.number, in favor of new RisRecord.miscellaneous1 * Deprecates RisRecord.typeOfWork, in favor of new RisRecord.miscellaneous3 - for consistency. * Fixes KRisTagTest and KRisTypeTest which were not asserted properly. * Cleans up the Tag descriptions on the way.

ursjoss · 2023-02-26T16:43:26Z

@adamsmd I'm currently fighting with my publishing pipeline. I hope to be able to resolve that in the next days and will publish KRis-0.4.2 that should get you unstuck with your import.

ursjoss · 2023-02-27T08:37:52Z

@adamsmd KRis-0.4.2 has been published. Please report if this works around the issues you have experienced during the import.

adamsmd · 2023-02-27T11:00:54Z

I just tested KRis-0.4.2, and it indeed works around the issue. Thank you!

ursjoss added a commit that referenced this issue Feb 26, 2023

refactor: [#129] Add (disabled) integration test

fbad049

ursjoss added a commit that referenced this issue Feb 26, 2023

fix!: [#129] Rename RisRecord.typeOfWork to RisRecord.miscellaneous3 …

54b4d41

…to capture M3 BREAKING CHANGE: Deprecation for Java users, but breaking change for Kotlin

ursjoss added a commit that referenced this issue Feb 26, 2023

Revert "fix!: [#129] Rename RisRecord.typeOfWork to RisRecord.miscell…

4edf473

…aneous3 to capture M3" This reverts commit 54b4d41.

ursjoss added a commit that referenced this issue Feb 26, 2023

feat: [#129] Change M1 implementation from RisRecord.number to RisRec…

724125a

…ord.miscellaneous1 using String

ursjoss added a commit that referenced this issue Feb 26, 2023

feat: [#129] Rename M3 implementation from RisRecord.typeOfWork to Ri…

390f757

…sRecord.miscellaneous3

ursjoss mentioned this issue Feb 26, 2023

Remove deprecated RisRecord#number and RisRecord.typeOfWork #132

Closed

ursjoss added a commit that referenced this issue Feb 26, 2023

feat: [#129] Change M1 implementation from RisRecord.number to RisRec…

12d0bac

…ord.miscellaneous1 using String

ursjoss added a commit that referenced this issue Feb 26, 2023

feat: [#129] Rename M3 implementation from RisRecord.typeOfWork to Ri…

ca02e5e

…sRecord.miscellaneous3

ursjoss self-assigned this Feb 26, 2023

ursjoss added a commit that referenced this issue Feb 26, 2023

feat: [#129] Change M1 implementation from RisRecord.number to RisRec…

4b4e946

…ord.miscellaneous1 using String

ursjoss added a commit that referenced this issue Feb 26, 2023

feat: [#129] Rename M3 implementation from RisRecord.typeOfWork to Ri…

be9f002

…sRecord.miscellaneous3

This was referenced Feb 26, 2023

feat(import): [#129] Allow importing non-numeric M1 RisTags #133

Merged

Improve Gracefulness of RIS Import #134

Open

ursjoss added a commit that referenced this issue Feb 26, 2023

refactor: [#129] Add (disabled) integration test

0d53e56

ursjoss added a commit that referenced this issue Feb 26, 2023

feat: [#129] Change M1 implementation from RisRecord.number to RisRec…

d88e7a5

…ord.miscellaneous1 using String

ursjoss added a commit that referenced this issue Feb 26, 2023

feat: [#129] Rename M3 implementation from RisRecord.typeOfWork to Ri…

d7eacb9

…sRecord.miscellaneous3

ursjoss closed this as completed in #133 Feb 26, 2023

ursjoss reopened this Feb 26, 2023

adamsmd closed this as completed Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse Errors in Publisher RIS #129

Parse Errors in Publisher RIS #129

adamsmd commented Feb 19, 2023

ursjoss commented Feb 19, 2023 •

edited

Loading

adamsmd commented Feb 20, 2023

ursjoss commented Feb 26, 2023 •

edited

Loading

ursjoss commented Feb 26, 2023

ursjoss commented Feb 27, 2023

adamsmd commented Feb 27, 2023

Parse Errors in Publisher RIS #129

Parse Errors in Publisher RIS #129

Comments

adamsmd commented Feb 19, 2023

How to Reproduce

Cause

Solutions

ursjoss commented Feb 19, 2023 • edited Loading

adamsmd commented Feb 20, 2023

ursjoss commented Feb 26, 2023 • edited Loading

ursjoss commented Feb 26, 2023

ursjoss commented Feb 27, 2023

adamsmd commented Feb 27, 2023

ursjoss commented Feb 19, 2023 •

edited

Loading

ursjoss commented Feb 26, 2023 •

edited

Loading