Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Errors in Publisher RIS #129

Closed
adamsmd opened this issue Feb 19, 2023 · 6 comments · Fixed by #133
Closed

Parse Errors in Publisher RIS #129

adamsmd opened this issue Feb 19, 2023 · 6 comments · Fixed by #133
Assignees

Comments

@adamsmd
Copy link

adamsmd commented Feb 19, 2023

Hello,

As part of a project I am working on, I have code that ingests RIS data from publishers. Unfortunately, I have found a case for which your code throws an exception while parsing.

How to Reproduce

The example paper is https://content.iospress.com/articles/fundamenta-informaticae/fi65-1-2-02. Clicking the "Cite" button then "Reference manager (RIS)" under "Export to:" downloads the RIS for this paper. Alternatively, you can use the following link to download the RIS directly: https://content.iospress.com/export/ris?stripAbstractHtml=true&entryId=fundamenta-informaticae%2Ffi65-1-2-02.

This produces the following RIS:

TY  - JOUR
AU  - Abbott, Michael
AU  - Altenkirch, Thorsten
AU  - McBride, Conor
AU  - Ghani, Neil
EP  - 28
AB  - This paper and our conference paper (Abbott, Altenkirch, Ghani, and McBride, 2003b) explain and analyse the notion of the derivative of a data structure as the type of its one-hole contexts based on the central observation made by McBride (2001). To make the idea precise we need a generic notion of a data type, which leads to the notion of a container, introduced in (Abbott, Altenkirch, and Ghani, 2003a) and investigated extensively in (Abbott, 2003). Using containers we can provide a notion of linear map which is the concept missing from McBride's first analysis. We verify the usual laws of differential calculus including the chain rule and establish laws for initial algebras and terminal coalgebras.
SP  - 1
TI  - ∂ for Data: Differentiating Data Structures
T2  - Fundamenta Informaticae
VL  - 65
M1  - 1-2
PY  - 2005
PB  - IOS Press
ER  - 

Calling toRisRecords() on this text results in the following exception:

Exception in thread "main" java.lang.NumberFormatException: For input string: "1-2"
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.base/java.lang.Long.parseLong(Long.java:692)
	at java.base/java.lang.Long.parseLong(Long.java:817)
	at ch.difty.kris.implementation.RisImport.typeSafeValueFrom(RisImport.kt:60)
	at ch.difty.kris.implementation.RisImport.fillFrom(RisImport.kt:52)
	at ch.difty.kris.implementation.RisImport.access$fillFrom(RisImport.kt:17)
	at ch.difty.kris.implementation.RisImport$process$1$2.emit(RisImport.kt:33)
	at ch.difty.kris.implementation.RisImport$process$1$2.emit(RisImport.kt:28)
	at ch.difty.kris.implementation.RisImport$process$1$invokeSuspend$$inlined$filterNot$1$2.emit(Emitters.kt:223)
	at kotlinx.coroutines.flow.FlowKt__TransformKt$filterNotNull$$inlined$unsafeTransform$1$2.emit(Emitters.kt:223)
	at kotlinx.coroutines.flow.FlowKt__BuildersKt$asFlow$$inlined$unsafeFlow$3.collect(SafeCollector.common.kt:115)
	at kotlinx.coroutines.flow.FlowKt__TransformKt$filterNotNull$$inlined$unsafeTransform$1.collect(SafeCollector.common.kt:113)
	at ch.difty.kris.implementation.RisImport$process$1$invokeSuspend$$inlined$filterNot$1.collect(SafeCollector.common.kt:113)
	at ch.difty.kris.implementation.RisImport$process$1.invokeSuspend(RisImport.kt:28)
	at ch.difty.kris.implementation.RisImport$process$1.invoke(RisImport.kt)
	at ch.difty.kris.implementation.RisImport$process$1.invoke(RisImport.kt)
	at kotlinx.coroutines.flow.SafeFlow.collectSafely(Builders.kt:61)
	at kotlinx.coroutines.flow.AbstractFlow.collect(Flow.kt:230)
	at kotlinx.coroutines.flow.FlowKt__CollectionKt.toCollection(Collection.kt:26)
	at kotlinx.coroutines.flow.FlowKt.toCollection(Unknown Source)
	at kotlinx.coroutines.flow.FlowKt__CollectionKt.toList(Collection.kt:15)
	at kotlinx.coroutines.flow.FlowKt.toList(Unknown Source)
	at kotlinx.coroutines.flow.FlowKt__CollectionKt.toList$default(Collection.kt:15)
	at kotlinx.coroutines.flow.FlowKt.toList$default(Unknown Source)
	at ch.difty.kris.KRis$processList$1.invokeSuspend(KRis.kt:58)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
	at ch.difty.kris.KRis.processList(KRis.kt:57)
	at ch.difty.kris.KRisExtensionsKt.toRisRecords(KRisExtensions.kt:34)

Cause

The problem is that the RIS from the publisher contains M1 - 1-2, but KRis expects M1 to be a number. Note that M1 is used for the "issue number" in which an article occurs, and in general, issue numbers often contain ranges (e.g., for https://www.sciencedirect.com/science/article/abs/pii/S0020019010000487?via%3Dihub it is issues 8-9) or letters (e.g., in https://dl.acm.org/doi/10.1145/242224.242324 the issue number is "4es"), or even letters without numbers (e.g.,
for https://dl.acm.org/doi/10.1145/3133906 the issue is "OOPSLA"). (Not all of these produce RIS that trigger this bug, but they are examples of the general idea of issue number as used by publishers.)

One could argue that the publisher is at fault for producing RIS that contains M1 - 1-2, but I don't have control over that input source, so my project is forced to deal with such RIS.

Solutions

Aside from manually parsing the RIS myself to remove the offending line, I think there are two solutions KRis could adopt.

The first is to support continuing the parse even if a field is malformed. Exceptions from various fields could be collected (but not thrown) so client code can decide to do with them. For example, bibtex.parser.BibtexParser from BibSonomy does this (see throwAllParseExceptions and getExceptions in https://bitbucket.org/bibsonomy/bibsonomy/src/bfe90976cfb47c9f3fd38a0483495af48f7b7576/bibsonomy-bibtex-parser/src/main/java/bibtex/parser/BibtexParser.java).

The second is to support access to the "raw" contents of a field regardless of whether that field parsed correctly. One way to do this would be to have a "raw" field for every field of RisRecord (e.g., rawNumber: String? in addition to number: Long?). Another, would be to wrap each field in a "Raw" type. For example, the type of RisRecord.number would become Raw<Long>?, where "Raw" is something like class <T>Raw(val raw: String, val value: T?). However, that would change the API a bit, and I don't know how open to such changes you are.

@ursjoss
Copy link
Owner

ursjoss commented Feb 19, 2023

Hi @adamsmd

Thanks for your interest in KRis and your extensive bug report and solution variants. I'm currently on vacation and have no access to a computer.

When back, I could imagine a two step approach: to unblock you assp, I would consider changing the type of M1 to String. It seems more appropriate after a quick Web search.

In the longer term, it seems reasonable to implement a more graceful error handling, and I like your suggestions, but need to evaluate it with a better tool than my mobile phone.

What do you think?

@adamsmd
Copy link
Author

adamsmd commented Feb 20, 2023

That sounds reasonable to me. Thank you!

ursjoss added a commit that referenced this issue Feb 26, 2023
…to capture M3

BREAKING CHANGE: Deprecation for Java users, but breaking change for Kotlin
ursjoss added a commit that referenced this issue Feb 26, 2023
ursjoss added a commit that referenced this issue Feb 26, 2023
ursjoss added a commit that referenced this issue Feb 26, 2023
ursjoss added a commit that referenced this issue Feb 26, 2023
ursjoss added a commit that referenced this issue Feb 26, 2023
@ursjoss ursjoss self-assigned this Feb 26, 2023
ursjoss added a commit that referenced this issue Feb 26, 2023
ursjoss added a commit that referenced this issue Feb 26, 2023
@ursjoss
Copy link
Owner

ursjoss commented Feb 26, 2023

PR #129 is expected to work around the current issue with M1 not complying with purely numeric values.

In addition to the request from this ticket, I also deprecated RisRecord.number and even RisRecord.typeOfWork, adding the new nullable String properties RisRecord.miscellaneous1 and RisRecord.miscellaneous3 as the new way of handling M1 and M3. The deprecated methods will be removed in a future version of KRis in the context of issue #132 .

The enhanced suggestions from the section Solutions in the issue descriptions are extracted into issue #134.

ursjoss added a commit that referenced this issue Feb 26, 2023
ursjoss added a commit that referenced this issue Feb 26, 2023
ursjoss added a commit that referenced this issue Feb 26, 2023
* Changes the data type of the property holding the M1 content in RisRecord from Long? to String?.
  This allows the import of ranges (e.g. "M1 - 1-2") or non-numeric chars (e.g. "M1 - 4es").
* Deprecates RisRecord.number, in favor of new RisRecord.miscellaneous1
* Deprecates RisRecord.typeOfWork, in favor of new RisRecord.miscellaneous3 - for consistency.
* Fixes KRisTagTest and KRisTypeTest which were not asserted properly.
* Cleans up the Tag descriptions on the way.
@ursjoss ursjoss reopened this Feb 26, 2023
@ursjoss
Copy link
Owner

ursjoss commented Feb 26, 2023

@adamsmd I'm currently fighting with my publishing pipeline. I hope to be able to resolve that in the next days and will publish KRis-0.4.2 that should get you unstuck with your import.

@ursjoss
Copy link
Owner

ursjoss commented Feb 27, 2023

@adamsmd KRis-0.4.2 has been published. Please report if this works around the issues you have experienced during the import.

@adamsmd
Copy link
Author

adamsmd commented Feb 27, 2023

I just tested KRis-0.4.2, and it indeed works around the issue. Thank you!

@adamsmd adamsmd closed this as completed Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants