-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse Errors in Publisher RIS #129
Comments
Hi @adamsmd Thanks for your interest in KRis and your extensive bug report and solution variants. I'm currently on vacation and have no access to a computer. When back, I could imagine a two step approach: to unblock you assp, I would consider changing the type of M1 to String. It seems more appropriate after a quick Web search. In the longer term, it seems reasonable to implement a more graceful error handling, and I like your suggestions, but need to evaluate it with a better tool than my mobile phone. What do you think? |
That sounds reasonable to me. Thank you! |
…to capture M3 BREAKING CHANGE: Deprecation for Java users, but breaking change for Kotlin
…aneous3 to capture M3" This reverts commit 54b4d41.
PR #129 is expected to work around the current issue with In addition to the request from this ticket, I also deprecated The enhanced suggestions from the section |
* Changes the data type of the property holding the M1 content in RisRecord from Long? to String?. This allows the import of ranges (e.g. "M1 - 1-2") or non-numeric chars (e.g. "M1 - 4es"). * Deprecates RisRecord.number, in favor of new RisRecord.miscellaneous1 * Deprecates RisRecord.typeOfWork, in favor of new RisRecord.miscellaneous3 - for consistency. * Fixes KRisTagTest and KRisTypeTest which were not asserted properly. * Cleans up the Tag descriptions on the way.
@adamsmd I'm currently fighting with my publishing pipeline. I hope to be able to resolve that in the next days and will publish KRis-0.4.2 that should get you unstuck with your import. |
@adamsmd KRis-0.4.2 has been published. Please report if this works around the issues you have experienced during the import. |
I just tested KRis-0.4.2, and it indeed works around the issue. Thank you! |
Hello,
As part of a project I am working on, I have code that ingests RIS data from publishers. Unfortunately, I have found a case for which your code throws an exception while parsing.
How to Reproduce
The example paper is https://content.iospress.com/articles/fundamenta-informaticae/fi65-1-2-02. Clicking the "Cite" button then "Reference manager (RIS)" under "Export to:" downloads the RIS for this paper. Alternatively, you can use the following link to download the RIS directly: https://content.iospress.com/export/ris?stripAbstractHtml=true&entryId=fundamenta-informaticae%2Ffi65-1-2-02.
This produces the following RIS:
Calling
toRisRecords()
on this text results in the following exception:Cause
The problem is that the RIS from the publisher contains
M1 - 1-2
, but KRis expects M1 to be a number. Note that M1 is used for the "issue number" in which an article occurs, and in general, issue numbers often contain ranges (e.g., for https://www.sciencedirect.com/science/article/abs/pii/S0020019010000487?via%3Dihub it is issues 8-9) or letters (e.g., in https://dl.acm.org/doi/10.1145/242224.242324 the issue number is "4es"), or even letters without numbers (e.g.,for https://dl.acm.org/doi/10.1145/3133906 the issue is "OOPSLA"). (Not all of these produce RIS that trigger this bug, but they are examples of the general idea of issue number as used by publishers.)
One could argue that the publisher is at fault for producing RIS that contains
M1 - 1-2
, but I don't have control over that input source, so my project is forced to deal with such RIS.Solutions
Aside from manually parsing the RIS myself to remove the offending line, I think there are two solutions KRis could adopt.
The first is to support continuing the parse even if a field is malformed. Exceptions from various fields could be collected (but not thrown) so client code can decide to do with them. For example,
bibtex.parser.BibtexParser
from BibSonomy does this (seethrowAllParseExceptions
andgetExceptions
in https://bitbucket.org/bibsonomy/bibsonomy/src/bfe90976cfb47c9f3fd38a0483495af48f7b7576/bibsonomy-bibtex-parser/src/main/java/bibtex/parser/BibtexParser.java).The second is to support access to the "raw" contents of a field regardless of whether that field parsed correctly. One way to do this would be to have a "raw" field for every field of RisRecord (e.g., rawNumber: String? in addition to number: Long?). Another, would be to wrap each field in a "Raw" type. For example, the type of RisRecord.number would become
Raw<Long>?
, where "Raw" is something likeclass <T>Raw(val raw: String, val value: T?)
. However, that would change the API a bit, and I don't know how open to such changes you are.The text was updated successfully, but these errors were encountered: