Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF/XML Literals not being parsed properly #75

Closed
jamsden opened this issue Aug 11, 2015 · 22 comments · Fixed by #632
Closed

RDF/XML Literals not being parsed properly #75

jamsden opened this issue Aug 11, 2015 · 22 comments · Fixed by #632

Comments

@jamsden
Copy link

jamsden commented Aug 11, 2015

Given some RDF/XML that contains:

<oslc:serviceProvider>
    <oslc:ServiceProvider rdf:about="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml">
      <dcterms:title rdf:parseType="Literal">JKE Banking (Change Management)</dcterms:title>
      <oslc:details rdf:resource="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/process/project-areas/_pMhMgPsWEeSnQvDHoYok5w"/>
      <jfs_proc:supportLinkDiscoveryViaLinkIndexProvider rdf:parseType="Literal">false</jfs_proc:supportLinkDiscoveryViaLinkIndexProvider>
      <jfs_proc:supportContributionsToLinkIndexProvider rdf:parseType="Literal">true</jfs_proc:supportContributionsToLinkIndexProvider>
      <jfs_proc:globalConfigurationAware rdf:parseType="Literal">compatible</jfs_proc:globalConfigurationAware>
      <jfs_proc:consumerRegistry rdf:resource="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/process/project-areas/_pMhMgPsWEeSnQvDHoYok5w/links"/>
    </oslc:ServiceProvider>
  </oslc:serviceProvider>

An a query such as:
someKb.the(aServiceProvider, DCTERMS('title’));

returns:

<dcterms:title rdf:parseType="Literal">JKE Banking (Change Management)</dcterms:title>    

instead of the text. Am I missing something of is the dcterms:title being parsed incorrectly?

@jamsden
Copy link
Author

jamsden commented Aug 14, 2015

In the this.parseDOM() function, changing:

                        var nv = parsetype.nodeValue;
                        if (nv === "Literal"){
                            frame.datatype = RDFParser.ns.RDF + "XMLLiteral";// (this.buildFrame(frame)).addLiteral(dom)
                               // should work but doesn't
                            frame = this.buildFrame(frame);
                            frame.addLiteral(dom);
                            dig = false;
                        }

to:

                        var nv = parsetype.nodeValue;
                        if (nv === "Literal"){
                            frame.datatype = RDFParser.ns.RDF + "XMLLiteral";// (this.buildFrame(frame)).addLiteral(dom)
                               // should work but doesn't
                            frame = this.buildFrame(frame);
                            frame.addLiteral(dom.lastChild.nodeValue);
                            dig = false;
                        }

to get the actual content of the literal node seems to work. Will this might break something else?

@jamsden jamsden closed this as completed Aug 14, 2015
@jamsden
Copy link
Author

jamsden commented Aug 14, 2015

I didn't mean to close the issue.

@jamsden jamsden reopened this Aug 14, 2015
@jamsden
Copy link
Author

jamsden commented Aug 14, 2015

It appears the dataType is incorrect:

  { subject: 
     { uri: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml',
       value: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml' },
    predicate: 
     { uri: 'http://purl.org/dc/terms/title',
       value: 'http://purl.org/dc/terms/title' },
    object: 
     { value: 'JKE Banking (Change Management)',
       lang: '',
       datatype: [Object] },
    why: 
     { uri: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/workitems/catalog',
       value: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/workitems/catalog' } },

Should it be:

{ value: 'JKE Banking (Change Management)',
  lang: undefined,
  datatype: undefined }

or somehow a string? Or am I doing this query incorrectly:

    var sp = this.catalog.statementsMatching(undefined, DCTERMS('title'), 'JKE Banking (Change Management)');

Does the string literal object need to be wrapped in this.catalog.literal? I tried that too, still didn't match, and I noticed that wrapping the string as a literal leaves the datatype undefined as shown above.

@jamsden
Copy link
Author

jamsden commented Aug 15, 2015

I'm making some progress. The ‘addLiteral’ function of the RDFParser frameFactory adds the datatype
sym('http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral') for literal nodes while the kb.literal('JKE Banking (Change Management') uses undefined - so they never match. If I force the data type to XMLLiteral, then the match works:

var sp = this.catalog.statementsMatching(undefined, DCTERMS('title'), this.catalog.literal('JKE Banking (Change Management)', undefined, this.catalog.sym('http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral'))));

This doesn't seem to match the documentation which says you should be able to just use a JavaScript string. Is this a bug or does it work as intended, and I have to create these literals with the symbol datatype?

@timbl
Copy link
Member

timbl commented Aug 15, 2015

The parsetype="Literal" syntax in RDF/XML is for quoting pieces of embed XML literally. I think you probably just want strings. If you just miss out parsetype="Literal" then you will have the strings you want I suspect.

@jamsden
Copy link
Author

jamsden commented Aug 15, 2015

Unfortunately I don't control the RDF/XML source, its from Rational Team Concert OSLC Service Provider Catalog. So I may have to just deal with RTC's quirk for how it expresses dcterms:title. That's no problem.

However, isn't there still an issue? The RDF/XML source is:

<oslc:serviceProvider>
    <oslc:ServiceProvider rdf:about="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml">
      <dcterms:title rdf:parseType="Literal">JKE Banking (Change Management)</dcterms:title>
      <oslc:details rdf:resource="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/process/project-areas/_pMhMgPsWEeSnQvDHoYok5w"/>
      <jfs_proc:supportLinkDiscoveryViaLinkIndexProvider rdf:parseType="Literal">false</jfs_proc:supportLinkDiscoveryViaLinkIndexProvider>
      <jfs_proc:supportContributionsToLinkIndexProvider rdf:parseType="Literal">true</jfs_proc:supportContributionsToLinkIndexProvider>
      <jfs_proc:globalConfigurationAware rdf:parseType="Literal">compatible</jfs_proc:globalConfigurationAware>
      <jfs_proc:consumerRegistry rdf:resource="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/process/project-areas/_pMhMgPsWEeSnQvDHoYok5w/links"/>
    </oslc:ServiceProvider>
  </oslc:serviceProvider>

Seems like the value of this property should be LiteralXML, but shouldn't include the property itself, just the value:

JKE Banking (Change Management)  

(is this even valid XML?) not

<dcterms:title rdf:parseType="Literal">JKE Banking (Change Management)</dcterms:title>  

@jamsden
Copy link
Author

jamsden commented Aug 17, 2015

I think my patch above is incorrect. The this.parseDOM() function for Literal nodes:

                        var nv = parsetype.nodeValue;
                        if (nv === "Literal"){
                            frame.datatype = RDFParser.ns.RDF + "XMLLiteral";// (this.buildFrame(frame)).addLiteral(dom)
                               // should work but doesn't
                            frame = this.buildFrame(frame);
                            frame.addLiteral(dom);
                            dig = false;
                        }

should normalize the children of the Literal property (so that === on embedded XML works consistently regardless of ordering), and use an XML serializer to create the value of the node which should be XML source, not parsed DOM. I see similar code in the RDFa parser. If this is correct, I can submit a fix.

@lonniev
Copy link

lonniev commented May 30, 2016

Interesting, I have a problem here in May 2016 with Jim's oslc-client being unable to find Service Providers because the statementsMatching method is not finding XMLLiterals that contain the sought CCM Project Name (name only). I wonder if rdflib.js evolved while Jim's OSLC4JS example has not.

@jamsden
Copy link
Author

jamsden commented May 30, 2016

My patch for XMLLiterals has not been merged into rdflib.js yet.

On May 30, 2016, at 1:14 AM, Lonnie VanZandt [email protected] wrote:

Interesting, I have a problem here in May 2016 with Jim's oslc-client being unable to find Service Providers because the statementsMatching method is not finding XMLLiterals that contain the sought CCM Project Name (name only). I wonder if rdflib.js evolved while Jim's OSLC4JS example has not.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub #75 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABECqgXHfqIHNFxueNv3laRPyknDOaEwks5qGnI3gaJpZM4FpbtE.

@lonniev
Copy link

lonniev commented May 30, 2016

Because the find-the-service-provider-by-name method is only looking for a string in what is likely to be a fairly small set of titles, we could refactor the method to retrieve all ?-title-? statements and then use a simple JS or Lodash collection filter to pick out the pattern "(.)${serviceProviderTitle}(.)". That may be good enough versus trying to get the rdflib.catalog to recognize our particular literal value string. What do you think?

@lonniev
Copy link

lonniev commented May 30, 2016

The following and the addition of lodash and escapeStringRegex allow the method to find the statement that relates the subject uri to the literal title for the sought serviceProviderTitle.

       var haveTitle = this.catalog.statementsMatching(
            undefined, 
            DCTERMS('title'),
            undefined );

        const regex = new RegExp( ".*?" + escapeStringRegexp( serviceProviderTitle ) + ".*?" );

        var sp = _.filter( haveTitle,
            (s) =>
            {                
                return s.object.value.match( regex );
            }
        );

@akoptelov
Copy link

@jamsden probably even easier fix without introducing new dependency:
frame.addLiteral(dom.childNodes)

@jamsden
Copy link
Author

jamsden commented Mar 28, 2018

frame.addLiteral(dom.childNodes) does indeed work.

DOM such as:
<dcterms:title rdf:parseType="Literal">

JKE Banking (Change Management)

Another paragraph

And another paragraph

</dcterms:title>

would parse as the following literal string of XML source:

JKE Banking (Change Management)

Another paragraph

And another paragraph

So this becomes a one-line code change. I'll implement in my fork, test and create a PULL request.

There is about to be a lot of use of rdflib.js in developing OSLC integrations. This defect is a show stopper however since OSLC makes a lot of use of parseType="Literal".

@JeffCave
Copy link

JeffCave commented Nov 9, 2018

This change does not behave nicely in-browser.

The Browser's DomParser handles serialization of NodeLists differently than the library used for NodeJS. In the browser, objects get serialized as "[object NameOfDataType]", rather than the contents of the list.

I would propose that the line

frame.addLiteral(dom.childNodes)

Would be better as

//frame.addLiteral(dom.innerHTML);
frame.addLiteral(dom.innerHTML || dom.childNodes);

This both serializes the inner content, as well as preserving it's XML content as requried by parseType='Literal'. By checking innerHTML first we use that by default, otherwise assume we are in node and serialize with default childNodes handler.

I'm a little fuzzy on how nodejs handles this. I assume xmldom does not have an innerHTML property.


Issue verified in:

  • Chrome
  • Firefox

https://forum.solidproject.org/t/errors-parsing-xml-with-rdflib-js-in-the-browser/448

@AndreyBespamyatnov
Copy link

We are facing the same issue. Is it possible to get that fixed or do you have any workarounds? Thanks

@bourgeoa
Copy link
Contributor

bourgeoa commented Feb 4, 2024

@AndreyBespamyatnov

//frame.addLiteral(dom.innerHTML);
frame.addLiteral(dom.innerHTML || dom.childNodes);

Is this solving your issue ? Or are there other issues ?
I published an [email protected] on npm with this patch ? Is this working for you ? Can you test it ?

@bourgeoa bourgeoa linked a pull request Feb 4, 2024 that will close this issue
@AndreyBespamyatnov
Copy link

AndreyBespamyatnov commented Feb 4, 2024

@AndreyBespamyatnov

//frame.addLiteral(dom.innerHTML);
frame.addLiteral(dom.innerHTML || dom.childNodes);

Is this solving your issue ? Or are there other issues ? I published an [email protected] on npm with this patch ? Is this working for you ? Can you test it ?

Hi @bourgeoa, let my try a new version and if not I will come back with more information about the issue and some test data, Thank you

@paulslauenwhite
Copy link

@bourgeoa, we had the same issue as this bug in an implementation of the OSLC AM V3 specification using [email protected] and moving to [email protected] resolved the issue with no side effects. Thanks for the fix.

@paulslauenwhite
Copy link

paulslauenwhite commented Apr 18, 2024

@bourgeoa, this fix is not in [email protected] or [email protected]. When will the next rdflib release containing this fix be published to https://www.npmjs.com/package/rdflib?

@bourgeoa
Copy link
Contributor

bourgeoa commented Apr 23, 2024

@paulslauenwhite

@bourgeoa, this fix is not in [email protected] or [email protected]. When will the next rdflib release containing this fix be published to https://www.npmjs.com/package/rdflib?

merged in [email protected]

@paulslauenwhite
Copy link

Thanks @bourgeoa! Confirmed [email protected] contains this fix. Will https://github.com/linkeddata/rdflib.js/releases be updated with the 2.2.35 release?

@bourgeoa
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants