OpenRefine+Wikidata quick demo #5

wetneb · 2017-02-06T13:55:07Z

I have been working on a tool that sounds quite relevant for the event:

https://tools.wmflabs.org/openrefine-wikidata/

It helps align datasets to Wikidata in OpenRefine, a super cool software to deal with messy data. If you are still looking for lightning talks during the event, I would be happy to give a quick demo of the tool. I'd love it if we could then play with the tool on some research data (and I'm sure some attendees will know of many interesting datasets).

Daniel-Mietchen · 2017-02-10T23:14:28Z

Yes, your reconciliation tool looks great and would be a good thing to demo, play with and hack on. We haven't fully figured out how to organize the lightning talks, other than the generic slot for them in the program. We'll update that as things become more concrete.

Daniel-Mietchen · 2017-02-11T03:10:26Z

I just gave this a try.

I downloaded the results of this SPARQL query

#Find common strings for authors of scientific articles
SELECT * WHERE {
  {
    SELECT ?authorstring (COUNT(?paper) AS ?count) WHERE { { ?paper wdt:P2093 ?authorstring . }}
    GROUP BY ?authorstring
  }

   FILTER(?count > 30)
}

and fed them into OpenRefine, which resulted in 351 matching rows that I then converted to the format of the new author resolver, which gave this list, from where I then picked cases to look at in more detail, which resulted in ca. 1k replacements of P2093 statements with the corresponding P50 statements. That looks promising.

Things that still need attention in this workflow:

The transitions from SPARQL to Open Refine and from there to the author resolver are currently manual, and some information gets lost along the way. For instance, Open Refine comes up with a good set of potential matches between the P2093 text strings from the query and item labels of instances of Q5 but the author resolver is not aware of these potential matches and basically repeats the same search (though it struggles with periods in names).

Pinging @magnusmanske

I also tried to tackle another problem by way of a similar pipeline:

SPARQL query

# scientific journal (Q5633421) as main subject (P921)
SELECT ?item ?topic ?topicLabel WHERE {
  ?item wdt:P921 ?topic .
  ?topic wdt:P31 wd:Q5633421 . 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?topic

Results in Open Refine

This is where I got stuck. Can the tool be used at all in any way that would help with replacing those journal items in P921 statements with the corresponding items about the actual topics, as per the mapping here?

wetneb · 2017-02-11T10:09:48Z

@Daniel-Mietchen Thanks for giving it a spin! But your links to OpenRefine refer to your local instance of the tool, that we cannot access. Screenshots?

Concerning your mapping, I have some ideas to make this work if the mapping is stored on Wikidata (as journal to topic statements). I'll add the relevant endpoint and make screenshots to explain how to use it.

Daniel-Mietchen · 2017-02-13T11:55:32Z

This was my first try with Open Refine, so I'm still trying to find my way around. Is there no way to make my OpenRefine projects open, perhaps even by default? Would be nice to have them synced with Zenodo or so for every "release".

In this specific case, though, I don't think it matters too much (and screenshots wouldn't make much of a difference), since I simply took the outputs of both SPARQL queries (in csv format) and imported them into OpenRefine.

Daniel-Mietchen · 2017-02-13T13:40:35Z

Re lightning talks, we now have #13 to get them organized.

ekoner · 2017-03-04T11:34:57Z

+1

kshamash · 2017-03-04T13:05:56Z

Very cool! I'm testing it out on this dataset https://figshare.com/articles/COAF_Jisc_and_RCUK_APC_data_2013-2015/3462620

wetneb · 2017-03-05T17:32:03Z

So, we've done a lot of things on this topic:

demonstrating Wikidata and OpenRefine to many participants
trying the reconciliation service on various datasets
fixing bugs in the reconciliation interface, wetneb/openrefine-wikibase@d2176d3 wetneb/openrefine-wikibase@48e2ac6 wetneb/openrefine-wikibase@93c11ea
better documentation for the reconciliation, with pictures: https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation

So many thanks to all who got involved!

Daniel-Mietchen added the idea label Feb 13, 2017

JosephMcArthur added the lightning-talks label Feb 20, 2017

Daniel-Mietchen added the postponed label Mar 5, 2017

Daniel-Mietchen closed this as completed Mar 5, 2017

C21Beancounter mentioned this issue Mar 30, 2017

Correspondence with Open Corporates (March 2017) #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenRefine+Wikidata quick demo #5

OpenRefine+Wikidata quick demo #5

wetneb commented Feb 6, 2017 •

edited

Loading

Daniel-Mietchen commented Feb 10, 2017

Daniel-Mietchen commented Feb 11, 2017

wetneb commented Feb 11, 2017

Daniel-Mietchen commented Feb 13, 2017

Daniel-Mietchen commented Feb 13, 2017

ekoner commented Mar 4, 2017

kshamash commented Mar 4, 2017

wetneb commented Mar 5, 2017 •

edited

Loading

OpenRefine+Wikidata quick demo #5

OpenRefine+Wikidata quick demo #5

Comments

wetneb commented Feb 6, 2017 • edited Loading

Daniel-Mietchen commented Feb 10, 2017

Daniel-Mietchen commented Feb 11, 2017

wetneb commented Feb 11, 2017

Daniel-Mietchen commented Feb 13, 2017

Daniel-Mietchen commented Feb 13, 2017

ekoner commented Mar 4, 2017

kshamash commented Mar 4, 2017

wetneb commented Mar 5, 2017 • edited Loading

wetneb commented Feb 6, 2017 •

edited

Loading

wetneb commented Mar 5, 2017 •

edited

Loading