Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creator as part of the bibliographical reference callout #612

Open
kermitt2 opened this issue May 10, 2019 · 1 comment
Open

Creator as part of the bibliographical reference callout #612

kermitt2 opened this issue May 10, 2019 · 1 comment

Comments

@kermitt2
Copy link
Member

Annotations of creator within a bibliographical reference callout seems random:

10.1002%2Fpam.22030.software-mention.xml

Data come from the Integrated Public Use Microdata Series 
(<rs type="software">IPUMS</rs>) database (Ruggles et al., 2010). 

10.1007%2Fs00191-010-0188-y.software-mention.xml

... the estimation method employed here is based on the 'recursive conditioning simulator' 
implemented for <rs type="software">STATA</rs> by Cappellari and Jenkins (2003). 

10.1007%2Fs10290-016-0264-y.software-mention.xml

with a maturity of 10 years found in the 
<rs type="software">Statistical Data Warehouse</rs> 
of the European Central Bank (2014). 

10.1007%2Fs10663-015-9287-1.software-mention.xml

... using <rs type="software">OpenBugs</rs> program of Meyer and Yu (2000).

versus

10.1007%2Fs10683-017-9548-x.software-mention.xml

using the recruitment software <rs id="software-2" type="software">ORSEE</rs> 
(<rs corresp="#software-2" type="creator">Greiner</rs> 
<rs corresp="#software-2" type="version-date">2015</rs>). All sessions were 
programmed with the <rs id="software-3" type="software">z-Tree</rs> 
(<rs corresp="#software-3" type="creator">Fischbacher</rs> 
<rs corresp="#software-3" type="version-date">2007</rs>) software. 

-> tagging version date using the publication year of the introduced bibliographical reference is not very well-funded imho.

10.1007%2Fs10258-013-0091-1.software-mention.xml

using the package <rs id="software-2" type="software">MulCom</rs> of 
<rs corresp="#software-2" type="creator">Hansen and Lunde</rs> 
(<rs corresp="#software-2" type="version-date">2010</rs>) written in 
<rs id="software-3" type="software">Ox</rs> 
(<rs corresp="#software-3" type="creator">Doornik</rs> 
<rs corresp="#software-3" type="version-date">2006</rs>). 

10.1007%2Fs11166-011-9127-z.software-mention.xml

The experiment was programmed in <rs id="software-1" type="software">Z-tree</rs> 
(<rs corresp="#software-1" type="creator">Fischbacher</rs> 
<rs corresp="#software-1" type="version-date">2007</rs>). 
@jameshowison
Copy link
Contributor

Hmmm, yes I see the issue here. Certainly the dates on bibliographical reference callout should not be coded as version-date. So we could programmatically remove any version-date codes that overlap with a bibliographical reference callout (assuming that TEI XML marks the call outs in some way?)

OTOH, it is appropriate to code the names in a bibliographical reference callout as creator (on the logic that having their names there gives them credit for the code). So that seems harder to fix. Perhaps programmatically, though, we could identify any bibliographical reference callout in a sentence with a mention (ie a software name). Then we could either a) auto-code any names with those as creator, or b) manually review to know which are actually the creators of the software.

For creator, though, I deeply suspect that without looking at the article cited specifically, a great many of the names in bibliographical reference callouts would not have been marked as creators. ie the information used to disambiguate whether any particular bibliographical reference callout with a name is outside the information given to the machine learning system (because we're not actually giving the text of the cited paper which the coders read/skimmed to make their decision). So that agues for dropping any creator tags in a bibliographical reference callout.

In terms of what we need for CiteAs etc, I think the creator is low priority (since we can work via the Grobid recognized bibliographical reference callout), so I think best route to consistency is to drop the creator tags in a bibliographical reference callout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants