-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistency: annotate implementation & environment as separate entities? #642
Comments
A more granular example is commands/functions in the programming environment. They are close to individually authored scripts. They are not consistently annotated in the dataset at the moment (but the support language environment is often annotated). Some existing examples: We used the <rs type="software">MATLAB</rs> command <rs type="software">fmin- search</rs> with multiple starting points to compute the maximum likelihood estimate for this value. linear regression with robust standard errors using the <rs type="software">STATA</rs> command "cluster (cluster variable)"was used-which relaxes the independence assumption and requires only that the observations should be independent across the clusters (STATA 2013) Would we want to leave them to crowd judgment? |
Similarly, the concern about annotating programming language may be addressed in this category of issues because:
Then what about Java in this case? <p>The Java GUI interface of <rs type="software">FastPval</rs> is shown in <ref type="figure">Supplementary Figure S</ref>2a-c. In the 'Method' field, the user can either choose '<rs type="software">FastPval</rs>' or the traditional 'Exact' method to calculate P-values. Thinking about the future annotation, the way we currently include these as valid annotations is still subject to subjective interpretation. (i.e., whether people understand the programming language as some sort of framework? They need to interpret the function of the programming language as implied in the textual context first). Though we can give some examples to prompt such understanding. |
The same issue for the mentions of the non-named "chunks of code" implemented in a certain software environment (borrowed from #637 ): Data analysis and model fitting were performed using <rs type="software">custom scripts</rs> written in <rs id="software-1" type="software">Igor Pro</rs> <rs corresp="#software-1" type="version">6</rs> (<rs corresp="#software-1" type="creator">WaveMetrics</rs>).</p>
Second, since <rs type="software">Matlab</rs> <rs id="software-0" type="software">routines</rs> applying Bayesian methods to the spatial lag, spatial error and spatial ... |
I think the principle applied here should be socio-technical :) Ultimately we are interested in improving credit for software contributions, including motivating sharing and coalescence. So I see three general categories (which have different names in different ecosystems).
I propose that we do not annotate "Included code" as |
And programming languages or frameworks should be coded (since they are distributed and should be credited). |
The reasons why I separated programming language introduced as an aspect of the implementation of a mentioned software, from a programming language as a framework mentioned on its own, are actually very practical:
|
@kermitt2 Per your second point above, would it make sense to annotate the programming language and the software as two entities? What would be the concern? e.g., technically it would be knotty to have the attribute of one entity as another entity in the serialized corpus? |
Fan, could you dig out the sociotechnical definition that we came up with,
surrounding distributed code? Perhaps after the paper is in for review we
can improve this situation and for the TagWorks coding we can add a
question about "software framework or language" which would produce a new
annotation linked to a specific software mention.
So the first example would become something like:
```
Data analysis and model fitting were performed using <rs id="software-1"
type="software" sub-type="unnamed">custom scripts</rs> written in <rs
id="software-1" type="framework">Igor Pro</rs> <rs corresp="#software-1"
type="framework-version">6</rs> (<rs corresp="#software-1"
type="creator">WaveMetrics</rs>).</p>
```
Ug. that example shows how complicated this since instantly we have
framework-version and framework-creator ... Perhaps we should rather
implement this either as "the code that is actually shared" which is Igor
Pro and not the custom scripts. Of course that doesn't really help when
there is
Perhaps another approach is to say: software explictly_depends_on
software so that both the "custom scripts" (software-1) and the Igor Pro
(software-2) are software (and potentially have all the annotations
associated with software) but software-1 also has the attribute
explicitly_depends_on="software-2").
…On Tue, Nov 26, 2019 at 9:52 AM C. Fan Du ***@***.***> wrote:
@kermitt2 <https://github.com/kermitt2> Per your second point above,
would it make sense to annotate the programming language and the software
as two entities? What would be the concern? e.g., technically it would be
knotty to have the attribute of one entity as another entity in the
serialized corpus?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#642?email_source=notifications&email_token=AAAWOUV2DRINSVEPCCECMA3QVVA5HA5CNFSM4JEMBII2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFGPVRY#issuecomment-558693063>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAWOUUT5PFTVK3GDD2CQKTQVVA5HANCNFSM4JEMBIIQ>
.
|
This example is from #637
Here R package proCIs is annotated as one single entity in this context.
While in most of the cases, the package and the environment are separately annotated. For instance:
(The final one has a package name missing annotation here :)
To me it's reasonable to annotate the software environment and the package separately.
The text was updated successfully, but these errors were encountered: