Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Dataset: Add "replication data for" to title, make "related publication" metadata field required #3679

Open
sbarbosadataverse opened this issue Mar 8, 2017 · 13 comments
Labels
Feature: Metadata Type: Feature a feature request User Role: Curator Curates and reviews datasets, manages permissions UX & UI: Design This issue needs input on the design of the UI and from the product owner

Comments

@sbarbosadataverse
Copy link

sbarbosadataverse commented Mar 8, 2017

Merce requested.
The whole purpose of replication data it to be able to find the article related to the data and this is not being filled out as needed because it's not required at this time. This feature would make "related publication field" required and optimally should include a place for a link to the article.

@djbrooke
Copy link
Contributor

djbrooke commented May 4, 2017

@pameyer - let's discuss this in terms of the discussion from the quarterly meeting.

@pameyer
Copy link
Contributor

pameyer commented May 4, 2017

Came up in the quarterly meeting in a discussion of customizing the 'add "replication data for"' to title button; in relation to both customizing the text of that button and referring to an identifier in a metadata field.
Initial thoughts (for future discussion):

  • Metadata field with an external identifier (PDB id, PMID, DOI, others?); probably dependency on more semantic metadata field / DataSetFieldType
  • Possibly a structured way to indicate that this option has been selected
  • validation check on the designated external identifier field presence (and possibly verification)

@djbrooke
Copy link
Contributor

Thanks @pameyer! We'll discuss soon.

@pdurbin
Copy link
Member

pdurbin commented May 23, 2017

Related: #3838.

@pdurbin pdurbin added the UX & UI: Design This issue needs input on the design of the UI and from the product owner label Jun 8, 2017
@pdurbin
Copy link
Member

pdurbin commented Jun 25, 2017

@sbarbosadataverse does the fact that we addressed #3838 help?

@pdurbin pdurbin added the User Role: Curator Curates and reviews datasets, manages permissions label Jul 5, 2017
@jggautier
Copy link
Contributor

jggautier commented Jul 24, 2017

I've been recording the number of published datasets uploaded to Harvard Dataverse that include any, each and all of the related publication metadata. It's been a month since #3838 (where related publication fields were brought up to the dataset create page in the 4.6.2 release), and the number of deposits with that metadata has increased very little - not enough, I think, to say that #3838 has helped, but it's an important first step.

In Harvard Dataverse, datasets with "Replication Data for" in the title make up about 22 percent of published datasets, so making the field required would definitely help. If Dataverse pulled publication metadata from other sources, the way Dryad, OpenICPSR, and Zotero do, it would make including this metadata easier for depositors, but only when the publication has already been given a persistent ID (e.g. DOI).

Using PIDs to pull metadata from other sources wouldn't help in cases where the article hasn't been assigned a PID yet. In Harvard Dataverse, the publicationID is the least used of the related publication fields (of all published, non-harvested datasets), I think because depositors don't have the PID when they deposit, and don't go back to include a PID when the article has been assigned one. Solving this problem would also help, maybe more.

Edit: There's also cases where the related publication may never get a PID, although that might be rare.

@djbrooke djbrooke assigned jggautier and unassigned djbrooke Aug 2, 2017
@djbrooke
Copy link
Contributor

djbrooke commented Aug 2, 2017

@jggautier - can you work with @pameyer on this to determine a potential solution that will meet the original request from Merce and Pete's need? Happy to help on this, but it seems like you've given this some good thought and research.

@jggautier
Copy link
Contributor

Can do. We're meeting today.

@jggautier
Copy link
Contributor

jggautier commented Aug 14, 2017

Spoke with @pameyer, who wrote in an earlier comment about SBGrid's need to create dataset titles composed of other metadata. To simplify things, I think another ticket should be created for that functionality, which might satisfy other use cases.

We spoke about making metadata fields required based on the type of data being deposited:

  • If the dataset is "replication data," Related Pub metadata would be required.
  • If the dataset is one of the several data types in SBGrid's Data Type metadata field, certain metadata fields in a custom metadata block would be required based on the data type.

But we have to consider a difference in deposit workflows, and perhaps work on solutions for these in stages:

  • For replication data, we've been considering a self-curation workflow. The depositor creates and publishes the dataset.
  • For SBGrid's data, the depositor saves the dataset, but a curator publishes it once the required metadata is available and entered.

For replication data, we want to encourage researchers to deposit data even if they don't have metadata for the related article. The trick is getting that metadata once it's available, which can take months as a title-less and PID-less manuscript and its data move through a journal's review and publication process.

For SBGrid data -- and maybe for repositories and dataverses where a curator, and not the depositor, publishes the data -- a depositor can save a dataset without the required metadata, but the curator can't publish it without that metadata. In other words, the metadata is optional when the depositor is saving the dataset, but required before the curator can publish it.

@mheppler mheppler changed the title Feature: for anything tagged as "replication data for" make "related publication field" required Create Dataset: Add "replication data for" to title, make "related publication" metadata field required Feb 10, 2018
@pdurbin
Copy link
Member

pdurbin commented Jun 28, 2018

I appreciate all the thought @jggautier and @pameyer have put into this issue but it's still not clear to me what the definition of done is for this issue. I think it would be a tricky one to estimate. We made some changes to the code in #3838. That issue was well defined and clear.

@pameyer
Copy link
Contributor

pameyer commented Jun 28, 2018

It's probably easier to split our specific use case out from the original. For the original use case, the definition of done seems fairly clear to me - but this functionality would require some relatively in-depth changes to the metadata / DatasetFieldType infrastructure (hidden boolean for "is-replication-data-for"; having a DatasetFieldType required or optional conditional on the value of another DatasetFieldType; enforcing "required" at publication time rather than creation/edit time, probably others).

@pameyer pameyer removed their assignment Mar 11, 2019
@jggautier
Copy link
Contributor

jggautier commented Jul 16, 2019

Removing myself from this issue for now. I haven't worked on it since @pameyer's comment last summer. I still think that because depositors often don't have the most important information about the related publication, e.g. article PID or title, requiring Related Publication metadata for datasets whose titles start with "Replication data for" might result in depositors:

  • Keeping the dataset in draft state because they can't publish it without related publication metadata, which delays publication of an otherwise publishable dataset
  • Entering filler text in the required metadata fields, and not returning to the dataset to enter related publication metadata when it's available. This problem exists now (query publicationCitation:forthcoming in Harvard Dataverse (query) or UNC Dataverse (query))
  • Not pressing the "Replication data for" button (if they're told or realize that pressing that button is what makes the Related Publication metadata fields required)

I think what would be helpful is something that gets someone or some system to add the related publication metadata when it becomes available:

  • Dataverse can send reminders to depositors or help curators identify datasets missing this metadata and contact depositors
  • a system like a journal publishing software can send the metadata after publication (part of an editorial system integration like the one between OJS and Dataverse) or a discovery system can suggest and/or display published works it thinks are related to the dataset. OpenAIRE has or had something like this, but I can't find examples now.

@cmbz
Copy link

cmbz commented Sep 9, 2024

2024/09/09: Keeping issue open, this is a useful feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata Type: Feature a feature request User Role: Curator Curates and reviews datasets, manages permissions UX & UI: Design This issue needs input on the design of the UI and from the product owner
Projects
None yet
Development

No branches or pull requests

7 participants