Create Dataset: Add "replication data for" to title, make "related publication" metadata field required #3679

sbarbosadataverse · 2017-03-08T19:08:51Z

Merce requested.
The whole purpose of replication data it to be able to find the article related to the data and this is not being filled out as needed because it's not required at this time. This feature would make "related publication field" required and optimally should include a place for a link to the article.

djbrooke · 2017-05-04T16:26:49Z

@pameyer - let's discuss this in terms of the discussion from the quarterly meeting.

pameyer · 2017-05-04T22:06:03Z

Came up in the quarterly meeting in a discussion of customizing the 'add "replication data for"' to title button; in relation to both customizing the text of that button and referring to an identifier in a metadata field.
Initial thoughts (for future discussion):

Metadata field with an external identifier (PDB id, PMID, DOI, others?); probably dependency on more semantic metadata field / DataSetFieldType
Possibly a structured way to indicate that this option has been selected
validation check on the designated external identifier field presence (and possibly verification)

djbrooke · 2017-05-10T17:25:39Z

Thanks @pameyer! We'll discuss soon.

pdurbin · 2017-05-23T14:17:29Z

Related: #3838.

pdurbin · 2017-06-25T18:24:09Z

@sbarbosadataverse does the fact that we addressed #3838 help?

jggautier · 2017-07-24T16:01:50Z

I've been recording the number of published datasets uploaded to Harvard Dataverse that include any, each and all of the related publication metadata. It's been a month since #3838 (where related publication fields were brought up to the dataset create page in the 4.6.2 release), and the number of deposits with that metadata has increased very little - not enough, I think, to say that #3838 has helped, but it's an important first step.

In Harvard Dataverse, datasets with "Replication Data for" in the title make up about 22 percent of published datasets, so making the field required would definitely help. If Dataverse pulled publication metadata from other sources, the way Dryad, OpenICPSR, and Zotero do, it would make including this metadata easier for depositors, but only when the publication has already been given a persistent ID (e.g. DOI).

Using PIDs to pull metadata from other sources wouldn't help in cases where the article hasn't been assigned a PID yet. In Harvard Dataverse, the publicationID is the least used of the related publication fields (of all published, non-harvested datasets), I think because depositors don't have the PID when they deposit, and don't go back to include a PID when the article has been assigned one. Solving this problem would also help, maybe more.

Edit: There's also cases where the related publication may never get a PID, although that might be rare.

djbrooke · 2017-08-02T14:03:41Z

@jggautier - can you work with @pameyer on this to determine a potential solution that will meet the original request from Merce and Pete's need? Happy to help on this, but it seems like you've given this some good thought and research.

jggautier · 2017-08-02T15:38:01Z

Can do. We're meeting today.

jggautier · 2017-08-14T15:13:18Z

Spoke with @pameyer, who wrote in an earlier comment about SBGrid's need to create dataset titles composed of other metadata. To simplify things, I think another ticket should be created for that functionality, which might satisfy other use cases.

We spoke about making metadata fields required based on the type of data being deposited:

If the dataset is "replication data," Related Pub metadata would be required.
If the dataset is one of the several data types in SBGrid's Data Type metadata field, certain metadata fields in a custom metadata block would be required based on the data type.

But we have to consider a difference in deposit workflows, and perhaps work on solutions for these in stages:

For replication data, we've been considering a self-curation workflow. The depositor creates and publishes the dataset.
For SBGrid's data, the depositor saves the dataset, but a curator publishes it once the required metadata is available and entered.

For replication data, we want to encourage researchers to deposit data even if they don't have metadata for the related article. The trick is getting that metadata once it's available, which can take months as a title-less and PID-less manuscript and its data move through a journal's review and publication process.

For SBGrid data -- and maybe for repositories and dataverses where a curator, and not the depositor, publishes the data -- a depositor can save a dataset without the required metadata, but the curator can't publish it without that metadata. In other words, the metadata is optional when the depositor is saving the dataset, but required before the curator can publish it.

pdurbin · 2018-06-28T22:23:31Z

I appreciate all the thought @jggautier and @pameyer have put into this issue but it's still not clear to me what the definition of done is for this issue. I think it would be a tricky one to estimate. We made some changes to the code in #3838. That issue was well defined and clear.

pameyer · 2018-06-28T23:26:14Z

It's probably easier to split our specific use case out from the original. For the original use case, the definition of done seems fairly clear to me - but this functionality would require some relatively in-depth changes to the metadata / DatasetFieldType infrastructure (hidden boolean for "is-replication-data-for"; having a DatasetFieldType required or optional conditional on the value of another DatasetFieldType; enforcing "required" at publication time rather than creation/edit time, probably others).

jggautier · 2019-07-16T00:51:35Z

Removing myself from this issue for now. I haven't worked on it since @pameyer's comment last summer. I still think that because depositors often don't have the most important information about the related publication, e.g. article PID or title, requiring Related Publication metadata for datasets whose titles start with "Replication data for" might result in depositors:

Keeping the dataset in draft state because they can't publish it without related publication metadata, which delays publication of an otherwise publishable dataset
Entering filler text in the required metadata fields, and not returning to the dataset to enter related publication metadata when it's available. This problem exists now (query publicationCitation:forthcoming in Harvard Dataverse (query) or UNC Dataverse (query))
Not pressing the "Replication data for" button (if they're told or realize that pressing that button is what makes the Related Publication metadata fields required)

I think what would be helpful is something that gets someone or some system to add the related publication metadata when it becomes available:

Dataverse can send reminders to depositors or help curators identify datasets missing this metadata and contact depositors
a system like a journal publishing software can send the metadata after publication (part of an editorial system integration like the one between OJS and Dataverse) or a discovery system can suggest and/or display published works it thinks are related to the dataset. OpenAIRE has or had something like this, but I can't find examples now.

cmbz · 2024-09-09T17:56:12Z

2024/09/09: Keeping issue open, this is a useful feature.

sbarbosadataverse added the Type: Feature a feature request label Mar 8, 2017

sbarbosadataverse assigned djbrooke Mar 8, 2017

djbrooke assigned pameyer May 4, 2017

pdurbin mentioned this issue May 23, 2017

Add "Related Publications" field to Create Dataset page #3838

Closed

pdurbin added the UX & UI: Design This issue needs input on the design of the UI and from the product owner label Jun 8, 2017

pdurbin added the User Role: Curator Curates and reviews datasets, manages permissions label Jul 5, 2017

djbrooke assigned jggautier and unassigned djbrooke Aug 2, 2017

jggautier mentioned this issue Aug 14, 2017

Dataset titles composed of other metadata #4072

Closed

mheppler mentioned this issue Jan 11, 2018

Dataset Metadata: "Replication Data for:" title prefix vs required field validation #4412

Closed

mheppler changed the title ~~Feature: for anything tagged as "replication data for" make "related publication field" required~~ Create Dataset: Add "replication data for" to title, make "related publication" metadata field required Feb 10, 2018

pdurbin added the Status: Still Interested? label Jun 28, 2018

pdurbin added the Feature: Metadata label Oct 13, 2018

pameyer removed their assignment Mar 11, 2019

jggautier removed their assignment Jul 16, 2019

mreekie removed the Status: Still Interested? label Jan 10, 2023

jggautier mentioned this issue Jan 5, 2024

Create metadata blocks for CAFE's collection of climate and geospatial data IQSS/dataverse.harvard.edu#232

Closed

vkush mentioned this issue Oct 10, 2024

Wish: change button "Replication Data for: xyz" to "Dataset for Publication: xyz" nfdi4cat/repo4cat#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Dataset: Add "replication data for" to title, make "related publication" metadata field required #3679

Create Dataset: Add "replication data for" to title, make "related publication" metadata field required #3679

sbarbosadataverse commented Mar 8, 2017 •

edited by djbrooke

Loading

djbrooke commented May 4, 2017

pameyer commented May 4, 2017

djbrooke commented May 10, 2017

pdurbin commented May 23, 2017

pdurbin commented Jun 25, 2017

jggautier commented Jul 24, 2017 •

edited

Loading

djbrooke commented Aug 2, 2017

jggautier commented Aug 2, 2017

jggautier commented Aug 14, 2017 •

edited

Loading

pdurbin commented Jun 28, 2018

pameyer commented Jun 28, 2018

jggautier commented Jul 16, 2019 •

edited

Loading

cmbz commented Sep 9, 2024

Create Dataset: Add "replication data for" to title, make "related publication" metadata field required #3679

Create Dataset: Add "replication data for" to title, make "related publication" metadata field required #3679

Comments

sbarbosadataverse commented Mar 8, 2017 • edited by djbrooke Loading

djbrooke commented May 4, 2017

pameyer commented May 4, 2017

djbrooke commented May 10, 2017

pdurbin commented May 23, 2017

pdurbin commented Jun 25, 2017

jggautier commented Jul 24, 2017 • edited Loading

djbrooke commented Aug 2, 2017

jggautier commented Aug 2, 2017

jggautier commented Aug 14, 2017 • edited Loading

pdurbin commented Jun 28, 2018

pameyer commented Jun 28, 2018

jggautier commented Jul 16, 2019 • edited Loading

cmbz commented Sep 9, 2024

sbarbosadataverse commented Mar 8, 2017 •

edited by djbrooke

Loading

jggautier commented Jul 24, 2017 •

edited

Loading

jggautier commented Aug 14, 2017 •

edited

Loading

jggautier commented Jul 16, 2019 •

edited

Loading