-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Dataset: Add "replication data for" to title, make "related publication" metadata field required #3679
Comments
@pameyer - let's discuss this in terms of the discussion from the quarterly meeting. |
Came up in the quarterly meeting in a discussion of customizing the 'add "replication data for"' to title button; in relation to both customizing the text of that button and referring to an identifier in a metadata field.
|
Thanks @pameyer! We'll discuss soon. |
Related: #3838. |
@sbarbosadataverse does the fact that we addressed #3838 help? |
I've been recording the number of published datasets uploaded to Harvard Dataverse that include any, each and all of the related publication metadata. It's been a month since #3838 (where related publication fields were brought up to the dataset create page in the 4.6.2 release), and the number of deposits with that metadata has increased very little - not enough, I think, to say that #3838 has helped, but it's an important first step. In Harvard Dataverse, datasets with "Replication Data for" in the title make up about 22 percent of published datasets, so making the field required would definitely help. If Dataverse pulled publication metadata from other sources, the way Dryad, OpenICPSR, and Zotero do, it would make including this metadata easier for depositors, but only when the publication has already been given a persistent ID (e.g. DOI). Using PIDs to pull metadata from other sources wouldn't help in cases where the article hasn't been assigned a PID yet. In Harvard Dataverse, the publicationID is the least used of the related publication fields (of all published, non-harvested datasets), I think because depositors don't have the PID when they deposit, and don't go back to include a PID when the article has been assigned one. Solving this problem would also help, maybe more. Edit: There's also cases where the related publication may never get a PID, although that might be rare. |
@jggautier - can you work with @pameyer on this to determine a potential solution that will meet the original request from Merce and Pete's need? Happy to help on this, but it seems like you've given this some good thought and research. |
Can do. We're meeting today. |
Spoke with @pameyer, who wrote in an earlier comment about SBGrid's need to create dataset titles composed of other metadata. To simplify things, I think another ticket should be created for that functionality, which might satisfy other use cases. We spoke about making metadata fields required based on the type of data being deposited:
But we have to consider a difference in deposit workflows, and perhaps work on solutions for these in stages:
For replication data, we want to encourage researchers to deposit data even if they don't have metadata for the related article. The trick is getting that metadata once it's available, which can take months as a title-less and PID-less manuscript and its data move through a journal's review and publication process. For SBGrid data -- and maybe for repositories and dataverses where a curator, and not the depositor, publishes the data -- a depositor can save a dataset without the required metadata, but the curator can't publish it without that metadata. In other words, the metadata is optional when the depositor is saving the dataset, but required before the curator can publish it. |
I appreciate all the thought @jggautier and @pameyer have put into this issue but it's still not clear to me what the definition of done is for this issue. I think it would be a tricky one to estimate. We made some changes to the code in #3838. That issue was well defined and clear. |
It's probably easier to split our specific use case out from the original. For the original use case, the definition of done seems fairly clear to me - but this functionality would require some relatively in-depth changes to the metadata / DatasetFieldType infrastructure (hidden boolean for "is-replication-data-for"; having a DatasetFieldType required or optional conditional on the value of another DatasetFieldType; enforcing "required" at publication time rather than creation/edit time, probably others). |
Removing myself from this issue for now. I haven't worked on it since @pameyer's comment last summer. I still think that because depositors often don't have the most important information about the related publication, e.g. article PID or title, requiring Related Publication metadata for datasets whose titles start with "Replication data for" might result in depositors:
I think what would be helpful is something that gets someone or some system to add the related publication metadata when it becomes available:
|
2024/09/09: Keeping issue open, this is a useful feature. |
Merce requested.
The whole purpose of replication data it to be able to find the article related to the data and this is not being filled out as needed because it's not required at this time. This feature would make "related publication field" required and optimally should include a place for a link to the article.
The text was updated successfully, but these errors were encountered: