-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crazy idea: make metadata blocks more flexible #6700
Comments
@poikilotherm - having a place to discuss metadata issues sounds useful - the problems and ideas are spread out across many issues and docs. Another google group like that for i18n might be useful. Also, FWIW, Gustavo and I are trying to get a doc together summarizing issues and potential design directions around metadata that we hope to get out for community input in the near future. W.r.t. another repository, while I think separating the set of schemas you can use from Dataverse itself makes sense in the longer term, I'm not sure that a separate tsv repo is as useful. (To be clear, this is a personal opinion rather than a GDCC position. I expect that if there's consensus this is useful, it could be a GDCC repo and I'd help to make it work.) Right now, the tsv files mix the description of what is included with how that should be displayed in Dataverse. And there are different changes you can make to metadata blocks that have different impacts. I think changes to tsv files that affect how things are displayed work with the current API and they don't have too much impact on preservation. Other changes - switching required fields to optional, changing the mapping to community terms are more problematic w.r.t. harvesting, import/export, preservation, etc. And some changes, such as removing a term, just don't work (unless you edit in the database and account for any prior use of that term). While none of these get worse if tsv files are in another repository, I think it may get harder to coordinate if there are changes like those discussed in #6561, or larger changes to separate out the schema from display issues. If there is a separate repository, I think it would be very important, given the way things work today, to continue to review changes for impacts like those above. |
Thank you @mercecrosas and @qqmyers for your input. Wonderful to hear that you are already are working with Gustavo on a doc. Is there some kind of ETA for such a huge UI change foreseeable? From my (not so long) experience with folks around here this sounds like end of 2020 minimum. Which is perfectly fine - you are all doing a great job, and this shouldn't be rushed. So my idea was about doing sth. that can be done easily in a very short time, does not involve many code changes, but helps people getting things done now until the big, great and fancy solution is ready. The very same idea was behind #6142. I know that TSV files aren't great and I very much dislike them, too. But it's where we are now. Maybe splitting them out can be a good starting point for moving on to a new format, too, as it's a central place to maintain things. (:wave: @mercecrosas) Jim, you are absolutely right about the multiverse of things you can do to TSVs. But actually that was the whole point why this should be moved into a separate repo. That way we have a non-cluttered, easy to follow change log for this more or less static data. Everyone can pretty much rely on it, but needs to make sure their changes are backported to any upstream change (the normal fork-problem workflow you need to work on). So for everyone that is happy with the current schemas, there is no need to fork, just use things. But for everyone else, it gets much easier to maintain a fork when you don't have to mess with the large main app repo. If you guys feel uncomfortable with removing from the main repo - updates to the metadata repo from the main repo can be automated. That way you would even need less changes, but still create a place to run to. In my personal experience, I'm less of a mailinglist guy, but read GitHub issues and IRC frequently and love to discuss here. Harvesting might become a problem, indeed. But what should we do instead when things like #6561 appear? Just this morning we discussed that we don't want all the author id schemas for Jülich DATA but stick with ORCID only. That's a change only doable by changing the TSV when you don't want to fork the code to implement filters or similar. On the other hand we want to keep the maintenance effort to maintain the metadata schemas as low as possible. Don't get me wrong - if there is no interest for this within the community, this is just a skip-able crazy idea. We can simply do this just for us and share scripts with everyone interested. If there is no greater value in creating a place to run to for the community I'll simply cope with what's present. 😉 |
To have a master repo with the default metadatablock files and some helpful scripts would be a good start. I am creating my own repo for our AUSSDA metadatablocks right now, to make them usable for our jenkins tests and for the deployment scripts. So it could be a fork or so in the future. |
Sounds like a great idea to me, @poikilotherm. Decompartmentalization (you don't get to use that word everyday but it sounds fitting here) is the way to go, I think. IQSS's wish to maintain a core of metadata is more than understandable: it's good practice documentologically speaking. Without it, say goodbye to interoperability. But with time the number of repositories with specific metadata needs, who come to the fore asking for customization options, only seems to grow… So looking for a workaround like this feels like a great initiative to me. @qqmyers wrote:
That's also a very good point. For instance, while JSON, flexible as ever, immediately incorporates custom metadata blocks in its files, DDI doesn't. I have yet to see if an issue or a topic on the Google Group was already created to mention this. |
Independent of the question of a repository, I just want to make sure that it's clear that there's a difference in the current design between having new metadata blocks and editing existing ones. The former is straight forward and there's no issue with having new ones (e.g. the Darwin Core block discussed in Tromso). Editing an existing block, or different groups using different versions of the same block is where care is needed to avoid problems/ where guidance about avoiding changes that will require db edits, affect interoperability, etc. |
@poikilotherm, seems there is quite a bit of interest here and we will discuss as @mercecrosas and others mentioned. Quick question, you mentioned a UI change in your comment. Can you provide some more details on what you see as the potential change (or changes)? |
I don't think we need to separate metadata schema from the master branch and put in another GitHub repository, or build another GUI to handle this. The most obvious solution is to create a synchronization tool that can read and update any schema in Dataverse directly from Google Spreadsheet by Google API. Spreadsheets are suitable for the collaborative work and can be archived time after time in the master branch. With a bit of efforts this tool can be also integrated in the Dataverse dashboard so admin can fill a form with Google API and link to Spreadsheet and get a metadata schema updated. |
@djbrooke actually I think for this proposal, there is no UI change needed. Everything can stay as is, this is just about reorganizing the metadata blocks files. The UI changes I mentioned in #6700 (comment) were about the changes that @qqmyers and @scolapasta are investigating and related to an ETA for those changes (my anticipation is that this will take much longer than reorganizing as a temporary workaround). |
Seeing a post referencing this issue at https://groups.google.com/g/dataverse-community/c/RJl4IQcPw30/m/pk1RtA58CgAJ reminds me to mention here that a new "flexible metadata" working group is being formed following the 2020 Dataverse Community Meeting. The "GDCC working groups" announcement can be found here: https://groups.google.com/g/dataverse-community/c/EY0dduRj3Ac/m/EDcEQHLoAwAJ Here's a direct link where people can sign up to the flexible metadata group and other groups: https://docs.google.com/document/d/1LTLjLM5sR07SAEqO7u-QgRp-StO327WS2TdbR2KdPxY/edit?usp=sharing |
This discussion was a dead end. Closing. |
Motivation
A lot of people at Tromso, my colleagues and so many other great folks I've been talking to about Dataverse highlighted one of the outstanding features: custom metadata blocks.
But when I read things like #6561 or discussing about new standard like CodeMeta, I miss two things:
citation.tsv
without forking the whole appI'm very aware of #4451 and #6030, but those are grand challenges, large tasks not being solveable in the nearer future. So let's talk about what we can do today and on a low-effort basis. (Like I did with Solr and custom metadata, see #6142)
Crazy idea
git submodule
Now what's the benefit?
master
(or whatever) updateable from upstreamLet's talk about this 😸
Mentions
Mentioning a bunch of people I know being interested in metadata blocks (and some other important people): @4tikhonov @skasberger @djbrooke @scolapasta @pdurbin @TaniaSchlatter @BPeuch @youssefOuahalou @vbernabe @RightInTwo @doigl @qqmyers @mercecrosas @bronger
Please share to anyone you might think is interested. Community power! 💪
The text was updated successfully, but these errors were encountered: