Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs update] Write content for Guides/RDL metadata page #149

Closed
odscjen opened this issue Jul 27, 2023 · 12 comments · Fixed by #239
Closed

[Docs update] Write content for Guides/RDL metadata page #149

odscjen opened this issue Jul 27, 2023 · 12 comments · Fixed by #239
Assignees
Labels
Docs This issue relates to documentation

Comments

@odscjen
Copy link
Contributor

odscjen commented Jul 27, 2023

Currently no content for the RDL metadata page in the Guidance section. This shall be created as part of work on the development of the validation and spreadsheet tools.

@odscjen odscjen added the Docs This issue relates to documentation label Jul 27, 2023
@odscjen
Copy link
Contributor Author

odscjen commented Aug 17, 2023

Based on some of the examples that have been created recently (see GFDRR/rdls-spreadsheet-template#3 (comment)) do we want to have a specific section of guidance specifying any differences in the use of RDLS depending on if the user is publishing data to an open data catalogue or to internal only or access-restricted catalogues?

@duncandewhurst
Copy link
Contributor

Sounds good. Do you want to propose some content?

We can perhaps include some guidance to explain that certain fields like resource URLs might only be populated once data has actually been added to a catalogue.

@odscjen
Copy link
Contributor Author

odscjen commented Aug 18, 2023

Great, I'll make a start on that today (Friday) or Monday

@odscrachel
Copy link
Contributor

Some ideas of what a skeleton might look like, although some of the topics under generating a json file are generic and could warrant a more general advice heading:

How to publish RDLS metadata

Adoption of the metadata schema

Metadata enables datasets to be found by human and machine searches, and so users can easily identify the dataset contents. It is strongly encouraged that any risk dataset being uploaded online has metadata prepared and uploaded with it.

The Risk Data Library Standard defines metadata in JSON format, but it can be translated into table (csv/excel). WIP

  • Option 1. Write directly into JSON file (templates are available at …)

  • Option 2. Use JSON metadata creation tool. This tool is standalone (not part of DDH) AND UNDER DEVELOPMENT. It uses an xsl file containing metadata in a specified structure, and exports a JSON file to be saved with the dataset.

How to assign a dataset identifier

See #184

Creating a JSON file

Link to package schema - see #203
Clarify the purpose of links - see #187
Ensuring non metadata is included within the datasets
Resource URLs see comment

Validate your metadata

See #203

Using the RDLS spreadsheet input template

Possibly add the Read me contents

Publishing to an open data catalogue

World Bank data catalogue

File sharing
Tips and specific advice for DDH

Sharing your data

This would be helpful to include some pointers on sharing/promoting data to encourage use.

Publishing to an internal or access-restricted catalogue

@duncandewhurst
Copy link
Contributor

Thanks, @odscrachel.

For ease of editing and review, I've copied the skeleton into a Google Doc and restructured it into an overview of the process for publishing RDLS metadata (prepare, check, publish) and how-to guides for specific topics.

@odscjen I've assigned you a couple of comments for sections relevant to your suggestion in #149 (comment)

@odscjen
Copy link
Contributor Author

odscjen commented Aug 25, 2023

Noting here, a comment from #207 (review) to include in the guidance that when creating JSON the various coordinate fields a comma should be used to separate values not a semi-colon as in the spreadsheet template

@odscjen
Copy link
Contributor Author

odscjen commented Aug 28, 2023

Linking in #56, as part of the metadata or not review a lot of fields were removed from vulnerability and it was mentioned that it should be mentioned someone in the guidance that users should still include these values in their data even if they're not given in the RDLS metadata. Looking at how the guidance is currently structured it's unclear where this would fit in. For now I've added a section titled 'Non-RDLS metadata' under 'Prepare your metadata'.

@duncandewhurst
Copy link
Contributor

duncandewhurst commented Aug 28, 2023

Linking in #56, as part of the metadata or not review a lot of fields were removed from vulnerability and it was mentioned that it should be mentioned someone in the guidance that users should still include these values in their data even if they're not given in the RDLS metadata. Looking at how the guidance is currently structured it's unclear where this would fit in. For now I've added a section titled 'Non-RDLS metadata' under 'Prepare your metadata'.

I think this is probably best addressed by adding a sentence at the end of the second paragraph of https://rdl-standard.readthedocs.io/en/dev/rdl/what/ along the lines of:

RDLS does not specify which fields to include within risk datasets. You ought to make sure that your risk datasets include the fields needed to fulfil their intended uses.

Edit: If there's a need to list the specific fields from #56, I think the right place would be a new "what to include in risk datasets" page under how to publish risk datasets.

@duncandewhurst
Copy link
Contributor

@odscjen let me know when your updates are ready for review.

@odscjen
Copy link
Contributor Author

odscjen commented Aug 29, 2023

@duncandewhurst please go ahead and review the google doc :)

@stufraser1
Copy link
Member

Possible workflow diagram for this guides page, showing users how the templates and validation tool work together.
https://docs.google.com/presentation/d/1pKpDUlZ1QlhLx6PgiZDWCzda7O5N3zBI/edit#slide=id.g27aa981260d_0_0

@duncandewhurst
Copy link
Contributor

I mentioned it briefly in #147 (review), but to reiterate and expand on the reasoning - I strongly suggest that we do not encourage implementers to author JSON data by hand, even using a template.

Even for people who are very familiar with JSON, authoring data by hand is very time-consuming and error-prone. In a standards context, this means that implementers and the people supporting them waste lots of time trying to fix basic JSON errors (missing brackets, commas, incorrect nesting etc.) that have nothing to do with the standard itself.

JSON is an appropriate format for exporting data from an existing system or generating data programmatically, but it is not well-suited to authoring data by hand, especially for large and complex data such as RDLS metadata. For implementers who are authoring data by hand, the spreadsheet template is the best approach currently available so we should encourage that.

I've explained this in the guidance on how to prepare RDLS metadata.

We can certainly add a diagram showing the relationship between the spreadsheet template, JSON data and the RDLS Convertor, but I don't think we should promote an RDLS JSON template as an 'option'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs This issue relates to documentation
Projects
4 participants