-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New version for testing #3
Comments
Some more comments:
|
Example for HAZARD: Fathom country dataset (Thailand)Note:
|
Example for EXPOSURE: GHS builtup dataset (Thailand)Note:
|
Thanks for the feedback! I'll respond to the points that are specific to the template here. @odscjen please can you:
The enum tab is there but hidden to prevent accidental edits. You should be able to unhide it if you need to. Let me know if you run into problems.
Ah, thanks for flagging. I will fix it in the next iteration. I've created an issue: #6 |
'describedby' is used to link to the schema that describes that RDLS metadata so it probably isn't the right code here. What is the relationship between the related publication and the dataset? Does it fit the semantics of |
It is a publication describing the methodology to produce the dataset. |
@matamadio I'm in the process of checking and validating your examples, could you give me comment access to the spreadsheets (at the moment I've only got view access) as I think it might be easiest if I can leave the feedback comments on the relevant columns in the spreadsheets rather than attempt to write them out in this issue? |
great, thanks :) |
@duncandewhurst an odd thing in the flatten-tool converted version of the Hazard example
"referenced_by": [
{
"id": "1",
"name": "A high-resolution global flood hazard model",
"authorNames": [
[
"Christopher C. Sampson",
" Andrew M. Smith",
" Paul D. Bates",
" Jeffrey C. Neal",
" Lorenzo Alfieri",
" Jim E. Freer"
]
],
"datePublished": "2015-08-18",
"url": "https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2015WR016954",
"doi": "10.1002/2015WR016954"
}
], and "bbox": [
[
96.72037,
5.407344,
105.72785,
20.718323
]
], but it looks right in the spreadsheet? |
@matamadio re comment above
It looks like you did get the branching right :) |
Yep, that's due to the Flatten Tool bug I mentioned in #4. I've opened an issue on Flatten Tool to get it fixed. Looks like the data has been entered correctly in the spreadsheet, albeit with extra whitespace before each author's name. |
ah, cool, I ignored all the other errors reported due to that mentioned bug but hadn't picked up that this was part of the same thing. |
I was mistaken, it's not a bug. The correct syntax is a semicolon-separated list, rather than a comma-separated list. @matamadio I've just pushed a new version of the template (XLSX, GSheets) with the following additions and improvements:
Please can you test it. I think this is all we can add in terms of features, but we can fix any bugs and improve the documentation. |
EDIT: see below comment for the fix to this confusion. entering as a semi-colon seperated list looks like this in the flatten-tool output (from Mat's Exposure example last week): "referenced_by": [
{
"id": "1",
"name": " GHS-BUILT-S R2023A - GHS built-up surface grid, derived from Sentinel2 composite and Landsat, multitemporal (1975-2030)",
"authorNames": [
[
"Pesaresi",
" Martino"
],
[
" Politis",
" Panagiotis"
]
],
"datePublished": "2023",
"url": "http://data.europa.eu/89h/9f06f36f-4b11-47ec-abb0-4f8b7b1d72ea",
"doi": "10.2905/9F06F36F-4B11-47EC-ABB0-4F8B7B1D72EA"
}
] so it's still creating array's within array's which can't be right? |
Realised what's wrong above, it's the combination of commas and semi colons. It's best then to give the author names without a comma, so "Jen Harris" or "J Harris" or "J. Harris" rather than "Harris, Jen" |
We'll need to be extremely clear on the instructions, or possibly have this auto-validated and corrected (i.e. commas and semicolons are auto-removed from the field?). |
I've asked the devs what the correct syntax is in OpenDataServices/flatten-tool#427. If it isn't possible to include commas or semi-colons within the values, we can update the input guidance for array fields and add a data validation warning if the cell's value includes commas or semi-colons. |
Latest split templates and examples for hazard and exposure (real data). https://drive.google.com/drive/folders/1V33k5YmYjcvjFnYpx7chOx-PeSivYRWm?usp=sharing |
@matamadio, a couple of questions on the Global Human Settlement layer (Thailand) example:
|
The reason is that the global dataset is made of a large number of tiles; here, those relative to the country were downloaded and merged together. In other cases there could be more processing compared to the source (e.g. change of resolution, value classification, clipping, others). Thus the download will point to the derived dataset, and not the original source.
At the moment, the "cost" of exposure is limited to monetary currencies. I see two options:
|
Ah, I see. So the actual processed/merged dataset is not linked in the example? If so, the value of
I've created an issue on the standard repo, let's follow up there: GFDRR/rdl-standard#194 |
Unfortunately leaving this blank will cause the data to be reported as invalid,
|
@duncandewhurst the autopopulation of the links sheet might not be right, when I converted FTH-snip example it gave "links": [
{
"href": "https://raw.githubusercontent.com/GFDRR/rdl-standard/0__2__0/schema/rdls_schema.json",
"rel": "describedby"
}
] but |
I had the exposure example file uploaded on sharepoint, though the link is accessible only for WB at the moment. |
Ah, ok. That will be fine for the actual data being uploaded to the data catalogue, but I think that for the case of examples it'd be best not to use non-open data. As in this case it's obviously useful to create the RDLS record for this data as it will exist, I'd recommend using a dummy url for the example. That way when you do upload it to the data catalogue you've already got the RDLS file and you'll just need to update the url to the real Sharepoint url EDIT: I've just seen you've created GFDRR/rdl-standard#195 so we can wait for the resolution of this and hopefully there will be real url's to use soon :) |
@matamadio regarding the rdls_hzd-FTH-snip example. Converting this to JSON and validating against the schema returns one major error:
The way that (I'll create an additional comment in one of the documentation issues (GFDRR/rdl-standard#149) around how much detail we may want to include based on how the dataset being described will be available, e.g. in an open catalogue or behind some sort of access restriction.) The appropriate example to use if you want to include the FATHOM global flood map is the one you have in rdls_hzd-FTH-THA where it has been used to create this specific datasource. The rdls_hzd-FTH-THA however suffers from the same issues as @duncandewhurst describes for the Global Human Settlement layer (Thailand) example, namely that the Some additional errors in rdls_hzd-FTH-THA: |
Could we host the demo data here on the GH? |
The idea for the "snippet" file was to provide a quick json example of just key metadata in reference to the example figure, without actual data download. For schema reference / Example tabs only, replacing the current ones.
I thought the only difference in metadata would be the subset coverage: Thailand instead of global. But yes, the resource should point to the download in our folder instead of source, but this is WIP. |
Yes, in general this would be true and given your other clarification about there being no change from the source I think actually the only field that doesn't quite make sense is "The Thailand country level dataset taken from the FATHOM flood-hazard model (previously known as SSBN). The FATHOM flood-hazard model is a global gridded dataset of flood hazard produced at the global scale. ..." This just makes it a bit clearer that this isn't the entire FATHOM map.
Ah, okay so in that case you're right to put it in as the dataset fields rather than the source, but is this a dataset that can actually be made available? It looks like FATHOM is only available for a fee? |
I can see the logic in this but my worry is that it makes it seem as though you can use RDLS without providing the data, which is not something you'd want to imply. |
Agree on the need to specify a subset in the details.
Yes it is a commercial product, I chose it because it is the most frequently used for flood analysis across the bank, so it's relevant to show.
Does it mean that examples will also need to have the actual resource download? Could it be just a dummy (empty resource links)? |
A dummy link would be fine but I'd recommend making sure it looks like a dummy link, e.g. use http://example.com/YOUR_EXAMPLE |
The readme is ok, we don't want to replicate the information found on docs;
Please have fields description in multiple rows instead of one cell for easier reading, like in rdls_hzd_AQD_docsample. |
It isn't possible to include commas or semicolons within array values so I've updated the input guidance for array fields. |
I've made those updates. At the same time, I've moved the readme content from the spreadsheet template to the README.md file in this repository and linked to it from the readme sheet in the template. I did that for three reasons:
|
@odscjen @matamadio I'm going to close this issue as I think that everything related to the template in this issue is now done. If there is anything outstanding relating to the example data or schema, please open an issue on the main rdl-standard repo. If there is anything outstanding relating to the spreadsheet template, please feel free to reopen this issue :-) |
I've pushed an updated version of the template to this repository and I've updated the Google Docs copy too.
@matamadio please use this version for testing. The issues you flagged in the old version should be resolved now. Let me know if you spot any other problems.
In addition to fixing those issues, I also updated the formatting and implemented data validation for id fields so you need only enter each identifier once, in it's 'parent' worksheet (e.g.
datasets
). When referring to an identifier from another worksheet (e.g.resources
), you can select the identifier from a drop-down list.The updated version of the template is based on the schema from GFDRR/rdl-standard#181, which includes a few fixes for issues that I noticed whilst working on the template.
The text was updated successfully, but these errors were encountered: