[Proposal] Metadata for stochastic event catalogues #82

stufraser1 · 2023-05-30T16:28:29Z

Additional metadata needed to adequately describe a file containing stochastic event set? Or is current structure sufficient by using event_set but not nesting events under it but instead linking to a data file?

          > Also, stochastic event catalogues in cat models are too large to be listed in meta-data.

We won't be listing the whole event set; this would be in a data file. The metadata in the event object aims to describe the RP etc of an individual hazard map or scenario event. To describe the event set we'd use the event_set object, but do need to make sure we can be clear whether the events within that are in a data file or have their metadata nested under the event_set

Originally posted by @stufraser1 in #59 (comment)

The text was updated successfully, but these errors were encountered:

johcarter · 2023-06-08T16:07:03Z

The total number of events in the catalogue is a useful number that could be included in meta-data.

johcarter · 2023-06-08T21:17:18Z

Here is an example of the standard ODS event file

Fields
EventId, EventRate, EventDescription
event_dict_p.csv

The event set may be accompanied by an occurrence file to provide event frequency/seasonality/clustering information.
(See #81)

ODS has additional fields to describe any frequency or seasonality distributions associated with the event set. These meta data fields are;

'FrequencyDistribution' - enum {Poisson, Negative Binomial, Occurrence File} (the last one indicates that the frequency info is held in the occurrence file).

'SeasonalityDistribution - enum {Uniform, Occurrence File} (the last one indicates that the frequency is in the occurrence file).

This information can be used to make further assumptions about event frequency in case there are event rates but no occurrence file.

stufraser1 · 2023-06-09T04:55:03Z

Propose adding these to event_set object.

Title	Field name	Description	Type	Codelist
Hazard type	event_set.hazard_type	The main type of hazard phenomena in the modelled scenario(s).	string	Hazard type from RDLS Hazard taxonomy
Analysis type	event_set.analysis_type	The type of analysis used by the hazard model: probabilistic (multiple potential events characterised by different intensity and occurrence frequency); deterministic (frequency-independent representation, such as mean, median or max); empirical (obtained from real event observations).	string	probabilistic, deterministic, empirical
NumberOfEvents	event_set.number_events	The number of events contained in the event set.	number
FrequencyDistribution	event_set.freq_dist	The frequency distribution used to generate the event set.	string	Poisson, Negative Binomial, User defined
SeasonalityDistribution	event_set.seasonality	The distribution used to generate the event set.	string	Uniform, User defined

Is 'User defined' is a helpful part of a codelist?
This informaiton may be in a different file than the list of events e.g. for Oasis model, or could be in the same file as the events, e.g. for OpenQuake model so I am reluctant to use 'Occurrence file' per above example.
The files containing the event set would be named by resources.title and linked by resources.id at the top-level.
Do we need a way to specify where SeasonalityDistribution and FrequencyDistribution information is contained, if not in the same file?

odscjen · 2023-06-12T14:09:35Z

event_set.number_events is something that could easily be calculated from the data by simply counting the number of event.ids nested within event_set. So it's something we'd generally recommend against as an unnecessary extra field.

event_set.freq_dist rename this event_set.frequency_distribution in keeping with the minimal style guide being adopted

johcarter · 2023-06-12T14:42:22Z

number of events is needed for when the list of events is in a resource file because there are too many events to list in the json.

odscjen · 2023-06-12T15:07:03Z

Ah, yes, in that case suggest the following change:

Title	Field name	Description	Type	Codelist
Hazard type	event_set.hazard_type	The main type of hazard phenomena in the modelled scenario(s).	string	Hazard type from RDLS Hazard taxonomy
Analysis type	event_set.analysis_type	The type of analysis used by the hazard model: probabilistic (multiple potential events characterised by different intensity and occurrence frequency); deterministic (frequency-independent representation, such as mean, median or max); empirical (obtained from real event observations).	string	probabilistic, deterministic, empirical
NumberOfEvents	event_set.number_events	The number of events contained in the event set. You should only use this field when details of individual events are not included in the RDLS metadata.	number
FrequencyDistribution	event_set.frequency_distribution	The frequency distribution used to generate the event set.	string	Poisson, Negative Binomial, User defined
SeasonalityDistribution	event_set.seasonality	The distribution used to generate the event set.	string	Uniform, User defined

Are any of the other fields currently sitting under event needed here? #91 already proposed adding triggers to event_set in the case of events being in a file rather than listed individually. Is there a case for calculation_method, geographic_coverage or description also being in event_set with the same caveat that they only need be used if the information for individual events isn't included in the RDLS metadata.

johcarter · 2023-06-13T09:47:53Z

Thanks @odscjen

I think yes to description of the event set. e,g, ' 1000 year stochastic event set', 'Historical events between 1951 and 2000'

I think yes to calculation_method, although this seems very similar to analysis type. The difference between these fields is not obvious to me and therefore I don't know whether having both is necessary.

For a stochastic event set, both analysis_type=Probabilistic and calculation_method=Simulated would be appropriate. For a historical event set, analysis_type=Empirical and calculation_method=Observed would also be appropriate. For synthetic events that are inferred from historical events calculation_method=Inferred might be appropriate.

For geographical_coverage, I dont think this is needed at the event set level if there is sufficient information at the hazard data package root level.

Can I suggest refinements to the fields & descriptions for FrequencyDistribution and SeasonalityDistribution as follows please:

event_set.frequency_distribution
'The frequency distribution assumed for the occurrence of events over a multi-year timeline.'

event_set.seasonality_distribution
'The seasonality distribution assumed for the occurrence of events across a calendar year.'

stufraser1 · 2023-06-15T10:24:16Z

Proposal accouting for

above refinements to the fields & descriptions for FrequencyDistribution and SeasonalityDistribution
adding calculation_method (agree its useful to have)

Title	Field name	Description	Type	Codelist
Hazard type	event_set.hazard_type	The main type of hazard phenomena in the modelled scenario(s).	string	Hazard type from RDLS Hazard taxonomy
Analysis type	event_set.analysis_type	The type of analysis used by the hazard model: probabilistic (multiple potential events characterised by different intensity and occurrence frequency); deterministic (frequency-independent representation, such as mean, median or max); empirical (obtained from real event observations).	string	probabilistic, deterministic, empirical
Calculation Method	event_set.calculation_method	The methodology used for the calculation of the event in the modelled scenario(s).	string	simulated, observed, inferred
NumberOfEvents	event_set.number_events	The number of events contained in the event set. You should only use this field when details of individual events are not included in the RDLS metadata.	number
FrequencyDistribution	event_set.frequency_distribution	The frequency distribution assumed for the occurrence of events over a multi-year timeline.	string	Poisson, Negative Binomial, User defined
SeasonalityDistribution	event_set.seasonality	The seasonality distribution assumed for the occurrence of events across a calendar year.	string	Uniform, User defined

matamadio changed the title ~~Metadata for stochastic event catalogues~~ [Proposal] Metadata for stochastic event catalogues Jun 6, 2023

matamadio added proposal New feature or request metadata Issues related to common, core metadata labels Jun 12, 2023

stufraser1 mentioned this issue Jun 15, 2023

[Proposal] Hazard schema - Hazard occurrence probability / frequency #59

Closed

odscjen added hazard Issues related to Hazard data and removed metadata Issues related to common, core metadata labels Jul 3, 2023

odscjen mentioned this issue Jul 3, 2023

[Schema] group remaining issues into single PRs based on component #118

Closed

odscrachel self-assigned this Jul 4, 2023

odscrachel mentioned this issue Jul 5, 2023

Update hazard #121

Merged

2 tasks

duncandewhurst closed this as completed in #121 Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Metadata for stochastic event catalogues #82

[Proposal] Metadata for stochastic event catalogues #82

stufraser1 commented May 30, 2023

johcarter commented Jun 8, 2023 •

edited

Loading

johcarter commented Jun 8, 2023

stufraser1 commented Jun 9, 2023

odscjen commented Jun 12, 2023 •

edited

Loading

johcarter commented Jun 12, 2023 •

edited

Loading

odscjen commented Jun 12, 2023

johcarter commented Jun 13, 2023

stufraser1 commented Jun 15, 2023

[Proposal] Metadata for stochastic event catalogues #82

[Proposal] Metadata for stochastic event catalogues #82

Comments

stufraser1 commented May 30, 2023

johcarter commented Jun 8, 2023 • edited Loading

johcarter commented Jun 8, 2023

stufraser1 commented Jun 9, 2023

odscjen commented Jun 12, 2023 • edited Loading

johcarter commented Jun 12, 2023 • edited Loading

odscjen commented Jun 12, 2023

johcarter commented Jun 13, 2023

stufraser1 commented Jun 15, 2023

johcarter commented Jun 8, 2023 •

edited

Loading

odscjen commented Jun 12, 2023 •

edited

Loading

johcarter commented Jun 12, 2023 •

edited

Loading