Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Metadata for stochastic event catalogues #82

Closed
stufraser1 opened this issue May 30, 2023 · 8 comments · Fixed by #121
Closed

[Proposal] Metadata for stochastic event catalogues #82

stufraser1 opened this issue May 30, 2023 · 8 comments · Fixed by #121
Assignees
Labels
hazard Issues related to Hazard data proposal New feature or request

Comments

@stufraser1
Copy link
Member

Additional metadata needed to adequately describe a file containing stochastic event set? Or is current structure sufficient by using event_set but not nesting events under it but instead linking to a data file?

          > Also, stochastic event catalogues in cat models are too large to be listed in meta-data.

We won't be listing the whole event set; this would be in a data file. The metadata in the event object aims to describe the RP etc of an individual hazard map or scenario event. To describe the event set we'd use the event_set object, but do need to make sure we can be clear whether the events within that are in a data file or have their metadata nested under the event_set

Originally posted by @stufraser1 in #59 (comment)

@matamadio matamadio changed the title Metadata for stochastic event catalogues [Proposal] Metadata for stochastic event catalogues Jun 6, 2023
@johcarter
Copy link
Collaborator

johcarter commented Jun 8, 2023

The total number of events in the catalogue is a useful number that could be included in meta-data.

@johcarter
Copy link
Collaborator

Here is an example of the standard ODS event file

Fields
EventId, EventRate, EventDescription
event_dict_p.csv

The event set may be accompanied by an occurrence file to provide event frequency/seasonality/clustering information.
(See #81)

ODS has additional fields to describe any frequency or seasonality distributions associated with the event set. These meta data fields are;

'FrequencyDistribution' - enum {Poisson, Negative Binomial, Occurrence File} (the last one indicates that the frequency info is held in the occurrence file).

'SeasonalityDistribution - enum {Uniform, Occurrence File} (the last one indicates that the frequency is in the occurrence file).

This information can be used to make further assumptions about event frequency in case there are event rates but no occurrence file.

@stufraser1
Copy link
Member Author

Propose adding these to event_set object.

Title Field name Description Type Codelist
Hazard type event_set.hazard_type The main type of hazard phenomena in the modelled scenario(s). string Hazard type from RDLS Hazard taxonomy
Analysis type event_set.analysis_type The type of analysis used by the hazard model: probabilistic (multiple potential events characterised by different intensity and occurrence frequency); deterministic (frequency-independent representation, such as mean, median or max); empirical (obtained from real event observations). string probabilistic, deterministic, empirical
NumberOfEvents event_set.number_events The number of events contained in the event set. number
FrequencyDistribution event_set.freq_dist The frequency distribution used to generate the event set. string Poisson, Negative Binomial, User defined
SeasonalityDistribution event_set.seasonality The distribution used to generate the event set. string Uniform, User defined

Is 'User defined' is a helpful part of a codelist?
This informaiton may be in a different file than the list of events e.g. for Oasis model, or could be in the same file as the events, e.g. for OpenQuake model so I am reluctant to use 'Occurrence file' per above example.
The files containing the event set would be named by resources.title and linked by resources.id at the top-level.
Do we need a way to specify where SeasonalityDistribution and FrequencyDistribution information is contained, if not in the same file?

@matamadio matamadio added proposal New feature or request metadata Issues related to common, core metadata labels Jun 12, 2023
@odscjen
Copy link
Contributor

odscjen commented Jun 12, 2023

event_set.number_events is something that could easily be calculated from the data by simply counting the number of event.ids nested within event_set. So it's something we'd generally recommend against as an unnecessary extra field.

event_set.freq_dist rename this event_set.frequency_distribution in keeping with the minimal style guide being adopted

@johcarter
Copy link
Collaborator

johcarter commented Jun 12, 2023

number of events is needed for when the list of events is in a resource file because there are too many events to list in the json.

@odscjen
Copy link
Contributor

odscjen commented Jun 12, 2023

Ah, yes, in that case suggest the following change:

Title Field name Description Type Codelist
Hazard type event_set.hazard_type The main type of hazard phenomena in the modelled scenario(s). string Hazard type from RDLS Hazard taxonomy
Analysis type event_set.analysis_type The type of analysis used by the hazard model: probabilistic (multiple potential events characterised by different intensity and occurrence frequency); deterministic (frequency-independent representation, such as mean, median or max); empirical (obtained from real event observations). string probabilistic, deterministic, empirical
NumberOfEvents event_set.number_events The number of events contained in the event set. You should only use this field when details of individual events are not included in the RDLS metadata. number
FrequencyDistribution event_set.frequency_distribution The frequency distribution used to generate the event set. string Poisson, Negative Binomial, User defined
SeasonalityDistribution event_set.seasonality The distribution used to generate the event set. string Uniform, User defined

Are any of the other fields currently sitting under event needed here? #91 already proposed adding triggers to event_set in the case of events being in a file rather than listed individually. Is there a case for calculation_method, geographic_coverage or description also being in event_set with the same caveat that they only need be used if the information for individual events isn't included in the RDLS metadata.

@johcarter
Copy link
Collaborator

Thanks @odscjen

I think yes to description of the event set. e,g, ' 1000 year stochastic event set', 'Historical events between 1951 and 2000'

I think yes to calculation_method, although this seems very similar to analysis type. The difference between these fields is not obvious to me and therefore I don't know whether having both is necessary.

For a stochastic event set, both analysis_type=Probabilistic and calculation_method=Simulated would be appropriate. For a historical event set, analysis_type=Empirical and calculation_method=Observed would also be appropriate. For synthetic events that are inferred from historical events calculation_method=Inferred might be appropriate.

For geographical_coverage, I dont think this is needed at the event set level if there is sufficient information at the hazard data package root level.

Can I suggest refinements to the fields & descriptions for FrequencyDistribution and SeasonalityDistribution as follows please:

event_set.frequency_distribution
'The frequency distribution assumed for the occurrence of events over a multi-year timeline.'

event_set.seasonality_distribution
'The seasonality distribution assumed for the occurrence of events across a calendar year.'

@stufraser1
Copy link
Member Author

Proposal accouting for

  1. above refinements to the fields & descriptions for FrequencyDistribution and SeasonalityDistribution
  2. adding calculation_method (agree its useful to have)
Title Field name Description Type Codelist
Hazard type event_set.hazard_type The main type of hazard phenomena in the modelled scenario(s). string Hazard type from RDLS Hazard taxonomy
Analysis type event_set.analysis_type The type of analysis used by the hazard model: probabilistic (multiple potential events characterised by different intensity and occurrence frequency); deterministic (frequency-independent representation, such as mean, median or max); empirical (obtained from real event observations). string probabilistic, deterministic, empirical
Calculation Method event_set.calculation_method The methodology used for the calculation of the event in the modelled scenario(s). string simulated, observed, inferred
NumberOfEvents event_set.number_events The number of events contained in the event set. You should only use this field when details of individual events are not included in the RDLS metadata. number
FrequencyDistribution event_set.frequency_distribution The frequency distribution assumed for the occurrence of events over a multi-year timeline. string Poisson, Negative Binomial, User defined
SeasonalityDistribution event_set.seasonality The seasonality distribution assumed for the occurrence of events across a calendar year. string Uniform, User defined

@odscjen odscjen added hazard Issues related to Hazard data and removed metadata Issues related to common, core metadata labels Jul 3, 2023
@odscrachel odscrachel self-assigned this Jul 4, 2023
@odscrachel odscrachel mentioned this issue Jul 5, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hazard Issues related to Hazard data proposal New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants