Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing GSIM class – quality indicators #6

Closed
FlavioRizzolo opened this issue Jan 15, 2022 · 24 comments
Closed

Missing GSIM class – quality indicators #6

FlavioRizzolo opened this issue Jan 15, 2022 · 24 comments
Labels

Comments

@FlavioRizzolo
Copy link
Collaborator

FlavioRizzolo commented Jan 15, 2022

Quality is an important information that drives the statistical business process. GSBPM Quality Indicator shows all GSBPM sub-processes produce quality information in one form or another and there is no GSIM information object to capture this quality information. Referential Metadata Set might be used for this, but it will make it difficult to differentiate quality from other types of metadata as well as overload Referential Metadata Set.

@FlavioRizzolo FlavioRizzolo changed the title Missing GSIM information objects – quality indicators Missing GSIM class – quality indicators Feb 16, 2022
@FrancineK
Copy link
Collaborator

A tentative mapping of Referential Metadata
image

  • Referential Metadata Set: An organized collection of referential metadata for a given Referential Metadata Subject.
    Ex: Methodology, Quality
  • Referential Metadata Structure: Defines the structure of an organized collection of referential metadata (Referential Metadata Set).
    Ex: for Methodology, methodology types:
    ERROR_DETECTION, ESTIMATION_COMPILATION, IMPUTATION, VALIDATION, REVISIONS, SAMPLING, ACCURACY, etc.
    Ex: for Quality, six dimensions of quality:
    relevance, accuracy, timeliness, accessibility, interpretability, coherence
  • Referential Metadata Attribute: The role given to a Represented Variable to supply information in the context of a Referential Metadata Structure.
    Ex. Quality Indicator, Status Flag, Methodology Description, Quality Statement, etc.,
  • Referential Metadata Subject: Identifies the subject of an organized collection of referential metadata.
    Ex. DataSet, Represented Variable, Data Point
  • Referential Metadata Subject Item: Identifies the actual subject for which referential metadata is reported.
    Ex. Sampling Frame (DataSet), Population Count (Value in Data Point)
  • Referential Metadata Content Item: The content describing a particular characteristic of a Referential Metadata Subject.

@JALinnerud
Copy link

As far as I remember the referential metadata classes originated from reference metadata classes in SDMX. See https://raw.githubusercontent.com/UKGovLD/publishing-statistical-data/master/specs/src/main/vocab/sdmx-attribute.ttl These also relate to SIMS (Single Integrated Metadata Structure) used to report quality to Eurostat. We are not currently using SIMS for our quality declarations, but we are looking into using it.

@FrancineK
Copy link
Collaborator

Here is a proposed simplified version of Referential Metadata, that can directly be linked to any GSIM object.
image

@JALinnerud
Copy link

Why are the desciptions, names and text only in Bilingual text? Shoukdn't they be multilingual?

@FrancineK
Copy link
Collaborator

Why are the desciptions, names and text only in Bilingual text? Shoukdn't they be multilingual?

Yes, we will need to rework this model. This was done for StatCan internal use.

@InKyungChoi
Copy link
Collaborator

@InKyungChoi
Copy link
Collaborator

InKyungChoi commented Aug 17, 2022

Current GSIM Referential Metadata

image

Mapping of GSIM referential metadata area for the ESS Standard for Quality Reports Structure (ESQRS) and Information Management Set (GSIM Issue from Sweden)

image

Issues:

  1. Referential Metadata Subject is currently constrained by Value Domain. Its explanatory text says "GSIM object type may be Product for which there is a list specified in a Value Domain. The Value Domain specifies the list of actual Products for which reference metadata can be reported or authored using this Referential Metadata Structure." but I think creating a list to be able to refer to a subject of referential metadata is too much
    -> potential solution: i) Add a relationship with several typical subjects such as Questionnaire, Statistical Program, Data Set; ii)
    Add a relationship between  Referential Metadata Subject and Identifiable Artefact 
  2. Referential Metadata Attribute is currently "defined by" Represented Variable. Although cardinality is 0,1, but its definition "role given to a Represented Variable to supply information in the context of a Referential Metadata Structure" seems to imply that it is not optional
    -> potential solution: Change definition (e.g., "characteristic providing qualitative information for a given Referential Metadata Subject") and remove relationship with Represented Variable

As a reference, see how it is done in SDMX:

image

@InKyungChoi
Copy link
Collaborator

Another mapping example for a documentation of statistical register (from Istat's MWW2022 presentation)
image

@InKyungChoi
Copy link
Collaborator

InKyungChoi commented Sep 2, 2022

Updated model

(based on discussion in #22)
image

Some remarks:

  • For now, subjects are kept in the model to see how it works as proposed in the last meeting. Removing RM Subject and RM Subject Item has the benefit of simplifying the model, but it might be worth keeping it to make it clear what subject is..?
  • The relationship between RM Structure and RM Attribute used to be composition - but can it be aggregation so that we can re-use RM Attributes not just in the context of certain RM Structure.
  • I am also not sure about cardinalities, please review them carefully
  • Represented Variable is now removed from the picture, but I wonder if we could also link RM Attribute with Idenfiable Artefect?

Proposed definition / explanatory text

  • Referential Metadata Subject

    • Definition: subject for which an organised collection of referential metadata is reported
    • Explanatory text: Referential Metadata Subject identifies the subject of the metadata that can be reported using this Referential Metadata Structure. These subjects may be any GSIM class on which organised set of metadata is needed, such as Statistical Program, Data Set, Statistical Classification.
  • Referential Metadata Structure

    • Definition: structure of an organised collection of referential metadata
    • Explanatory text: Referential Metadata Structure defines a structured list of Referential Metadata Attributes for a given Referential Metadata Subject (e.g., ESS Standard for Quality Reports Structure)
  • Referential Metadata Attribute

    • Definition: particular characteristic of referential metadata OR characteristic that describes or qualifies Referential Metadata Subject (!! note that this definition is completely different from original definition, feel free to propose a new one!)
    • Explanatory text: A set of Referential Metadata Attributes is structured by Referential Metadata Structure to describe Referential Metadata Subject. Examples of Referential Metadata Attributes can be Represented Variable (e.g., "Accuracy", "Timeliness" when describing quality information) or other GSIM class (e.g., Statistical Classification, Contact, Owner)
  • Referential Metadata Content

    • Definition: actual content of Referential Metadata Attribute
    • Explanatory text: Referential Metadata Content can take different formats (e.g., text, number, value from a predefined codelist, table)
  • Referential Metadata Subject Item

    • Definition: actual subject for which referential metadata is reported
    • Explanatory text: Examples are an actual Product such as Balance of Payments and International Investment Position, Australia, June 2013, or a collection of Data Points such as the Data Points for a single region within a Data Set covering all regions for a country.

@InKyungChoi
Copy link
Collaborator

I found an old GSIM discussion (from 2018) that has very different interpretations...!!! https://statswiki.unece.org/pages/viewpage.action?pageId=129177198

It seems, in short,

  1. Referential Metadata (RM) parts were applied (or originally, primarily aimed to be applied) to footnotes of tables (using table footnote as RM Attribute - which actually got me even more confused about how Represented Variable plays a role here...)
  2. it got too complicated;
  3. Guillaume suggested "A MetadataStructureDefinition structures a MetadataFlowDefinition and contains one or more MetadataTarget composed of TargetObject. So basically it is a heap of metadata atttributes gathered in a set that targets a flow." which is more similar to what we discussed, and pointed out "The problem in this example is that we are mixing the DataStructure and the MetadataStructure areas without being as complete as the SDMX-IM on the MetadataStructure/Set part";
  4. but the way he applied RM to Single Integrated Metadata Structure (SIMS) is quite different...
  5. in the end, [there were no big changes made to this part] in new version of GSIM (https://statswiki.unece.org/display/gsim/GSIM+v1.2+main+changes) (except "Self-referential relationship name changed from parent/child to parent-child")

@JALinnerud - what do you think? Do you think we are deviating too much from what was originally aimed?
@FrancineK - do you think the new way we use can be applied to footnotes?

If we cannot apply RM to footnotes, we cannot do what we could do before - but to be honest, I am not sure if it COULD do before....?

@FrancineK
Copy link
Collaborator

FrancineK commented Oct 26, 2022

Hi @InKyungChoi, I tried to work it out with this page: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3210000101&request_locale=en.
The Referential Metadata Set:

  • Frequency, Dataset ID, Release date (these are variables come from Data Structure) and Classification (Geography) (this is the identifier component), and Footnotes (Referential Metadata ONLY).
    If I was to consider Footnotes only:
  • Referential Metadata Set: Footnotes
  • Referential Metadata Structure: Table Footnote: Footnote1, Footnote2, etc.
  • Referential Metadata Attribute:
    -- Footnote1 statement
    -- What about Frequency, Dataset ID, Release date which are variables? Could this have referenced by the link to Represented Variable?
  • Referential Metadata Subject: Dataset : Stocks of specified dairy products
  • Referential Metadata Subject Item: Instance of Stocks of specified dairy products with ID 32-10-0001-01
  • Referential Metadata Content Item: Footnote1 actual statement: As of January 1988 Newfoundland and Labrador stocks are included.

I hope I am making sense and also answering your question.

@InKyungChoi
Copy link
Collaborator

InKyungChoi commented Oct 27, 2022

Updated version based on meeting notes October 5 #27 (relationship between RM Attribute and IA is added)

image

Would this provide reference mechanism? So, for example, a list of footnotes would be Reference Metadata Structure, each footnote is a Reference Metadata Attribute that refers to a Data Point (which is Identifiable Artefact) or Represented Variable (which is also IA)

@FrancineK
Copy link
Collaborator

FrancineK commented Nov 16, 2022

Hopefully, this will finally answer the question!!

image
image

@InKyungChoi
Copy link
Collaborator

InKyungChoi commented Dec 6, 2022

About Referential Metadata (RM) Subject Item in the example of StatCan "Nursing and residential care facilities" table.

When a subject is a single Represented Variable, we can say RM Subject is a "Total Residents" (with RM Attribute being a footnote, RM Content Item being a particular footnote no. 5, and RM Structure being a simple footnote), can we also still say RM Subject Item is "Total Residents" BUT used in the context of a particular data set? Hence RM Subject Item is indeed Instance Variable?

Because, "Total Residents" would already exist as a represented variable, for example, in a variable catalog. But when we use this variable in this particular table "Nursing and residential care facilities", we attach this particular footnote no. 5 this the variable.

@flo7894
Copy link

flo7894 commented Dec 7, 2022

With the lastest modelisation there seems to be no link between a ReferentialMetadataSet and its content which consist of ReferentialMetadataContentItem. Also the composition relation between ReferentialMetadataContentItem et ReferentialMetadataAttribute seems odd, shouldn't a ReferentialMetadataContentItem to be viewed as an instance of a ReferentialMetadataAtrribute in the context of a particular IdentifiableArtefact.

@FrancineK
Copy link
Collaborator

With the lastest modelisation there seems to be no link between a ReferentialMetadataSet and its content which consist of ReferentialMetadataContentItem. Also the composition relation between ReferentialMetadataContentItem et ReferentialMetadataAttribute seems odd, shouldn't a ReferentialMetadataContentItem to be viewed as an instance of a ReferentialMetadataAtrribute in the context of a particular IdentifiableArtefact.

You are right @flo7894, on both points. The first one was not obvious to me at first, but I think that the link between Attribute with ContentItem as an intance is what was missing from the original model.

@InKyungChoi
Copy link
Collaborator

InKyungChoi commented Jan 31, 2023

image

Object Definition Explanatory Text
Referential Metadata Structure structure of a Referential Metadata Set A Referential Metadata Structure defines a structured list of Referential Metadata Attributes for a given Referential Metadata Subject. Examples of Referential Metadata Structure include structures for describing quality information and methodologies information (e.g., ESS Standard for Quality Reports Structure) or characteristics of registers as well as a structure of documentation storing information necessary for internal dataset management (e.g., GDPR status, existence of information on minor).
Referential Metadata Set organised collection of referential metadata for a given Referential Metadata Subject (Item??) Each Referential Metadata Set uses a Referential Metadata Structure to define a structured list of Referential Metadata Attributes for a given Referential Metadata Subject.
Referential Metadata Attribute characteristic that describes or qualifies Referential Metadata Subject Represented Variable can often be used to define a Referential Metadata Attribute (e.g., "Accuracy", "Timeliness", "Frequency" when describing quality information), but other GSIM class can also play a role of Referential Metadata Attribute (e.g., Statistical Classification, Contact, Owner).
Referential Metadata Subject subject for which Referential metadata is reported The Referential Metadata Subject identifies the subject of the metadata that can be reported using this Referential Metadata Structure. These subjects may be any GSIM class on which organised set of metadata is needed, such as Statistical Program Cycle, Data Set, Questionnaire and Statistical Classification.
Referential Metadata Subject Item actual subject for which referential metadata is reported Examples are an actual Product such as "Balance of Payments and International Investment Position (Australia, June 2013)", or a collection of Data Points such as the Data Points for a single region within a Data Set covering all regions for a country.
Referential Metadata Content Item actual content for Referential Metadata Attribute Referential Metadata Content Item can take different formats (e.g., text, number, value from a predefined codelist, table)

Examples (to be used in Specification)

Object For quality (ESS Standard Quality Report)
Referential Metadata Structure Structure as specified in the Eurostat metadata content list (1. Contact, 1.1. Contact Organisation, 1.2. Contact unit, 2. Statistical Presentation, 3. Statistical Processing, etc.)
Referential Metadata Set Structured quality report
Referential Metadata Attribute Contact, Represented Variables (e.g., accuracy, timeliness), etc.
Referential Metadata Subject Statistical Program Cycle
Referential Metadata Subject Item Labour Force Survey (2021 Q1)
Referential Metadata Content Item "Eurostat" for Contact, textual descriptions and coefficients for accuracy, timeliness, etc.
Object For register
Referential Metadata Structure 1. Identification information, 2. Main objective, 3. Data source information, etc.
Referential Metadata Set Structured description for a register
Referential Metadata Attribute Maintainer, Data Provider, Represented Variables (e.g., frequency, data source type)
Referential Metadata Subject Register
Referential Metadata Subject Item "Integrated System of Statistical Registers"
Referential Metadata Content Item "Tax authority" for Data Provider and textual descriptions for the frequency, data source type, etc.
Object For data table (footnotes)
Referential Metadata Structure Implicit (footnote 1, footnote 2, etc.)
Referential Metadata Set Structured set of footnotes
Referential Metadata Attribute Table footnote (can be represented by Represented Variable)
Referential Metadata Subject Data Set, Represented Variable
Referential Metadata Subject Item "Nursing facilities, total resident by annual (2020)" (for Data Set), "Total Resident" (for Represented Variable"
Referential Metadata Content Item "the counts in this have been rounded .. to meet the confidentiality requirement" for footnote 1, "Total residents is calculated by ..." for footnote 2, etc.

Questions:

  1. Does this work? (still not clear how to do for a situation where Subject is a data table and certain footnotes (Attributes) are for the entire data table while others are for Represented Variable inside the table)
  2. Referential Metadata Set is "organised collection of referential metadata for a given Referential Metadata Subject Item", not for "given Referential Metadata Subject"

@FlavioRizzolo
Copy link
Collaborator Author

I haven't finished reviewing the whole thing, but I noticed that Referential Metadata Attribute is missing the optional "is defined by" association to Represented Variable. I think we still need that for two reasons: (i) the Referential Metadata Attribute parallels the Attribute Component in Data Structures, and (ii) in many cases we could use a Represented Variable, as your last example shows.

@FrancineK
Copy link
Collaborator

FrancineK commented Feb 1, 2023

image
If Content Item is an instance of Attribute, is the parent-child relationship justified for Content Item?
Both is an instance of cardinalities seem to be reversed.

@flo7894
Copy link

flo7894 commented Feb 1, 2023

A ReferentialMetadataSubject refers to a GSIM class e.g. Dataset whereas a ReferentialMetadataSubjectItem refers to an instance of Dataset e.g. "Nursing facilities, total resident by annual (2020)". The IdentifiableArtefact seems more likely to be the instance of Dataset. May be there should not be a "refers to" property between IdentifiableArtefact and ReferentialMetadataSubject ?

Questions:

1. Does this work? (still not clear how to do for a situation where Subject is a data table and certain footnotes (Attributes) are for the entire data table while others are for Represented Variable inside the table)

Regarding question 1, couldn't we consider having a ReferentialMetadataSet for the entire data table and others ReferentialMetadataSet for the _RepresentedVariable_s , then you would group them together with the DataSet in an InformationSet . Thus the Product using the InformationSet gets all the footnotes ?

@InKyungChoi
Copy link
Collaborator

InKyungChoi commented Feb 15, 2023

@flo7894

A ReferentialMetadataSubject refers to a GSIM class e.g. Dataset whereas a ReferentialMetadataSubjectItem refers to an instance of Dataset e.g. "Nursing facilities, total resident by annual (2020)". The IdentifiableArtefact seems more likely to be the instance of Dataset. May be there should not be a "refers to" property between IdentifiableArtefact and ReferentialMetadataSubject ?

=> DataSet is a sub-type of IndentifiableArtefact, and many of existing GSIM classes that can be ReferentialMetadataSubject are sub-types of IndentifiableArtefact (e.g., StatisticalClassification, StatisticalProgram), so instead of listing all classes, IndentifiableArtefact was used..... thinking now if this creates more confusion...

Regarding question 1, couldn't we consider having a ReferentialMetadataSet for the entire data table and others ReferentialMetadataSet for the _RepresentedVariable_s , then you would group them together with the DataSet in an InformationSet . Thus the Product using the InformationSet gets all the footnotes ?

=> this works for me!

@InKyungChoi
Copy link
Collaborator

InKyungChoi commented Feb 15, 2023

Updated:

image

@FrancineK
Copy link
Collaborator

FrancineK commented Feb 22, 2023

Change link from Subject to Identifiable Artifact to Subject Item to Id. Art. And Add an attribute to Subject to indicate that it is any GSIM Information Class.
Change cardinality Subject to Subject Item from 1 - 0..* to 0..1 - 0..*

@FlavioRizzolo
Copy link
Collaborator Author

To be implemented in EA UML

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants