-
Notifications
You must be signed in to change notification settings - Fork 6
Reference Data
Hoa edited this page Jan 23, 2024
·
1 revision
The 'classic' way to model reference table would typicall only include one reference table, that holds both the reference keys and the descriptive data. It was optional to split attributes into separate reference satellites.
But after years of seeing reference data in action, we would now always recommend to directly split all descriptive reference data into separate satellites. The main advantage is to directly be ahead of future changing requirements regarding historization, privacy regulations or additional source systems.
- reference Hub: An entity that holds the reference key(s) for a reference object.
- reference Satellites: Satellites attached to a reference Hub that store the descriptive data and keep track of changes over time. Similar to regular Satellites, multiple satellites can be attached to one reference Hub.
- reference Tables: A presentation layer that re-unites a reference Hub with all its reference Satellites. Typically this would be done only virtually, but in some cases this object might be materialised as a table. Datavault4dbt allows users to historize these objects in three different ways; store only the latest descriptive data, capture all deltas, or apply a snapshot-based pattern.
You will need to configure three tables/worksheets in Your metadata:
Name | Explanation |
---|---|
Reference_Hub_Identifier | A globally unique identifier of a reference hub. |
Target_Reference_table_physical_name | The name of this reference hub object. |
Source_Table_Identifier | The global identifier of the source. Used to create a relation between reference hub and source. |
Source_Column_Physical_Name | The physical name of the column in the source data. |
Target_Column_Sort_Order | The position of the business key in the sort order for the hashkey generation. Influences the order of composite business keys. |
Name | Explanation |
---|---|
Reference_Satellite_Identifier | A globally unique identifier of a reference satellite. |
Target_Reference_table_physical_name | The name of this reference satellite object. |
Source_Table_Identifier | The global identifier of the source. Used to create a relation between reference satellite and source. |
Parent_Table_Identifier | The unique identifier of the Parent Hub or Link entity defined in the metadata |
Source_Column_Physical_Name | The physical name of the column in the source data. |
Target_Column_Sort_Order | The position of the business key in the sort order for the hashkey generation. Influences the order of composite business keys. |
Name | Explanation |
---|---|
Reference_Table_Identifier | A globally unique identifier of a reference table. |
Target_Reference_table_physical_name | The name of this reference table object. |
Referenced_Hub | The global identifier of the underlying reference hub. |
Referenced_Satellite | The global identifier of the underlying reference satellites. |
Included_Columns | When defining columns to include only the specified columns will be selected from that satellite. When defining multiple columns use ";" (e.g. "ColumnA;ColumnB") |
Excluded_Columns | When defining columns to exclude all columns except the ones specified will be selected from the satellite. When defining multiple columns use ";" (e.g. "ColumnA;ColumnB") |
Historized | Controls how the data in the reference table should be historized. The three allowed values are 'full', 'latest' and 'snapshot'. Further explanation can be found in the datavault4dbt documentation. (https://github.com/ScalefreeCOM/datavault4dbt/wiki/Reference-Table) |
Group_Name | Defines to which business object group this entry belongs, used for the folder structure of the dbt models as well as the visualization of the entities. |
Reference_Hub_Identifier | Target_Reference_table_physical_name | Source_Table_Identifier | Source_Column_Physical_Name | Target_Column_Sort_Order |
RH0001 | nation_tpch_rh | SRC0004 | N_NATIONKEY | 1 |
... | ... | ... | ... | ... |
Reference_Satellite_Identifier | Target_Reference_table_physical_name | Source_Table_Identifier | Parent_Table_Identifier | Source_Column_Physical_Name | Target_Column_Sort_Order |
RS0001 | nation_tpch_rs | SRC0004 | RH0001 | N_COMMENT | 1 |
... | ... | ... | ... | ... | ... |
Reference_Table_Identifier | Target_Reference_table_physical_name | Referenced_Hub | Referenced_Satellite | Included_Columns | Excluded_Columns | Historized | Group_Name |
R0001 | nation_tpch_r | RH0001 | RS0001 | N_COMMENT | latest | nation | |
... | ... | ... | ... | ... | ... | ... | ... |
Table of Content