Skip to content

Reference Data

Hoa edited this page Jan 23, 2024 · 1 revision

Reference Data

The 'classic' way to model reference table would typicall only include one reference table, that holds both the reference keys and the descriptive data. It was optional to split attributes into separate reference satellites.

But after years of seeing reference data in action, we would now always recommend to directly split all descriptive reference data into separate satellites. The main advantage is to directly be ahead of future changing requirements regarding historization, privacy regulations or additional source systems.

  • reference Hub: An entity that holds the reference key(s) for a reference object.
  • reference Satellites: Satellites attached to a reference Hub that store the descriptive data and keep track of changes over time. Similar to regular Satellites, multiple satellites can be attached to one reference Hub.
  • reference Tables: A presentation layer that re-unites a reference Hub with all its reference Satellites. Typically this would be done only virtually, but in some cases this object might be materialised as a table. Datavault4dbt allows users to historize these objects in three different ways; store only the latest descriptive data, capture all deltas, or apply a snapshot-based pattern.

Columns

You will need to configure three tables/worksheets in Your metadata:

ref_hub

Name Explanation
Reference_Hub_Identifier A globally unique identifier of a reference hub.
Target_Reference_table_physical_name The name of this reference hub object.
Source_Table_Identifier The global identifier of the source. Used to create a relation between reference hub and source.
Source_Column_Physical_Name The physical name of the column in the source data.
Target_Column_Sort_Order The position of the business key in the sort order for the hashkey generation. Influences the order of composite business keys.

ref_sat

Name Explanation
Reference_Satellite_Identifier A globally unique identifier of a reference satellite.
Target_Reference_table_physical_name The name of this reference satellite object.
Source_Table_Identifier The global identifier of the source. Used to create a relation between reference satellite and source.
Parent_Table_Identifier The unique identifier of the Parent Hub or Link entity defined in the metadata
Source_Column_Physical_Name The physical name of the column in the source data.
Target_Column_Sort_Order The position of the business key in the sort order for the hashkey generation. Influences the order of composite business keys.

ref_table

Name Explanation
Reference_Table_Identifier A globally unique identifier of a reference table.
Target_Reference_table_physical_name The name of this reference table object.
Referenced_Hub The global identifier of the underlying reference hub.
Referenced_Satellite The global identifier of the underlying reference satellites.
Included_Columns When defining columns to include only the specified columns will be selected from that satellite. When defining multiple columns use ";" (e.g. "ColumnA;ColumnB")
Excluded_Columns When defining columns to exclude all columns except the ones specified will be selected from the satellite. When defining multiple columns use ";" (e.g. "ColumnA;ColumnB")
Historized Controls how the data in the reference table should be historized. The three allowed values are 'full', 'latest' and 'snapshot'. Further explanation can be found in the datavault4dbt documentation. (https://github.com/ScalefreeCOM/datavault4dbt/wiki/Reference-Table)
Group_Name Defines to which business object group this entry belongs, used for the folder structure of the dbt models as well as the visualization of the entities.

Example Reference Hub

Reference_Hub_Identifier Target_Reference_table_physical_name Source_Table_Identifier Source_Column_Physical_Name Target_Column_Sort_Order
RH0001 nation_tpch_rh SRC0004 N_NATIONKEY 1
... ... ... ... ...

Example Reference Satellite

Reference_Satellite_Identifier Target_Reference_table_physical_name Source_Table_Identifier Parent_Table_Identifier Source_Column_Physical_Name Target_Column_Sort_Order
RS0001 nation_tpch_rs SRC0004 RH0001 N_COMMENT 1
... ... ... ... ... ...

Example Reference Table

Reference_Table_Identifier Target_Reference_table_physical_name Referenced_Hub Referenced_Satellite Included_Columns Excluded_Columns Historized Group_Name
R0001 nation_tpch_r RH0001 RS0001 N_COMMENT latest nation
... ... ... ... ... ... ... ...