Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out why metadata last ingested is not coming through, and fix inconsistencies in reporting of dates #1113

Closed
MatMoore opened this issue Nov 28, 2024 · 1 comment
Assignees

Comments

@MatMoore
Copy link
Contributor

MatMoore commented Nov 28, 2024

  1. I was expecting lastUpdated to always be set, but this is not the case for performance hub - what are we doing differently?
  2. We are reporting entity.custom_properties.data_summary.last_updated as the data last updated date, but this does not exist - this should be changed to data_last_modified if we have it.
    • For CaDeT, we could potentially report last_datajob_run_date but this may mislead users into thinking that just because dbt ran, the data was changed

Example metadata from dev

glue_table

metadata_last_ingested=2024-11-25 11:46:53.032000+00:00
created=None
data_last_modified=None
last_datajob_run_date=None

cadet_model

metadata_last_ingested=2024-11-25 12:34:07.283000+00:00
created=None
data_last_modified=None
last_datajob_run_date=2024-11-22 06:15:40.540000+00:00

performance_hub_metric

metadata_last_ingested=None
created=None
data_last_modified=None
last_datajob_run_date=None

govuk_publication

metadata_last_ingested=2024-11-20 16:48:43.340000+00:00
created=None
data_last_modified=2013-07-25 08:30:00+00:00
last_datajob_run_date=None

cadet_database

metadata_last_ingested=2024-11-28 13:52:48.501000+00:00
created=None
data_last_modified=None

esda

metadata_last_ingested=None
created=None
data_last_modified=None

justice_data_chart

metadata_last_ingested=None
created=None
data_last_modified=2024-09-26 00:00:00+00:00

justice_data_dashboard

metadata_last_ingested=2024-11-28 13:53:57.022000+00:00
created=None
data_last_modified=None

cjs_dashboard

metadata_last_ingested=None
created=None
data_last_modified=None

cjs_chart

metadata_last_ingested=None
created=None
data_last_modified=2024-11-07 00:00:00+00:00

@github-project-automation github-project-automation bot moved this to Todo 📝 in Data Catalogue Nov 28, 2024
@MatMoore MatMoore self-assigned this Nov 28, 2024
@MatMoore MatMoore moved this from Todo 📝 to In Progress 🚀 in Data Catalogue Nov 28, 2024
@MatMoore
Copy link
Contributor Author

MatMoore commented Nov 28, 2024

Incompleteness of lastIngested values

  • All the metadata created through proper ingestions have lastIngested timestamps, except for the Justice Data charts. It's set on the dashboard, but not the chart.
  • ESDA metadata does not have lastIngested either. I checked the datahub db, and this the lastRunId value set to null in systemMetadata.
  • Performance hub and CJS dashboard metadata does not have lastIngested, but in this case lastRunId is set to a default value of no-run-id-provided

There are at least two issues here

  • lastIngested possibly needs adding to chart graphqlQueries
  • systemMetadata needs adding to all our MCPs in the scripts where we call emit() - don't think this is required for the custom ingestion sources

@MatMoore MatMoore moved this from In Progress 🚀 to Review 🛂 in Data Catalogue Nov 29, 2024
@MatMoore MatMoore moved this from Review 🛂 to Done ✅ in Data Catalogue Nov 29, 2024
@MatMoore MatMoore closed this as completed by moving to Done ✅ in Data Catalogue Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done ✅
Development

No branches or pull requests

1 participant