Populate domains drop down with what's been ingested in datahub #407

MatMoore · 2024-06-06T15:50:30Z

Resolves #385

Our hardcoded domain list has drifed from the CaDeT domain model. We now have a way to ingest domains from CaDeT, so we can dynamically populate our domain model in the service.

On dev, this now looks like this (there are some old domains hanging around still)

I've sorted these alphabetically and removed the subdomains drop down for now. This is based on there being no subdomains available to select any more, regardless of which domain you choose.

Also noticed a small bug where domains weren't being displayed for Chart results, so fixed this in the process.

- remove entity which is not currently present - enable the no_duplicates test (we have fixed this)

home/forms/domain_model.py

tests/conftest.py

Previously we hardcoded the list of domains shown in the search filter, and had different lists per environment. This was useful in alpha when we had some junk domains we wanted to filter out, but now we're at a point where every domain in Datahub should be one we want to use. This commit means we now fetch every domain that has something linked to it, and display that in alphabetical order.

Ideally we would just fetch the facets once per request, but in practice we do this from a few different places. 1. In the view we instantiate a SearchService, which uses the domain model in constructing filters for Datahub. 2. The SearchForm also needs them to know what choices are valid, so we need to pass a callback to the form's ChoiceField. That callback does not share any data with the view. Caching the value is a quick way to avoid making extra requests for the same data.

This is the case at the moment, because the domain model we've pulled in from CaDeT doesn't have subdomains. This might change later though so I don't want to remove the subdomain code completely.

lib/datahub-client/CHANGELOG.md

murdo-moj · 2024-06-11T15:29:02Z

Because of how our search is structured, only domains for which there are one of the datatypes we have defined are being pulled through. Is this the intended behaviour? eg electronic_monitoring https://datahub-catalogue-dev.apps.live.cloud-platform.service.justice.gov.uk/domain/urn:li:domain:electronic_monitoring/Entities?is_lineage_mode=false

Previously it was only returning domains with tables in. We should include any that show as non-empty in Find MOJ Data.

MatMoore · 2024-06-11T15:46:03Z

Sort of, but there was a bug - it's supposed to pull through anything non-empty. This means that if nothing in the domain is tagged to dc_display_in_catalogue, the domain will be hidden as well.

By default it was filtering on result type = table so we were missing a few, but fixed now.

sentry-io · 2024-06-12T12:55:38Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ CatalogueError: Unable to execute facets query /search View Issue
‼️ ConnectivityError /search View Issue
‼️ CatalogueError: Unable to execute facets query /search View Issue
‼️ CatalogueError: Unable to execute facets query /search View Issue
‼️ SystemExit: 1 /search View Issue

_{Did you find this useful? React with a 👍 or 👎}

* Add missing domain information from charts * Update search tests that hit datahub dev - remove entity which is not currently present - enable the no_duplicates test (we have fixed this) * Load the list of domains from Datahub Previously we hardcoded the list of domains shown in the search filter, and had different lists per environment. This was useful in alpha when we had some junk domains we wanted to filter out, but now we're at a point where every domain in Datahub should be one we want to use. This commit means we now fetch every domain that has something linked to it, and display that in alphabetical order. * Move domain model to models and remove unused model * Refacotr: decouple SearchFacetFetcher from DomainModel * Cache facets fetched from datahub Ideally we would just fetch the facets once per request, but in practice we do this from a few different places. 1. In the view we instantiate a SearchService, which uses the domain model in constructing filters for Datahub. 2. The SearchForm also needs them to know what choices are valid, so we need to pass a callback to the form's ChoiceField. That callback does not share any data with the view. Caching the value is a quick way to avoid making extra requests for the same data. * Hide subdomains if there aren't any defined This is the case at the moment, because the domain model we've pulled in from CaDeT doesn't have subdomains. This might change later though so I don't want to remove the subdomain code completely. * Include missing domains Previously it was only returning domains with tables in. We should include any that show as non-empty in Find MOJ Data.

* add .env.tpl env template file * Add MOJ internal service header (#405) * Add MOJ internal service header The main links are now in a primary nav component. This should go below the phase banner as the banner is supposed to touch the black header. I've also changed the phase from alpha -> beta, and changed the capitalization in the service name. * Remove commented out html * Populate domains drop down with what's been ingested in datahub (#407) * Add missing domain information from charts * Update search tests that hit datahub dev - remove entity which is not currently present - enable the no_duplicates test (we have fixed this) * Load the list of domains from Datahub Previously we hardcoded the list of domains shown in the search filter, and had different lists per environment. This was useful in alpha when we had some junk domains we wanted to filter out, but now we're at a point where every domain in Datahub should be one we want to use. This commit means we now fetch every domain that has something linked to it, and display that in alphabetical order. * Move domain model to models and remove unused model * Refacotr: decouple SearchFacetFetcher from DomainModel * Cache facets fetched from datahub Ideally we would just fetch the facets once per request, but in practice we do this from a few different places. 1. In the view we instantiate a SearchService, which uses the domain model in constructing filters for Datahub. 2. The SearchForm also needs them to know what choices are valid, so we need to pass a callback to the form's ChoiceField. That callback does not share any data with the view. Caching the value is a quick way to avoid making extra requests for the same data. * Hide subdomains if there aren't any defined This is the case at the moment, because the domain model we've pulled in from CaDeT doesn't have subdomains. This might change later though so I don't want to remove the subdomain code completely. * Include missing domains Previously it was only returning domains with tables in. We should include any that show as non-empty in Find MOJ Data. * Cleanup - bring through tags and glossary terms consistently, and remove dead code for data products (#418) * Extend tags and include glossary terms in search results * Remove remaining references to data product This is currently unused, because we no longer include data products in the search. * Set chromedriver path to one installed by setup-chromedriver * Remove metrics ingres config (#425) remove allowed subnets * Show when stuff is an ESDA (#421) * Show when stuff is an ESDA This is only shown on a handful of assets. Also remove metadata fields we have not populated yet (these will always display as blank) * Correct casing * Metrics ingress test (#428) * remove allowed subnets * change ecr_region from input to var * Update workflow variable assignments (#431) * add replaces vars with inputs * Remove inputs and pull vars from respective environements * Fmd 366 add dataset lineage link (#416) * add upstream and downstream lineage to getDatasetDetails graphql query * refactor parse_relations() helper to handle more relations * add upstream and downstream lineage to RelationshipType enum * update parse_relations() input args * update parse_relations() input args in search * add has_lineage and lineage_url to dataset details context * add lineage link to details_table template * remove redundant block in query for data product relationships * return entity name for lineage * have only 1 RelationshipType for lineage * simplfy `parse_relations()` helper function * update DatasetDetails to use single lineage type * align url to rest of table * update tests * add default value for url * design suggestions for lineage label, from Alex and Jess * spell it right * suggestions from Mat * update readme * remove .env.example --------- Co-authored-by: Mat <[email protected]> Co-authored-by: Matt <[email protected]>

MatMoore added 2 commits June 6, 2024 16:18

Add missing domain information from charts

ca84897

Update search tests that hit datahub dev

d855eb5

- remove entity which is not currently present - enable the no_duplicates test (we have fixed this)

MatMoore had a problem deploying to dev June 6, 2024 15:50 — with GitHub Actions Failure

MatMoore commented Jun 6, 2024

View reviewed changes

home/forms/domain_model.py Outdated Show resolved Hide resolved

MatMoore commented Jun 6, 2024

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

MatMoore force-pushed the align-domains branch from 5beb7c5 to 9bc282f Compare June 10, 2024 09:32

MatMoore had a problem deploying to dev June 10, 2024 09:32 — with GitHub Actions Failure

MatMoore added 3 commits June 10, 2024 10:39

Move domain model to models and remove unused model

ccd1551

Refacotr: decouple SearchFacetFetcher from DomainModel

cc83f48

MatMoore had a problem deploying to dev June 10, 2024 11:06 — with GitHub Actions Failure

Hide subdomains if there aren't any defined

b907229

This is the case at the moment, because the domain model we've pulled in from CaDeT doesn't have subdomains. This might change later though so I don't want to remove the subdomain code completely.

MatMoore force-pushed the align-domains branch from 0242ef0 to b907229 Compare June 10, 2024 11:12

MatMoore temporarily deployed to dev June 10, 2024 11:12 — with GitHub Actions Inactive

MatMoore marked this pull request as ready for review June 10, 2024 11:16

MatMoore changed the title ~~[WIP] Populate domains drop down with what's been ingested in datahub~~ Populate domains drop down with what's been ingested in datahub Jun 10, 2024

murdo-moj reviewed Jun 11, 2024

View reviewed changes

lib/datahub-client/CHANGELOG.md Show resolved Hide resolved

Include missing domains

6677ea5

Previously it was only returning domains with tables in. We should include any that show as non-empty in Find MOJ Data.

MatMoore temporarily deployed to dev June 11, 2024 15:44 — with GitHub Actions Inactive

murdo-moj approved these changes Jun 11, 2024

View reviewed changes

Merge branch 'main' into align-domains

6214177

MatMoore temporarily deployed to dev June 11, 2024 15:52 — with GitHub Actions Inactive

MatMoore merged commit aee5e43 into main Jun 11, 2024
5 checks passed

MatMoore deleted the align-domains branch June 11, 2024 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Populate domains drop down with what's been ingested in datahub #407

Populate domains drop down with what's been ingested in datahub #407

MatMoore commented Jun 6, 2024 •

edited

Loading

murdo-moj commented Jun 11, 2024

MatMoore commented Jun 11, 2024

sentry-io bot commented Jun 12, 2024 •

edited

Loading

Populate domains drop down with what's been ingested in datahub #407

Populate domains drop down with what's been ingested in datahub #407

Conversation

MatMoore commented Jun 6, 2024 • edited Loading

murdo-moj commented Jun 11, 2024

MatMoore commented Jun 11, 2024

sentry-io bot commented Jun 12, 2024 • edited Loading

Suspect Issues

MatMoore commented Jun 6, 2024 •

edited

Loading

sentry-io bot commented Jun 12, 2024 •

edited

Loading