Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modify schema, reindex script and dependencies to support dimensions and population types #202

Merged
merged 13 commits into from
Mar 23, 2023

Conversation

DavidSubiros
Copy link
Contributor

@DavidSubiros DavidSubiros commented Mar 8, 2023

What

Work done to prove that the new elasticsearch model works for dimensions and population types, corresponding to the spike defined by this trello card. The following trello cards have been implemented in order to prove the spike work:

  • Reindex with models containing population_type and dimensions. Trello card

    • Modify reindex script to use and correctly map the new models for dimensions and population types
    • Improve reindex script flexibility, by adding flags to be able to debug issues without needing to reindex all the data from zebedee and dataset api
    • Add onSuccess and onFailure callbacks on the indexer, otherwise the async errors failed to be reported.
    • Add unit tests for new functionality in reindex script
  • Update elasticsearch models to include population_type and dimensions. Trello card

    • Modify elasticsearch json schema (elasticsearch/search-index-=settings.json)
  • Update search query templates to allow filtering by population type. Trello card

    • Modify query/templates/search/v710/contentFilters.tmpl to add the population type filter if present in the struct that is executed against the template.
    • Add query/templates/search/v710/populationTypeFilters.tmpl to allow filtering by population type name and/or label.
  • Update search query templates to allow filtering by dimensions. Trello card

    • Modify query/templates/search/v710/contentFilters.tmpl to add the dimensions filter if present in the struct that is executed against the template.
    • Add query/templates/search/v710/dimensionsFilters.tmpl to allow filtering by dimension name, label and/or raw_label.
  • Update search query templates to allow counting / aggregating by one category, while filtering by the other categories. Trello card

    • Add the search templates to allow counting content types while filtering by other categories:
      • query/templates/search/v710/countContentTypeQuery.tmpl
      • query/templates/search/v710/countContentTypeHeader.tmpl
      • query/templates/search/v710/countContentTypeFilters.tmpl
    • Add the search templates to allow counting dimensions while filtering by other categories:
      • query/templates/search/v710/countDimensionsQuery.tmpl
      • query/templates/search/v710/countDimensionsHeader.tmpl
      • query/templates/search/v710/countDimensionsFilters.tmpl
    • Add the search templates to allow counting population types while filtering by other categories:
      • query/templates/search/v710/countPopulationTypeQuery.tmpl
      • query/templates/search/v710/countPopulationTypeHeader.tmpl
      • query/templates/search/v710/countPopulationTypeFilters.tmpl
    • Add the search templates to allow counting topicx while filtering by other categories:
      • query/templates/search/v710/countTopicQuery.tmpl
      • query/templates/search/v710/countTopicHeader.tmpl
      • query/templates/search/v710/countTopicFilters.tmpl
  • Other refactors with no functional changes:

    • Refactor the code for the search endpoint a bit to make it more consistent with the release endpoint, including a validator and the way the handler is registered to the api.
    • Refactor the code to create the search and count query structures from the http.Request query params (CreateRequests func), which is then passed to the query package to execute the templates
    • Add Dockerfile.local and reflex so that this repo can be used in dp-compose/v2 search stack

Depends on the following PRs:

How to review

  • Make sure code changes make sense
  • Make sure unit tests pass
  • You may have a look to the spike document attached to the trello card an check it makes sense
  • You may also run the reindex script against dp-dataset-api in sandbox (optionally, or ask me more information about this)

Who can review

Anyone

…o using latest models from search data extractor and importer
… add validator and create the request data struct directly from the paramteres, which makes the code easier to maintain
…(e.g. dimensions counts may be filtered by population types, conten types and topics)
…d search-data-importer, undo 'is_based_on' changes, as they are now in the Metadata document, add unit test and fix lint issues
…AND, not set values that are not provided as query params in the search request
@DavidSubiros DavidSubiros marked this pull request as ready for review March 21, 2023 17:01
@DavidSubiros DavidSubiros force-pushed the feature/add-population-types-and-dimensions branch from 9d22914 to bfdfbfc Compare March 22, 2023 13:28
@DavidSubiros DavidSubiros merged commit 866ff65 into develop Mar 23, 2023
@DavidSubiros DavidSubiros deleted the feature/add-population-types-and-dimensions branch March 23, 2023 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants