Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Engine- What formal metadata would look like #40

Closed
ShaneMill1 opened this issue Apr 3, 2020 · 2 comments
Closed

Search Engine- What formal metadata would look like #40

ShaneMill1 opened this issue Apr 3, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@ShaneMill1
Copy link
Collaborator

As a result of the EDR-API Sprint, I worked on a basic search engine implementation that simply allows a user to search from the landing page of the NWS EDR implementation, and a dictionary of links to the collections containing the keywords is returned. This allows the users to be taken directly to the link where they can immediately make a query, rather than navigating the structure of the API to find whether or not that collection contains the data they are looking for.

This is a nice simple implementation, but we need to work on how formal metadata would look when returned by the search engine.

I have been looking at the pycsw module which offers several different implementations of relevant OGC services including Opensearch geo and time extensions as well as CSW. Through the EDR, do we offer some of these services? All of the services? Is it up to the data provider?

@tomkralidis had this comment on the Sprint Issue, and this is an important thing to discuss so this issue serves that purpose:

@ShaneMill1 let's continue the discussion on what/how the formal metadata would look to serve as part of a CSW instance (Dublin Core, ISO, etc.). This will also give us an opportunity to investigate the OGC API - Records work (disclosure: I'm part of this SWG and working on an implementation).

@ShaneMill1
Copy link
Collaborator Author

ShaneMill1 commented Apr 9, 2020

An update regarding this issue. I have chosen at the moment to expose the opensearch mode of the pycsw module to implement a search engine into our instance of the EDR-API.

Under the hood explanation
Through the aggregation of collections ingest process, metadata is collected for each collection of parameters with common dimensions, for now calling this a "collection". This metadata is then used to create datastores in zarr so that the EDR-API can query the data in a much more productive way.

With these metadata artifacts (let's call it collections.json files), each json file containing each collection ID and metadata, we can power a search engine.

Using the pycsw module, I have implemented a search engine using opensearch. To implement the search engine, metadata is extracted from the collections.json files for each dataset and translated into csw:records in xml format. With a directory full of these xml csw:records, each record being a specific collection, and each record containing metadata that the search engine will search from, a pycsw process is used to ingest the records into a sqlite3 database. With this, pycsw is implemented with our metadata records.

In Practice

  1. Navigate to https://data-api.mdl.nws.noaa.gov/EDR-API
  2. In the search bar, search for "temperature isobaric"
  3. This will take you to the opensearch geo and time implementation provided by the pycsw module. Each result in the return has two links provided. One to the csw record that the search was conducted on, and one to the EDR endpoint where you can make the query to the data.

Further work

  • Will want to formalize the metadata provided in the csw record endpoint. Right now, LongName is the only thing supplied and therefore only property that you can search on. I think we would want to formalize this endpoint and schema.
  • In related issue Should there be 2 levels of returned metadata: basic & extended/verbose? #44, concerns about how much metadata to expose to a user is raised. We could choose to not expose the csw record endpoint at all, although the csw records aren't very complex at the moment. Even though the search engine depends on this metadata, we don't need to necessarily expose it to the user. We can discuss what we find most appropriate to the EDR

@tomkralidis
Copy link
Collaborator

tomkralidis commented Apr 28, 2020

Nice work @ShaneMill1. We could think about a collection, here, being modeled as a collection of variables at a certain xyzt. We could also associate this with the common view of OGC API - Records of discovery metadata. So a discovery metadata record (say WMO Core Metadata profile) can have a link relation for a real-time data search of a given dataset. This could link to an OGC API - Records implementation of this finer level of granularity (per this issue).

OGC API - Records (discovery metadata)
                 ||
                 vv
OGC API - Records (realtime data search)

To be clear, this doesn't mean that WIS records are the way into a real-time search, but one way that is interoperable.

We can use the OGC API - Record draft record model and extend it for MetOcean needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants