-
Notifications
You must be signed in to change notification settings - Fork 8
Ability to provide search engine using artifacts from Aggregation of Collection IDs #19
Comments
So far, have created a python module that will create a catalog out of all of the _collection.json files that are created from the aggregation process. Created a dictionary where each key is the model id and instance (ie: 00z_gfs_100), just to make the dictionary id's unique. The value to that key is another dictionary containing the contents of the associated _collection.json file. To recap, the contents of 00z_gfs_100_collection.json include the collection IDs, and the associated dimensions, long_name, level_type, etc. Now, with this catalog in the form of a dictionary, the python module takes an input keyword. The input keyword is a string and the module will search the long_names for the string. The module will create a list of collection IDs that are a match, and the resulting URL list is created. For example, for a search of "ozone": And the following list is returned: ['https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_025_forecast_time0_lat_0_lon_0_Entire_atmosphere_considered_as_a_single_layer/instance/00z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_025_forecast_time0_lat_0_lon_0_lv_ISBL12_Isobaric_surface_Pa/instance/00z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_050_forecast_time0_lat_0_lon_0_Entire_atmosphere_considered_as_a_single_layer/instance/00z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_050_forecast_time0_lat_0_lon_0_lv_ISBL12_Isobaric_surface_Pa/instance/00z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_100_forecast_time0_lat_0_lon_0_Entire_atmosphere_considered_as_a_single_layer/instance/00z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_100_forecast_time0_lat_0_lon_0_lv_ISBL12_Isobaric_surface_Pa/instance/00z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_100_forecast_time0_lat_0_lon_0_Entire_atmosphere_considered_as_a_single_layer/instance/06z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_100_forecast_time0_lat_0_lon_0_lv_ISBL12_Isobaric_surface_Pa/instance/06z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_100_forecast_time0_lat_0_lon_0_Entire_atmosphere_considered_as_a_single_layer/instance/12z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_100_forecast_time0_lat_0_lon_0_lv_ISBL12_Isobaric_surface_Pa/instance/12z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_100_forecast_time0_lat_0_lon_0_Entire_atmosphere_considered_as_a_single_layer/instance/18z', 'https://data-api.mdl.nws.noaa.gov/EDR-API/collections/automated_gfs_100_forecast_time0_lat_0_lon_0_lv_ISBL12_Isobaric_surface_Pa/instance/18z'] Although this is run from the command line, the next step is to import this module into the wow_point_server.py and incorporate an interface either through a /search URL (flask endpoint) or modify the root.html template file to incorporate a Search header and an associated form. |
We now have the search engine (by grib long name) implemented on our instance: https://data-api.mdl.nws.noaa.gov/EDR-API You can search a keyword, and the collections that have the parameters with the long name containing that keyword will be shown. You can click on one of the links, and be taken to the point where you can make a query. The links returned are in a dictionary that was not ordered, so my next move would be to order that dictionary. Additional work would be to match the keywords with other metadata attributes such as the dimension names (ie ISBL for isobaric) |
Going off of the last comment, added the ability to search by grib parameter long name as well as the dimension long name so that a user can search for isobaric,temperature to further narrow the results. In a previous meeting, it was discussed that we would like to utilize OpenSearch geo and time extensions. I did some research, and found the pycsw module that offers a python implementation of OGC CSW as well as the OpenSearch geo and time extensions. Looking at the documentation, I was able to incorporate pycsw into our EDR-API implementation following this approach: Then, following the steps provided at the link below, I was able to create a compliant blank sqlite database to start from: Finally, you can see the beginning of our implementation at the following endpoint: My next steps will be to connect the dots from how I create metadata in the aggregation of collections software and incorporate that metadata into these services. |
After discussion with Mark, Chris, and Steve, will continue to work with pycsw module. Will work on populating the sqlite database with metadata that comes from aggregation software. OpenSearch geo and time extensions provide simple queries and responses for the search engine, so focus will be on this aspect for now. pycsw module is very extensible, so we can pick what other services we want to expose. For now though, focus will be on opensearch. |
@ShaneMill1 let's continue the discussion on what/how the formal metadata would look to serve as part of a CSW instance (Dublin Core, ISO, etc.). This will also give us an opportunity to investigate the OGC API - Records work (disclosure: I'm part of this SWG and working on an implementation). |
@tomkralidis Absolutely- I am going to create an issue in the EDR SWG github to continue this discussion. All of that sounds good to me! located here: opengeospatial/ogcapi-environmental-data-retrieval#40 |
As a result of Issue #14, the aggregation of collections software provides a json output that contains the metadata for each dataset.
For example, 00z_gfs_100_collections.json (1.00 Degree GFS) contains:
We have this information for all of the datasets, so we want to create a catalog of this metadata.
We want to start simple, so as a prototype, I want to be able to allow the user to be able to search by the grib long_name (ie. 'Temperature') and have every collection ID and therefore link to the collection returned that contains 'Temperature'.
Then, the collection ID should be informative enough to provide the user some information regarding what type of Temperature is contained in the collection:
So for Temperature, possible returns would be the links for:
Therefore, from the collection IDs, a user would see that gfs_100_forecast_time0_lat_0_lon_0_lv_ISBL0_Isobaric_surface_Pa would contain the isobaric temperature, could go to that collection and select 500mb temperature (50000 pa)
The text was updated successfully, but these errors were encountered: