Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source of truth for band (dimension) metadata #86

Closed
soxofaan opened this issue Oct 24, 2019 · 5 comments
Closed

Source of truth for band (dimension) metadata #86

soxofaan opened this issue Oct 24, 2019 · 5 comments

Comments

@soxofaan
Copy link
Member

While working on band (metadata) related issues in python client and driver, I'm bit confused about the "best" source to find band metadata:

from the collection metadata example in the openEO spec:

                "properties": {
                  "cube:dimensions": {
                    ...
                    "spectral": {
                      "type": "bands",
                      "values": [
                        "B1",
                        "B2",
                  ...
                  "eo:bands": [
                    {
                      "name": "B1",
                      "common_name": "coastal",
                      "center_wavelength": 0.4439,
                    },
                    {
                      "name": "B2",
                      "common_name": "blue",
                      "center_wavelength": 0.4966,
                    },

There is:

  • properties > cube:dimensions which can have a dimension of type "bands", called "spectral" in the example, listing the names of the bands under values
  • properties > eo:bands which is a list of objects with band name, common name and even wavelength
  • note the duplication of band names

This metadata is used in processes, e.g.:

  • In the filter_bands process there is a (recommended) argument common_names, for which the common names from properties > eo:bands must be used (I assume, I'm not sure if this is specified somewhere).
  • In the reduce process there is a dimension argument, for which a key from from the properties > cube:dimensions object should be used (as specified in spec)

Is there a way to better centralize this band/dimension metadata and have a single source of truth without duplication/redundancy?

related: Open-EO/openeo-api#208, Open-EO/openeo-python-client#76, Open-EO/openeo-python-client#77, Open-EO/openeo-python-client#93, Open-EO/openeo-python-driver#25

@m-mohr
Copy link
Member

m-mohr commented Oct 25, 2019

To clarify that directly: It's the way it is because STAC is designed this way. Some things overlap, because the data cube extension can't define all details about some specific fields. cube:dimensions should NEVER diverge from other metadata for the same collection. I'm currently trying to improve the data cube extension for STAC in PR radiantearth/stac-spec#607 though.

While working on band (metadata) related issues in python client and driver, I'm bit confused about the "best" source to find band metadata:

cube:dimension is your data you need to look for when doing data cube related operations. Look at eo:bands if you need further information to work with specific bands.

  • In the filter_bands process there is a (recommended) argument common_names, for which the common names from properties > eo:bands must be used (I assume, I'm not sure if this is specified somewhere).

Yes, you are correct. Not an optimal constellation, but hard to avoid as common names are not necessarily unique.

  • In the reduce process there is a dimension argument, for which a key from from the properties > cube:dimensions object should be used (as specified in spec)

Indeed, reduce doesn't know anything about bands, just dimensions. That's something different than you do in filter_bands.

Is there a way to better centralize this band/dimension metadata and have a single source of truth without duplication/redundancy?

I don't think so. The only redundancy (for bands) are the band names. But cube:dimensions is basically only an index of the names available in the bands, which you can't get easily without walking through the whole bands array.

So expect for #77, I don't know how I could really improve the situation. On the other thing, I don't really see a big problem here. Maybe it needs better documentation in some places. In this case, please tell me at which place you'd like to see a clarification.

@soxofaan
Copy link
Member Author

... It's the way it is because STAC is designed ...
... I don't think so.

I was afraid that would be the answer :)

I don't really see a big problem here. Maybe it needs better documentation in some places.

Indeed, it can probably be alleviated with documentation, e.g.:

  • The reduce process already refers to where to find the dimension name, but maybe it could be specified explicitly that one should use one of the keys of that cube:dimensions object

  • For the "common names" of filter_bands I don't think there is already an explicit reference to the "eo:bands" metadata

@m-mohr
Copy link
Member

m-mohr commented Oct 25, 2019

  • The reduce process already refers to where to find the dimension name, but maybe it could be specified explicitly that one should use one of the keys of that cube:dimensions object

I just recently changed the API description in a way that the keys in objects are named. For example, the cube:dimensions keys is now named Dimension Name* in the API spec. I hope that gets through to the clients and they name it accordingly. Then we don't need to use the more technical term (object) "key", which we can't ensure users understand. If a user reads key and the Web Editor or R client for example shows a nice table with the header "Dimension Name" then the user will be confused about the "key". And I think usually we should just use Dimension Name everywhere instead of relying on something that depends on the data structure used.

  • For the "common names" of filter_bands I don't think there is already an explicit reference to the "eo:bands" metadata

Probably not in the process description. It seems I need to find a place where I can document the connection between processes and the discovery metadata for implementers. I feel that making this connection for the user is more up to the client.

@m-mohr m-mohr transferred this issue from Open-EO/openeo-api Oct 29, 2019
@m-mohr
Copy link
Member

m-mohr commented Oct 29, 2019

Transferred as I see more room for improvement for the processes, not the API itself.

@m-mohr m-mohr added this to the v1.0 milestone Nov 18, 2019
@m-mohr
Copy link
Member

m-mohr commented Nov 18, 2019

Made minor clarifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants