-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling STAC extension version upgrades #448
Comments
Some questions:
Speaking from a stactools perspective, the best case scenario would be a PySTAC that came with "batteries included"; it would work with all core extensions without needing to depend on additional repositories. The version thing is a little trickier -- I think it's non-ideal if downstream code has to add logic to deal with different extension versions, that feels like a PySTAC thing to me. I think Option 3 (Only support latest version) is my initial instinct, with maybe some additional super-powers to "bring forward" older versions to the latest version when it makes sense. |
Answers:
PySTAC already migrates things to the latest supported version, including extensions. That's part of the hooks for an extension - it provides the extension the ability to upgrade an object to the latest version. I agree Option 3 is the way to go; the additional maintenance burden and code spread of Option 1 I think would outweigh the burden to upgrade to new versions of extensions as they are released. PySTAC has gone with an approach of "use the latest version to read any version, but write the latest version. If you need to write older versions, use older versions of PySTAC". I think that also applies here. Option 2 might be a better solve, but would be a lot more complex and potentially an even greater maintenance burden, and would confuse the "always upgrade to latest" approach PySTAC takes now - if we do this for extensions, why not for core objects? Maybe Option 2 is a direction we should explore if there's enough use cases where upgrading to the latest (or needing to upgrade PySTAC to a newly released version of an extension) is a big issue for users. |
That all makes sense to me. If we're comfortable with Option 3, at least for now, I won't include this in the RC1 milestone. I mostly wanted to make sure we decided on any breaking changes prior to that release. I've been a little uncomfortable with the fact that we automatically migrate objects when deserializing them (although I also see the benefit of it). I think that deserves a discussion of its own, however, so I'll open a separate issue for that. |
@duckontheweb to answer your question in previous discussion: In DotNetStac, I implemented Option 3 with a versioning mechanism reading previous versions (if implemented) with upgrading functions to latest version. |
FYI: I basically do the same in STAC Browser, all data is automatically converted to the latest version (although STAC Browser only does read and no write, of course). This is done through stac-migrate, but I have not implemented e.g. updating file extension 1.0 to 2.0 (+ raster 1.0) for example. |
👋 That's an interesting discussion. From a user POV, right now I'm a bit worried. I wanted to use pystac+extensions but I'm afraid about having to keep an eye on either pystac version and version of the extension within each extension submodule. I was a bit surprised to find all extensions from stat-extensions available in pystac, and while this is nice it might get messy in the future. (IMO, I would have deferred the management of each extension to each repo but some maintainers might not use python at all). I'm 👍 on Option 3, because it seems the simplest (😬 I'm not a contributor so you can ignore me). |
One possible hiccup to option 3 is the case where we cannot fully migrate fields from a previous extension version into the most recent version. The File Info Extension may be an example of this (see discussion in #472). In that case, some of the v1.0.0 fields were dropped in v2.0.0 and moved into the Raster Extension. However, there was no guarantee that those fields were being applied to a raster file originally so it may not be appropriate to create a Raster Band for them. This does not necessarily break anything because they can still access the fields directly via |
@duckontheweb resurfacing this issue as something that should probably be addressed as part of a v2.0 release. As the STAC ecosystem matures the domain of extension versions will only grow, and if we want people to use PySTAC's extension API (instead of going directly to |
I think that exact version match approach will have significant negative impact. Effectively this makes it really hard to evolve extensions. One would need to upgrade an entire collection every time a new version of an extension your collection uses changes. Even if that change is a pure addition of new optional fields. Even if I could downgrade/upgrade installed extension versions independently per extension, it would not solve the problem of using multiple data sources within the same environment, not all collections will be in sync and sometimes you want data from many data sources. If my currently installed software supports extension Newer code/older data case is certainly possible, even if you have no constraints at all on format evolution, in the worst case you have older implementations included as renamed copies in the newer code that newer code can dispatch to, and possibly include on-the-fly translations (extra code to support migration will be needed). Older code/newer data case is harder/more dangerous. For that to work one would need to agree up front on what changes are allowed as you go from At the very least it should be possible to force extension code to interpret whatever data is present even if it's not quite the right version without having to patch |
It sounds like there is pretty broad interest in supporting multiple extension versions within a single version of PySTAC, and I tend to agree that we should not force an extension upgrade behind the scenes. Adding support for new extensions and new versions of extensions is often mixed in with other bug fixes and feature enhancements as part of a single release. This means that if someone upgrades their PySTAC version to take advantage of some new feature or fix they may also be railroaded into using a newer version of one of the extensions. Users may have a good reason for sticking with an older version of an extension (e.g. maintaining consistency across an existing API/catalog) and shouldn't have to go to great lengths to do so just because they updated their PySTAC version. All of that being said, I'm still struggling with how to implement this in a maintainable way within the library. It seems like we would need separate |
I agree that explicitly enumerating supported extension versions would be both unwieldy, and wouldn't really solve the problem of unsupported extension versions in the wild unexpectedly blowing up in a user's face. One idea could be to leverage extension prefixes to collect any fields not explicitly supported by PySTAC's supported extension version; this could be coupled with foo = FooExtension.ext(item) # warning: unsupported extension field(s): `foo:bar, foo:cheers`
assert foo.additional_attributes == { "foo:bar": "baz", "foo:cheers": True } ...or something like that? |
I'm not familiar with PySTAC codebase so can't comment on the implementation options, but from the user perspective I would like to see the following, note that this is a "wish list" and "an ideal scenario for evolution". The ideal ideal scenario is that everything is defined so well at the start that no change is needed at all, but that's not a thing that anyone can deliver. At the very least these are the things to keep in mind when suggesting a format change or designing an implementation. On the software side of things
On the format side
|
I've also recently surfaced this again when writing stuff for a stactools package where I has to use older (file) and newer (raster) extension versions. I'm also having this issue in the JS ecosystem (stac-migrate). A solution should be discussed in the stac-spec community in general, I think. What should not happen is that is you read an older version that it throws an error. Then PySTAC should at least handle it as an unknown extension (via stac_extensions and extra_fields directly) instead of using the (incompatible) extension interface. But ideally it would try to migrate somehow, I think. Isn't that already done with core STAC? |
newer version of raster extension breaks tests. We'll need to work around overly strict version checking in pystac library or just not use extension classes and look for data in dicts directly, it's not realistic to expect all data and all software to always use compatible versions in any given installation. stac-utils/pystac#448
newer version of raster extension breaks tests. We'll need to work around overly strict version checking in pystac library or just not use extension classes and look for data in dicts directly, it's not realistic to expect all data and all software to always use compatible versions in any given installation. stac-utils/pystac#448
I've been messing around with the interface from objects to extensions (#1051) and I think that if we change the invocation, option 2 becomes a little more attractive. In that world the user really doesn't care what the Extension class is called at least. With the idea of registering extensions we could theoretically open the door for custom extensions to write their own classes and register them with the pystac objects. I think there isn't a big risk of namespace collision because the prefix probably has to be pretty unique already and that would be the namespace. |
If we needed more motivation to fix the extension situation: opendatacube/odc-stac#105. |
Some thoughts from today's stac-utils working group meeting:
|
yeah I think that's right. We can go ahead and close this. |
Moving here from
stac-spec
Discussions thread.Now that STAC extension versions are no longer tied to the core STAC version or to each other, we may run into situations where a user is working with catalogs that implement some combination of a core spec version and extension versions that are not all supported by the same version of PySTAC.
Here are a few ideas on how to handle this, but other suggestions are welcome. It would also be good to get feedback from maintainers of downstream libraries on what would work best.
1. Separate Extension Repos
STAC extension implementations would each have their own repo that would follow its own versioning (e.g.
pystac-extension-label
,pystac-extension-file
, etc.). This would probably also require creating apystac-extensions-base
package that would be a dependency of each extension implementation.Pros
Cons
stac-utils
orgpystac-extensions-base
all-extensions
extra in thepystac
install that grabs all of them...)2. Version-specific Extension Classes
Instead of a single set of extension classes (e.g.
EOExtension
,ItemEOExtension
,AssetEOExtension
), we create a set of classes for each extension version (e.g.EOExtension_1_0_0
, etc.). We still have a top-level extension class (e.g.EOExtension
) with anext
method that would handle parsing the extension schema URI and delegating to the right version-specific class.Pros
ext
stays mostly the same as it is nowCons
EOExtension.ext
and getting an instance ofEOExtension_1_0_0
back might be non-intuitive).3. Only Support Latest Version (Status Quo)
We continue to update the existing extension code to support each new extension version as it is released and do not provide support for multiple extension versions within the same version of PySTAC.
Pros
Cons
cc @lossyrob @m-mohr @gadomski @jbants @matthewhanson
The text was updated successfully, but these errors were encountered: