Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactors architecture to better enable STAC 1.0 extension changes #309

Merged
merged 54 commits into from
May 6, 2021

Conversation

lossyrob
Copy link
Member

@lossyrob lossyrob commented Apr 22, 2021

Related Issue(s): #306

Description:

This PR will contain work to refactor some core architectural pieces in preparation for the STAC 1.0 release. One of the most disruptive changes is that Extensions were moved off of a distinction between "core" and custom extensions, and the usage of a short ID in the stac_extensions field has been removed in favor of identifying extensions only by their JSON schema URL. PySTAC use these IDs as a core component of how it handled extensions, and so this will need to change significantly moving forward. After discussions with some other PySTAC developers and users, it also became clear that the current extension implementation is complex and clunky. I believe now's a good time to rethink how we're doing extensions, and performing a few core refactors of PySTAC prior to the 1.0 release that should make things more simple and better suited to the most recent version of the spec.

I'm putting up a WIP PR early to let folks know what's happening and elicit any feedback as this work occurs. My plan is to approach the work like this:

  • Add type annotations to everything. This will cause breakages due to circular imports, but will make refactoring easier and will likely expose some type organization issues that will be good to clean up (and will solve Add type annotations to PySTAC #260)
  • Attempt to resolve circular imports. This may take a bit of refactoring, and it may be useful to do the extensions rework while this happens, but ideally I can get to a point where tests pass again with full annotations.
  • Rework the extensions
  • Fix tests
  • Implement IO changes
  • Reformat codebase using black (Move to using black as the formatter #316)

Left TODO -

  • Update CHANGELOG

Docstring/Documentation updates, as well as bringing test coverage back up to 93% and some additional implementations/fixes for Collection/Catalog summaries and extensions are encapsulated in issues and left to follow up work.

Extensions have a new API, e.g.

item: pystac.Item = ...
item.ext.enable("eo")
item.ext.eo.cloud_cover = 0.7
print(item.ext.projection.bbox)

is now

from pystac.extensions.eo import EOExtension

item = ...

EOExtension.add_to(item)
eo_ext = EOExtension.ext(item)
eo_ext.cloud_cover = 0.7
print(eo_ext.cloud)

PR Checklist:

  • Code is formatted (run scripts/format)
  • Tests pass (run scripts/test)
  • This PR maintains or improves overall codebase code coverage.
  • Changes are added to the CHANGELOG. See the docs for information about adding to the changelog.

Fixes #293
Fixes #308
Fixes #311
Fixes #313
Fixes #316
Fixes #266
Fixes #260
Fixes #242
Fixes #205
Fixes #132
Fixes #30

@lossyrob lossyrob force-pushed the feature/rde/1.0-refactors branch from cde0c68 to d531cc2 Compare April 22, 2021 15:27
@codecov-commenter
Copy link

codecov-commenter commented Apr 25, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@b005962). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #309   +/-   ##
=======================================
  Coverage        ?   88.60%           
=======================================
  Files           ?       36           
  Lines           ?     4625           
  Branches        ?        0           
=======================================
  Hits            ?     4098           
  Misses          ?      527           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b005962...63f83f4. Read the comment docs.

@lossyrob lossyrob force-pushed the feature/rde/1.0-refactors branch from 75a8c21 to f2f7242 Compare April 25, 2021 05:54
@lossyrob lossyrob force-pushed the feature/rde/1.0-refactors branch from f2f7242 to c78f832 Compare April 25, 2021 05:58
Introdues "ExtensionHooks" which are not necessary for an extension
but are registered if the extension needs to modify the behavor of
PySTAC in it's processes. The two places extensions can currently hook
into are during object resolution, where they can add link reltypes
that indicate STACObject links based on the extension; also migration
- the logic for migrating extensions will be moved from the migrate
subpackage to each extension itself.
@lossyrob lossyrob mentioned this pull request Apr 26, 2021
This extension is in transition - there's upcoming work to either
remove this extension or transfer it to be based on
ItemCollections. Removing it for now to remove maintenance burden.
Copy link
Member

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new stac_io stuff makes sense and looks good to me. I did my best to go through each file's functional changes but it was hard to differentiate formatting/typing changes from the real deal; no obvious WTFs so 👍🏽 .

I will make one comment that's more of a STAC issue than a PySTAC issue, but I still don't love when I see, e.g., get_children returning Catalogs. I understand (I think) why Collection does not inherit from Catalog in the spec but it does here in pystac, and I don't think it matters for the vast majority of users, but it still scans funny during a review. Don't know what the answer is though (without adding like a Container supertype that encompasses both Catalogs and Collectionss and then we're just spamming types).

pystac/asset.py Outdated Show resolved Hide resolved
@lossyrob
Copy link
Member Author

lossyrob commented May 4, 2021

@gadomski I believe we can solve that issue via types by using

def get_children(self): Iterable[Union[Catalog, Collection]]:
   ...

which doesn't mean a lot for the type checker since Collection inherits from Catalog, but would be more clear to a user. Would that resolve the issue you brought up around the get_children confusion? If so I'll make that change to all of the child methods on Catalog.

@gadomski
Copy link
Member

gadomski commented May 4, 2021

Would that resolve the issue you brought up around the get_children confusion?

In that case, yes -- I think there were some other cases where you did a cast(Catalog) in the body of a function, and my static-typed brain thought we were losing information when in reality you're just making the type checker happy. It's not a huge deal, but yes I do think at least in function signatures being explicit about returning a Union[Catalog, Collection] aligns better with the spec (IMO).

lossyrob added 3 commits May 4, 2021 10:04
This makes it more explicit that children are either a Catalog or a
Collection, even though in PySTAC a Collection inherits from a Catalog.
@lossyrob lossyrob dismissed schwehr’s stale review May 4, 2021 16:36

Comments were resolved

lossyrob added 2 commits May 4, 2021 18:06
…tors

While resolving merge conflicts, reorder imports and make slight edits
to docstring text for consistency
@matthewhanson
Copy link
Member

This all looks good to me, although I primarily looked through the STAC_IO changes.

I'm not sure I understand the value in deprecating the STAC_IO class though. None of those functions will ever be called by the new code, and if someone overrides a function like before they won't get an deprecation warning or anything.

Is there value in keeping the code around? Also, I'm not found of the file being stac_io.py while the new class is StacIO, which breaks the naming style. I think it would be fine to completely replace the old STAC_IO class with the new one.

I might have some more feedback when updating pystac-client, but see no reason to not merge this, and I'll get pystac-client working with the main branch in the next few days.

@lossyrob
Copy link
Member Author

lossyrob commented May 6, 2021

I'm not sure I understand the value in deprecating the STAC_IO class though. None of those functions will ever be called by the new code, and if someone overrides a function like before they won't get an deprecation warning or anything.

I've definitely used STAC_IO in various other projects as a convenient way to piggyback on PySTAC functionality; I think there is usage of it directly in stactools. The deprecation allows anyone who was doing that to avoid breaking their codebases while upgrading.

Is there value in keeping the code around? Also, I'm not found of the file being stac_io.py while the new class is StacIO, which breaks the naming style. I think it would be fine to completely replace the old STAC_IO class with the new one.

I think this does follow the convention? e.g. MediaType and media_type.py. I'd consider the "io" as a separate word so the underscore would separate them in the file name.

Let me know what you think on the above two points; if it makes sense to merge as-is I'll do that once you give a final +1

@matthewhanson
Copy link
Member

@lossyrob
Ah-ha, I see it may be used downstream. Ok, sounds good then.

Also, you are right on the convention, in fact it didn't follow the standard Python convention for Class naming using CapWords before, so carry on. +1!

@lossyrob
Copy link
Member Author

lossyrob commented May 6, 2021

SoundsGoodToMe :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment