Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection link points to local path #838

Open
m-mohr opened this issue Jul 6, 2022 · 11 comments
Open

Collection link points to local path #838

m-mohr opened this issue Jul 6, 2022 · 11 comments
Milestone

Comments

@m-mohr
Copy link
Contributor

m-mohr commented Jul 6, 2022

While I see a lot of code that normalizes asset hrefs and make them relative whenever required, it seems I always end up with a local file path for the collection link. I think the collection link should be normalized as well and ideally be relative to the self link.

@m-mohr m-mohr added the bug Things which are broken label Jul 6, 2022
@b4l
Copy link

b4l commented Jul 12, 2022

I experienced the same issue playing around with the qgis stac browser plugin. It only happens when the root catalog / landing page is served at the root, for example http://localhost/ but not if served at a path like http://localhost/somepath/.

@gadomski
Copy link
Member

Isn't this solved by saving Collections with CatalogType.SELF_CONTAINED? I don't know if this is a bug per se -- maybe more of a documentation/best practices issue?

E.g. we generate a lot of self contained Collections for stactools-packages examples:

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 31, 2023

That is likely the case, but if I load a Collection from disk and create an Item for it, it's a bit weird that I have to set something in the Collection to get a relative link in the item, right? Can I specify the catalog type when loading from disk? Otherwise, I assume that if I save the Collection after the Item, the Item is not updated accordingly, right?

I still have to verify, but I think that's what I did back then (i.e. load collection from disk -> create item -> save item -> not sure whether I saved the collection again, likely not). So maybe what's missing is a way to specify the catalog type when calling Collection.from_file()? @gadomski

@gadomski gadomski added this to the 1.7 milestone Jan 31, 2023
@gadomski gadomski self-assigned this Jan 31, 2023
@gadomski
Copy link
Member

Yup, you're right -- reading a catalog with relative hrefs products pystac objects with absolute self hrefs: #972. I've opened that draft PR to demonstrate the problem and will use it to make a fix.

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 31, 2023

Thanks. Just for reference, this is where it occurred to me: https://github.com/stactools-packages/noaa-mrms-qpe/blob/main/src/stactools/noaa_mrms_qpe/commands.py#L135

@gadomski
Copy link
Member

@m-mohr circling back to this one, I'm now unsure of what the issue is. I implemented your workflow, as described in #838 (comment), here: 247d27c. That test doesn't fail -- the item is saved with relative hrefs. I must be misunderstanding your scenario -- can you help clarify? Thanks.

@m-mohr
Copy link
Contributor Author

m-mohr commented Feb 22, 2023

@gadomski I'm not 100% sure about the test (not sure what prebuilt_catalog_1.get_all_collections() does), but it was essential for this to happen that the Collection is loaded from disk, not created on the fly. Is that the case in the tests? Also, I did not use add_item as I did not had to add the item to the collection (stactools package where you create an item for a collection, but don't add it to it directly).

@gadomski
Copy link
Member

it was essential for this to happen that the Collection is loaded from disk, not created on the fly. Is that the case in the tests?

Yes, the prebuilt catalog is created by reading https://github.com/stac-utils/pystac/blob/ba073056e420a99bab0e5287687246fa4ab8f320/tests/data-files/catalogs/test-case-1/catalog.json.

I did not use add_item as I did not had to add the item to the collection

Hm, then how did you add the item link to the collection?

@m-mohr
Copy link
Contributor Author

m-mohr commented Feb 22, 2023

I haven't added the item link to the collection to keep the collection clean as I assumed it would not be relevant if you ingest into MS PC and generate links dynamically anyway. See https://github.com/stactools-packages/noaa-mrms-qpe/blob/main/src/stactools/noaa_mrms_qpe/commands.py#L135 for details.

@gadomski
Copy link
Member

I think I've figured out the issue, thanks for your patience and your iterations. You're bumping into some complexities around how pystac resolves link hrefs. Specifically, link resolution is absolute by default, and only uses relative hrefs if the link's owner has a root and the root is a "relative" Catalog type:

pystac/pystac/link.py

Lines 169 to 185 in 8db9afb

if (
transform_href
and href
and is_absolute_href(href)
and self.owner
and self.owner.get_root()
):
root = self.owner.get_root()
rel_links = [
*HIERARCHICAL_LINKS,
*pystac.EXTENSION_HOOKS.get_extended_object_links(self.owner),
]
# if a hierarchical link with an owner and root, and relative catalog
if root and root.is_relative() and self.rel in rel_links:
owner_href = self.owner.get_self_href()
if owner_href is not None:
href = make_relative_href(href, owner_href)

I'm not sure if this a bug per se; it might just a complex and relatively undocumented system. @m-mohr what would be your preferred resolution?

Examples

This test fails:

def test_collection_link_is_relative(item: Item, collection: Collection) -> None:
    collection.set_self_href("http://pystac.test/collection.json")
    item.set_self_href("http://pystac.test/item.json")
    item.set_collection(collection)
    link = item.get_single_link("collection")
    assert link
    assert link.href == "./collection.json"  # link.href is "http://pystac.test/collection.json"

In order to get your desired behavior (relative collection href), you need to make a couple changes:

def test_collection_link_is_relative(item: Item, collection: Collection) -> None:
    collection.set_self_href("http://pystac.test/collection.json")
    collection.catalog_type = CatalogType.SELF_CONTAINED  # ensure the collection is a "relative" root (self contained or relative published)
    item.set_self_href("http://pystac.test/item.json")
    item.set_collection(collection)
    item.set_root(collection)  # the item needs a root to enable relative hrefs
    link = item.get_single_link("collection")
    assert link
    assert link.href == "./collection.json"

@gadomski
Copy link
Member

gadomski commented Mar 1, 2023

Moving this to the 1.8 milestone so it doesn't hold up 1.7.

@gadomski gadomski modified the milestones: 1.7, 1.8 Mar 1, 2023
@gadomski gadomski removed this from the 1.8 milestone Mar 31, 2023
@gadomski gadomski removed their assignment Mar 31, 2023
@gadomski gadomski removed the bug Things which are broken label Mar 31, 2023
@gadomski gadomski added this to the v1.12 milestone Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants