Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pickling methods loads all absolute links #1336

Closed
emmanuelmathot opened this issue May 2, 2024 · 6 comments · Fixed by #1337
Closed

New pickling methods loads all absolute links #1336

emmanuelmathot opened this issue May 2, 2024 · 6 comments · Fixed by #1337

Comments

@emmanuelmathot
Copy link
Contributor

Since the PR #1285 and more specifically this change, it seems that a deepcopy of an item will load all the links with an absolute href.
Is it intended or am I missing something?
This causes issue when loading assets using get_assets method that makes first a deep copy of the stac object that uses pickling.
When I load an item with unreachable links (e.g. s3 url but no custom IO reader set) and try to list the assets, it raises an issue.

    self.assets = list(
rio_tiler/io/stac.py:149: in _get_assets
    for asset, asset_info in stac_item.get_assets().items():
venv/lib/python3.11/site-packages/pystac/asset.py:300: in get_assets
    return {
venv/lib/python3.11/site-packages/pystac/asset.py:301: in <dictcomp>
    k: deepcopy(v)
/usr/lib/python3.11/copy.py:172: in deepcopy
    y = _reconstruct(x, memo, *rv)
/usr/lib/python3.11/copy.py:271: in _reconstruct
    state = deepcopy(state, memo)
/usr/lib/python3.11/copy.py:146: in deepcopy
    y = copier(x, memo)
/usr/lib/python3.11/copy.py:231: in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
/usr/lib/python3.11/copy.py:161: in deepcopy
    rv = reductor(4)
venv/lib/python3.11/site-packages/pystac/item.py:179: in __getstate__
    d["links"] = [
venv/lib/python3.11/site-packages/pystac/item.py:180: in <listcomp>
    link.to_dict() if link.get_href() else link for link in d["links"]
venv/lib/python3.11/site-packages/pystac/link.py:181: in get_href
    and self.owner.get_root()
venv/lib/python3.11/site-packages/pystac/stac_object.py:326: in get_root
    root_link.resolve_stac_object()
venv/lib/python3.11/site-packages/pystac/link.py:330: in resolve_stac_object
    obj = stac_io.read_stac_object(target_href, root=root)
venv/lib/python3.11/site-packages/pystac/stac_io.py:234: in read_stac_object
    d = self.read_json(source, *args, **kwargs)
venv/lib/python3.11/site-packages/pystac/stac_io.py:205: in read_json
    txt = self.read_text(source, *args, **kwargs)
venv/lib/python3.11/site-packages/pystac/stac_io.py:282: in read_text
    return self.read_text_from_href(href)
venv/lib/python3.11/site-packages/pystac/stac_io.py:300: in read_text_from_href
    with urlopen(req) as f:
/usr/lib/python3.11/urllib/request.py:216: in urlopen
    return opener.open(url, data, timeout)
/usr/lib/python3.11/urllib/request.py:519: in open
    response = self._open(req, data)
/usr/lib/python3.11/urllib/request.py:541: in _open
    return self._call_chain(self.handle_open, 'unknown',
/usr/lib/python3.11/urllib/request.py:496: in _call_chain
    result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <urllib.request.UnknownHandler object at 0x74a76915ba50>
req = <urllib.request.Request object at 0x74a768cd8dd0>

    def unknown_open(self, req):
        type = req.type
>       raise URLError('unknown url type: %s' % type)
E       urllib.error.URLError: <urlopen error unknown url type: s3>
@gadomski
Copy link
Member

gadomski commented May 2, 2024

It looks to me like we've gotten bit by get_link()'s default to transform_hrefs=True again (for previous art, see #960). I'll open a PR with a fix.

I'm not sure this is true, see follow-on comment for more info.

@gadomski
Copy link
Member

gadomski commented May 2, 2024

@emmanuelmathot can you provide a minimum-reproducible example so I can be sure I'm testing against the same problem? I was not able to reproduce the behavior you described with this test:

def test_non_existent_link_during_deepcopy(item: Item) -> None:
    item.add_link(pystac.Link("non-existent-asset", "../not-a-dir/not-a-file"))
    item = copy.deepcopy(item)
    assert item.get_single_link("non-existent-asset").href == "../not-a-dir/not-a-file"

@emmanuelmathot
Copy link
Contributor Author

sure, please find the test in this branch: https://github.com/emmanuelmathot/pystac/blob/pickle/tests/test_item.py#L686

@gadomski
Copy link
Member

gadomski commented May 2, 2024

@emmanuelmathot do you have an example that includes creating that test file? I'd like to be able to dig into the process that's actually doing the href modifications.

@emmanuelmathot
Copy link
Contributor Author

emmanuelmathot commented May 2, 2024

No I do not but a very simple item with one link with absolute s3 href makes the error.

This is really similar to what you mentioned here

It looks to me like we've gotten bit by get_link()'s default to transform_hrefs=True again (for previous art, see #960).

when I put transform_href=False in the __getstate__ method

   d["links"] = [
            link.to_dict(transform_href=False) if link.get_href(transform_href=False) else link for link in d["links"]
        ]

There is no more error

@gadomski
Copy link
Member

gadomski commented May 2, 2024

@emmanuelmathot got it, thanks. Fix in #1337 which we'll release after merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants