Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a data custodian, I want the Deep Archive to work around invalid URLs in the Registry #162

Closed
nutjob4life opened this issue Mar 26, 2024 · 8 comments

Comments

@nutjob4life
Copy link
Member

nutjob4life commented Mar 26, 2024

Checked for duplicates

No - I haven't checked

πŸ§‘β€πŸ”¬ User Persona(s)

Users like the ones in NASA-PDS/operations#476

πŸ’ͺ Motivation

The Registry API seems to be loaded with some bad data, namely file paths like

'ops:Data_File_Info.ops:file_ref': ['https://pds-rings.seti.org/pds4/bundles/cassini_uvis_solarocc_beckerjarmak2023//data/collection_data.csv']

with a // between cassini_uvis_solarocc_beckerjarmak2023 and data. This causes the Deep Archive to output Submission Information Packages with double-slashes in them too, causing validation errors.

πŸ“– Additional Details

See NASA-PDS/operations#476 for a specific example.

Acceptance Criteria

Given a document in OpenSearch containing double-slashes in the URL path
When I perform pds-deep-registry-archive on the bundle containing that document
Then I expect the file paths and URLs output to be "cleaned" up to single slashes

βš™οΈ Engineering Details

No response

@jordanpadams jordanpadams changed the title As a data custodian, I want the Deep Archive to work around bad data in the Registry As a data custodian, I want the Deep Archive to work around invalid URLs in the Registry Apr 19, 2024
@github-project-automation github-project-automation bot moved this to Release Backlog in B15.0 Apr 19, 2024
@jordanpadams
Copy link
Member

@nutjob4life I just realized I never triaged this and we probably need this cleaned up ASAP to unblock that operations ticket.

@nutjob4life
Copy link
Member Author

@jordanpadams on it!

@jordanpadams
Copy link
Member

Thanks @nutjob4life πŸŽ‰

@gxtchen
Copy link

gxtchen commented Jul 14, 2024

@nutjob4life what url should I use to run the deep-registry-archive? I got "ValueError: πŸ€·β€β™€οΈ The bundle urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.1 cannot be found in the registry at https://pds.nasa.gov/api/search/1.0/"

@nutjob4life
Copy link
Member Author

Hi @gxtchen, I don't know the answer to this.

I believe the URL is correct but perhaps the registry is missing some data? @tloubrieu-jpl @jordanpadams could you take a peek? When I run it, I get the same thing:

mirasol 209 % .v/bin/pds-deep-registry-archive --site PDS_RNG urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.1
INFO πŸ‘Ÿ PDS Deep Registry-based Archive, version 1.3.0
ERROR πŸ’₯ We got an unexpected error; sorry it didn't work out
Traceback (most recent call last):
  File "/Users/kelly/Documents/Clients/JPL/PDS/Development/nasa-pds/deep-archive/src/pds2/aipgen/registry.py", line 375, in main
    generatedeeparchive(args.url, args.bundle, args.site, not args.include_latest_collection_only)
  File "/Users/kelly/Documents/Clients/JPL/PDS/Development/nasa-pds/deep-archive/src/pds2/aipgen/registry.py", line 350, in generatedeeparchive
    prefixlen, bac, title = _comprehendregistry(url, bundlelidvid, allcollections)
  File "/Users/kelly/Documents/Clients/JPL/PDS/Development/nasa-pds/deep-archive/src/pds2/aipgen/registry.py", line 224, in _comprehendregistry
    raise ValueError(f"πŸ€·β€β™€οΈ The bundle {bundlelidvid} cannot be found in the registry at {url}")
ValueError: πŸ€·β€β™€οΈ The bundle urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.1 cannot be found in the registry at https://pds.nasa.gov/api/search/1.0/
INFO πŸ‘‹ Thanks for using this program! Bye!

@jordanpadams
Copy link
Member

jordanpadams commented Jul 15, 2024

@gxtchen you cannot test this this with the public registry until the multi-tenancy migration has completed: NASA-PDS/registry#185

@jordanpadams
Copy link
Member

you can try downloading and loading that data into a local registry and test with that

@tloubrieu-jpl
Copy link
Member

@gxtchen can wait to test that until the API is up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏁 Done
Status: 🏁 Done
Development

No branches or pull requests

4 participants