Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive status does not change for all bundle members #109

Closed
plawton-umd opened this issue Nov 11, 2024 · 14 comments Β· Fixed by NASA-PDS/registry-common#113
Closed

archive status does not change for all bundle members #109

plawton-umd opened this issue Nov 11, 2024 · 14 comments Β· Fixed by NASA-PDS/registry-common#113
Assignees

Comments

@plawton-umd
Copy link

plawton-umd commented Nov 11, 2024

Checked for duplicates

Yes - I've already checked

πŸ› Describe the bug

When I did set the archive status to archive, I noticed only the bundle's product status was changed. All members
of the bundle remained staged.

πŸ•΅οΈ Expected behavior

I expected all products of the bundle to have their archive_status updated to archived.

πŸ“œ To Reproduce

  1. Upload bundle including at least one collection and one data product
  2. check archive_status
    pds-registry-client -v -p -d '{"query":{"simple_query_string":{"query":"<your bundle lid LID here>*"}}}' '/<your registry>-registry/_search'
  3. confirm "staged" status
  4. update archive status
    registry-manager set-archive-status \Β Β Β Β 
    -auth <your auth file here> \Β Β Β Β Β Β Β Β Β Β 
    -es file:<your xml file here> \Β Β Β Β 
    -lidvid <your bundle LIDVID here> \Β Β Β Β 
    -status archived
  5. check archive_status
    pds-registry-client -v -p -d '{"query":{"simple_query_string":{"query":"<your bundle lid LID here>*"}}}' '/<your registry>-registry/_search'
  6. confirm all products now have archived status
    ...

πŸ–₯ Environment Info

  • development registry
  • Registry Manager version: 5.0.2
    Build time: 2024-10-16T22:51:47Z
  • Operating System: MacOSX
    ...

πŸ¦„ Related requirements

πŸ¦„ #112
πŸ¦„ #113

βš™οΈ Engineering Details

No response

πŸŽ‰ Integration & Test

No response

@tloubrieu-jpl
Copy link
Member

Thanks @plawton-umd for reporting this.

For now, a work-around is to run registry-mgr on each collection in the bundle. The archive statuses of the products members of the collections should be updated with the archive-status of the collection.

@plawton-umd
Copy link
Author

@tloubrieu-jpl

Same problem will collections. Only the collection product archive-status is changed. Not the members' archive-status.

@tloubrieu-jpl
Copy link
Member

Thanks @plawton-umd , the collectino members is a regression, I feel like.

@al-niessner
Copy link
Contributor

@plawton-umd @tloubrieu-jpl

This is by design. When given a lidvid, registry-mgr acts on that lidvid alone. It does not inspect the lidvid and operate on its references. For this we would have to add a --recursive switch to know if you wanted to act on this lidvid alone or not.

To change all items from a harvest of a bundle or collection, use -packageId. I believe this was recently introduced instead of --recursive because it solves nearly the same problem (very good proxy) or an even more relevant problem.

@jordanpadams
Copy link
Member

@al-niessner tracked down and added the related requirements that were closed but we have introduced a regression at some point

@al-niessner
Copy link
Contributor

@jordanpadams

Because of how lid/lidvids are done, we can never get this right. Arguably everything is version 1.0 so the error is small. However, would it be acceptable to say --recurse looks up the lidvid, extracts its packageID and then uses it to recurse? Seems it should be given how people harvest a bundle.

Either way, we are going to make mistakes. The question is which is easier to recover from. Given a bundle that points to a collection via a lid which do you pick when there are more than one lidvids for that lid? Do all of them? The latest? All but the latest? You get the idea. In most cases we do latest, but this is unique because almost all use cases are going to go to archived for all versions or all older versions. If we do crazy checks and twists and turns with the lidvid to do anything other than all, then it may not be reversible. If we do packageId and the user does not like it, then they can just do it again with the old state. It is why I am leaning that way and asking. It will produce weird results like --recurse on an end product will also update the bundle. It will make recurse go both ways. Thoughts? It is really a question of use case setup (how harvest is used) and what they want to update (last harvest or thread of lidvids).

@jordanpadams
Copy link
Member

jordanpadams commented Nov 25, 2024

@al-niessner aren't the *-registry-refs indexes supposed to capture the membership information we need for this?

@jordanpadams
Copy link
Member

@al-niessner per the requirements referenced above, this has been done before and was working as expected, so it is possible from the current registry metadata.

@al-niessner
Copy link
Contributor

@jordanpadams

*-registry-refs does not contain all the information needed. Lot of it comes from lid/lidvid problems and the other comes from bundles contain the collection ref lids. The *-registry-refs does not contain the information that says collection 2.0 contains this set of lidvids but collection 3.0 contains this other set of lidvids. I suspect that nobody really cares but it is why lid and lidvid translations are always a problem -- lid to lidvid conversions are not well defined. I also suspect that the real requirement is change every reference ignoring all versions for aggregate lids (bundles and collections) not follow references of a specific version (way harder).

I think I will implement this with the default recursion level being 3 for bundles, 2 for collections and 1 for produce and ignore versions.

@al-niessner
Copy link
Contributor

@jordanpadams

It seems all the original code is still in place for doing bundles and collections:
https://github.com/NASA-PDS/registry-common/blame/main/src/main/java/gov/nasa/pds/registry/common/es/service/ProductService.java

Let me debug this because it may be a super simple fix...

@jordanpadams
Copy link
Member

@al-niessner copy that

@plawton-umd
Copy link
Author

@al-niessner @jordanpadams
I do not understand the comment "everything is version 1.0". Everything is not version 1.0.
If you need examples, please let me know.

@tloubrieu-jpl
Copy link
Member

Hi @plawton-umd , I believe @al-niessner meant a wide majority of lid only exists with a version 1.0, but we know some have additional version and @al-niessner 's fix will cover that.

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Dec 3, 2024

The set-archive-status only updates the primary members of a collection, not the secondary. That is the expected behavior.

The apparent inconsistency with harvest behavior is due to the fact that harvest loads all data in directories without analyzing the bundle/collection structures.

To make it less confusing, we could remove the bundle configuration option from harvest.

@github-project-automation github-project-automation bot moved this from ToDo to 🏁 Done in EN Portfolio Backlog Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏁 Done
5 participants