Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set-archive-status does not update the same number of products each time with packageId argument #115

Closed
tloubrieu-jpl opened this issue Dec 2, 2024 · 5 comments · Fixed by #116
Assignees
Labels
bug Something isn't working s.high sprint-backlog

Comments

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Dec 2, 2024

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

I’m now testing the registry-manager tool and I get some results what seems inconsistent results. For instances, this command:

/registry-manager -es file:///home/psaops/.auth/opensearch-configuration.xml -auth /home/psaops/.auth/registry-auth.txt set-archive-status -status archived -packageId 384cfba1-2752-49d7-ad2b-5afbce35c8b0

I can get 27 elements updated. Second time I got 0, this is expected. But when I change again the status to “staged” , for instances the command can return just 15 values or some other inconsistent number

Also this package-id is associated to just three documents: A bundle and two collections:

[SUMMARY] Summary:
[SUMMARY] Skipped files: 0
[SUMMARY] Loaded files: 3
[SUMMARY] Product_Bundle: 1
[SUMMARY] Product_Collection: 2
[SUMMARY] Failed files: 0
[SUMMARY] Package ID: 384cfba1-2752-49d7-ad2b-5afbce35c8b0

So I asume the tool here is updating the status for these context products AND the products contained in the collections, right?

🕵️ Expected behavior

I expected the command to update the same number of products, since there are not other changes in the registry.

📜 To Reproduce

Reported by Jose Osinde from ESA/PSA.

🖥 Environment Info

No response

📚 Version of Software Used

registry-mgr 5.0

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

Acceptance Criteria

Given
When I perform
Then I expect

🎉 Integration & Test

No response

@tloubrieu-jpl tloubrieu-jpl added bug Something isn't working sprint-backlog s.high labels Dec 2, 2024
@tloubrieu-jpl tloubrieu-jpl changed the title set-archive-status does not update the same number of products each time it runs set-archive-status does not update the same number of products each time with packageId argument Dec 2, 2024
@al-niessner
Copy link
Contributor

@tloubrieu-jpl

I confirm your behavior. Mine is 41201, 41175, 41407 when I think it should be 49097. The most interesting part is that when I run it again, it says 0. Wait 10 minutes and 0 again. Seems like two problems:

  1. how is packageId computed
    a. can two harvest runs make same packageId
    b. can harvest change the packageId during ingestion
  2. why is search giving up before all items are found or in your case keeps going past end of list although end of list is variable depending on 1a.

@josinde
Copy link

josinde commented Dec 3, 2024

Problem detected at PSA with version 5.0.2.

$ ~/software/registry-manager/bin/registry-manager --version
Registry Manager version: 5.0.2
Build time: 2024-10-16T22:51:47Z

Will repeat the problem with latest available version

@josinde
Copy link

josinde commented Dec 3, 2024

Same results with latest official version (5.0.3):

Test 1:
$ registry-manager ... set-archive-status -status staged -packageId 384cfba1-2752-49d7-ad2b-5afbce35c8b
updated 17 documents associated with package ID ...

Test 2 and 3:
$ registry-manager ... set-archive-status -status staged -packageId 384cfba1-2752-49d7-ad2b-5afbce35c8b
updated 0 documents

Test 4:
$ registry-manager ... set-archive-status -status archived -packageId 384cfba1-2752-49d7-ad2b-5afbce35c8b
updated 9 documents

Test 5:
$ registry-manager ... set-archive-status -status archived -packageId 384cfba1-2752-49d7-ad2b-5afbce35c8b
updated 0 documents

Test 6:
$ registry-manager ... set-archive-status -status staged -packageId 384cfba1-2752-49d7-ad2b-5afbce35c8b
updated 12 documents

@al-niessner
Copy link
Contributor

al-niessner commented Dec 3, 2024

@josinde @tloubrieu-jpl

Doh! I had reload my bundle into an empty opensearch today. There are 41097 total products. Now I understand. It has to do with the refresh. AOSS does not support refresh and I have it turned off locally. What this means is that the batch jobs execute to update some of the products to the new status. The process then waits 1 second to give opensearch enough time to refresh. Apparently one second is not quite enough. It then does the query and finds some of the same lidvids that have been updated but not refreshed. Rinse and repeat. Thus the numbers are always larger than the actual number of products (my Doh! moment).

I changed the size of the delay to 5 seconds and the symptoms persisted. Changed to 10 and symptoms altered to being a repeatable 41105. Still 8 products were replicated over 41 batch updates (1000 in a batch). Changed to 20 seconds resulting in 41105 - now I am doubting harvest count. The problem with increasing the delay too much, for large packageIds, like a million products, that is 1000 delays. At one second, it is nearly 17 minutes of just waiting. 10 seconds is 3 hours of waiting.

Annoyed with the delay, reset it back to 1 second and implemented a Set that keeps just the unique lidvids removing the double counting. However, it means more memory usage as in a quarter GB for a million products. Results of unique list is 41105.

Used opensearch _count and it says 41105. So much for harvest counting. 41105 agrees with on disk as well:

find . -name \*.xml -exec grep 'logical_identifier' {} \; | wc -l
41105

There are 3 options available to you. I implemented the third on the attached PR. If you want option 1 ignore the PR. If you option 3, then accept the PR and call it good. I do not recommend option 2 at all because it scales the worst.

  1. Accept the count as random numbers
  2. Increase the delay to 10 seconds
  3. Use a Set to determine the number of unique lidvids

@josinde
Copy link

josinde commented Dec 4, 2024

@al-niessner
Dear Albert,

Thanks for looking into this issue. I can't tell you what's the best option here. In my case maybe we can deal with random numbers or delay the checks on the new data for some hours to mitigate this problem. Is this possible?

Here the results of a different test as I think they are related to this same problem, just for you to know:

We use this command to check a previous ingestion:

$ pds-registry-client --pretty "/psa-registry/_search" -d @${DATA_FILE}

Where DATA_FILE content is:

{
  "size" : "10000",
  "_source" : ["_package_id", "lidvid", "ops:Tracking_Meta/ops:archive_status", "ops:Label_File_Info/ops:file_ref"], 
  "query": {
    "match": {
      "_package_id": "be847797-79dc-472c-8796-d5e60e743016"
    }
  }
}

This command also returns different values, always increasing, until it gets the expected result. So, I assume the same problem affects not only the count process but any other query in the system.

@github-project-automation github-project-automation bot moved this from ToDo to 🏁 Done in EN Portfolio Backlog Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working s.high sprint-backlog
Projects
Status: 🏁 Done
Development

Successfully merging a pull request may close this issue.

3 participants