-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set-archive-status does not update the same number of products each time with packageId argument #115
Comments
I confirm your behavior. Mine is 41201, 41175, 41407 when I think it should be 49097. The most interesting part is that when I run it again, it says 0. Wait 10 minutes and 0 again. Seems like two problems:
|
Problem detected at PSA with version 5.0.2.
Will repeat the problem with latest available version |
Same results with latest official version (5.0.3): Test 1: Test 2 and 3: Test 4: Test 5: Test 6: |
Doh! I had reload my bundle into an empty opensearch today. There are 41097 total products. Now I understand. It has to do with the refresh. AOSS does not support refresh and I have it turned off locally. What this means is that the batch jobs execute to update some of the products to the new status. The process then waits 1 second to give opensearch enough time to refresh. Apparently one second is not quite enough. It then does the query and finds some of the same lidvids that have been updated but not refreshed. Rinse and repeat. Thus the numbers are always larger than the actual number of products (my Doh! moment). I changed the size of the delay to 5 seconds and the symptoms persisted. Changed to 10 and symptoms altered to being a repeatable 41105. Still 8 products were replicated over 41 batch updates (1000 in a batch). Changed to 20 seconds resulting in 41105 - now I am doubting harvest count. The problem with increasing the delay too much, for large packageIds, like a million products, that is 1000 delays. At one second, it is nearly 17 minutes of just waiting. 10 seconds is 3 hours of waiting. Annoyed with the delay, reset it back to 1 second and implemented a Set that keeps just the unique lidvids removing the double counting. However, it means more memory usage as in a quarter GB for a million products. Results of unique list is 41105. Used opensearch
There are 3 options available to you. I implemented the third on the attached PR. If you want option 1 ignore the PR. If you option 3, then accept the PR and call it good. I do not recommend option 2 at all because it scales the worst.
|
@al-niessner Thanks for looking into this issue. I can't tell you what's the best option here. In my case maybe we can deal with random numbers or delay the checks on the new data for some hours to mitigate this problem. Is this possible? Here the results of a different test as I think they are related to this same problem, just for you to know: We use this command to check a previous ingestion:
Where DATA_FILE content is:
This command also returns different values, always increasing, until it gets the expected result. So, I assume the same problem affects not only the count process but any other query in the system. |
Checked for duplicates
No - I haven't checked
🐛 Describe the bug
I’m now testing the registry-manager tool and I get some results what seems inconsistent results. For instances, this command:
/registry-manager -es file:///home/psaops/.auth/opensearch-configuration.xml -auth /home/psaops/.auth/registry-auth.txt set-archive-status -status archived -packageId 384cfba1-2752-49d7-ad2b-5afbce35c8b0
I can get 27 elements updated. Second time I got 0, this is expected. But when I change again the status to “staged” , for instances the command can return just 15 values or some other inconsistent number
Also this package-id is associated to just three documents: A bundle and two collections:
[SUMMARY] Summary:
[SUMMARY] Skipped files: 0
[SUMMARY] Loaded files: 3
[SUMMARY] Product_Bundle: 1
[SUMMARY] Product_Collection: 2
[SUMMARY] Failed files: 0
[SUMMARY] Package ID: 384cfba1-2752-49d7-ad2b-5afbce35c8b0
So I asume the tool here is updating the status for these context products AND the products contained in the collections, right?
🕵️ Expected behavior
I expected the command to update the same number of products, since there are not other changes in the registry.
📜 To Reproduce
Reported by Jose Osinde from ESA/PSA.
🖥 Environment Info
No response
📚 Version of Software Used
registry-mgr 5.0
🩺 Test Data / Additional context
No response
🦄 Related requirements
🦄 #xyz
Acceptance Criteria
Given
When I perform
Then I expect
🎉 Integration & Test
No response
The text was updated successfully, but these errors were encountered: