-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New records harvested in the registry don't have the expected Node value #186
Comments
Created a clear requirement to tie to this as well: |
Yes, this is to be expected. As discussed previously, moved the node name to be identified through the registry name. This is done here: harvest/src/main/java/gov/nasa/pds/harvest/cfg/ConfigManager.java Lines 60 to 64 in ae58da7
It always returns a name. Do you prefer to throw an error when not found in this table? harvest/src/main/java/gov/nasa/pds/harvest/cfg/ConfigManager.java Lines 21 to 34 in ae58da7
As you can see from the function, it is mangling your registry name to get the node name since you are not following the convention you gave me. That convention was used to create the connections here: We can do a couple of thing if this is a set of integration test nodes (maybe for some users initial review) but not full deployment nodes (ones that follow the original naming rules):
I would suggest 2 and 4. We can and should add more connections, both cognito and direct, to support more registries. We can then add them to the table above as needed with whatever name is desired. |
Need answering for outstanding questions |
Hi @al-niessner , Sorry, after taking a bit more time to think of it, I believe you are right, that does not make sense to keep a hardcoded mapping in the code, because we would need to update the harvest code if we add a new discipline node. Although that might never happen that is an unnecessary hassle. Besides, I am realizing we don't want to redundantly manage that harvesintg node in harvest since we know who the document has been harvested by from the index in which the document is. In the documents it shows as Unless we have cases where someone would like to ingest into an index, e.g. geo-registry but be authored as a different node, for example PDS_SBN or a more specific entity, for example PDS_GEO_MARS ... I don't believe we need that, but I am leaving that decision to @jordanpadams. If we confirm we don't need that flexibility of having different harvesting nodes in the same index, I would completely remove this updates from the client side, in harvest, and move it servers ide, in sweeper or by an opensearch configuration (I am not finding which though). @al-niessner , @alexdunnjpl , @jordanpadams , let me know what are your thoughts. Thanks, " |
@tloubrieu-jpl @al-niessner @alexdunnjpl I agree that it is not βcleanβ to have this mapping hardcoded in the client, and we can go with a server side solution, as long as it will not take a significant amount of time or effort. The creation of new indexes / nodes should not happen often (albeit we will have some international partners being added soon :-)). In the event that it is updated and a new node/index is needed, the user that requesting a new index can be required to install a new version of harvest. |
Because of the solution you are outlined above, it is quite obvious I cannot communicate my ideas to you. I am basically saying the opposite. The argument about the hard coded registry name <-> node name mapping is empty. No matter what, because naming rules for both registry and node names is neither defined nor consistent, the mapping has to be hard coded somewhere - client or server - in either a table or algorithm or both. Putting the hard code in the client is long term cheapest as once it is right fewer and fewer corrections are needed. If the ability of forced updates existed (possible), then this would not be a problem - sweeper could be run once. Typos in the index name is a serious problem since the user can use create DB then harvest to fill up pda-registry instead of psa-registry. Hence, users should have only used app://. Doing so also ties the mapping to a fixed set of registry indices. The table and app:// provided in registry-common was self consistent as well preventing this issue altogether. Go back to the purpose of the node name to drive the design. The purpose of node name seems to be to reduce the search space to a specific index. In other words, a user can say I just want to search the node name == PDS_SBN stuff and not the universe, which, for us, maps to an index. If it has more purpose or a different purpose, then we should work with its intended purpose. For instance, if the intent is who authored it, then we should embed the cognito credentials not some enum that cannot be verified; which is to say mars author can claim to be an sbn author while putting data into psa node. "clean" is not possible. See above in this comment for why hard coding must happen. You should write all of the tools to look and see if a new version is available before running and then refuse to run until versions match. Getting this out would save you so much money in development time and database sweeping. There may still be a bit of open window to slip it in with this release but that window is barely open and closing. Anyway, I doubt I did any better communicating my position so just tell me what you want implemented and I will do it. |
@al-niessner per the new requirement now developed, we need a very consistent
I 100% agree we need something like this. We are spending way too much time fixing issues with older/bad versions of tools. The problem I foresee here is (1) the tool should key off of the major version, and ignore the minor. (2) we have to use semantic versioning correctly when it comes to changes to metadata. |
@tloubrieu-jpl will describe the table to be hardcoded. Anything out of the table should through an error. create dev name. |
Hi @al-niessner , The expected mapping is:
Anything different, should raise an error saying: "Index not supported: either fix it in your configuration by using one of the supported or request an upgrade of harvest to support your new index by submitting a ticket on https://github.com/NASA-PDS/harvest/issues". Thanks, |
Checked for duplicates
Yes - I've already checked
π Describe the bug
In the registry, we can find documents in the *-registry indexes with property ops:Harvest_Info/ops:node_name =
geo
oren
π΅οΈ Expected behavior
I expected to have values PDS_GEO, or PDS_ENG.
π To Reproduce
Harvest data as described in the manual https://docs.google.com/document/d/12W1DyaRYh4yYnw4p_-qFLQ0kmn_w14Sr/edit
π₯ Environment Info
No response
π Version of Software Used
Using harvest v4.0.1
π©Ί Test Data / Additional context
The node value expected are:
π¦ Related requirements
π¦ #187
βοΈ Engineering Details
It sounds related to the change in the code which made the node name in the harvest configuration obsolete.
π Integration & Test
No response
The text was updated successfully, but these errors were encountered: