Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove rcf jar and fix zip fetching for AD and JS #497

Closed
wants to merge 5 commits into from

Conversation

amitgalitz
Copy link
Member

@amitgalitz amitgalitz commented Apr 8, 2022

Description

  1. Removed RCF jars and updated to fetch RCF 3.0-rc2 from maven
  2. Added zip and folder deletion before fetching zips for bwc and integCluster so there isn't multiple zips
  3. added bwc back to CI
  4. changed V1JsonToV2StateConverter usage V1JsonToV3StateConverter
  5. Added testDeserialize based on 1.3 rcf model

Issues Resolved

resolves #433

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@amitgalitz amitgalitz requested a review from a team April 8, 2022 21:45
@opensearch-trigger-bot opensearch-trigger-bot bot added backport 1.x infra Changes to infrastructure, testing, CI/CD, pipelines, etc. labels Apr 8, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #497 (c0fb117) into main (ee057e2) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main     #497      +/-   ##
============================================
+ Coverage     78.14%   78.17%   +0.03%     
- Complexity     4155     4160       +5     
============================================
  Files           296      296              
  Lines         17654    17654              
  Branches       1877     1877              
============================================
+ Hits          13795    13801       +6     
+ Misses         2963     2959       -4     
+ Partials        896      894       -2     
Flag Coverage Δ
plugin 78.17% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../main/java/org/opensearch/ad/ml/CheckpointDao.java 70.19% <ø> (+0.64%) ⬆️
.../java/org/opensearch/ad/AnomalyDetectorPlugin.java 96.53% <100.00%> (ø)
...ansport/handler/AnomalyResultBulkIndexHandler.java 69.35% <0.00%> (-1.62%) ⬇️
.../main/java/org/opensearch/ad/cluster/HashRing.java 81.37% <0.00%> (-0.41%) ⬇️
...ain/java/org/opensearch/ad/model/ModelProfile.java 70.90% <0.00%> (ø)
...rch/ad/transport/AnomalyResultTransportAction.java 80.82% <0.00%> (+0.68%) ⬆️
...va/org/opensearch/ad/AnomalyDetectorJobRunner.java 64.17% <0.00%> (+0.78%) ⬆️
...ava/org/opensearch/ad/task/ADHCBatchTaskCache.java 90.12% <0.00%> (+1.23%) ⬆️

String json = Files.readString(Paths.get(filePath), Charset.defaultCharset());
Map map = gson.fromJson(json, Map.class);
String model = (String) ((Map) ((Map) ((ArrayList) ((Map) map.get("hits")).get("hits")).get(0)).get("_source")).get("modelV2");
ThresholdedRandomCutForest forest = checkpointDao.toTrcf(model);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add checks on the specific fields of trcf. Please check my other tests.

@@ -154,7 +157,7 @@
private GenericObjectPool<LinkedBuffer> serializeRCFBufferPool;
private RandomCutForestMapper mapper;
private ThresholdedRandomCutForestMapper trcfMapper;
private V1JsonToV2StateConverter converter;
private V1JsonToV3StateConverter converter;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a V2StateToV3ForestConverter. We need to convert from v1 to v3, v2 to v3. If v2 and v3 are incompatible, we need to add a field modelV3 in checkpoint.

// .parseReader(new FileReader(new File(getClass().getResource(labalFileName).toURI()), Charset.defaultCharset()))

public void testDeserializeRCFModel() throws Exception {
// Model in file 1_3_0_rcf_model.json not passed initialization yet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it means the model is empty? If yes, have you tested model fully initialized?

@@ -590,6 +607,9 @@ dependencies {
compileOnly "org.opensearch:opensearch-job-scheduler-spi:${job_scheduler_version}"
implementation "org.opensearch:common-utils:${common_utils_version}"
implementation "org.opensearch.client:opensearch-rest-client:${opensearch_version}"
implementation 'software.amazon.randomcutforest:randomcutforest-parkservices:3.0-rc2'
Copy link
Collaborator

@kaituo kaituo Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you list the tests you have done to upgrade to rc2? What I can think of:

  • real time single stream and HCAD (including create/start detector, see detector emit results; then stop cluster and restart it to see if results continue to show)
  • historical single stream and HCAD
  • backward compatible tests: v1 to v3 model, v2 to v3 model: v1 model in checkpoint should not be overridden, and we add v2 and v3 model in checkpoint if necessary. You can start with a v1 cluster, start real time single stream and hcad detector, let them run and produce results; then upgrade some nodes to v2 cluster, see if anything happens when both v1 and v2 node co-exists in the cluster (this is to simulate our blue/green deployment in service), then remove v1 nodes and let v2 run a while, then upgrade some nodes to v3, and see if they can co-exist, then remove v2 and let v3 nodes run.
  • Does memory size formula still hold (Check MemoryTracker)? No need to be exactly the same. Want to verify if the ballpark number is similar.

V1 refers to the checkpoint version we have used until #149. v2 is what we have until your PR. v3 is what your PR tries to bring.

Copy link
Member Author

@amitgalitz amitgalitz Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested single stream real time and historical, I'll do the same for HCAD right now. BWC test were the test added based on Yaliang's code to check deserialize model from 1.3. And converting test to v1 to v3, I didn't add v2 to v3.

Copy link
Collaborator

@kaituo kaituo Apr 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests for bwc is not enough. It just checks the conversion has no exception. We will need to do some e2e testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually tested that after starting both single and HCAD (both real time and historical) with current RCF and getting results, I can switch to RCF 3.0-rc2 and there is no issues, detectors continue to run the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After letting the detector run for longer I do get one issue, Failed to generate checkpoint, java.lang.IllegalStateException: There is discepancy in indices

@@ -1000,4 +1003,14 @@ public void testFromEntityModelCheckpointWithEntity() throws Exception {
}
return point;
}
// .parseReader(new FileReader(new File(getClass().getResource(labalFileName).toURI()), Charset.defaultCharset()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

@amitgalitz
Copy link
Member Author

Closing in favor of new PR that fetches RCF 3.0-rc1 instead of 3.0-rc2. There are some compatibility issue that need further investigation for AD transition to rc2.

@amitgalitz amitgalitz closed this Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra Changes to infrastructure, testing, CI/CD, pipelines, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Plugin contains checked in JARs with unclear provenance
3 participants