Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing MetricsLogger and OpenSearch analytics cluster #515

Conversation

gregschohn
Copy link
Collaborator

@gregschohn gregschohn commented Feb 20, 2024

Description

Kill the OpenSearch analytics domain and its bastion host that came along with the otel-collector, leaving only the collector (which will shoot metrics and traces to AWS).
I've also updated the README and have made the cdk.context.json set otelCollectorEnabled = true, though maybe the default should be true & this should be opt-out.
All mentions of MetricsLogger have also been removed.

I've also updated the Kafka broker for CDK container deployment and docker-compose to 3.8 and now use the new Apache Kafka image (which has recently come into being). MSK still uses 3.6, which is the latest that MSK supports at present. We no longer rely upon zookeeper for Kafka containers. Instead, KRaft is enabled, making the deployments a bit simpler and giving us one less thing to debug. Some other Kafka related stuff (like volumes) were also cleaned up to simplify the developer experience. The aws otel collector was also upgraded to 0.38.0.

  • Category Maintenance, as Otel metrics and Traces via contexts replace the old MetricsLogger stuff.
  • Why these changes are required? The OpenSearch analytics domain and the metrics that it was tracking are obviated by the cloudwatch metrics that the otel-collector is now sending out.
  • What is the old behavior before changes and new behavior after changes? CDK will no longer deploy an extra domain and bastion.

Issues Resolved

https://opensearch.atlassian.net/browse/MIGRATIONS-1475

Testing

Deployed to my AWS account with this cdk.context.json

{
  "demo-deploy": {
    "stage": "dev",
    "openAccessPolicyEnabled": true,
    "domainRemovalPolicy": "DESTROY",
    "enableDemoAdmin": true,
    "trafficReplayerEnableClusterFGACAuth": true,
    "captureProxyESServiceEnabled": true,
    "fetchMigrationEnabled": true,
    "otelCollectorEnabled": true,
    "kafkaBrokerServiceEnabled": true,
    "elasticsearchServiceEnabled": true,
    "sourceClusterEndpoint": "https://capture-proxy-es.migration.dev.local:19200",
    "osContainerServiceEnabled": true
  },
  "availability-zones:account=897272129541:region=us-east-1": [
    "us-east-1a",
    "us-east-1b",
    "us-east-1c",
    "us-east-1d",
    "us-east-1e",
    "us-east-1f"
  ],
  "availability-zones:account=897272129541:region=us-east-2": [
    "us-east-2a",
    "us-east-2b",
    "us-east-2c"
  ]
}

I ran cdk --c contextId=demo-deploy --require-approval never --concurrency 3 deploy \*

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@AndreKurait
Copy link
Member

is something weird with the commit history here? All my commits are showing up as unverified

@gregschohn gregschohn changed the title Removing open search analytics cluster Removing MetricsLogger and OpenSearch analytics cluster Feb 20, 2024
…main.

Strip opensearch analytics from the docker solution and the CDK.
Use the AWS Distro for Otel's collector as the base image for our otel collector instead of the custom built one.
Kill the OpenSearch analytics domain and its bastion host that came along with the otel-collector, leaving only the collector (which will shoot metrics and traces to AWS).
Updated the README and have made the cdk.context.json set otelCollectorEnabled = true, though maybe the default should be true & this should be opt-out.

Signed-off-by: Greg Schohn <[email protected]>
@gregschohn gregschohn force-pushed the RemovingOpenSearchAnalyticsCluster branch from c799e72 to 6850984 Compare February 21, 2024 14:14
@@ -1,14 +1,4 @@
FROM public.ecr.aws/a0w2c5q7/otelcol-with-opensearch:amd-latest
#FROM public.ecr.aws/aws-observability/aws-otel-collector:v0.37.0
FROM public.ecr.aws/aws-observability/aws-otel-collector:v0.37.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for this

Copy link

codecov bot commented Feb 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.48%. Comparing base (7eaf632) to head (d28dd2e).
Report is 2 commits behind head on main.

❗ Current head d28dd2e differs from pull request most recent head df70719. Consider uploading reports for the commit df70719 to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #515      +/-   ##
============================================
- Coverage     76.88%   76.48%   -0.41%     
+ Complexity     1386     1370      -16     
============================================
  Files           158      154       -4     
  Lines          6105     5923     -182     
  Branches        532      532              
============================================
- Hits           4694     4530     -164     
+ Misses         1059     1041      -18     
  Partials        352      352              
Flag Coverage Δ
unittests 76.48% <100.00%> (-0.41%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@lewijacn lewijacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up! A few additional minor things we should do:

  • Remove the analytics options from the default-values.json
  • Add the otel collector enabled option to the default-values.json
  • Remove the accessAnalyticsDashboard.sh file

@@ -11,6 +11,7 @@
"trafficReplayerEnableClusterFGACAuth": true,
"captureProxyESServiceEnabled": true,
"fetchMigrationEnabled": true,
"otelCollectorEnabled": true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mentioned elsewhere, but agree that I would prefer this in the default-values.json

@@ -179,22 +179,7 @@ export class StackComposer {
const sourceClusterEndpoint = this.getContextForType('sourceClusterEndpoint', 'string', defaultValues, contextJSON)
const osContainerServiceEnabled = this.getContextForType('osContainerServiceEnabled', 'boolean', defaultValues, contextJSON)

const migrationAnalyticsServiceEnabled = this.getContextForType('migrationAnalyticsServiceEnabled', 'boolean', defaultValues, contextJSON)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an additional section in the options.md file with these options, can you remove that specific section as well

}

const stacks = createStackComposer(contextOptions)

const services = [CaptureProxyESStack, CaptureProxyStack, ElasticsearchStack, MigrationConsoleStack,
TrafficReplayerStack, KafkaBrokerStack, KafkaZookeeperStack, MigrationAnalyticsStack]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should replace with OtelCollectorStack right?

@@ -15,7 +15,7 @@ prepare_source_nodes_for_capture () {
# Substitute @ to be used instead of ',' for cases where ',' would disrupt formatting of arguments, i.e. AWS SSM commands
kafka_brokers=$(echo "$kafka_brokers" | tr ',' '@')
kafka_sg_id=$(aws ssm get-parameter --name "/migration/$deploy_stage/default/trafficStreamSourceAccessSecurityGroupId" --query 'Parameter.Value' --output text)
otel_sg_id=$(aws ssm get-parameter --name "/migration/$deploy_stage/default/analyticsDomainSGId" --query 'Parameter.Value' --output text)
otel_sg_id=$(aws ssm get-parameter --name "/migration/$deploy_stage/default/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this line got cut off from what I can tell

… and set the default to deploy the otel-collector service.

Signed-off-by: Greg Schohn <[email protected]>
# Conflicts:
#	TrafficCapture/dockerSolution/src/main/docker/docker-compose.yml
#	deployment/cdk/opensearch-service-migration/lib/service-stacks/capture-proxy-es-stack.ts
#	deployment/cdk/opensearch-service-migration/lib/service-stacks/migration-console-stack.ts
#	deployment/cdk/opensearch-service-migration/lib/service-stacks/migration-otel-collector-stack.ts
#	deployment/cdk/opensearch-service-migration/lib/stack-composer.ts
…ions.

Normalize the opensearch container deployment to look more like the managed domain so that the migration console and traffic replayer can treat them the same (and work).
Make the opensearch container that gets automatically setup as a single container use the demo admin.  Nobody should be deploying a single opensearch node for production anyway and this is one more way to simplify the developer experience.

Signed-off-by: Greg Schohn <[email protected]>
… apache/kafka:3.7.0 (from bitnami).

Environment variables were changed to support the new election-leader scheme that KRaft has and to match the new container.  The environment variables deployed by compose and for the experimental CDK both use variables recommended by the repo where the Kafka image resides.

Signed-off-by: Greg Schohn <[email protected]>
…ful given what we now know about testing/experimenting with Kafka.

Signed-off-by: Greg Schohn <[email protected]>
@@ -13,6 +13,7 @@ export interface CaptureProxyESProps extends StackPropsExt {
readonly vpc: IVpc,
readonly streamingSourceType: StreamingSourceType,
readonly otelCollectorEnabled: boolean,
readonly fargateCpuArch: CpuArchitecture,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is odd that I am seeing this, did you remove and then add back?

depends_on:
- zookeeper
# see https://github.com/apache/kafka/blob/3.7/docker/examples/jvm/single-node/plaintext/docker-compose.yml
KAFKA_NODE_ID: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of settings we've not used before here, hopefully this will give us a better of how Kafka is working. I do have a slight reservation whether we will always be able to have the same functionality in MSK but I don't have an example of this yet

constructor(scope: Construct, id: string, props: OpenSearchContainerProps) {
super(scope, id, props)

const deployId = props.addOnMigrationDeployId ? props.addOnMigrationDeployId : props.defaultDeployId
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't passed in an addOnMigrationDeployId prop to this class, but it would make sense to do so if we want to allow this stack to be used for multiple replay scenarios

Signed-off-by: Greg Schohn <[email protected]>
@@ -351,6 +351,7 @@ export class StackComposer {
stage: stage,
defaultDeployId: defaultDeployId,
fargateCpuArch: fargateCpuArch,
addOnMigrationDeployId: addOnMigrationDeployId,
enableDemoAdmin: true,
...props,
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor things to avoid conflicts here:

  1. Can you change L347 to opensearch-container-${deployId}
  2. Can you change L349 to use ${stage}-${region}-${deployId} instead of just stage and region

Signed-off-by: Greg Schohn <[email protected]>

# Conflicts:
#	deployment/cdk/opensearch-service-migration/lib/opensearch-domain-stack.ts
@gregschohn gregschohn merged commit dfc1be0 into opensearch-project:main Mar 7, 2024
4 of 5 checks passed
@gregschohn gregschohn deleted the RemovingOpenSearchAnalyticsCluster branch April 3, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants