Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Basic Support for OpenSearch Ingestion controls from the Migration Console #621

Merged
merged 12 commits into from
May 1, 2024

Conversation

lewijacn
Copy link
Collaborator

@lewijacn lewijacn commented Apr 30, 2024

Description

This change introduces basic support for creating/starting/stopping an OpenSearch Ingestion pipeline.

  • It provides an experimental option for enabling OSI on the migration console. It does not prevent creating the OSI python tool on the Migration Console, but rather controls creating the IAM roles and policies related to OSI
  • It sets up necessary IAM policies for the Migration Console to assume as a 'Manager" of the OSI pipeline. It also creates a pipeline role that will be assumed by OSI to communicate with the source and target clusters
  • Adds the osiMigration.py script to the migration console for controlling OSI pipelines. This script supports starting/stopping a pipeline as well as creating a pipeline from provided command arguments or creating the pipeline from inferred settings based on the associated deployment.

See README update for details on executing. Once this is merged, I expect to copy and paste something similar to the wiki.

Issues Resolved

https://opensearch.atlassian.net/browse/MIGRATIONS-1659

Testing

Manual AWS testing on cloud

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Apr 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.88%. Comparing base (4994898) to head (30dc9a8).
Report is 6 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #621      +/-   ##
============================================
+ Coverage     75.77%   75.88%   +0.10%     
- Complexity     1533     1534       +1     
============================================
  Files           168      168              
  Lines          6395     6395              
  Branches        570      570              
============================================
+ Hits           4846     4853       +7     
+ Misses         1172     1164       -8     
- Partials        377      378       +1     
Flag Coverage Δ
unittests 75.88% <ø> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 22 to 30
SECRET_CONFIG_TEMPLATE = """
pipeline_configurations:
aws:
secrets:
source-secret-config:
secret_id: <SOURCE_SECRET_ID_PLACEHOLDER>
region: <AWS_REGION_PLACEHOLDER>
sts_role_arn: <STS_ROLE_ARN_PLACEHOLDER>
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three can be replaced done with standard string interpolation, because the variables are already filled above. This would ensure that changing the variable name only has to happen in one place, or if we move away from these placeholders (to env vars or something else), this doesn't have to be changed. (same applies to lines 35-39)

Suggested change
SECRET_CONFIG_TEMPLATE = """
pipeline_configurations:
aws:
secrets:
source-secret-config:
secret_id: <SOURCE_SECRET_ID_PLACEHOLDER>
region: <AWS_REGION_PLACEHOLDER>
sts_role_arn: <STS_ROLE_ARN_PLACEHOLDER>
"""
SECRET_CONFIG_TEMPLATE = f"""
pipeline_configurations:
aws:
secrets:
source-secret-config:
secret_id: {SOURCE_SECRET_ID_PLACEHOLDER}
region: {AWS_REGION_PLACEHOLDER}
sts_role_arn: {STS_ROLE_ARN_PLACEHOLDER}
"""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I slightly misunderstood how this was being used. I still think an f-string is a better way to do it, but will elaborate below

Comment on lines 236 to 240
secret_config = SECRET_CONFIG_TEMPLATE.replace(SOURCE_SECRET_ID_PLACEHOLDER, source_auth_secret)
secret_config = secret_config.replace(STS_ROLE_ARN_PLACEHOLDER, source_pipeline_role_arn)
secret_config = secret_config.replace(AWS_REGION_PLACEHOLDER, aws_region)
pipeline_config = pipeline_config.replace(AWS_SECRET_CONFIG_PLACEHOLDER, secret_config)
pipeline_config = pipeline_config.replace(SOURCE_AUTH_OPTIONS_PLACEHOLDER, SOURCE_BASIC_AUTH_CONFIG_TEMPLATE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm not sure I understand why you went with this construction method and all the placeholders, over e.g. a generate_secret_config function that takes these variables and returns a secret config string (likely using f-strings internally, but could be whatever you want). Am I missing a reason why that wouldn't work?

I think I would basically make it look something like this:

def generate_secret_config(source_auth_secret, source_pipeline_role_arn, aws_region):
	return f"""
pipeline_configurations:
  aws:
    secrets:
      source-secret-config:
        secret_id: {source_auth_secret}
        region: {aws_region}
        sts_role_arn: {source_pipeline_role}
"""

And then something pretty similar for pipeline config. It looks much more straightforward and testable to me, but maybe I'm missing something.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, if you want to "declare" the config/python templates somewhere outside of a function, you can do this sort of thing:

SECRET_CONFIG_FRAGMENT = """
    secrets:
      source-secret-config:
        secret_id: {source_auth_secret}
        region: {aws_region}
        sts_role_arn: {source_pipeline_role}
"""
def generate_secret_config(**kwargs):
	return SECRET_CONFIG_FRAGEMENT.format(**kwargs) # or make a dictionary with the named function params

But either of these seems like a much more canonical python way to do what you're doing "manually"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out and the examples, this went through a few iterations and ended up in a usable but unsavory state. I've refactored this a good bit to not use any placeholders internally but have left placeholders in the actual template file. My assumption is that users may want to add OSI specific settings to the template, and this seems to be the easiest way to allow users to optionally add to the template. As we actually use this though, we may find that this isn't the case and users are better off if we just completely generate the pipeline template file ourselves.

logger = logging.getLogger(__name__)
DEFAULT_PIPELINE_NAME = "migration-assistant-pipeline"
DEFAULT_PIPELINE_CONFIG_PATH = "./osiPipelineTemplate.yaml"
AWS_SECRET_CONFIG_PLACEHOLDER = "<AWS_SECRET_CONFIG_PLACEHOLDER>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor note, but if we stick with the PLACEHOLDER thing, can you add a blank line to create a distinction between the "real" variables whose value likely matters to the user, and the placeholders that are less relevant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment on lines +276 to +278
cw_log_options = {
'IsLoggingEnabled': False
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: right now we won't get any logs from the pipeline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, from my testing there was issue with OSI both to auto-create log group (didn't seem possible with error), and to provide an existing log-group (was looking for a very specific name, and when matched still produced issue). Need to get these resolved, but that's the current state

# Documentation lists this as a requirement but does not seem to be the case from testing:
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/osis/client/create_pipeline.html
# BufferOptions={
# 'PersistentBufferEnabled': False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which value does it default to? Do we care which one it actually is?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want false since we are a pull based source, but I experienced an issue with deployment essentially saying BufferOptions shouldn't exist, yet its a requirement on the API doc 🤷‍♂️

@@ -34,6 +38,7 @@ RUN chmod ug+x /root/humanReadableLogs.py
RUN chmod ug+x /root/simpleDocumentGenerator.py
RUN chmod ug+x /root/catIndices.sh
RUN chmod ug+x /root/showFetchMigrationCommand.sh
RUN chmod ug+x /root/osiMigration.py
RUN chmod ug+x /root/kafka-tools/kafkaExport.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, we should probably start doing all these chmods in one line so we create fewer layers (and maybe the copies too), but I don't think that's an issue for this moment

@mikaylathompson
Copy link
Collaborator

I have some questions about the python implementation, but otherwise LTGM

lewijacn added 2 commits May 1, 2024 16:15
Signed-off-by: Tanner Lewis <[email protected]>
Signed-off-by: Tanner Lewis <[email protected]>
@lewijacn lewijacn merged commit 4beba00 into opensearch-project:main May 1, 2024
9 checks passed
@lewijacn lewijacn deleted the add-osi-migration-console branch October 10, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants