Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): support for env variable in cli #3215

Merged

Conversation

aseembansal-gogo
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

Copy link
Contributor

@gabe-lyons gabe-lyons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @aseembansal-gogo - the good news is env variables are already supported for gms token via gms_token.format(**os.environ):

https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/cli/cli_utils.py#L77

See: https://stackoverflow.com/questions/386934/how-to-evaluate-environment-variables-into-a-string-in-python

If you'd like to also access GMS_HOST via env variable, you could extend the logic to where the gms host is parsed as well.

@aseembansal-gogo
Copy link
Contributor Author

@gabe-lyons Did not notice that.

The approach I have taken with env variable is similar to what AWS cli uses. ENV variables are usually used when people don't want to use config files.

The problem with current approach seems to be that it requires interaction and a config file. When using cli in an automation tool (e.g. on jenkins) we cannot keep on interactively entering the config again and again. We create K8s pods to run each of our jenkins job and then the pods are killed. The config file is never persisted. The current approach seems to be assuming persistence of the config. Can you please suggest if I do not want to interactively enter the details then how would I use the gms_token env variable in the current approach?

Copy link
Contributor

@gabe-lyons gabe-lyons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that makes sense!

return None, None


def get_details_from_env() -> Tuple[Optional[str], Optional[str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this would be simpler to model as a Dict[str, str] being returned with keys being GMS_HOST and GMS_TOKEN .. then you don't need the first-non-null checks later on.. you can just pick the value of these variables from the Dict directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still need to use the Optional[str] in Dict because it is quite possible one of the variables was set and not the other. We cannot assume user will either provide both of the variables or none. The first non-null check is for that purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally not a fan of using Tuples .. because there is implied order here (first element is host, second element is token) and can lead to bugs later on if someone misunderstands the order ... with a dictionary like "host" -> value, and "token" -> value, it is harder to make those mistakes.

But I see that the rest of the code in this file has similar things. So I'll let this in and we can do a pass over these later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the implied order. Had not considered from that point.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shirshanka shirshanka merged commit 97bed71 into datahub-project:master Sep 17, 2021
@aseembansal-gogo aseembansal-gogo deleted the cli_env_var_support branch September 17, 2021 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants