Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): snowflake using oauth #4647

Merged

Conversation

saxo-lalrishav
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions
Copy link

github-actions bot commented Apr 12, 2022

Unit Test Results (build & test)

334 tests  ±0   334 ✔️ ±0   3m 8s ⏱️ -7s
  78 suites ±0       0 💤 ±0 
  78 files   ±0       0 ±0 

Results for commit 507de7d. ± Comparison against base commit e62c647.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Apr 12, 2022

Unit Test Results (metadata ingestion)

       5 files  ±  0         5 suites  ±0   1h 41m 52s ⏱️ + 6m 38s
   554 tests +  3     551 ✔️ +  3    3 💤 ±0  0 ±0 
2 547 runs  +15  2 474 ✔️ +15  73 💤 ±0  0 ±0 

Results for commit 507de7d. ± Comparison against base commit e62c647.

♻️ This comment has been updated with latest results.

@maggiehays
Copy link
Collaborator

Hi @anshbansal please pick up review on this one -- let's see if we can replicate on our Snowflake instance

1 similar comment
@maggiehays
Copy link
Collaborator

Hi @anshbansal please pick up review on this one -- let's see if we can replicate on our Snowflake instance

@anshbansal
Copy link
Collaborator

anshbansal commented Apr 13, 2022

@saxo-lalrishav Few comments

I'll try and see see if I can validate this PR on my local after that.

Copy link
Collaborator

@anshbansal anshbansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs changes. Please also add docs.

import logging
from typing import Any, Dict, List, Union

import msal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this dependency in setup.py? If we directly depending on it can we please add it as a direct dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already mentioned here -

microsoft_common = {"msal==1.16.0"}

Copy link
Collaborator

@anshbansal anshbansal May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not added in snowflake only powerbi. This might cause snowflake connector to fail for people not using powerbi

    "powerbi": {"orderedset"} | microsoft_common,

return self._get_token(CLIENT_CREDENTIAL, scope, check_cache)

def get_token_with_secret(
self, secret: str, scope: Union[str, List[str]], check_cache: bool = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please change scope to be List[str] only? And make the corresponding changes? I don't see why we need to have it a Union of both. Wherever we are passing we can make it List[str].

scopes might represent that better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return token

def get_public_certificate_thumbprint(self, public_cert_str: str) -> str:
cert_str = public_cert_str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this get_public_certificate_thumbprint function is used to fetch thumbprinbt of the certificate, which is later used by msal library to generate the toke is use_certificate is true

@@ -69,6 +73,13 @@ class BaseSnowflakeConfig(BaseTimeWindowConfig):

scheme: str = "snowflake"
username: Optional[str] = None
use_certificate: Optional[bool] = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extract all these oauth settings in their own object? Currently it is hard to know what is required. 3 of them are not optional but this makes it hard to see whether they are optional or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if values.get("base64_encoded_oauth_private_key") is None:
raise ValueError(
f"'base64_encoded_oauth_private_key' was none "
f"but should be set when using {v} authentication"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message needs to be changed and add it is needed when use_certificate is true

if values.get("base64_encoded_oauth_public_key") is None:
raise ValueError(
f"'base64_encoded_oauth_public_key' was none"
f"but should be set when using {v} authentication"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message needs to be changed and add it is needed when use_certificate is true

if values.get("oauth_client_secret") is None:
raise ValueError(
f"'oauth_client_secret' was none "
f"but should be set when using {v} authentication"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message needs to be changed and add it is needed when use_certificate is true

@anshbansal anshbansal changed the title snowflake using oauth feat(ingest): snowflake using oauth Apr 26, 2022
@anshbansal
Copy link
Collaborator

@saxo-lalrishav Is this something that you are still working on?

@saxo-lalrishav
Copy link
Contributor Author

@anshbansal pls have a look into this and provide your feedback

@anshbansal
Copy link
Collaborator

#4647 (comment) This comment is remaining
Please update description from "oauth configuration" to add the snowflake doc where the details are present.
Please merge latest master so that linting passes.
And please add an example configuration at least in the PR so we know what the whole config looks like for snowflake for microsoft. A better thing to do would be to add unit tests for the big validator that you have added.

@anshbansal
Copy link
Collaborator

Hi @saxo-lalrishav Are you still working on this PR? We can merge it after the changes requested in last comment.

@saxo-lalrishav
Copy link
Contributor Author

Hi @saxo-lalrishav Are you still working on this PR? We can merge it after the changes requested in last comment.

Yeah, looking into this

@anshbansal
Copy link
Collaborator

For the lint failures suggest to re-create your local venv from scratch so that you have the latest libraries.

SnowflakeConfig.parse_obj(
{
"authentication_type": "OAUTH_AUTHENTICATOR",
"oauth_config":
Copy link
Collaborator

@anshbansal anshbansal May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a python dictionary. This is not a yaml file. Same for all the tests added. So like

 "oauth_config": {
     "provider": "microsoft",
     "scopes": "[https://microsoft.com/f4b353d5-ef8d/.default]",
     "client_secret": "6Hb9apkbc6HD7",
     "authority_url": "https://login.microsoftonline.com/yourorganisation.com",
}

This is the reason linting is failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh sorry , i got it, fixing it

Copy link
Collaborator

@anshbansal anshbansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bug caught in the unit tests. Are you running these unit tests on local while testing? Please see https://datahubproject.io/docs/metadata-ingestion/developing/#sanity-check-code-before-committing.

>               if values.get("oauth_config").provider is None:
E               AttributeError: 'NoneType' object has no attribute 'provider'

src/datahub/ingestion/source_config/sql/snowflake.py:192: AttributeError
_ test_snwoflake_throws_error_on_encoded_oauth_private_key_missing_if_use_certificate_is_true _

    def test_snwoflake_throws_error_on_encoded_oauth_private_key_missing_if_use_certificate_is_true():
        with pytest.raises(ConfigurationError):
            SnowflakeConfig.parse_obj(
                {
                    "authentication_type": "OAUTH_AUTHENTICATOR",
                    "oauth_config": {
                        "client_id": "882e9831-7ea51cb2b954",
                        "provider": "microsoft",
                        "scopes": "[https://microsoft.com/f4b353d5-ef8d/.default]",
                        "use_certificate": True,
                        "authority_url": "https://login.microsoftonline.com/yourorganisation.com",
>                       "encoded_oauth_public_key": "fkdsfhkshfkjsdfiuwrwfkjhsfskfhksjf==",
                    },
                }
            )

Copy link
Collaborator

@anshbansal anshbansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anshbansal anshbansal merged commit 4f82e29 into datahub-project:master Jun 6, 2022
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants