Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RM-34-upgrade-syntax-to-support-airflow-2-0 #193

Merged
merged 50 commits into from
Jan 20, 2023
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
79906fb
upgrade airflow version
ryantimjohn Jan 6, 2023
1ab1506
update airflow, add backport provide
ryantimjohn Jan 9, 2023
6e01812
remove first yield-based test
ryantimjohn Jan 9, 2023
182500f
Merge branch 'RM-34-upgrade-syntax-to-support-airflow-2-0' of https:/…
ryantimjohn Jan 9, 2023
9e9dd8f
remove sqlalchemy bounds
Jan 10, 2023
8e2e9cb
add back airflow providers amazon
Jan 10, 2023
f9846c5
add airflow providers google
Jan 10, 2023
aa93c01
update to new GoogleBaseHook syntax
ryantimjohn Jan 10, 2023
dd39762
update to base_aws hook syntax
ryantimjohn Jan 10, 2023
0e5da80
update patch to base_aws
ryantimjohn Jan 10, 2023
6c2944e
update to new BaseHook import location
ryantimjohn Jan 10, 2023
4ef3156
update to new BaseHook syntax
ryantimjohn Jan 10, 2023
4f257a3
update patches to support new BaseHook import
ryantimjohn Jan 10, 2023
aace5b2
fix typo
ryantimjohn Jan 10, 2023
3be0b11
update to modern pands datatypes
Jan 11, 2023
2a9a94e
RM-34 change integer nullable to parametrize
ryantimjohn Jan 11, 2023
3353f3a
RM-34 update decimal float test to parametrize
ryantimjohn Jan 11, 2023
400b3f4
RM-34 update dtype misc to parametrize
ryantimjohn Jan 11, 2023
a7e363e
RM-34 remove commented code
ryantimjohn Jan 11, 2023
5e81a66
RM-34 add autopep8 changes
ryantimjohn Jan 11, 2023
b80c9b2
ratchet coverage
ryantimjohn Jan 11, 2023
3d45149
RM-34 ratchet bigfiles high watermark
ryantimjohn Jan 11, 2023
3c3e344
RM-34 ignore override
Jan 12, 2023
f0a6bfd
RM-34 ignore signature incompatible gcp cred hook
Jan 12, 2023
b7162fa
RM-34 update to AwsBaseHook syntax
ryantimjohn Jan 12, 2023
4729be0
RM-34 update to AwsBaseHook syntax
ryantimjohn Jan 12, 2023
b0c3af6
RM-34 update to base_aws syntax
ryantimjohn Jan 12, 2023
642e47e
RM-34 revert patch update
ryantimjohn Jan 12, 2023
ce28cd0
RM-34 remove annotation-unchecked typecheck
Jan 12, 2023
772d2a1
RM-34 fix flake8 violations
ryantimjohn Jan 12, 2023
3d36761
RM-34 add noqa flag for Flake8 comment
ryantimjohn Jan 12, 2023
71212b5
RM-34 correct redshift dtypes
Jan 13, 2023
56dc630
RM-34 update to DOUBLE_PRECISION
Jan 13, 2023
e6c223a
RM-34 update expected_df redshift remove w/o timezone
Jan 13, 2023
863b422
RM-34 update postgres to DOUBLE_PRECISION
ryantimjohn Jan 14, 2023
69dcecb
RM-34 postgres singledb update expected column
ryantimjohn Jan 14, 2023
61accce
RM-34 update postgres expected df column types
ryantimjohn Jan 14, 2023
ee3a5ad
RM-34 match column widths exp column types
ryantimjohn Jan 14, 2023
e85f0d1
RM-34 Remove trailing whitespace expected colums
ryantimjohn Jan 14, 2023
3de1488
Merge branch 'master' into RM-34-upgrade-syntax-to-support-airflow-2-0
ryantimjohn Jan 14, 2023
95fbaad
RM-34 reapply minimum version to sqlalchemy
ryantimjohn Jan 17, 2023
292f4e1
RM-34 uncomment BQ tests
ryantimjohn Jan 18, 2023
cbc3cbe
RM-34 dump redshift
ryantimjohn Jan 18, 2023
fe71037
Merge branch 'master' into RM-34-upgrade-syntax-to-support-airflow-2-0
ryantimjohn Jan 19, 2023
443819f
RM-34 readd requires redshift
ryantimjohn Jan 19, 2023
f1f8472
RM-34 reconfigure import
ryantimjohn Jan 19, 2023
6ff71f2
RM-34 remove type ignore
ryantimjohn Jan 19, 2023
cb6bcca
RM-34 reduce mypy coverage
ryantimjohn Jan 19, 2023
efb0c0c
RM-34 fix autopep8 errors
ryantimjohn Jan 19, 2023
acea0ed
Merge branch 'master' into RM-34-upgrade-syntax-to-support-airflow-2-0
ryantimjohn Jan 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion metrics/bigfiles_high_water_mark
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1138
1131
2 changes: 1 addition & 1 deletion metrics/coverage_high_water_mark
Original file line number Diff line number Diff line change
@@ -1 +1 @@
93.0000
93.6400
6 changes: 3 additions & 3 deletions records_mover/airflow/hooks/google_cloud_credentials_hook.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
from airflow.providers.google.common.hooks.base_google import GoogleBaseHook
from typing import Iterable, Optional, TYPE_CHECKING
if TYPE_CHECKING:
# see the 'gsheets' extras_require option in setup.py - needed for this!
import google.auth.credentials # noqa


class GoogleCloudCredentialsHook(GoogleCloudBaseHook):
class GoogleCloudCredentialsHook(GoogleBaseHook):
def get_conn(self) -> 'google.auth.credentials.Credentials':
return self._get_credentials()

def scopes(self) -> Iterable[str]:
def scopes(self) -> Iterable[str]: # type: ignore
ryantimjohn marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried several things here, firstly looked at this:
https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides
And this:
https://airflow.apache.org/docs/apache-airflow-providers-google/6.8.0/_modules/airflow/providers/google/common/hooks/base_google.html#GoogleBaseHook.scopes

Tried to convert the return to a Sequence with no luck, then tried to make the syntax more similar to scopes in GoogleBaseHook to no avail.

Open to suggestions here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird - definitely looks like it's returning something Iterable, whats the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

records_mover/airflow/hooks/google_cloud_credentials_hook.py:12: error: Signature of "scopes" incompatible with supertype "GoogleBaseHook" [override]

scope: Optional[str] = self._get_field('scope', None)
scopes: Iterable[str]
if scope is not None:
Expand Down
6 changes: 3 additions & 3 deletions records_mover/airflow/hooks/records_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from records_mover.db.factory import db_driver
from records_mover.db import DBDriver
from records_mover.url.resolver import UrlResolver
from airflow.contrib.hooks.aws_hook import AwsHook
from airflow.providers.amazon.aws.hooks import base_aws
from typing import Optional, Union, List, TYPE_CHECKING
import sqlalchemy

Expand All @@ -14,7 +14,7 @@
from airflow.hooks import BaseHook
except ImportError:
# Required for Airflow 2.0
from airflow.hooks.base_hook import BaseHook # type: ignore
from airflow.hooks.base import BaseHook # type: ignore

if TYPE_CHECKING:
from boto3.session import ListObjectsResponseContentType, S3ClientTypeStub # noqa
Expand All @@ -41,7 +41,7 @@ def __init__(self,

def _get_boto3_session(self) -> boto3.session.Session:
if not self._boto3_session:
self._boto3_session = AwsHook(self.aws_conn_id).get_session()
self._boto3_session = base_aws(self.aws_conn_id).get_session()
return self._boto3_session

@property
Expand Down
2 changes: 1 addition & 1 deletion records_mover/airflow/hooks/sqlalchemy_db_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from airflow.hooks import BaseHook
except ImportError:
# Required for Airflow 2.0
from airflow.hooks.base_hook import BaseHook # type: ignore
from airflow.hooks.base import BaseHook # type: ignore
ryantimjohn marked this conversation as resolved.
Show resolved Hide resolved


class SqlAlchemyDbHook(BaseHook):
Expand Down
6 changes: 3 additions & 3 deletions records_mover/creds/creds_via_airflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@

class CredsViaAirflow(BaseCreds):
def boto3_session(self, aws_creds_name: str) -> 'boto3.session.Session':
from airflow.contrib.hooks.aws_hook import AwsHook
aws_hook = AwsHook(aws_creds_name)
from airflow.providers.amazon.aws.hooks import base_aws
aws_hook = base_aws(aws_creds_name)
return aws_hook.get_session()

def db_facts(self, db_creds_name: str) -> DBFacts:
from airflow.hooks import BaseHook
from airflow.hooks.base import BaseHook
conn = BaseHook.get_connection(db_creds_name)
out: DBFacts = {}

Expand Down
15 changes: 4 additions & 11 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,20 +137,13 @@ def initialize_options(self) -> None:
)

airflow_dependencies = [
# Minimum version here is needed to avoid syntax error in setup.py
# in 1.10.0
'apache-airflow>=1.10.1,<2'
'apache-airflow>=2',
ryantimjohn marked this conversation as resolved.
Show resolved Hide resolved
'apache-airflow-providers-amazon',
'apache-airflow-providers-google',
]

db_dependencies = [
# Lower bound (>=1.3.18) is to improve package resolution performance
#
# Upper bound (<1.4) is to avoid 1.4 which has breaking changes and is
# incompatible with python-bigquery-sqlalchemy per
# https://github.com/googleapis/python-bigquery-sqlalchemy/issues/83
# Can lift this once records-mover itself is compatible and
# other packages have appropriate restrictions in place.
'sqlalchemy>=1.3.18,<1.4',
'sqlalchemy',
ryantimjohn marked this conversation as resolved.
Show resolved Hide resolved
]

smart_open_dependencies = [
Expand Down
62 changes: 34 additions & 28 deletions tests/component/records/schema/field/test_dtype.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
)
import numpy as np
import pandas as pd
import pytest


def with_nullable(nullable: bool, fn):
Expand All @@ -31,29 +32,30 @@ def check_dtype(field_type, constraints, expectation):
assert out.dtype == expectation


def test_to_pandas_dtype_integer_no_nullable():
class Test_to_pandas_dtype_integer_no_nullable:
ryantimjohn marked this conversation as resolved.
Show resolved Hide resolved
expectations = {
(-100, 100): np.int8,
(0, 240): np.uint8,
(-10000, 10000): np.int16,
(500, 40000): np.uint16,
(-200000000, 200000000): np.int32,
(25, 4000000000): np.uint32,
(-9000000000000000000, 2000000000): np.int64,
(25, 10000000000000000000): np.uint64,
(-100, 100): pd.Int8Dtype(),
(0, 240): pd.UInt8Dtype(),
(-10000, 10000): pd.Int16Dtype(),
(500, 40000): pd.UInt16Dtype(),
(-200000000, 200000000): pd.Int32Dtype(),
(25, 4000000000): pd.UInt32Dtype(),
(-9000000000000000000, 2000000000): pd.Int64Dtype(),
(25, 10000000000000000000): pd.UInt64Dtype(),
(25, 1000000000000000000000000000): np.float128,
(None, None): np.int64,
(None, None): pd.Int64Dtype(),
}
for (min_, max_), expected_pandas_type in expectations.items():

@pytest.mark.parametrize("expectation", expectations.items())
def test_to_pandas_dtype_integer_no_nullable(self, expectation):
(min_, max_), expected_pandas_type = expectation
constraints = RecordsSchemaFieldIntegerConstraints(
required=True, unique=None, min_=min_, max_=max_
)
yield with_nullable(
False, check_dtype
), "integer", constraints, expected_pandas_type
with_nullable(False, check_dtype("integer", constraints, expected_pandas_type))


def test_to_pandas_dtype_integer_nullable():
class Test_to_pandas_dtype_integer_nullable:
ryantimjohn marked this conversation as resolved.
Show resolved Hide resolved
expectations = {
(-100, 100): pd.Int8Dtype(),
(0, 240): pd.UInt8Dtype(),
Expand All @@ -66,16 +68,17 @@ def test_to_pandas_dtype_integer_nullable():
(25, 1000000000000000000000000000): np.float128,
(None, None): pd.Int64Dtype(),
}
for (min_, max_), expected_pandas_type in expectations.items():

@pytest.mark.parametrize("expectation", expectations.items())
def test_to_pandas_dtype_integer_nullable(self, expectation):
(min_, max_), expected_pandas_type = expectation
constraints = RecordsSchemaFieldIntegerConstraints(
required=True, unique=None, min_=min_, max_=max_
)
yield with_nullable(
True, check_dtype
), "integer", constraints, expected_pandas_type
with_nullable(True, check_dtype("integer", constraints, expected_pandas_type))


def test_to_pandas_dtype_decimal_float():
class Test_to_pandas_dtype_decimal_float():
expectations = {
(8, 4): np.float16,
(20, 10): np.float32,
Expand All @@ -84,10 +87,10 @@ def test_to_pandas_dtype_decimal_float():
(500, 250): np.float128,
(None, None): np.float64,
}
for (
fp_total_bits,
fp_significand_bits,
), expected_pandas_type in expectations.items():

@pytest.mark.parametrize("expectation", expectations.items())
def test_to_pandas_dtype_Tdecimal_float(self, expectation):
(fp_total_bits, fp_significand_bits), expected_pandas_type = expectation
constraints = RecordsSchemaFieldDecimalConstraints(
required=False,
unique=None,
Expand All @@ -96,10 +99,10 @@ def test_to_pandas_dtype_decimal_float():
fp_total_bits=fp_total_bits,
fp_significand_bits=fp_significand_bits,
)
yield check_dtype, "decimal", constraints, expected_pandas_type
check_dtype("decimal", constraints, expected_pandas_type)


def test_to_pandas_dtype_misc():
class Test_to_pandas_dtype_misc():
expectations = {
"boolean": np.bool_,
"string": np.object_,
Expand All @@ -108,8 +111,11 @@ def test_to_pandas_dtype_misc():
"datetimetz": "datetime64[ns, UTC]",
"time": np.object_,
}
for field_type, expected_pandas_type in expectations.items():
yield check_dtype, field_type, None, expected_pandas_type

@pytest.mark.parametrize("expectation", expectations.items())
def test_to_pandas_dtype_misc(self, expectation):
field_type, expected_pandas_type = expectation
check_dtype(field_type, None, expected_pandas_type)


def test_to_pandas_dtype_fixed_precision_():
Expand Down
8 changes: 4 additions & 4 deletions tests/unit/airflow/hooks/test_records_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ def setUp(self):

@patch('records_mover.airflow.hooks.records_hook.UrlResolver')
@patch('records_mover.airflow.hooks.records_hook.Records')
@patch('records_mover.airflow.hooks.records_hook.AwsHook')
@patch('records_mover.airflow.hooks.records_hook.base_aws')
def test_get_conn(self,
mock_AwsHook,
mock_base_aws,
mock_Records,
mock_UrlResolver):
conn = self.records_hook.get_conn()
Expand All @@ -24,11 +24,11 @@ def test_get_conn(self,

@patch('records_mover.airflow.hooks.records_hook.UrlResolver')
@patch('records_mover.airflow.hooks.records_hook.Records')
@patch('records_mover.airflow.hooks.records_hook.AwsHook')
@patch('records_mover.airflow.hooks.records_hook.base_aws')
@patch('records_mover.airflow.hooks.records_hook.db_driver')
def test_get_conn_invalid_s3_url(self,
mock_db_driver,
mock_AwsHook,
mock_base_aws,
mock_Records,
mock_UrlResolver):
records_hook = RecordsHook(s3_temp_base_url='foo',
Expand Down
4 changes: 2 additions & 2 deletions tests/unit/airflow/test_google_cloud_credentials_hook.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
from airflow.providers.google.common.hooks.base_google import GoogleBaseHook
from records_mover.airflow.hooks.google_cloud_credentials_hook import GoogleCloudCredentialsHook
from mock import Mock
import unittest
Expand All @@ -7,7 +7,7 @@
class TestGoogleCloudCredentialsHook(unittest.TestCase):
def test_get_conn(self):
mock_init = Mock('__init__')
GoogleCloudBaseHook.__init__ = mock_init
GoogleBaseHook.__init__ = mock_init
mock_init.return_value = None
hook = GoogleCloudCredentialsHook()
mock_get_credentials = Mock('get_credentials')
Expand Down
6 changes: 3 additions & 3 deletions tests/unit/airflow/test_records_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@


class TestRecordsHook(unittest.TestCase):
@patch('records_mover.airflow.hooks.records_hook.AwsHook')
@patch('records_mover.airflow.hooks.records_hook.base_aws')
def test_validate_and_prepare_target_directory(self,
mock_AwsHook):
mock_base_aws):
target_url = 's3://bluelabs-fake-bucket'
mock_boto3_session = mock_AwsHook.return_value.get_session.return_value
mock_boto3_session = mock_base_aws.return_value.get_session.return_value
mock_s3 = mock_boto3_session.client.return_value
mock_s3.list_objects_v2.return_value.get.return_value =\
[{
Expand Down
12 changes: 6 additions & 6 deletions tests/unit/creds/test_creds_via_airflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ def setUp(self):
default_aws_creds_name=None,
default_gcp_creds_name=None)

@patch('airflow.contrib.hooks.aws_hook.AwsHook')
def test_boto3_session(self, mock_AwsHook):
@patch('airflow.providers.amazon.aws.hooks.base_aws')
def test_boto3_session(self, mock_base_aws):
mock_aws_creds_name = Mock(name='aws_creds_name')
out = self.creds_via_airflow.boto3_session(mock_aws_creds_name)
mock_AwsHook.assert_called_with(mock_aws_creds_name)
self.assertEqual(mock_AwsHook.return_value.get_session.return_value, out)
mock_base_aws.assert_called_with(mock_aws_creds_name)
self.assertEqual(mock_base_aws.return_value.get_session.return_value, out)

@patch('airflow.hooks.BaseHook')
@patch('airflow.hooks.base.BaseHook')
def test_db_facts_normcoredb(self, mock_BaseHook):
mock_db_creds_name = Mock(name='db_creds_name')
mock_conn = mock_BaseHook.get_connection.return_value
Expand All @@ -37,7 +37,7 @@ def test_db_facts_normcoredb(self, mock_BaseHook):
}
self.assertEqual(expected_db_facts, out)

@patch('airflow.hooks.BaseHook')
@patch('airflow.hooks.base.BaseHook')
def test_db_facts_bigquery_serviceaccount(self, mock_BaseHook):
mock_db_creds_name = Mock(name='db_creds_name')
mock_conn = mock_BaseHook.get_connection.return_value
Expand Down
4 changes: 2 additions & 2 deletions tests/unit/test_records_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ class TestRecordsHook(unittest.TestCase):
@patch('records_mover.airflow.hooks.records_hook.Records')
@patch('records_mover.airflow.hooks.records_hook.db_driver')
@patch('records_mover.airflow.hooks.records_hook.UrlResolver')
@patch('records_mover.airflow.hooks.records_hook.AwsHook')
@patch('records_mover.airflow.hooks.records_hook.base_aws')
def test_get_conn(self,
mock_AwsHook,
mock_base_aws,
mock_UrlResolver,
mock_db_driver,
mock_Records):
Expand Down
4 changes: 2 additions & 2 deletions types/stubs/airflow/contrib/hooks/aws_hook/__init__.pyi
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import boto3
from typing import Optional
from airflow.hooks import BaseHook
from airflow.hooks.base import BaseHook


# https://github.com/apache/airflow/blob/master/airflow/contrib/hooks/aws_hook.py
class AwsHook(BaseHook):
class base_aws(BaseHook):
def __init__(self, conn_id: str):
...

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ if TYPE_CHECKING:
import google.auth.credentials # noqa


class GoogleCloudBaseHook:
class GoogleBaseHook:
def __init__(self, gcp_conn_id: str):
...

Expand Down
2 changes: 1 addition & 1 deletion types/stubs/logging/config.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ class ConvertingDict(dict, ConvertingMixin):

class ConvertingList(list, ConvertingMixin):
def __getitem__(self, key: Any): ...
def pop(self, idx: int = ...): ...
def pop(self, idx: int = ...): ... # type: ignore[override]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types/stubs/logging/config.pyi:31: error: Argument 1 of "pop" is incompatible with supertype "list"; supertype defines the argument type as "SupportsIndex"  [override]
types/stubs/logging/config.pyi:31: note: This violates the Liskov substitution principle
types/stubs/logging/config.pyi:31: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides

I'm not actually sure we should ignore the override error here:
https://clifford.readthedocs.io/en/v1.3.0/_modules/typing.html
https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides

It seems to me that there's no reason we should be overriding pop here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe my search wasn't accurate but I can't find anywhere ConvertingList is even referenced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've come across things in stubs that are that way. Maybe it's a place Vince used to define interface without actually implementing it?



class ConvertingTuple(tuple, ConvertingMixin):
Expand Down