Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDL-14058: Update SDK and api version #105

Merged
merged 41 commits into from
Feb 10, 2022
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
bfb9aaf
Updated SDK and API version
savan-chovatiya Dec 17, 2021
e943539
Added expand parameter for more streams
savan-chovatiya Dec 17, 2021
4f8540e
Updated replication key for invoices
savan-chovatiya Dec 17, 2021
9f7f9bb
added backward compability for invoice bookmark
savan-chovatiya Dec 20, 2021
5746fc9
Added unit test and code coverage report in circleci
savan-chovatiya Dec 22, 2021
0f84a64
Resolved unit test failure
savan-chovatiya Dec 22, 2021
9f56ba2
Updated stripe SDK to latest version
savan-chovatiya Dec 22, 2021
f23a0c3
Running all test for testing
savan-chovatiya Dec 27, 2021
ab4af98
updated stripe version in config.yml
savan-chovatiya Dec 27, 2021
566afcc
Running all test for testing
savan-chovatiya Dec 27, 2021
dc137c7
Running all test for testing
savan-chovatiya Dec 27, 2021
23b02f0
Running all test for testing
savan-chovatiya Dec 28, 2021
016c0c4
Updated base class of tap_tester to handle replication key change of …
savan-chovatiya Dec 28, 2021
cd40d52
Merge branch 'master' of https://github.com/singer-io/tap-stripe into…
savan-chovatiya Dec 29, 2021
a284115
Reverted back config.yml to run dependant tests
savan-chovatiya Dec 29, 2021
e797ca9
Reverted back config.yml to run dependant tests
savan-chovatiya Dec 29, 2021
d676147
Updated all_field test case
prijendev Dec 31, 2021
f005a9e
Added back payout stream
prijendev Dec 31, 2021
84b4095
Added comments in test case
prijendev Dec 31, 2021
d111788
Fixed key in bookmark for invoice_line_items
savan-chovatiya Dec 31, 2021
befe058
Fixed key in bookmark for invoice_line_items in get_bookmark
savan-chovatiya Dec 31, 2021
98d9d68
Fixed charge stream for all_field test case
prijendev Jan 3, 2022
6895c02
Fixed payouts stream for all_field test case
prijendev Jan 3, 2022
6d4dfcc
Updated amount value
prijendev Jan 3, 2022
260109b
Resolved circleci error
prijendev Jan 3, 2022
8789910
Fiexed all_field test case…
prijendev Jan 4, 2022
5b4d8f6
Fix the build_daily workflow definition
prijendev Jan 6, 2022
ad1e0ee
Resolved review comment
savan-chovatiya Jan 18, 2022
6f6cd9f
Updated all_field test case
prijendev Jan 18, 2022
098032b
Updated test case for customer and invoice_line_item stream
prijendev Jan 19, 2022
7419e7a
Updated util.py
prijendev Jan 19, 2022
a0a3924
Removed fields from missing fields
prijendev Jan 19, 2022
1223197
fixed subscription_items stream
namrata270998 Jan 19, 2022
f6b96fc
Resolved review comment and added function for get and write bookmark
savan-chovatiya Jan 20, 2022
44abdf8
Updated code comment
savan-chovatiya Jan 20, 2022
95a37d9
Tdl 5894 update invoices stream (#115)
namrata270998 Feb 10, 2022
e2a6fbc
Tdl 6026 Modify 'customer' stream on 'test_all_fields.py' to pass (#109)
prijendev Feb 10, 2022
f26ab5c
Tdl 6597 Modify 'product' stream on 'test_all_fields.py' to pass (#108)
prijendev Feb 10, 2022
23c49c8
TDL-6587 Modify 'invoice_line_items' stream on 'test_all_fields.py' t…
prijendev Feb 10, 2022
2ce41eb
TDL-5992 modify subscription items stream (#114)
namrata270998 Feb 10, 2022
f013574
TDL-17429 Revert back tiers field datatype conversion (#117)
prijendev Feb 10, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ jobs:
name: 'Unit Tests'
command: |
source /usr/local/share/virtualenvs/tap-stripe/bin/activate
nosetests tests/unittests
nosetests --with-coverage --cover-erase --cover-package=tap_stripe --cover-html-dir=htmlcov tests/unittests
coverage html
run_integration_test:
parameters:
file:
Expand All @@ -91,7 +92,7 @@ jobs:
source /usr/local/share/virtualenvs/tap-stripe/bin/activate
source /usr/local/share/virtualenvs/tap-tester/bin/activate
source /usr/local/share/virtualenvs/dev_env.sh
pip install 'stripe==2.42.0'
pip install 'stripe==2.64.0'
run-test --tap=${CIRCLE_PROJECT_REPONAME} tests/test_<< parameters.file >>.py
- slack/notify-on-failure:
only_for_branches: master
Expand Down
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,13 @@
py_modules=["tap_stripe"],
install_requires=[
"singer-python==5.5.1",
"stripe==2.10.1",
"stripe==2.64.0",
],
extras_require={
'test': [
'pylint==2.7.2',
'nose==1.3.7'
'nose==1.3.7',
'coverage'
],
'dev': [
'ipdb',
Expand Down
82 changes: 65 additions & 17 deletions tap_stripe/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
'events': 'created',
'customers': 'created',
'plans': 'created',
'invoices': 'date',
'invoices': 'created',
'invoice_items': 'date',
'transfers': 'created',
'coupons': 'created',
Expand Down Expand Up @@ -80,6 +80,17 @@
# payouts - these are called transfers with an event type of payout.*
}

# Some fields are not available by default with latest API version so
# retrive it by passing expand paramater in SDK object

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be 'retrieve'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed typo.

STREAM_TO_EXPAND_FIELDS = {
'customers': ['data.sources', 'data.subscriptions'],
'plans': ['data.tiers'],
'invoice_items': ['data.plan.tiers'],
'invoice_line_items': ['data.plan.tiers'],
'subscriptions': ['data.plan.tiers'],
'subscription_items': ['data.plan.tiers']
}

SUB_STREAMS = {
'subscriptions': 'subscription_items',
'invoices': 'invoice_line_items',
Expand Down Expand Up @@ -162,7 +173,7 @@ def configure_stripe_client():
# https://github.com/stripe/stripe-python/tree/a9a8d754b73ad47bdece6ac4b4850822fa19db4e#usage
stripe.api_key = Context.config.get('client_secret')
# Override the Stripe API Version for consistent access
stripe.api_version = '2018-09-24'
stripe.api_version = '2020-08-27'
# Allow ourselves to retry retriable network errors 5 times
# https://github.com/stripe/stripe-python/tree/a9a8d754b73ad47bdece6ac4b4850822fa19db4e#configuring-automatic-retries
stripe.max_network_retries = 15
Expand All @@ -177,7 +188,7 @@ def configure_stripe_client():
account = stripe.Account.retrieve(Context.config.get('account_id'))
msg = "Successfully connected to Stripe Account with display name" \
+ " `%s`"
LOGGER.info(msg, account.display_name)
LOGGER.info(msg, account.settings.dashboard.display_name)

def unwrap_data_objects(rec):
"""
Expand Down Expand Up @@ -371,10 +382,11 @@ def reduce_foreign_keys(rec, stream_name):
return rec


def paginate(sdk_obj, filter_key, start_date, end_date, limit=100):
def paginate(sdk_obj, filter_key, start_date, end_date, stream_name, limit=100):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain why you have to include the stream_name as a new parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expand parameter is passed in SDK request in the function and the required values are in the STREAM_TO_EXPAND_FIELDS map. So stream_name is used to retrieve expand's a value from that map.

# Some fields are not available by default with latest API version so
# retrieve it by passing expand paramater in SDK object
expand=STREAM_TO_EXPAND_FIELDS.get(stream_name, []),

yield from sdk_obj.list(
limit=limit,
stripe_account=Context.config.get('account_id'),
expand=STREAM_TO_EXPAND_FIELDS.get(stream_name, []),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain this code change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added code comment for the change.

# None passed to starting_after appears to retrieve
# all of them so this should always be safe.
**{filter_key + "[gte]": start_date,
Expand All @@ -391,6 +403,7 @@ def epoch_to_dt(epoch_ts):
return datetime.fromtimestamp(epoch_ts)

# pylint: disable=too-many-locals
# pylint: disable=too-many-statements
def sync_stream(stream_name):
"""
Sync each stream, looking for newly created records. Updates are captured by events stream.
Expand All @@ -404,8 +417,16 @@ def sync_stream(stream_name):
replication_key = metadata.get(stream_metadata, (), 'valid-replication-keys')[0]
# Invoice Items bookmarks on `date`, but queries on `created`
filter_key = 'created' if stream_name == 'invoice_items' else replication_key
stream_bookmark = singer.get_bookmark(Context.state, stream_name, replication_key) or \
int(utils.strptime_to_utc(Context.config["start_date"]).timestamp())

# Invoice was bookmarking on `date` but in latest API version, that field is deprecated and replication key changed to `created`
# kept `date` in bookmarking as it as to respect bookmark of active connection too

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# kept `date` in bookmarking as it as to respect bookmark of active connection too
# kept `date` in bookmarking as it has to respect bookmark of active connection too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed typo.

if stream_name == 'invoices':
stream_bookmark = singer.get_bookmark(Context.state, stream_name, 'date') or \
int(utils.strptime_to_utc(Context.config["start_date"]).timestamp())
else:
stream_bookmark = singer.get_bookmark(Context.state, stream_name, replication_key) or \
int(utils.strptime_to_utc(Context.config["start_date"]).timestamp())

bookmark = stream_bookmark

# if this stream has a sub_stream, compare the bookmark
Expand All @@ -414,8 +435,16 @@ def sync_stream(stream_name):
# If there is a sub-stream and its selected, get its bookmark (or the start date if no bookmark)
should_sync_sub_stream = sub_stream_name and Context.is_selected(sub_stream_name)
if should_sync_sub_stream:
sub_stream_bookmark = singer.get_bookmark(Context.state, sub_stream_name, replication_key) \
or int(utils.strptime_to_utc(Context.config["start_date"]).timestamp())

# Invoices's replication key changed from `date` to `created` in latest API version.
# Invoice line Items write bookmark with Invoice's replication key but it changed to `created`
# so kept `date` in bookmarking as it as to respect bookmark of active connection too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# so kept `date` in bookmarking as it as to respect bookmark of active connection too.
# so kept `date` in bookmarking as it has to respect bookmark of active connection too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed typo.

if sub_stream_name == "invoice_line_items":
sub_stream_bookmark = singer.get_bookmark(Context.state, sub_stream_name, 'date') \
or int(utils.strptime_to_utc(Context.config["start_date"]).timestamp())
else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment it is mentioned as Created, but in the code it is start_date. Looks the similar code is replicated. Can we write a function which accept these as parameters and set the appropriate bookmark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added get_bookmark_for_stream() function to retrive bookmark and similarly added write_bookmark_for_stream() for writing bookmark.

sub_stream_bookmark = singer.get_bookmark(Context.state, sub_stream_name, replication_key) \
or int(utils.strptime_to_utc(Context.config["start_date"]).timestamp())

# if there is a sub stream, set bookmark to sub stream's bookmark
# since we know it must be earlier than the stream's bookmark
Expand Down Expand Up @@ -454,7 +483,7 @@ def sync_stream(stream_name):
stop_window = end_time

for stream_obj in paginate(STREAM_SDK_OBJECTS[stream_name]['sdk_object'],
filter_key, start_window, stop_window):
filter_key, start_window, stop_window, stream_name):

# get the replication key value from the object
rec = unwrap_data_objects(stream_obj.to_dict_recursive())
Expand Down Expand Up @@ -488,18 +517,37 @@ def sync_stream(stream_name):
# Update stream/sub-streams bookmarks as stop window
if stop_window > stream_bookmark:
stream_bookmark = stop_window
singer.write_bookmark(Context.state,
stream_name,
replication_key,
stream_bookmark)
# Invoice was bookmarking on `date` but in latest API version,
# that field is deprecated and replication key changed to `created`
# kept `date` in bookmarking as it as to respect bookmark of active connection too.
if stream_name == "invoices":
singer.write_bookmark(Context.state,
stream_name,
'date',
stream_bookmark)
else:
singer.write_bookmark(Context.state,
stream_name,
replication_key,
stream_bookmark)

# the sub stream bookmarks on its parent
if should_sync_sub_stream and stop_window > sub_stream_bookmark:
sub_stream_bookmark = stop_window
singer.write_bookmark(Context.state,
sub_stream_name,
replication_key,
sub_stream_bookmark)

# Invoices's replication key changed from `date` to `created` in latest API version.
# Invoice line Items write bookmark with Invoice's replication key but it changed to `created`
# so kept `date` in bookmarking as it as to respect bookmark of active connection too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# so kept `date` in bookmarking as it as to respect bookmark of active connection too.
# so kept `date` in bookmarking as it has to respect bookmark of active connection too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed typo.

if sub_stream_name == "invoice_line_items":
singer.write_bookmark(Context.state,
sub_stream_name,
'date',
sub_stream_bookmark)
else:
singer.write_bookmark(Context.state,
sub_stream_name,
replication_key,
sub_stream_bookmark)

singer.write_state(Context.state)

Expand Down
6 changes: 6 additions & 0 deletions tap_stripe/schemas/invoices.json
Original file line number Diff line number Diff line change
Expand Up @@ -707,6 +707,12 @@
"integer"
]
},
"application_fee_amount": {
"type": [
"null",
"integer"
]
},
"lines": {
"type": [
"null",
Expand Down
36 changes: 26 additions & 10 deletions tests/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import decimal
from datetime import datetime as dt
from datetime import timezone as tz
from dateutil import parser

from tap_tester import connections, menagerie, runner

Expand Down Expand Up @@ -79,12 +80,7 @@ def expected_metadata(self):
'events': default,
'customers': default,
'plans': default,
'invoices': {
self.AUTOMATIC_FIELDS: {"updated"},
self.REPLICATION_KEYS: {"date"},
self.PRIMARY_KEYS: {"id"},
self.REPLICATION_METHOD: self.INCREMENTAL,
},
'invoices': default,
'invoice_items': {
self.AUTOMATIC_FIELDS: {"updated"},
self.REPLICATION_KEYS: {"date"},
Expand Down Expand Up @@ -314,15 +310,31 @@ def split_records_into_created_and_updated(self, records):
'schema': batch['schema'],
'key_names' : batch.get('key_names'),
'table_version': batch.get('table_version')}
created[stream]['messages'] += [m for m in batch['messages']
if m['data'].get("updated") == m['data'].get(bookmark_key)]
# Bookmark key changed for `invoices` from `date` to `created` due to latest API change
# but for `invoices` stream, the `created` field have integer type(epoch format) from starting so
# converting `updated` to epoch for comparison.
if stream == "invoices":
created[stream]['messages'] += [m for m in batch['messages']
if self.dt_to_ts(m['data'].get("updated")) == m['data'].get(bookmark_key)]
else:
created[stream]['messages'] += [m for m in batch['messages']
if m['data'].get("updated") == m['data'].get(bookmark_key)]

Comment on lines +313 to +322
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a best practice for bookmarking. Consistency in the data types for bookmarks is what we want. This may be worth discussing with a dev, if it's unclear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bookmark is written in epoch format for all streams like below in STATE.
{"type": "STATE", "value": {"bookmarks": {"charges": {"created": 1642661551}, "charges_events": {"updates_created": 1609459200}, "events": {"created": 1642661567}, "customers": {"created": 1642661570}, "customers_events": {"updates_created": 1609459200}, "plans": {"created": 1642661586}, "plans_events": {"updates_created": 1609459200}, "invoices": {"date": 1642661602}, "invoice_line_items": {"date": 1642661602}, "invoices_events": {"updates_created": 1609459200}, "invoice_items": {"date": 1642661617}, "invoice_items_events": {"updates_created": 1609459200}, "transfers": {"created": 1642661632}, "transfers_events": {"updates_created": 1609459200}, "coupons": {"created": 1642661647}, "coupons_events": {"updates_created": 1609459200}, "subscriptions": {"created": 1642661662}, "subscription_items": {"created": 1642661662}, "subscriptions_events": {"updates_created": 1609459200}, "balance_transactions": {"created": 1642661677}, "payouts": {"created": 1642661680}, "payout_transactions": {"created": 1642661680}, "payouts_events": {"updates_created": 1609459200}, "disputes": {"created": 1642661696}, "disputes_events": {"updates_created": 1609459200}, "products": {"created": 1642661711}, "products_events": {"updates_created": 1609459200}}}}
While the replication key of invoices is emitted in epoch format and date-time for other streams. (In RECORD).

We did not change the data-type to avoid column split because the created field was already there with an integer type. and handled that in a test case.

if stream not in updated:
updated[stream] = {'messages': [],
'schema': batch['schema'],
'key_names' : batch.get('key_names'),
'table_version': batch.get('table_version')}
updated[stream]['messages'] += [m for m in batch['messages']
if m['data'].get("updated") != m['data'].get(bookmark_key)]

# Bookmark key changed for `invoices` from `date` to `created` due to latest API change
# but for `invoices` stream, the `created` field have integer type(epoch format) from starting so
# converting `updated` to epoch for comparison.
if stream == "invoices":
updated[stream]['messages'] += [m for m in batch['messages']
if self.dt_to_ts(m['data'].get("updated")) != m['data'].get(bookmark_key)]
else:
updated[stream]['messages'] += [m for m in batch['messages']
if m['data'].get("updated") != m['data'].get(bookmark_key)]
Comment on lines +329 to +337

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm little bit confused here. In the previous section we are converting updated to integer type to compare with created which made sense. Why are we here doing the comparison with Updated and Updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing similar things in the previous section and above section. The only difference is in the previous section we are checking with the == operator to get records that are created. While in the above section we are checking with the != operator to get records that are updated.

return created, updated

def select_all_streams_and_fields(self, conn_id, catalogs, select_all_fields: bool = True, exclude_streams=None):
Expand Down Expand Up @@ -523,3 +535,7 @@ def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.start_date = self.get_properties().get('start_date')
self.maxDiff=None


def dt_to_ts(self, dtime):
return parser.parse(dtime).timestamp()
Loading