fix!: handle multiple disputes #88

bramrodenburg · 2024-09-09T16:03:20Z

Please provide your name and company

Bram - Gigs

Link the issue/feature request which this PR is meant to address

Detail what changes this PR introduces and how this addresses the issue/feature request linked above.

Although it happens rarely (see here), a payment can be disputed more than once. The existing balance_transactions model cannot handle this scenario. In this PR, that is fixed by aggregating the dispute_[id/reason/amount] before joining the dispute data back.

How did you validate the changes introduced within this PR?

Followed the steps here and built the dbt_stripe package against our data.

Which warehouse did you use to develop these changes?

BigQuery

Did you update the CHANGELOG?

Yes

Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)

Yes

Typically there are additional maintenance changes required before this will be ready for an upcoming release. Are you comfortable with the Fivetran team making a few commits directly to your branch?

Yes
No

If you had to summarize this PR in an emoji, which would it be?

🥸

Feedback

We are so excited you decided to contribute to the Fivetran community dbt package! We continue to work to improve the packages and would greatly appreciate your feedback on our existing dbt packages or what you'd like to see next.

PR Template

Community Pull Request Template (default)
Maintainer Pull Request Template (to be used by maintainers)

* MagicBot/documentation-updates * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Jamie Rodriguez <[email protected]>

fivetran-joemarkiewicz

@bramrodenburg thanks for opening this PR to address the issue you raised! This definitely is a change we would like to fold into the broader package; however, I do have a few questions:

Please see my question below regarding the use of array_agg and if we should consider string_agg?
Would you be able to create a CHANGELOG.md entry for these changes? Since this will be changing field names, we will need to make this a breaking change (v0.15.0).

fivetran-joemarkiewicz · 2024-09-09T19:08:23Z

models/stripe__balance_transactions.sql

+        array_agg(dispute_id) as dispute_ids,
+        array_agg(dispute_reason) as dispute_reasons,


For these aggs is there a reason you used the array_agg function instead of string_agg? Reason I ask is because in BigQuery and Databricks especially I know the array_agg function can change the nature of the table. Do you find using array_agg to be easier to leverage in queries down the road? Would there be any concerns with using string_agg instead, or would that impact the usability of these fields?

The main reason for using array_agg is that I find it easier to use the data downstream (e.g. if I need to join back to dispute using the dispute_id). If I would use string_agg, I would first have to split the string again. However, this is a "fictional" use case - I am not doing this now :)

I am perfectly fine with using string_agg if that's the preferred way of doing it.

Replaced array_agg with string_agg

bramrodenburg · 2024-09-09T19:42:13Z

Would you be able to create a CHANGELOG.md entry for these changes? Since this will be changing field names, we will need to make this a breaking change (v0.15.0).

Makes sense. Will do.

jsnorthrup · 2024-09-11T18:29:02Z

models/stripe__balance_transactions.sql

+        source_relation,
+        string_agg(dispute_id) as dispute_ids,
+        string_agg(dispute_reason) as dispute_reasons,
+        sum(dispute_amount) as dispute_amount


Since the same amount can get disputed more than once, I think it makes more sense to also string_agg the dispute_amount. Here's an example where I have a balance transaction where the full amount has been disputed twice. Summing the dispute_amount in this case will erroneously double the amount of the transaction.

@jsnorthrup thanks so much for chiming in and sharing this level of insight. We definitely would not want to have a scenario where we could possibly be double counting the amount.

It looks like the two disputes you shared have a different "status" where it seems only one was "won". Since there is a "status" field in the source table (via the ERD), do you think it would be appropriate to retain the sum, but only sum disputes that are of status='won'?

I think that would work for when the status is "won", but what about if it hasn't been won yet? Will the dispute_amount be 0? In that case I think the column needs to be renamed to dispute_amount_won to indicate it's not showing dispute amounts in other statuses.

If you go that route, it might be good to add another column dispute_amounts that still has a string_agg of each individual dispute amount to match dispute_ids and dispute_reasons.

For us, we are not only interested in disputes won, but also lost, under_review etc. If we would go down the route of adding a dispute_amount_won, then I feel we should also add similar columns for the other statuses. There are currently 7 different statuses in Stripe for disputes (see here). So you would get something like this:

dispute_amount_won

dispute_amount_lost

dispute_amount_under_review

dispute_amount_needs_response

dispute_amount_warning_closed

dispute_amount_warning_under_review

dispute_amount_warning_needs_response

If we wouldn't do this, then the user of the table needs to join back to the dispute table to get this information, which kind of renders having dispute amounts in this table obsolete. (I actually naively took the sum in PR, because we are actually already doing this)

If we wouldn't do this, then the user of the table needs to join back to the dispute table to get this information, which kind of renders having dispute amounts in this table obsolete.

@bramrodenburg I agree with this idea. I also worry if we go the string_agg route for the dispute information then it won't provide much analytical value. Whereas, if we take the approach in creating distinct aggregate sums for each status, these values can be used more easily for any downstream analysis. Since we can confirm from Stripe's documentation that there are only 7 statuses, I'd request we take the approach of having distinct dispute aggregate records for each status like you suggest.

My only remaining question here - Is it possible for Stripe to provide multiple entries for a dispute status that could still result in us double counting? For example, could there be some reason the same dispute amount has two won records in the underlying table? If not, then I'm comfortable with the above approach.

Hey @jsnorthrup - In that scenario, would you want to only look at the most recent dispute amount for won and lost disputes (or any other statuses)? So in the above screenshot, if you won Dispute 2, the model would only include the second $175 in the dispute_amount_won sum.

If so, I think this would be possible by doing something like row_number() over (partition by charge_id, dispute_status, source_relation order by dispute_created_at desc) = 1 or dispute_status not in ('won', 'lost') as is_dispute_included and filtering based on that

Otherwise, yes, perhaps joining the balance_transactions model and with dispute downstream is the most sensible option

@bramrodenburg @jsnorthrup I suppose my bigger question is how do you two think about Disputes? What is the bottom line you'd like to analyze or see surfaced?

Also, are there cases were a singe transaction's disputes can have different dollar amounts? So like in the above example, what if the second dispute only pertained to some items in the charge and therefore had a dispute value $100 instead of $175? Assuming you won Dispute 2, would you want the dispute_won_amount to reflect the max amount or most recent amount (or both, which would lead me to think that just joining in dispute downstream may make more sense)?

Multiple scenarios are possible. If different parts of the total are disputed, the amounts could be additive. If the total amount is disputed multiple times, you'd only want the max.

I think doing the latest might be a good compromise. Remember, this is a very rare occurrence. I have exactly one instance of multiple disputes on the same transaction. If we have the dispute columns only refer to the latest dispute (in which case I recommend prefixing with latest_), it will represent the total disputed amount in the vast majority of cases, and in those rare cases where there are multiple disputes, we could have a dispute_count field to signal that you should reference the dispute table for more information.

I like that! @bramrodenburg if this compromise sounds good to you, I'll go ahead and merge your PR into my own working branch and add this extra code to stripe__balance_transactions, as it's a bit involved.

Specifically, I'll:

Add code to select the latest dispute in each status to account for the rare (but possible) case of multiple disputes per transaction: row_number() over (partition by charge_id, dispute_status, source_relation order by dispute_created_at desc) = 1 as is_latest_dispute.

This filter will be leveraged to create the following dispute_amount fields split by status:

latest_dispute_amount_won

latest_dispute_amount_lost

latest_dispute_amount_under_review

latest_dispute_amount_needs_response

latest_dispute_amount_warning_closed

latest_dispute_amount_warning_under_review

latest_dispute_amount_warning_needs_response

Add a count(distinct dispute_id) as dispute_count field that indicates whether there are multiple disputes for a transaction. If so, users should then join in the dispute table for more details

Sorry for the late reply. @jsnorthrup is completely right that there can be multiple scenarios.

The suggested approach sounds great. Thanks for merging this into your working branch @fivetran-jamie

Joe's going on PTO and I'm making additional changes to this in a working branch before merging to main

…nsactions (#92) * fix!: handle multiple disputes (#88) * Documentation Standard Updates (#85) * MagicBot/documentation-updates * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Jamie Rodriguez <[email protected]> * fix!: handle multiple disputes * replace array_agg with string_agg * update changelog * bump version --------- Co-authored-by: Joe Markiewicz <[email protected]> Co-authored-by: Jamie Rodriguez <[email protected]> * changes * aws and changelog * validations * Update models/stripe__balance_transactions.sql Co-authored-by: Renee Li <[email protected]> * renee feedback * docs * update package ref * changelog --------- Co-authored-by: bramrodenburg <[email protected]> Co-authored-by: Joe Markiewicz <[email protected]> Co-authored-by: Renee Li <[email protected]>

fivetran-joemarkiewicz and others added 2 commits September 5, 2024 14:35

Documentation Standard Updates (fivetran#85)

1d46ea3

* MagicBot/documentation-updates * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Jamie Rodriguez <[email protected]>

fix!: handle multiple disputes

29bd230

fivetran-joemarkiewicz previously requested changes Sep 9, 2024

View reviewed changes

bramrodenburg added 3 commits September 10, 2024 08:12

replace array_agg with string_agg

4a3717f

update changelog

a4c2b30

bump version

091f776

jsnorthrup reviewed Sep 11, 2024

View reviewed changes

fivetran-jamie changed the base branch from main to bugfix/handle-multiple-disputes September 27, 2024 23:24

fivetran-jamie changed the base branch from bugfix/handle-multiple-disputes to releases/v0.15.latest September 27, 2024 23:32

fivetran-jamie changed the base branch from releases/v0.15.latest to release/v0.14.0 September 28, 2024 00:01

fivetran-jamie merged commit 78ee45e into fivetran:release/v0.14.0 Sep 28, 2024

fivetran-jamie mentioned this pull request Sep 30, 2024

Continuation of #88 - handle multiple disputes in stripe__balance_transactions #92

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix!: handle multiple disputes #88

fix!: handle multiple disputes #88

bramrodenburg commented Sep 9, 2024 •

edited

Loading

fivetran-joemarkiewicz left a comment

fivetran-joemarkiewicz Sep 9, 2024

bramrodenburg Sep 9, 2024

bramrodenburg Sep 10, 2024

bramrodenburg commented Sep 9, 2024

jsnorthrup Sep 11, 2024

fivetran-joemarkiewicz Sep 11, 2024

jsnorthrup Sep 12, 2024

bramrodenburg Sep 13, 2024

fivetran-joemarkiewicz Sep 13, 2024

fivetran-jamie Sep 23, 2024

fivetran-jamie Sep 24, 2024

jsnorthrup Sep 24, 2024 •

edited

Loading

fivetran-jamie Sep 25, 2024

bramrodenburg Sep 27, 2024

		array_agg(dispute_id) as dispute_ids,
		array_agg(dispute_reason) as dispute_reasons,

fix!: handle multiple disputes #88

fix!: handle multiple disputes #88

Conversation

bramrodenburg commented Sep 9, 2024 • edited Loading

fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bramrodenburg commented Sep 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsnorthrup Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bramrodenburg commented Sep 9, 2024 •

edited

Loading

jsnorthrup Sep 24, 2024 •

edited

Loading