Skip to content

Commit

Permalink
Merge pull request #68 from fivetran/feature/incremental-mar-update
Browse files Browse the repository at this point in the history
Feature/incremental mar update
  • Loading branch information
fivetran-joemarkiewicz authored Jan 3, 2023
2 parents e0c74db + 6cb6996 commit 226deff
Show file tree
Hide file tree
Showing 18 changed files with 132 additions and 82 deletions.
1 change: 0 additions & 1 deletion .buildkite/scripts/run_models.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
#!/bin/bash

set -euo pipefail

apt-get update
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,14 @@
- `dbt_utils.surrogate_key` has also been updated to `dbt_utils.generate_surrogate_key`. Since the method for creating surrogate keys differ, we suggest all users do a `full-refresh` for the most accurate data. For more information, please refer to dbt-utils [release notes](https://github.com/dbt-labs/dbt-utils/releases) for this update.
- `packages.yml` has been updated to reflect new default `fivetran/fivetran_utils` version, previously `[">=0.3.0", "<0.4.0"]` now `[">=0.4.0", "<0.5.0"]`.

[PR #68](https://github.com/fivetran/dbt_fivetran_log/pull/68) includes the following breaking changes:
- The `active_volume` source (and accompanying `stg_fivetran_log__active_volume` model) has been deprecated from the Fivetran Log connector. In its place, the `incremental_mar` table (and accompanying `stg_fivetran_log__incremental_mar` model) has been added. This new source has been swapped within the package to reference the new source table.
- This new source table has enriched data behind the paid and free MAR across Fivetran connectors within your destinations.
- Removed the `monthly_active_rows` field from the `fivetran_log__mar_table_history` and `fivetran_log__usage_mar_destination_history` models. In it's place the following fields have been added:
- `free_mothly_active_rows`: Detailing the total free MAR
- `paid_mothly_active_rows`: Detailing the total paid MAR
- `total_mothly_active_rows`: Detailing the total free and paid MAR

# dbt_fivetran_log v0.6.4
## Fixes
- Added second qualifying join clause to `fivetran_log__usage_mar_destination_history` in the `usage` cte. This join was failing this test to ensure each `destination_id` has a single `measured_month` :
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
- Produces staging models in the format described by [this ERD](https://fivetran.com/docs/logs/fivetran-log#schemainformation) which clean, test, and prepare your Fivetran Log data from [Fivetran's free connector](https://fivetran.com/docs/applications/fivetran-log) and generates analysis ready end models.
- The above mentioned models enable you to better understand how you are spending money in Fivetran according to our [consumption-based pricing model](https://fivetran.com/docs/getting-started/consumption-based-pricing) as well as providing details about the performance and status of your Fivetran connectors and transformations. This is achieved by:
- Displaying consumption data at the table, connector, destination, and account levels
- Providing a history of measured monthly active rows (MAR), credit consumption, and the relationship between the two
- Providing a history of measured free and paid monthly active rows (MAR), credit consumption, and the relationship between the two
- Creating a history of vital daily events for each connector
- Surfacing an audit log of records inserted, deleted, an updated in each table during connector syncs

Expand All @@ -27,7 +27,7 @@ Refer to the table below for a detailed view of all models materialized by defau
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [fivetran_log__connector_status](https://fivetran.github.io/dbt_fivetran_log/#!/model/model.fivetran_log.fivetran_log__connector_status) | Each record represents a connector loading data into a destination, enriched with data about the connector's data sync status. |
| [fivetran_log__transformation_status](https://github.com/fivetran/dbt_fivetran_log/blob/main/models/fivetran_log__transformation_status.sql) | Each record represents a transformation, enriched with data about the transformation's last sync and any tables whose new data triggers the transformation to run. |
| [fivetran_log__mar_table_history](https://fivetran.github.io/dbt_fivetran_log/#!/model/model.fivetran_log.fivetran_log__mar_table_history) | Each record represents a table's active volume for a month, complete with data about its connector and destination. |
| [fivetran_log__mar_table_history](https://fivetran.github.io/dbt_fivetran_log/#!/model/model.fivetran_log.fivetran_log__mar_table_history) | Each record represents a table's free, paid, and total volume for a month, complete with data about its connector and destination. |
| [fivetran_log__usage_mar_destination_history](https://fivetran.github.io/dbt_fivetran_log/#!/model/model.fivetran_log.fivetran_log__usage_mar_destination_history) | Table of each destination's usage and active volume, per month. Includes the usage per million MAR and MAR per usage. Usage either refers to a dollar or credit amount, depending on customer's pricing model. Read more about the relationship between usage and MAR [here](https://www.fivetran.com/legal/service-consumption-table). |
| [fivetran_log__connector_daily_events](https://fivetran.github.io/dbt_fivetran_log/#!/model/model.fivetran_log.fivetran_log__connector_daily_events) | Each record represents a daily measurement of the API calls, schema changes, and record modifications made by a connector, starting from the date on which the connector was set up. |
| [fivetran_log__schema_changelog](https://fivetran.github.io/dbt_fivetran_log/#!/model/model.fivetran_log.fivetran_log__schema_changelog) | Each record represents a schema change (altering/creating tables, creating schemas, and changing schema configurations) made to a connector and contains detailed information about the schema change event. |
Expand Down
4 changes: 2 additions & 2 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ models:
staging:
+schema: stg_fivetran_log
tmp:
+materialized: view
+materialized: view

vars:
fivetran_log:
account: "{{ source('fivetran_log', 'account') }}"
account_membership: "{{ source('fivetran_log', 'account_membership') }}"
active_volume: "{{ source('fivetran_log', 'active_volume') }}"
incremental_mar: "{{ source('fivetran_log', 'incremental_mar') }}"
connector: "{{ source('fivetran_log', 'connector') }}"
credits_used: "{{ source('fivetran_log', 'credits_used') }}"
destination: "{{ source('fivetran_log', 'destination') }}"
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/run_results.json

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ vars:
fivetran_log_schema: fivetrans_logs_integration_tests
fivetran_log_account_identifier: "account"
fivetran_log_account_membership_identifier: "account_membership"
fivetran_log_active_volume_identifier: "active_volume"
fivetran_log_incremental_mar_identifier: "incremental_mar"
fivetran_log_connector_identifier: "connector"
fivetran_log_credits_used_identifier: "credits_used"
fivetran_log_usage_cost_identifier: "usage_cost"
Expand All @@ -41,10 +41,10 @@ seeds:
+column_types:
activated_at: timestamp
joined_at: timestamp
active_volume:
incremental_mar:
+column_types:
measured_at: timestamp
monthly_active_rows: "{{ 'int64' if target.type == 'bigquery' else 'bigint' }}"
measured_date: timestamp
incremental_rows: "{{ 'int64' if target.type == 'bigquery' else 'bigint' }}"
connector:
+column_types:
signed_up: timestamp
Expand Down
Binary file modified integration_tests/seeds/.DS_Store
Binary file not shown.
3 changes: 0 additions & 3 deletions integration_tests/seeds/active_volume.csv

This file was deleted.

11 changes: 11 additions & 0 deletions integration_tests/seeds/incremental_mar.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
connector_id,destination_id,free_type,measured_date,schema_name,sync_type,table_name,updated_at,_fivetran_synced,incremental_rows
this_connector,111eee,PAID,2022-08-31 00:00:00,this_connector,UNKNOWN,my_table,2022-09-01 6:12:13,2022-11-09 20:09:26,46
that_connector,222rrr,PAID,2022-05-22 00:00:00,that_connector,UNKNOWN,drop_table,2022-05-22 8:29:22,2022-11-07 20:10:24,1
that_connector,222rrr,PAID,2022-03-01 00:00:00,that_connector,UNKNOWN,drop_table,2022-03-01 21:52:24,2022-11-07 20:10:15,8
that_connector,222rrr,PAID,2022-03-04 00:00:00,that_connector,UNKNOWN,drop_table,2022-03-04 16:19:05,2022-11-07 20:10:16,1
that_connector,222rrr,PAID,2022-07-29 00:00:00,that_connector,UNKNOWN,drop_table,2022-07-29 23:05:17,2022-11-07 20:10:33,4
that_connector,222rrr,PAID,2022-05-14 00:00:00,that_connector,UNKNOWN,drop_table,2022-05-14 11:07:37,2022-11-07 20:10:24,1
that_connector,222rrr,PAID,2021-09-19 00:00:00,that_connector,UNKNOWN,drop_table,2021-09-19 12:11:38,2022-11-07 20:09:55,0
that_connector,222rrr,PAID,2022-06-08 00:00:00,that_connector,UNKNOWN,drop_table,2022-06-08 21:29:07,2022-11-07 20:10:25,2
that_connector,222rrr,PAID,2022-03-05 00:00:00,that_connector,UNKNOWN,drop_table,2022-03-05 10:55:00,2022-11-07 20:10:16,1
that_connector,222rrr,PAID,2022-07-01 00:00:00,that_connector,UNKNOWN,drop_table,2022-07-01 21:56:37,2022-11-07 20:10:30,1
26 changes: 20 additions & 6 deletions models/fivetran_log.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,14 @@ models:
- name: last_measured_at
description: Timestamp of when the last MAR measurement for the month was made.

- name: monthly_active_rows
description: The number of active rows measured in the table for the month.
- name: free_monthly_active_rows
description: The number of free active rows measured in the table for the month.

- name: paid_monthly_active_rows
description: The number of paid active rows measured in the table for the month.

- name: total_monthly_active_rows
description: The total number of active rows measured in the table for the month.

- name: connector_type
description: The kind of connector (ie google sheets, webhooks).
Expand Down Expand Up @@ -133,11 +139,14 @@ models:
- name: destination_name
description: Name of the destination as it appears in the UI.

- name: consumption
description: The number of credits or dollar amount used by the destination for the given month.
- name: free_monthly_active_rows
description: The number of free active rows measured in the destination for the month.

- name: monthly_active_rows
description: The number of total active rows measured in the destination for the month.
- name: paid_monthly_active_rows
description: The number of paid active rows measured in the destination for the month.

- name: total_monthly_active_rows
description: The total number of active rows measured in the destination for the month.

- name: credits_spent_per_million_mar
description: The ratio of credits spent that month per every million active rows synced.
Expand All @@ -151,6 +160,11 @@ models:
- name: mar_per_amount_spent
description: The ratio of the month's active volume to amount spent.

- name: credits_spent
description: The number of credits used by the destination for the given month.

- name: dollars_spent
description: The dollar amount used by the destination for the given month.

- name: fivetran_log__transformation_status
description: >
Expand Down
28 changes: 17 additions & 11 deletions models/fivetran_log__mar_table_history.sql
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
with active_volume as (
with incremental_mar as (

select
*,
{{ dbt.date_trunc('month', 'measured_at') }} as measured_month
{{ dbt.date_trunc('month', 'measured_date') }} as measured_month

from {{ ref('stg_fivetran_log__active_volume') }}

where schema_name != 'fivetran_log' -- it's free!
from {{ ref('stg_fivetran_log__incremental_mar') }}

),

Expand All @@ -29,14 +27,20 @@ ordered_mar as (
schema_name,
table_name,
destination_id,
measured_at,
measured_date,
measured_month,
monthly_active_rows,
incremental_rows,
case when lower(free_type) = 'paid'
then incremental_rows
end as paid_monthly_active_rows,
case when lower(free_type) != 'paid'
then incremental_rows
end as free_monthly_active_rows,

-- each measurement is cumulative for the month, so we'll only look at the latest date for each month
row_number() over(partition by table_name, connector_name, destination_id, measured_month order by measured_at desc) as n
row_number() over(partition by table_name, connector_name, destination_id, measured_month order by measured_date desc) as n

from active_volume
from incremental_mar

),

Expand All @@ -47,8 +51,10 @@ latest_mar as (
table_name,
destination_id,
measured_month,
date(measured_at) as last_measured_at,
monthly_active_rows
date(measured_date) as last_measured_at,
free_monthly_active_rows,
paid_monthly_active_rows,
(free_monthly_active_rows + paid_monthly_active_rows) as total_monthly_active_rows

from ordered_mar
where n = 1
Expand Down
16 changes: 10 additions & 6 deletions models/fivetran_log__usage_mar_destination_history.sql
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ destination_mar as (
measured_month,
destination_id,
destination_name,
sum(monthly_active_rows) as monthly_active_rows
sum(free_monthly_active_rows) as free_monthly_active_rows,
sum(paid_monthly_active_rows) as paid_monthly_active_rows,
sum(total_monthly_active_rows) as total_monthly_active_rows
from table_mar
group by 1,2,3
),
Expand All @@ -48,13 +50,15 @@ join_usage_mar as (
destination_mar.destination_name,
usage.credits_spent,
usage.dollars_spent,
destination_mar.monthly_active_rows,
destination_mar.free_monthly_active_rows,
destination_mar.paid_monthly_active_rows,
destination_mar.total_monthly_active_rows,

-- credit and usage mar calculations
round( cast(nullif(usage.credits_spent,0) * 1000000.0 as {{ dbt.type_numeric() }}) / cast(nullif(destination_mar.monthly_active_rows,0) as {{ dbt.type_numeric() }}), 2) as credits_spent_per_million_mar,
round( cast(nullif(destination_mar.monthly_active_rows,0) * 1.0 as {{ dbt.type_numeric() }}) / cast(nullif(usage.credits_spent,0) as {{ dbt.type_numeric() }}), 0) as mar_per_credit_spent,
round( cast(nullif(usage.dollars_spent,0) * 1000000.0 as {{ dbt.type_numeric() }}) / cast(nullif(destination_mar.monthly_active_rows,0) as {{ dbt.type_numeric() }}), 2) as amount_spent_per_million_mar,
round( cast(nullif(destination_mar.monthly_active_rows,0) * 1.0 as {{ dbt.type_numeric() }}) / cast(nullif(usage.dollars_spent,0) as {{ dbt.type_numeric() }}), 0) as mar_per_amount_spent
round( cast(nullif(usage.credits_spent,0) * 1000000.0 as {{ dbt.type_numeric() }}) / cast(nullif(destination_mar.total_monthly_active_rows,0) as {{ dbt.type_numeric() }}), 2) as credits_spent_per_million_mar,
round( cast(nullif(destination_mar.total_monthly_active_rows,0) * 1.0 as {{ dbt.type_numeric() }}) / cast(nullif(usage.credits_spent,0) as {{ dbt.type_numeric() }}), 0) as mar_per_credit_spent,
round( cast(nullif(usage.dollars_spent,0) * 1000000.0 as {{ dbt.type_numeric() }}) / cast(nullif(destination_mar.total_monthly_active_rows,0) as {{ dbt.type_numeric() }}), 2) as amount_spent_per_million_mar,
round( cast(nullif(destination_mar.total_monthly_active_rows,0) * 1.0 as {{ dbt.type_numeric() }}) / cast(nullif(usage.dollars_spent,0) as {{ dbt.type_numeric() }}), 0) as mar_per_amount_spent
from destination_mar
left join usage
on destination_mar.measured_month = cast(usage.measured_month as timestamp)
Expand Down
26 changes: 16 additions & 10 deletions models/staging/src_fivetran_log.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,27 +12,33 @@ sources:
warn_after: {count: 72, period: hour}
error_after: {count: 96, period: hour}

tables:
- name: active_volume
identifier: "{{ var('fivetran_log_active_volume_identifier', 'active_volume')}}"
tables:
- name: incremental_mar
identifier: "{{ var('fivetran_log_incremental_mar_identifier', 'incremental_mar')}}"
description: >
Table of **monthly active row (MAR)** measurements made by table per date.
Each measurement is calculated cumulatively for the month.
Each measurement is calculated cumulatively for the month and includes all types of mar (paid, free, etc.)
columns:
- name: id
description: System-generated unique ID of the active volume measurement.
- name: connector_id
description: The *name* of the connector being measured. Note - this is erroneously named and will be fixed soon by Fivetran.
- name: destination_id
description: Foreign key referencing the `destination` whose table is being measured.
- name: measured_at
- name: free_type
description: If it is free MAR, the value indicates the type of free MAR. For paid MAR, the value is `PAID`.
- name: measured_date
description: Timestamp of when the MAR measurement was made.
- name: monthly_active_rows
description: The number of active rows cumulatively synced this month for this table.
- name: incremental_rows
description: The number of new distinct primary keys on the current day synced for the connector.
- name: sync_type
description: This defines whether the sync for which MAR calculated is HISTORICAL or INCREMENTAL. Currently, the available value is UNKNOWN.
- name: schema_name
description: The name of the connector's schema that houses the measured table.
- name: table_name
description: The name of the table whose MAR was measured.
- name: updated_at
description: Timestamp of when the record was last updated.
- name: _fivetran_synced
description: Timestamp of when the record was last synced.

- name: connector
identifier: "{{ var('fivetran_log_connector_identifier', 'connector')}}"
Expand All @@ -46,7 +52,7 @@ sources:
- name: connecting_user_id
description: Foreign key referencing the `user` who set up the connector.
- name: connector_name
description: Individual name of the connector. Note that this could be different from the `active_volume.schema_name`.
description: Individual name of the connector. Note that this could be different from the `incremental_mar.schema_name`.
# Other tables lack a true `connector_id` and will have to join with `connector_name`
- name: connector_type
description: The kind of connector (ie google sheets, webhooks).
Expand Down
Loading

0 comments on commit 226deff

Please sign in to comment.