BUG - Too many partitions produced by query, allowed 4000, query produces at least 4272 partitions #39

csaroff · 2021-09-02T01:43:11Z

Are you a current Fivetran customer?
Yes; Chaskin Saroff, Data Engineer at Bstock

Describe the bug
Too many partitions produced by int_zendesk__field_calendar_spine and int_zendesk__field_history_pivot.

Database Error in model int_zendesk__field_calendar_spine (models/ticket_history/int_zendesk__field_calendar_spine.sql)
  Too many partitions produced by query, allowed 4000, query produces at least 4272 partitions
  compiled SQL at target/run/zendesk/models/ticket_history/int_zendesk__field_calendar_spine.sql

Database Error in model int_zendesk__field_history_pivot (models/ticket_history/int_zendesk__field_history_pivot.sql)
  Too many partitions produced by query, allowed 4000, query produces at least 4022 partitions
  compiled SQL at target/run/zendesk/models/ticket_history/int_zendesk__field_history_pivot.sql

Steps to reproduce
Have more than 4000 days worth of field history data in zendesk while using bigquery as your data warehouse.

Expected behavior
No error should occur.

Project variables configuration

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'bstock'
version: '0.0.1'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'default'

# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
models:
  +persist_docs:
    relation: true
    columns: true
target-path: "target"  # directory which will store compiled SQL files
clean-targets:         # directories to be removed by `dbt clean`
    - "target"
    - "dbt_modules"
# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the marketplace/ directory
# as tables. These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
  marketplace:
      # Applies to all files under models/marketplace/
      marketplace:
          materialized: table


vars:
    'dbt_date:time_zone': 'America/Los_Angeles'

Package Version

packages:
  - package: fishtown-analytics/dbt_utils
    version: 0.6.4
  - package: fivetran/zendesk
    version: 0.4.0
  - package: fivetran/salesforce
    version: 0.3.1
  - package: fishtown-analytics/spark_utils
    version: 0.1.0
  - package: calogica/dbt_date
    version: 0.2.5

Warehouse

Please indicate the level of urgency
Not urgent, but the bug is frustrating because my build fails constantly if I leave this package installed.

Are you interested in contributing to this package?

Yes, I can do this and open a PR for your review.
Possibly, but I'm not quite sure how to do this. I'd be happy to do a live coding session with someone to get this fixed.
No, I'd prefer if someone else fixed this. I don't have the time and/or don't know what the root cause of the problem is.

The text was updated successfully, but these errors were encountered:

csaroff · 2021-09-02T19:49:08Z

It might not hurt to simply ignore data that is more than 4000 days old if the data warehouse is bigquery.

fivetran-jamie · 2021-09-02T20:14:33Z

hey there @csaroff -- you can expect a more formal solution released in the next week or so, but for the time being, this branch of the package should hopefully work for you without having to exclude any data

# packages.yml
  - git: https://github.com/fivetran/dbt_zendesk.git
    revision: bq-partitions
    warn-unpinned: false

basically, i just changed the granularity of the partition_by config clause to week. the final solution may be different, but let me know if this works for you!

fivetran-jamie · 2021-09-07T18:52:34Z

^ did this work for you btw? @csaroff

csaroff · 2021-09-07T18:58:51Z

@fivetran-jamie I haven't gotten a chance to try it out yet, but will try soon and let you know. Thanks!

csaroff · 2021-09-08T20:18:53Z

@fivetran-jamie Looks like there might be a bug with this.

Here's what I'm seeing when I do a dbt run

Completed with 2 errors and 0 warnings:

Database Error in model int_zendesk__field_calendar_spine (models/ticket_history/int_zendesk__field_calendar_spine.sql)
  PARTITION BY expression must be DATE(<timestamp_column>), DATE(<datetime_column>), DATETIME_TRUNC(<datetime_column>, DAY/HOUR/MONTH/YEAR), a DATE column, TIMESTAMP_TRUNC(<timestamp_column>, DAY/HOUR/MONTH/YEAR), DATE_TRUNC(<date_column>, MONTH/YEAR), or RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<int64_value>, <int64_value>[, <int64_value>]))
  compiled SQL at target/run/zendesk/models/ticket_history/int_zendesk__field_calendar_spine.sql

Database Error in model int_zendesk__field_history_pivot (models/ticket_history/int_zendesk__field_history_pivot.sql)
  PARTITION BY expression must be DATE(<timestamp_column>), DATE(<datetime_column>), DATETIME_TRUNC(<datetime_column>, DAY/HOUR/MONTH/YEAR), a DATE column, TIMESTAMP_TRUNC(<timestamp_column>, DAY/HOUR/MONTH/YEAR), DATE_TRUNC(<date_column>, MONTH/YEAR), or RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<int64_value>, <int64_value>[, <int64_value>]))
  compiled SQL at target/run/zendesk/models/ticket_history/int_zendesk__field_history_pivot.sql

Looks like the same error is showing on your circleci build for that branch.

fivetran-jamie · 2021-09-08T22:38:03Z

@csaroff ack somehow missed the circle error 🤦 it should be working now -- week is not actually a supported granularity level so switched it to month

csaroff · 2021-09-09T17:25:30Z

No worries. I'll try again today. Thanks!

csaroff · 2021-09-09T18:59:14Z

Looks like it cleared up the previous error, but now I'm getting:

Database Error in model int_zendesk__field_calendar_spine (models/ticket_history/int_zendesk__field_calendar_spine.sql)
  Query exceeded resource limits. This query used 19447 CPU seconds but would charge only 15M Analysis bytes. This exceeds the ratio supported by the on-demand pricing model. Please consider moving this workload to the flat-rate reservation pricing model, which does not have this limit. 19447 CPU seconds were used, and this query must use less than 3800 CPU seconds.
  compiled SQL at target/run/zendesk/models/ticket_history/int_zendesk__field_calendar_spine.sql

fivetran-jamie · 2021-09-10T16:37:44Z

Ah shoot -- and are you trying to run a full refresh?

Also it might be more efficient to troubleshoot this over a call if you'd like! You can book an office hours session here if that sounds good

fivetran-jamie · 2021-09-10T17:02:43Z

oh one idea .... this may be because int_zendesk__calendar_spine is ephemeral by default (so that whole model is just a CTE in int_zendesk__field_calendar_spine). could you try adding the following to your dbt_project.yml?

models: 
    zendesk:
        utils:
            +materialized: table

csaroff · 2021-09-10T19:36:30Z

I don't think it ever successfully built int_zendesk__field_calendar_spine so I guess it's doing a full refresh even though I'm not using the --full-refresh flag.

I added the +materialized: table config to my dbt_project.yml as you suggested. Tragically, it's still failing :(

Completed with 1 error and 0 warnings:

Database Error in model int_zendesk__field_calendar_spine (models/ticket_history/int_zendesk__field_calendar_spine.sql)
  Query exceeded resource limits. This query used 18210 CPU seconds but would charge only 20M Analysis bytes. This exceeds the ratio supported by the on-demand pricing model. Please consider moving this workload to the flat-rate reservation pricing model, which does not have this limit. 18210 CPU seconds were used, and this query must use less than 5100 CPU seconds.
  compiled SQL at target/run/zendesk/models/ticket_history/int_zendesk__field_calendar_spine.sql

I can confirm that int_zendesk__calendar_spine is being built as a table rather than a view

fivetran-jamie · 2021-09-14T19:41:54Z

Ok got a few more tricks up my sleeve....

currently, int_zendesk__field_calendar_spine associates every ticket with every day since the creation of the ticket, regardless of its age or status. i am going to try adapting our code from the jira dbt package, which caps the timeline for resolved issues at the last date they were updated at. that's a better solution overall and hopefully may address the resources-exceeded error, as the data volume can really explode here.

in the meantime, while i develop the above solution, could you try out this branch again? i've added a cluster_by config to the problematic models, which may help optimize those queries further

csaroff · 2021-09-15T21:44:29Z

Hmm. Still getting Query Exceeded Resource Limits

Database Error in model int_zendesk__field_calendar_spine (models/ticket_history/int_zendesk__field_calendar_spine.sql)
  Query exceeded resource limits. This query used 27133 CPU seconds but would charge only 15M Analysis bytes. This exceeds the ratio supported by the on-demand pricing model. Please consider moving this workload to the flat-rate reservation pricing model, which does not have this limit. 27133 CPU seconds were used, and this query must use less than 3800 CPU seconds.
  compiled SQL at target/run/zendesk/models/ticket_history/int_zendesk__field_calendar_spine.sql

fivetran-jamie · 2021-09-16T19:01:19Z

ok that seems to have made it worse lol -- i will get back to ya soon with the other solution...

fivetran-jamie · 2021-09-16T22:05:29Z

@csaroff alright wanna try it out again? 🤞

csaroff · 2021-09-19T04:54:53Z

I ran dbt clean && dbt deps && dbt run --model zendesk

Completed with 1 error and 0 warnings:

Database Error in model int_zendesk__field_calendar_spine (models/ticket_history/int_zendesk__field_calendar_spine.sql)
  Query exceeded resource limits. This query used 7891 CPU seconds but would charge only 30M Analysis bytes. This exceeds the ratio supported by the on-demand pricing model. Please consider moving this workload to the flat-rate reservation pricing model, which does not have this limit. 7891 CPU seconds were used, and this query must use less than 7600 CPU seconds.
  compiled SQL at target/run/zendesk/models/ticket_history/int_zendesk__field_calendar_spine.sql

Sooooo close!

fivetran-jamie · 2021-09-21T21:43:19Z

ahhh -- I think at this point it might be best / most efficient to troubleshoot this over a call if you're down! could you book one or two 15 minute slots of our office hours here?

fivetran-jamie · 2021-09-28T23:39:32Z

@csaroff if that doesn't work for you tho, i've also gone ahead and introduced a workaround to simply filter out old data. this happens via a ticket_field_history_timeframe variable, which limits ticket field history to the last X many years you set the variable to, and by default all data is included (code here).

this change lives on the bq-partitions branch we've been working with - if you would like to try setting ticket_field_history_timeframe to 10 years, i believe that the original too many partitions error will go away 🤞

happy to try to troubleshoot other solutions on a call, but i am thinking this is the route we'll probably have to take given the resources exceeded messages we're getting with larger partitions 😢

we are planning to release this at the end of our sprint (end of next week), so if you'd like to try it out/provide any feedback before then that'd be wonderful! thanks and sorry we couldn't come up with a more elegant fix....

fivetran-joemarkiewicz · 2021-10-20T22:07:22Z

@csaroff the latest release v0.7.0 (which includes this request) has just be cut!! You can expect this release to be live on the dbt hub at the top of the hour.

I will leave this ticket open until the release is live. Once the latest release is live I will close the ticket and please feel free to open another Github issue if you have any further questions or requests for the package. 😄

csaroff added the bug Something isn't working label Sep 2, 2021

fivetran-jamie mentioned this issue Oct 4, 2021

partition fix #47

Merged

12 tasks

fivetran-joemarkiewicz closed this as completed Oct 20, 2021

jens-koster mentioned this issue May 9, 2023

[Improvement] too many partitions in int_zendesk__field_history_pivot on bigquery #96

Closed

4 tasks

fivetran-joemarkiewicz mentioned this issue Aug 23, 2024

bugfix/too-many-partitions #165

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG - Too many partitions produced by query, allowed 4000, query produces at least 4272 partitions #39

BUG - Too many partitions produced by query, allowed 4000, query produces at least 4272 partitions #39

csaroff commented Sep 2, 2021 •

edited by fivetran-reneeli

Loading

csaroff commented Sep 2, 2021

fivetran-jamie commented Sep 2, 2021 •

edited

Loading

fivetran-jamie commented Sep 7, 2021

csaroff commented Sep 7, 2021

csaroff commented Sep 8, 2021

fivetran-jamie commented Sep 8, 2021

csaroff commented Sep 9, 2021

csaroff commented Sep 9, 2021

fivetran-jamie commented Sep 10, 2021

fivetran-jamie commented Sep 10, 2021

csaroff commented Sep 10, 2021 •

edited

Loading

fivetran-jamie commented Sep 14, 2021

csaroff commented Sep 15, 2021

fivetran-jamie commented Sep 16, 2021

fivetran-jamie commented Sep 16, 2021

csaroff commented Sep 19, 2021

fivetran-jamie commented Sep 21, 2021

fivetran-jamie commented Sep 28, 2021

fivetran-joemarkiewicz commented Oct 20, 2021

BUG - Too many partitions produced by query, allowed 4000, query produces at least 4272 partitions #39

BUG - Too many partitions produced by query, allowed 4000, query produces at least 4272 partitions #39

Comments

csaroff commented Sep 2, 2021 • edited by fivetran-reneeli Loading

csaroff commented Sep 2, 2021

fivetran-jamie commented Sep 2, 2021 • edited Loading

fivetran-jamie commented Sep 7, 2021

csaroff commented Sep 7, 2021

csaroff commented Sep 8, 2021

fivetran-jamie commented Sep 8, 2021

csaroff commented Sep 9, 2021

csaroff commented Sep 9, 2021

fivetran-jamie commented Sep 10, 2021

fivetran-jamie commented Sep 10, 2021

csaroff commented Sep 10, 2021 • edited Loading

fivetran-jamie commented Sep 14, 2021

csaroff commented Sep 15, 2021

fivetran-jamie commented Sep 16, 2021

fivetran-jamie commented Sep 16, 2021

csaroff commented Sep 19, 2021

fivetran-jamie commented Sep 21, 2021

fivetran-jamie commented Sep 28, 2021

fivetran-joemarkiewicz commented Oct 20, 2021

csaroff commented Sep 2, 2021 •

edited by fivetran-reneeli

Loading

fivetran-jamie commented Sep 2, 2021 •

edited

Loading

csaroff commented Sep 10, 2021 •

edited

Loading