Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-328] [Feature] Package Enabled Flag #4837

Closed
1 task done
csoule-shaker opened this issue Mar 7, 2022 · 15 comments
Closed
1 task done

[CT-328] [Feature] Package Enabled Flag #4837

csoule-shaker opened this issue Mar 7, 2022 · 15 comments
Labels
deps dbt's package manager enhancement New feature or request stale Issues that have gone stale

Comments

@csoule-shaker
Copy link

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

the ability to set an enabled flag in the packages.yml file to have dbt deps skip that package. The flag should also be able to be set using a variable.

Describe alternatives you've considered

Thought about using a Macro to iterate over a list variable and creating the package code and placing the macro in the packages.yml file, but deps doesn't seem to support this either.

Who will this benefit?

Some packages reference dependent packages that are only used if you enable them through a variable in your project.yml. If a higher level packages references 10 dependents and I am only using 3, being able to set enabled flags using variables would greatly cur down on project clutter, unnecessary documentation in the docs, and reduce project size in the IDE and at runtime.

This recently caused errors in dbt Cloud due to the IDE not able to handle 50MB of files and resulted in not being able to display the folder/file lists.

Are you interested in contributing this feature?

I'm not sure how deps code works, but I can definitely consult on the solution.

Anything else?

Should be fairly simple to parse the flag out of the project,yml and use it as a filter in the function that lists the packages to install.

This would also be a great feature for package developer ecosystem.

@csoule-shaker csoule-shaker added enhancement New feature or request triage labels Mar 7, 2022
@github-actions github-actions bot changed the title [Feature] Package Enabled Flag [CT-328] [Feature] Package Enabled Flag Mar 7, 2022
@jtcohen6 jtcohen6 added packages Functionality for interacting with installed packages Team:Language labels Mar 7, 2022
@emmyoop
Copy link
Member

emmyoop commented Mar 14, 2022

Thanks for the writeup @csoule-shaker. I'm not sure I understand so I'm going to need a bit of clarification on what you're asking for.

Are you saying you have your packages.yml defined in your project as

packages:
  - package: dbt-labs/adwords
    version: 0.4.0
  - package: dbt-labs/audit_helper
    version: 0.5.0
  - package: dbt-labs/codegen
    version: 0.5.0
  - package: dbt-labs/dbt_external_tables
    version: 0.8.0
  - package: dbt-labs/facebook_ads
    version: 0.8.0

but you're only using 3 of the 5 packages you have defined so you want to be able to say

packages:
  - package: dbt-labs/adwords
    version: 0.4.0
    enabled: true
  - package: dbt-labs/audit_helper
    version: 0.5.0
    enabled: false
  - package: dbt-labs/codegen
    version: 0.5.0
    enabled: true
  - package: dbt-labs/dbt_external_tables
    version: 0.8.0
    enabled: true
  - package: dbt-labs/facebook_ads
    version: 0.8.0
    enabled: false

If this is the correct scenario, can you share what the need to have unused packages defined is? Are you sharing this file across multiple dbt projects?

If I am not understanding correctly, an example of the problem you're trying to resolve might be hugely beneficial to me getting on the same page as you!

@emmyoop emmyoop self-assigned this Mar 14, 2022
@emmyoop emmyoop removed the triage label Mar 14, 2022
@csoule-shaker
Copy link
Author

csoule-shaker commented Mar 14, 2022 via email

@jeremyyeo
Copy link
Contributor

@emmyoop - there are packages that contain many other packages - the dbt_ad_reporting package contains all these dependencies:

packages:
- package: fivetran/pinterest
  version: [">=0.5.0", "<0.6.0"]

- package: fivetran/microsoft_ads
  version: [">=0.4.0", "<0.5.0"]

- package: fivetran/linkedin
  version: [">=0.4.0", "<0.5.0"]

- package: fivetran/google_ads
  version: [">=0.6.0", "<0.7.0"]

- package: fivetran/tiktok_ads
  version: [">=0.1.0", "<0.2.0"]

- package: fivetran/twitter_ads
  version: [">=0.4.0", "<0.5.0"]

- package: fivetran/facebook_ads
  version: [">=0.4.0", "<0.5.0"]

- package: fivetran/snapchat_ads
  version: [">=0.3.0", "<0.4.0"]

Even if a user may only need to use the google_ads and facebook_ads packages for example (in tandem with some core models/macros provided by the higher level dbt_ad_reporting package). It would be useful to have the ability to skip certain lower level dependencies that is not needed - not too sure what the API looks like but perhaps:

packages:
  - package: fivetran/ad_reporting
    version: [">=0.7.0", "<0.8.0"]
    include_only: ["google_ads", "facebook_ads"]

Which would make dbt deps not install all 8 of those sub-packages when trying to use the dbt_ad_reporting package but just what is required (ad_reporting and it's core models / macros, fivetran/google_ads, fivetran/facebook_ads).


For the record, the on-disk install size when you try to use dbt_ad_reporting is 50 MB (this is due to all the integration csv files and the docs files contained within all 8 of those sub-packages.

Generally this is not an issue but the dbt Cloud IDE doesn't play nicely when you have such a large on-disk dbt_packages/ directory. One of the workarounds currently is to fork each of those 8 packages and remove the bloat (https://github.com/jeremyyeo/dbt_ad_reporting) which trims the on-disk size by ~60% (which may be only marginally helpful in the dbt Cloud IDE tbh).

There's probably a wider conversation to be had here around not including large files (csv / json) when using dbt packages which is separate to this particular ask ("allow users to specify what to include / exclude when packages contain other sub-packages") but thought I'd add this to provide a bit more colour to the motivation for this particular issue.

@emmyoop
Copy link
Member

emmyoop commented Mar 15, 2022

@csoule-shaker and @jeremyyeo thank you for all the clarification. I'm on the same page as you now!

I think @jeremyyeo is very close to the real issue with

There's probably a wider conversation to be had here around not including large files (csv / json) when using dbt packages

I think now is the time to have that conversation! If we could cut down on package size by excluding those csv/json files, that would solve the real problem here. Definitely speak up if there a reason other than package size to exclude/include specific packages via an enabled flag!

Banning the use of these large files in packages isn't the right answer. The ad_reporting files here are what allow the docs to exist here which is pretty neat. It's powerful to be able to navigate the package this way without having it installed. The csv files in the project are for integration tests which are another good practice we definitely don't want to discourage or make difficult.

The right answer is to have a way to mark these files/directories as not needing to be downloaded as part of the package you're including in your project (or a dependency of that project). .dbtignore anyone? 😉

In all honestly, I don't actually know the exact solution here. It could involve how we handle packages in the dbt hub, how we cache dependancies or how we handle installation within dbt-core itself. This is all very worthwhile work to solve the underlying issue that we will need to get prioritized.

@emmyoop emmyoop removed their assignment Mar 15, 2022
@csoule-shaker
Copy link
Author

csoule-shaker commented Mar 15, 2022 via email

@jeremyyeo
Copy link
Contributor

@csoule-shaker you're right about the separation of issues here - moved that to #4868 so that we can have that discussion separately.

@emmyoop
Copy link
Member

emmyoop commented Mar 15, 2022

This is great feedback, @csoule-shaker. I obviously missed that detail in the previous replies.

@jeremyyeo opened #4868 (thank you!) to track what I discussed above. I'll leave this one open to track any conversation more directly related to flags to enable specific projects.

There's a complexity with the current architecture where vars are not available in the packages.yml file. I'm going to tag in @gshank to see if she can shed any more light on a possible way to achieve this.

@gshank
Copy link
Contributor

gshank commented Mar 16, 2022

Currently command line vars are available in packages.yml (though there seems to be a bug related to that too) but the vars in the dbt_project file are not available. This is an issue that comes up pretty often in one way or another... I'm wondering if it might not make sense to move vars out into a separate file (vars.yml). Then we could process it before dbt_project.yml, selectors.yml, and packages.yml, and substitute vars. Or we could look for them separately and combine them with cli_vars and achieve a similar thing.

@jtcohen6 I know we've discussed this before , but I think it was in a "might be nice" context. It probably wouldn't be too hard to pull out. And the work that's being done to separate out dbt_project processing might help too.

@jtcohen6 jtcohen6 added deps dbt's package manager and removed packages Functionality for interacting with installed packages labels Mar 30, 2022
@jtcohen6
Copy link
Contributor

I missed these comments a month ago. The extra context helps a lot!

There should be some way to perform "partial" import of a big package such as ad_reporting, akin to extras for Python packages. That could be defined in two places:

  • The monolithic package uses vars to dynamically turn on/off its own dependencies
  • The user specifies which

One way of doing this, which we're describing above, is by having the ad_reporting package define its dependencies dynamically. The ability to use project-defined vars in package.yml would be the most ergonomic way of instrumenting that (fuller discussion in #4938), but it would be possible to use env_var to set those flags today:

packages:
- package: fivetran/pinterest
  version: [">=0.5.0", "<0.6.0"]
  enabled: "{{ env_var('DBT_FIVETRAN_PINTEREST_ENABLED', True) }}"
- package: fivetran/microsoft_ads
  version: [">=0.4.0", "<0.5.0"]
  enabled: "{{ env_var('DBT_FIVETRAN_MICROSOFT_ADS_ENABLED', True) }}"
...

A simpler approach: Could this just look like the end user providing static enabled configs in their packages.yml, which might override the ones defined the ad_reporting package's packages.yml?

packages:
  # i want this
  - package: fivetran/ad_reporting
    version: [">=0.7.0", "<0.8.0"]
  # but not these
  - package: fivetran/pinterest
    enabled: False
  - package: fivetran/microsoft_ads
    enabled: False

Of course, the ad_reporting package would need to ensure it still works if those parent packages are missing. That should be effectively the same as if they're entirely disabled, but we'd want to double-check.

This might also have interesting implications for the Hub API, which captures the upstream dependencies of every Hub-registered package to aid in version resolution. I think that might be fine, but hard to say for sure.

@csoule-shaker
Copy link
Author

csoule-shaker commented Apr 21, 2022 via email

@dataders
Copy link
Contributor

extras for Python packages.

doubling down that this seems the best way to go. This way end users don't have to ignore files they won't use, they simply won't be installed!

@csoule-shaker
Copy link
Author

Hey everyone, Just wondering if there was any traction on this enhancement or if it was on the list yet?

@joellabes
Copy link
Contributor

A related use case: @christineberger is adding a new feature to dbt utils (dbt-labs/dbt-utils#624) and we wanted to be able to use a dbt-expectations test in the CI project.

We weren't able to install it from the Hub:

packages:
  - local: ../
  - package: calogica/dbt_expectations
    version: [">=0.5.0", "<0.6.0"]

because expectations also depends on utils and then there were two utils projects in play.

If we could have said "please don't install dbt-labs/dbt_utils" then we wouldn't have had to copy-paste the code over 😅

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jan 23, 2023
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deps dbt's package manager enhancement New feature or request stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

7 participants