-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-328] [Feature] Package Enabled Flag #4837
Comments
Thanks for the writeup @csoule-shaker. I'm not sure I understand so I'm going to need a bit of clarification on what you're asking for. Are you saying you have your
but you're only using 3 of the 5 packages you have defined so you want to be able to say
If this is the correct scenario, can you share what the need to have unused packages defined is? Are you sharing this file across multiple dbt projects? If I am not understanding correctly, an example of the problem you're trying to resolve might be hugely beneficial to me getting on the same page as you! |
Essentially you have the gist of it. This feature is less for direct end user projects and more for developers that create packages for others. As an example we want to use a package that brings together data from models generated by other packages, making these packages a dependency. This package is called ad_reporting as listed below. Once that package is installed, it also has a package.yml with all the possible dependent package in it.
I only need to use Facebook, Google, and Snapchat ad packages so they already use variables so I can turn off the other models for the other sources. Unfortunately, since the ad_reporting package could use the other sources, we get all the other packages installed as well even if we do not need them, which causes bloat in documentation, space issues, and just clutter in general.
If we were able to leverage those same variables that enable the models on the packages.yml, then we could have the end user only install the packages they need when they need them. This also gives a lot of flexibility to designing packages and I think is a natural feature to introduce.
Feature Behavior:
The default value would be true if the flag was not defined. If the flag is defined it should be able to use variables as well as true/false values.
Below I tried to detail out how I envision using tis flag in conjunction with packages that have optional dependencies.
Example:
Main Projects Project.yml file
vars:
linkedin_ads_enable: False ##Overrides package variable
twitter_ads_enable: False ##Overrides package variable
Main Projects Package.yml file
packages:
- package: dbt-labs/audit_helper
version: 0.5.0
- package: dbt-labs/codegen
version: 0.5.0
- package: dbt-labs/dbt_external_tables
version: 0.8.0
- package: dbt-labs/ad_reporting
version: 0.8.0
Ad_Reporting Project.yml file (Default package behavior)
vars:
linkedin_ads_enable: True
twitter_ads_enable: True
facebook_ads_enable: True
google_ads_enable: True
snapchat_ads_enable: True
Ad Reporting’s Package.yml file
packages:
- package: dbt-labs/facebook_ads
version: 0.5.0
enabled: var{{facebook_ads_enable}}
- package: dbt-labs/google_ads
version: 0.5.0
enabled: var{{google_ads_enable}}
- package: dbt-labs/linkedin_ads
version: 0.5.0
enabled: var{{linkedin_ads_enable}}
- package: dbt-labs/twitter_ads
version: 0.5.0
enabled: var{{twitter_ads_enable}}
- package: dbt-labs/snapchat_ads
version: 0.5.0
enabled: var{{snapchat_ads_enable}}
Chris Soule
InterWorks, Inc. | Data Architect
t: 202/834-8549
www.interworks.com<https://www.interworks.com/>
From: Emily Rockman ***@***.***>
Sent: Monday, March 14, 2022 4:48 PM
To: dbt-labs/dbt-core ***@***.***>
Cc: csoule-shaker ***@***.***>; Mention ***@***.***>
Subject: Re: [dbt-labs/dbt-core] [CT-328] [Feature] Package Enabled Flag (Issue #4837)
Thanks for the writeup @csoule-shaker<https://github.com/csoule-shaker>. I'm not sure I understand so I'm going to need a bit of clarification on what you're asking for.
Are you saying you have your packages.yml defined in your project as
packages:
- package: dbt-labs/adwords
version: 0.4.0
- package: dbt-labs/audit_helper
version: 0.5.0
- package: dbt-labs/codegen
version: 0.5.0
- package: dbt-labs/dbt_external_tables
version: 0.8.0
- package: dbt-labs/facebook_ads
version: 0.8.0
but you're only using 3 of the 5 packages you have defined so you want to be able to say
packages:
- package: dbt-labs/adwords
version: 0.4.0
enabled: true
- package: dbt-labs/audit_helper
version: 0.5.0
enabled: false
- package: dbt-labs/codegen
version: 0.5.0
enabled: true
- package: dbt-labs/dbt_external_tables
version: 0.8.0
enabled: true
- package: dbt-labs/facebook_ads
version: 0.8.0
enabled: false
If this is the correct scenario, can you share what the need to have unused packages defined is? Are you sharing this file across multiple dbt projects?
If I am not understanding correctly, an example of the problem you're trying to resolve might be hugely beneficial to me getting on the same page as you!
—
Reply to this email directly, view it on GitHub<#4837 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AWRLYJ6YRTVV3RFWY2OPC5TU76QY7ANCNFSM5QEEIE4A>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
[CT-328]: https://dbtlabs.atlassian.net/browse/CT-328?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
|
@emmyoop - there are packages that contain many other packages - the packages:
- package: fivetran/pinterest
version: [">=0.5.0", "<0.6.0"]
- package: fivetran/microsoft_ads
version: [">=0.4.0", "<0.5.0"]
- package: fivetran/linkedin
version: [">=0.4.0", "<0.5.0"]
- package: fivetran/google_ads
version: [">=0.6.0", "<0.7.0"]
- package: fivetran/tiktok_ads
version: [">=0.1.0", "<0.2.0"]
- package: fivetran/twitter_ads
version: [">=0.4.0", "<0.5.0"]
- package: fivetran/facebook_ads
version: [">=0.4.0", "<0.5.0"]
- package: fivetran/snapchat_ads
version: [">=0.3.0", "<0.4.0"] Even if a user may only need to use the packages:
- package: fivetran/ad_reporting
version: [">=0.7.0", "<0.8.0"]
include_only: ["google_ads", "facebook_ads"] Which would make For the record, the on-disk install size when you try to use Generally this is not an issue but the dbt Cloud IDE doesn't play nicely when you have such a large on-disk There's probably a wider conversation to be had here around not including large files (csv / json) when using dbt packages which is separate to this particular ask ("allow users to specify what to include / exclude when packages contain other sub-packages") but thought I'd add this to provide a bit more colour to the motivation for this particular issue. |
@csoule-shaker and @jeremyyeo thank you for all the clarification. I'm on the same page as you now! I think @jeremyyeo is very close to the real issue with
I think now is the time to have that conversation! If we could cut down on package size by excluding those csv/json files, that would solve the real problem here. Definitely speak up if there a reason other than package size to exclude/include specific packages via an enabled flag! Banning the use of these large files in packages isn't the right answer. The The right answer is to have a way to mark these files/directories as not needing to be downloaded as part of the package you're including in your project (or a dependency of that project). In all honestly, I don't actually know the exact solution here. It could involve how we handle packages in the dbt hub, how we cache dependancies or how we handle installation within |
Lert me start by saying I definitely see that there is still a reason to still exclude packages using a enabled flag.
Where there is an issue in dbt Cloud IDE not handling large packages/projects due to this specific issue, that does not invalidate the need for the enable flag feature or the benefits of it.
I wanted to keep these two things separate due to the confusion, as I am not trying to solve for dbt Cloud with this feature request, but for giving the package ecosystem a much needed option for allowing end user to not clutter their projects with optional dependent packages they might not use. As an end user of the ad_reporting package myself for a client, it is hard to tout dbt great documentation feature and then tell them to ignore all the other packages that got installed due to this one package even though they are only using 3-8.
I think this feature would be relatively easy to implement and help this aspect of the problem.
For the other aspect of large files in a dbt package, I like the .dbtignore feature for selecting files folders to not include in the package downloads in order to help minimize the file size, but also think with the growth of dbt and dbt packages, we would still need to find a way to handle large project sizes in the IDE if we want the adoption rate to continue. It was a challenge to work in dbt Cloud when it could not load the folder paths for clients.
I do want to re-iterate though. The feature I am requesting with this is not to address the dbt Cloud IDE issue regarding a projects size, even though it is what led me to it. I still think the need is their for package developers to give user an option to not download the optional dependencies of a higher level package if they do not need them. Package providers are already using optional packages in higher package and as of right now the users are forced to download them and have them as part of their build and in their documentation because there is a lack of a feature to handle this.
I would gladly be part of a separate ticket addressing the dbt Cloud IDE issue on Storage size of a project, but this is a separate feature.
Chris Soule
InterWorks, Inc. | Data Architect
t: 202/834-8549
www.interworks.com<https://www.interworks.com/>
From: Emily Rockman ***@***.***>
Sent: Tuesday, March 15, 2022 3:32 PM
To: dbt-labs/dbt-core ***@***.***>
Cc: csoule-shaker ***@***.***>; Mention ***@***.***>
Subject: Re: [dbt-labs/dbt-core] [CT-328] [Feature] Package Enabled Flag (Issue #4837)
@csoule-shaker<https://github.com/csoule-shaker> and @jeremyyeo<https://github.com/jeremyyeo> thank you for all the clarification. I'm on the same page as you now!
I think @jeremyyeo<https://github.com/jeremyyeo> is very close to the real issue with
There's probably a wider conversation to be had here around not including large files (csv / json) when using dbt packages
I think now is the time to have that conversation! If we could cut down on package size by excluding those csv/json files, that would solve the real problem here. Definitely speak up if there a reason other than package size to exclude/include specific packages via an enabled flag!
Banning the use of these large files in packages isn't the right answer. The ad_reporting files here<https://github.com/fivetran/dbt_ad_reporting/tree/main/docs> are what allow the docs to exist here<https://fivetran.github.io/dbt_ad_reporting/#!/overview> which is pretty neat. It's powerful to be able to navigate the package this way without having it installed. The csv files in the project are for integration tests which are another good practice we definitely don't want to discourage or make difficult.
The right answer is to have a way to mark these files/directories as not needing to be downloaded as part of the package you're including in your project (or a dependency of that project). .dbtignore anyone? 😉
In all honestly, I don't actually know the exact solution here. It could involve how we handle packages in the dbt hub, how we cache dependancies or how we handle installation within dbt-core itself. This is all very worthwhile work to solve the underlying issue that we will need to get prioritized.
—
Reply to this email directly, view it on GitHub<#4837 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AWRLYJ4UGCM2DDMPXKLHWYLVADQRLANCNFSM5QEEIE4A>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
[CT-328]: https://dbtlabs.atlassian.net/browse/CT-328?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
|
@csoule-shaker you're right about the separation of issues here - moved that to #4868 so that we can have that discussion separately. |
This is great feedback, @csoule-shaker. I obviously missed that detail in the previous replies. @jeremyyeo opened #4868 (thank you!) to track what I discussed above. I'll leave this one open to track any conversation more directly related to flags to enable specific projects. There's a complexity with the current architecture where vars are not available in the |
Currently command line vars are available in packages.yml (though there seems to be a bug related to that too) but the vars in the dbt_project file are not available. This is an issue that comes up pretty often in one way or another... I'm wondering if it might not make sense to move vars out into a separate file (vars.yml). Then we could process it before dbt_project.yml, selectors.yml, and packages.yml, and substitute vars. Or we could look for them separately and combine them with cli_vars and achieve a similar thing. @jtcohen6 I know we've discussed this before , but I think it was in a "might be nice" context. It probably wouldn't be too hard to pull out. And the work that's being done to separate out dbt_project processing might help too. |
I missed these comments a month ago. The extra context helps a lot! There should be some way to perform "partial" import of a big package such as
One way of doing this, which we're describing above, is by having the packages:
- package: fivetran/pinterest
version: [">=0.5.0", "<0.6.0"]
enabled: "{{ env_var('DBT_FIVETRAN_PINTEREST_ENABLED', True) }}"
- package: fivetran/microsoft_ads
version: [">=0.4.0", "<0.5.0"]
enabled: "{{ env_var('DBT_FIVETRAN_MICROSOFT_ADS_ENABLED', True) }}"
... A simpler approach: Could this just look like the end user providing static packages:
# i want this
- package: fivetran/ad_reporting
version: [">=0.7.0", "<0.8.0"]
# but not these
- package: fivetran/pinterest
enabled: False
- package: fivetran/microsoft_ads
enabled: False Of course, the This might also have interesting implications for the Hub API, which captures the upstream dependencies of every Hub-registered package to aid in version resolution. I think that might be fine, but hard to say for sure. |
Hi Jeremy,
Thanks for taking this up again.
I still think the first method you described would be the best approach here. Putting my end user hat on, it doesn’t require me to break out a bunch of dependent packages that are already listed inside the monolithic package I’m trying to utilize. I know the Fivetran package is what brought this up but also, wanted to put out their that this applies to other vendors ability to produce these large packages as well and not have a bunch a bloat come with it if the consumer is not using certain pieces.
The Ad_reporting package as of today already utilizes the concept of turning off certain models that the client doesn’t need by utilizing variables in their project that are enabled by default and their documentation tells you to add the same variables in your main project.yml and set the value to disabled if you do not need them. So this pattern has already been established and works quite well for models. I think it would be a relatively easy lift for the vendor to incorporate a similar approach to the dependent packages as well.
As for the env_var flags controlling the enabled flag (I didn’t think packages had the ability to use this flag yet either though) that is a good work around or interm approach to the full solution granted the packages.yml file supports the enabled flag. I still think the end result should be to have the project.yml be scanned for variables first, or create a new yml that supports just variables as someone else mentioned, that you can define in the IDE itself (thinking dbt Cloud).
“having the ad_reporting package define its dependencies dynamically”
packages:
- package: fivetran/pinterest
version: [">=0.5.0", "<0.6.0"]
enabled: "{{ env_var('DBT_FIVETRAN_PINTEREST_ENABLED', True) }}"
- package: fivetran/microsoft_ads
version: [">=0.4.0", "<0.5.0"]
enabled: "{{ env_var('DBT_FIVETRAN_MICROSOFT_ADS_ENABLED', True) }}"
...
…________________________________
A simpler approach: Could this just look like the end user providing static enabled configs in their packages.yml, which might override the ones defined the ad_reporting package's packages.yml?
This approach would not be ideal for some of the reasons you mentioned. This takes the responsibility out of the Package vendors hands to support it’s own dependencies and could create more headaches for end users in the end. The vendors should be able to decide which dependent packages are variable and can be turned off, and the above method seems to work around that.
|
doubling down that this seems the best way to go. This way end users don't have to ignore files they won't use, they simply won't be installed! |
Hey everyone, Just wondering if there was any traction on this enhancement or if it was on the list yet? |
A related use case: @christineberger is adding a new feature to dbt utils (dbt-labs/dbt-utils#624) and we wanted to be able to use a dbt-expectations test in the CI project. We weren't able to install it from the Hub: packages:
- local: ../
- package: calogica/dbt_expectations
version: [">=0.5.0", "<0.6.0"] because expectations also depends on utils and then there were two utils projects in play. If we could have said "please don't install dbt-labs/dbt_utils" then we wouldn't have had to copy-paste the code over 😅 |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Is there an existing feature request for this?
Describe the Feature
the ability to set an enabled flag in the packages.yml file to have dbt deps skip that package. The flag should also be able to be set using a variable.
Describe alternatives you've considered
Thought about using a Macro to iterate over a list variable and creating the package code and placing the macro in the packages.yml file, but deps doesn't seem to support this either.
Who will this benefit?
Some packages reference dependent packages that are only used if you enable them through a variable in your project.yml. If a higher level packages references 10 dependents and I am only using 3, being able to set enabled flags using variables would greatly cur down on project clutter, unnecessary documentation in the docs, and reduce project size in the IDE and at runtime.
This recently caused errors in dbt Cloud due to the IDE not able to handle 50MB of files and resulted in not being able to display the folder/file lists.
Are you interested in contributing this feature?
I'm not sure how deps code works, but I can definitely consult on the solution.
Anything else?
Should be fairly simple to parse the flag out of the project,yml and use it as a filter in the function that lists the packages to install.
This would also be a great feature for package developer ecosystem.
The text was updated successfully, but these errors were encountered: