-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support configurable delimiter for seed files, default to comma (#3990) #7186
Support configurable delimiter for seed files, default to comma (#3990) #7186
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ramonvermeulen! I think we'd want a functional test for this as well, guaranteeing that a CSV with a nonstandard (non-comma) delimiter actually works.
From my POV, this doesn't feel super high-priority — dbt's seed
functionality is intentionally limited / narrow in scope, and I hesitate before adding more, given the added maintenance burden — but this is a very straightforward & self-contained change. TSVs are common enough in the wild.
core/dbt/clients/agate_helper.py
Outdated
@@ -135,12 +135,12 @@ def as_matrix(table): | |||
return [r.values() for r in table.rows.values()] | |||
|
|||
|
|||
def from_csv(abspath, text_columns): | |||
def from_csv(abspath, text_columns, delimiter = ","): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Should we add support for more than just delimiter
here? Any/all **kwargs
supported by agate.from_csv
/ Python's stdlib CSV reader?
Thoughts:
- That can be out of scope for now / for this PR
- It would risk locking us further into using
agate
(which I don't love), or at least the Python stdlibcsv
reader (which I don't mind as much)
Hi @jtcohen6, sorry for the late reply, I had a week of vacation. The reason I made this PR is because I encountered customers delivering there "csv" files with for example pipes or tabs as delimiter. Maybe because that is a standard within their organization. I thought it would be nice to have some more control about the configuration of the csv parser for the seed files, but I get your point about that it will increase the maintenance.
I find this a hard one, it is possible to add support for all the parameters of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @ramonvermeulen!
Co-authored-by: Cor <[email protected]>
Conflicts: test/unit/test_contracts_graph_parsed.py
@ramonvermeulen I am rather interested in this PR, is there anything I can do to help get it over the line? |
Hi @eliwoods, I think functional tests are still required to get this PR over the line @jtcohen6? I've been quite busy lately so feel free to pick this up / contribute on this as you wish! |
@jtcohen6 @eliwoods @JCZuurmond I added a couple of functional tests for this feature, and fixed the failing seed related integration tests. Only one integration test called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramonvermeulen Thank you for putting all this work it! I've looked it over and pulled it down for some manual testing. Every looks great ✨ Gonna run it through our CI workflow, and then we should be good to go 🙂
Eek, foiled by mypy. Perhaps I should have run the So there are two classes of mypy issues we're seeing. One is some simple formatting changes, which are simple fixes. The other is the errors like |
merge main into configurableDelimiterSeeds
Thanks for taking a look at my PR! Just pushed a commit merging the upstream |
Absolutely! Thank you for the quick turn around with the fixes. Also, sorry that it took me so long to get to my initial review. It's been a busier few weeks for me than I anticipated 😅 I'm gonna kick off the CI again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approving 🙂
Codecov Report
@@ Coverage Diff @@
## main #7186 +/- ##
=======================================
Coverage 86.22% 86.22%
=======================================
Files 174 174
Lines 25515 25517 +2
=======================================
+ Hits 22000 22002 +2
Misses 3515 3515
|
Installed the pre-release version and trying this in my
|
Hmmm, not sure what is going wrong. Currently I am leaving for holidays for a week, but will definitely look into this once I am back. |
@saviorand I created a new dbt project using
On location
Works without problems On location
Also works without problems Could you provide more info in reproducing this issue, maybe I am missing something? |
Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#4126 |
Great feature!!. Just one point: What about TSV files? In that case, the delimiter would be |
Good edge case actually that I didn't think about yet, I think as of now it works for any 1 character delimiter. I am currently on a longer break, but might look into this once I get back to see if I can add support for tab delimited files as well. Small Update: when I take a look at the from_csv docs from agate I would expect that it would work with |
resolves #3990
Original description
Added support to configure a custom delimiter for seed files, defaults to comma.
Copilot description
This pull request introduces several changes related to seed files and configuration options. The most important changes include adding new test classes and methods to test different scenarios related to seed files with unique delimiters, adding a new configuration option for specifying a delimiter for seed files, and modifying various functions to support the new delimiter configuration option.
Summary of changes:
tests/adapter/dbt/tests/adapter/simple_seed/test_seed.py
: Added a new test class and several test methods to test seed files with unique delimiters. [1] [2] [3]core/dbt/contracts/graph/model_config.py
: Added a new configuration optiondelimiter
to theSeedConfig
class.tests/functional/artifacts/expected_manifest.py
: Added a new configuration optiondelimiter
to a function inexpected_manifest.py
.tests/unit/test_contracts_graph_parsed.py
: Added new configuration optionsdelimiter
to different functions. [1] [2] [3]Other changes:
tests/adapter/dbt/tests/adapter/simple_seed/fixtures.py
: Added a new model to the list of fixtures..changes/unreleased/Features-20230714-202445.yaml
: Added support for configuring a delimiter for a seed file.tests/functional/list/test_list.py
: Added a new configuration option to a function.core/dbt/context/providers.py
: Added a new parameter to a function.core/dbt/clients/agate_helper.py
: Added a new parameter to a function.Checklist
changie new
to create a changelog entry