Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delimiters in properties.yml doesn't work correctly #1119

Closed
2 tasks done
salimmoulouel opened this issue Feb 14, 2024 · 7 comments · Fixed by #1122
Closed
2 tasks done

delimiters in properties.yml doesn't work correctly #1119

salimmoulouel opened this issue Feb 14, 2024 · 7 comments · Fixed by #1122
Labels
bug Something isn't working

Comments

@salimmoulouel
Copy link
Contributor

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

when i put

delimiter : "|"

Expected Behavior

create table from seed

Steps To Reproduce

put delimiter "|" in the properties.yml

Relevant log output

Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 0; errors: 5; max bad: 0; error percent: 0
  Error while reading data, error message: CSV table references column position 8, but line contains only 1 columns.; line_number: 2 byte_offset_to_start_of_line: 85 column_index: 8 column_name: "timestamp" column_type: STRING
  Error while reading data, error message: CSV table references column position 8, but line contains only 1 columns.; line_number: 3 byte_offset_to_start_of_line: 204 column_index: 8 column_name: "timestamp" column_type: STRING
  Error while reading data, error message: CSV table references column position 8, but line contains only 1 columns.; line_number: 4 byte_offset_to_start_of_line: 311 column_index: 8 column_name: "timestamp" column_type: STRING
  Error while reading data, error message: CSV table references column position 8, but line contains only 1 columns.; line_number: 5 byte_offset_to_start_of_line: 413 column_index: 8 column_name: "timestamp" column_type: STRING
  Error while reading data, error message: CSV table references column position 8, but line contains only 1 columns.; line_number: 6 byte_offset_to_start_of_line: 515 column_index: 8 column_name: "timestamp" column_type: STRING
  You are loading data without specifying data format, data will be treated as CSV format by default. If this is not what you mean, please specify data format by --source_format.
15:25:31  
15:25:31  Done. PASS=3 WARN=0 ERROR=1 SKIP=0 TOTAL=4

Environment

- OS:
- Python:
- dbt: 1.7.7

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

@salimmoulouel salimmoulouel added bug Something isn't working triage labels Feb 14, 2024
@dbeatty10
Copy link
Contributor

Thanks for reporting this @salimmoulouel !

Could you provide more code details to help us reproduce what you are seeing?

For example, something like this:

seeds/my_seed.csv

col_a|col_b|col_c
1|2|3
4|5|6

seeds/properties.yml

seeds:
  - name: my_seed
    config: 
      delimiter: "|"

@salimmoulouel
Copy link
Contributor Author

salimmoulouel commented Feb 23, 2024

Hello, Thank you for replying.
I tried with this configuration and it doesn't accept my delimiter, I have all the fields as a global field
dbt-core==1.7.8
dbt-bigquery==1.7.2

properties.yml

version: 2

seeds:
  - name: test_delimiter
    config:
      delimiter: ","
test_delimiter.csv
col_a|col_b|col_c
1|2|3
4|5|6

the resut i have is
image

i commented the MR which is supposed to solve the problem here
dbt-labs/docs.getdbt.com#4265 (comment)

When i use | as delimiter in the properties.yml

properties.yml

version: 2

seeds:
  - name: test_delimiter
    config:
      delimiter: "|"

i get

Runtime Error in seed test_delimiter (seeds/seed_delimiter/test_delimiter.csv)
  Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 0; errors: 2; max bad: 0; error percent: 0
  Error while reading data, error message: CSV table references column position 2, but line contains only 1 columns.; line_number: 2 byte_offset_to_start_of_line: 18 column_index: 2 column_name: "col_c" column_type: INT64
  Error while reading data, error message: CSV table references column position 2, but line contains only 1 columns.; line_number: 3 byte_offset_to_start_of_line: 24 column_index: 2 column_name: "col_c" column_type: INT64
  You are loading data without specifying data format, data will be treated as CSV format by default. If this is not what you mean, please specify data format by --source_format.

@graciegoheen
Copy link

I was able to reproduce this bug.

col_a|col_b|col_c
1|2|3
4|5|6

When I configure my seed delimiter in a properties yml file:

seeds:
  - name: test_delimiter
    config:
      delimiter: "|"

I get a failure:

15:49:06 Completed with 1 error and 0 warnings:
15:49:06   Database Error in seed test_delimiter (seeds/test_delimiter.csv)
  001003 (42000): SQL compilation error:
  syntax error line 1 at position 58 unexpected '|'.

However, I am able to configure this in my dbt_project.yml file:

seeds:
  coalesce_ci_demo_2023:
    test_delimiter:
      +delimiter: "|"

and get a successful dbt seed:

15:50:04 Finished running 1 seed in 0 hours 0 minutes and 4.33 seconds (4.33s).
15:50:04 Completed successfully

We should either update the docs or allow folks to configure the delimiter in the properties yml file.

@graciegoheen
Copy link

We're only able to reproduce this issue in bigquery, all other adapters seem to be working fine. I'm going to transfer this over to dbt-bigquery

@graciegoheen graciegoheen transferred this issue from dbt-labs/dbt-core Feb 23, 2024
@dbeatty10 dbeatty10 removed the triage label Feb 23, 2024
@salimmoulouel
Copy link
Contributor Author

I opened a PR, it's my first PR, if you have any comments or suggestion

@salimmoulouel
Copy link
Contributor Author

Hope you're doing well! I wanted to chat about some merge requests I've submitted to dbt. It's been over Two months since I sent them in, and I haven't heard a peep back. Any chance you could give me the lowdown on what's up with them?

I get that running an open-source project can be hectic, and I totally respect the hustle. But it's a bit frustrating not knowing where my contributions stand. I've put some real effort into these requests and would love to see them get some love.

If there's anything I need to tweak or if my requests aren't quite hitting the mark, I'm all ears. Just looking for some clarity so I can get back in the game!

Thanks a bunch for your time, and looking forward to hearing from you soon.

@dataders
Copy link
Contributor

@salimmoulouel I responded to your Community Slack post

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants