Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consecutive Tab Delimiters Collapsed To A Single Character #84

Closed
jetzerb opened this issue Oct 10, 2019 · 3 comments
Closed

Consecutive Tab Delimiters Collapsed To A Single Character #84

jetzerb opened this issue Oct 10, 2019 · 3 comments
Labels

Comments

@jetzerb
Copy link
Contributor

jetzerb commented Oct 10, 2019

When querying a tab-delimited file, if there's an empty column, the rest of the row shifts over one position. Works fine as long as there is data in every column.

echo 'a,b,c,d 1,2,3,4 5,,,8 9,A,B,C' | sed 's/,/\t/g; s/ /\n/g;' |trdsql -ih -id '\t' -oat 'select * from -'
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
| 1 | 2 | 3 | 4 |
| 5 | 8 |   |   | <-- value from column "d" is output under column "b"
| 9 | A | B | C |
+---+---+---+---+

For completeness, I tested all the ASCII values 1-126 as delimiters. The problem only occurs for a few values:

  • 9 (Horizontal Tab)
  • 11 (Vertical Tab)
  • 12 (Form Feed)
  • 32 (Space)
for ord in $(seq 1 126)
do
	[ $ord -eq 10 ] && continue; # linefeed
	[ $ord -eq 13 ] && continue; # carriage return
	[ $ord -eq 34 ] && continue; # double quote
	[ $ord -eq 39 ] && continue; # single quote
	[ $ord -eq 49 ] && continue; # the number 1 (our data below)

	delim=$(printf \\$(printf '%03o' $ord));
	[ $ord -eq 92 ] && delim="\\$delim";

	sedDelim=$delim;
	[ "$delim" = "/" ] && sedDelim="\/";
	[ "$delim" = "&" ] && sedDelim="\&";

	output=$(
		echo "1,,1" | sed "s/,/$sedDelim/g;" |
		trdsql -id "$delim" 'select * from -';
	);

	[ "$output" = "1,,1" ] || echo "$output for ord $ord";
done;
1,1 for ord 9
1,1 for ord 11
1,1 for ord 12
1,1 for ord 32

While I've never encountered a file delimited by VT, FF or Space, I regularly encounter tab delimited files, so this is a fairly important issue.

@noborus
Copy link
Owner

noborus commented Oct 11, 2019

Thank you for a good issue.

I think this is because TrimLeadingSpace is set to true.
This removed unicode.IsSpace as well as space.
This is not the intended behavior and will be fixed.

@noborus noborus added the bug label Oct 11, 2019
@jetzerb
Copy link
Contributor Author

jetzerb commented Oct 11, 2019

That fixed the issue for all delimiters except space. But again, I don't think I've ever encountered a space-delimited file, so probably not that big a deal.

@noborus
Copy link
Owner

noborus commented Oct 11, 2019

Thank you for your reply.
I cannot satisfy everyone, but I will merge pr#86.

noborus added a commit that referenced this issue Oct 11, 2019
Change TrimLeadingSpace set conditions (Fix #84).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants