Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid promoting inline string types larger than String31 #949

Merged
merged 2 commits into from
Dec 14, 2021
Merged

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Dec 14, 2021

Fixes #945. The core issue here is an inconsistency between the initial
type detection code and the later-on type promotion code while parsing.
The type detection code had a limit setup so that inline string column
types weren't allowed larger than String31. The promotion code,
however, allowed inline string types to continue to promote up to
String255. That means if a column type was at least initiall detected
as some smaller-than-String31 type, then later on a value larger than
31 bytes was parsed, it would continue to promote up to larger inline
string types. The change proposed in this PR is to limit the promotion
code to be more consistent with the type detection code: if a parsed
value "overflows" the String31 type, we'll just promote to a regular
String instead of promoting to larger inline string types.

Fixes #945. The core issue here is an inconsistency between the initial
type detection code and the later-on type promotion code while parsing.
The type detection code had a limit setup so that inline string column
types weren't allowed larger than `String31`. The promotion code,
however, allowed inline string types to continue to promote up to
`String255`. That means if a column type was at least initiall detected
as some smaller-than-`String31` type, then later on a value larger than
31 bytes was parsed, it would continue to promote up to larger inline
string types. The change proposed in this PR is to limit the promotion
code to be more consistent with the type detection code: if a parsed
value "overflows" the `String31` type, we'll just promote to a regular
`String` instead of promoting to larger inline string types.
@codecov
Copy link

codecov bot commented Dec 14, 2021

Codecov Report

Merging #949 (8850300) into main (879f41b) will decrease coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #949      +/-   ##
==========================================
- Coverage   90.35%   90.27%   -0.09%     
==========================================
  Files           9        9              
  Lines        2302     2303       +1     
==========================================
- Hits         2080     2079       -1     
- Misses        222      224       +2     
Impacted Files Coverage Δ
src/file.jl 95.68% <100.00%> (-0.32%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 879f41b...8850300. Read the comment docs.

Fixes #948. As noted in the included code comment, if the last expected
value to be parsed is `missing`, the delimited file may not include a
final newline, so the EOF, while abrupt, can be treated as valid. In
this specific case, we can avoid emitting the warning and just ensure
that column is marked as having a `missing` value parsed.
@quinnj quinnj merged commit d25992a into main Dec 14, 2021
@quinnj quinnj deleted the jq/945 branch December 14, 2021 06:44
@bkamins
Copy link
Member

bkamins commented Dec 14, 2021

Thank you! Are you planning to make a patch release or you are waiting for some additional changes before this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible string type detection bug
2 participants