-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible string type detection bug #945
Comments
@quinnj I have the same problem in https://github.com/bkamins/PyDataGlobal2020. See file https://github.com/bkamins/PyDataGlobal2020/blob/main/police.ipynb, cell [4]. We create |
quinnj
added a commit
that referenced
this issue
Dec 14, 2021
Fixes #945. The core issue here is an inconsistency between the initial type detection code and the later-on type promotion code while parsing. The type detection code had a limit setup so that inline string column types weren't allowed larger than `String31`. The promotion code, however, allowed inline string types to continue to promote up to `String255`. That means if a column type was at least initiall detected as some smaller-than-`String31` type, then later on a value larger than 31 bytes was parsed, it would continue to promote up to larger inline string types. The change proposed in this PR is to limit the promotion code to be more consistent with the type detection code: if a parsed value "overflows" the `String31` type, we'll just promote to a regular `String` instead of promoting to larger inline string types.
quinnj
added a commit
that referenced
this issue
Dec 14, 2021
Fixes #945. The core issue here is an inconsistency between the initial type detection code and the later-on type promotion code while parsing. The type detection code had a limit setup so that inline string column types weren't allowed larger than `String31`. The promotion code, however, allowed inline string types to continue to promote up to `String255`. That means if a column type was at least initiall detected as some smaller-than-`String31` type, then later on a value larger than 31 bytes was parsed, it would continue to promote up to larger inline string types. The change proposed in this PR is to limit the promotion code to be more consistent with the type detection code: if a parsed value "overflows" the `String31` type, we'll just promote to a regular `String` instead of promoting to larger inline string types.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@quinnj it seems that in some cases
CSV.read
createsString63
columns under default settings (and AFAICT it should not -String
should be used in such a case as we are using InlineStrings.jl up toString31
by default).Here is a notebook https://github.com/JuliaAcademy/DataFrames/blob/main/3.%20Working%20with%20text%20files.ipynb reproducing the issue. See cell [12] for the result and cell [10] for the command I use to read in the data. To get the data use the code in cell [5].
The text was updated successfully, but these errors were encountered: