-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add to_records() option to output NumPy string dtypes, not objects #18146
Comments
@jzwinck : Thanks for sharing this! Seems reasonable to add a parameter to |
what would i propose to name this option? |
Well in |
It's different to |
True, only threw it out there because similar names. But yeah, your suggestions are also reasonable. |
This options records dtype for string as arrays as 'Sx', where x is the length of the longest string, instead of 'O"
This options records dtype for string as arrays as 'Sx', where x is the length of the longest string, instead of 'O"
This options records dtype for string as arrays as 'Sx', where x is the length of the longest string, instead of 'O"
This options records dtype for string as arrays as 'Sx', where x is the length of the longest string, instead of 'O"
This options records dtype for string as arrays as 'Sx', where x is the length of the longest string, instead of 'O"
This options records dtype for string as arrays as 'Sx', where x is the length of the longest string, instead of 'O"
This options records dtype for string as arrays as 'Sx', where x is the length of the longest string, instead of 'O"
This options records dtype for string as arrays as 'Sx', where x is the length of the longest string, instead of 'O"
This options records dtype for string as arrays as 'Sx', where x is the length of the longest string, instead of 'O"
This options records dtype for string as arrays as 'Sx', where x is the length of the longest string, instead of 'O"
This option changes DataFrame.to_records() dtype for string arrays to 'Sx', where x is the length of the longest string, instead of 'O"
This option changes DataFrame.to_records() dtype for string arrays to 'Sx', where x is the length of the longest string, instead of 'O"
This option changes DataFrame.to_records() dtype for string arrays to 'Sx', where x is the length of the longest string, instead of 'O"
This option changes DataFrame.to_records() dtype for string arrays to 'Sx', where x is the length of the longest string, instead of 'O"
This option changes DataFrame.to_records() dtype for string arrays to 'Sx', where x is the length of the longest string, instead of 'O"
This option changes DataFrame.to_records() dtype for string arrays to 'Sx', where x is the length of the longest string, instead of 'O"
This option changes DataFrame.to_records() dtype for string arrays to 'Sx', where x is the length of the longest string, instead of 'O"
…ev#18146) This option changes DataFrame.to_records() dtype for string arrays to 'Sx', where x is the length of the longest string, instead of 'O"
Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts.
Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts.
Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts.
Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts.
Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts.
Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts.
Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts.
Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts.
Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts.
Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts.
… (#22229) * ENH: Allow fixed-length strings in df.to_records() Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes gh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts. * Add dtype parameters instead of fix-string-like The original parameter was causing a lot of acrobatics with regards to string dtypes between 2.x and 3.x. The new parameters simplify the internal logic and pass the responsibility and motivation of memory efficiency back to the users. * MAINT: Use is_dict_like in to_records More generic than checking whether our mappings are instances of dict. Expands is_dict_like check to include whether it has a __contains__ method. * TST: Add test for is_dict_like expanded def * MAINT: Address final comments
* upstream/master: REF/TST: replace capture_stdout with pytest capsys fixture (pandas-dev#24501) BUG: fix .iat assignment creates a new column (pandas-dev#24495) DOC: add checks on the returns section in the docstrings (pandas-dev#23138) (pandas-dev#23432) ENH: Add strings_as_fixed_length parameter for df.to_records() (pandas-dev#18146) (pandas-dev#22229) TST: Skip db tests unless explicitly specified in -m pattern (pandas-dev#24492) Mix EA into DTA/TDA; part of 24024 (pandas-dev#24502) DOC: Fix building of a single API document (pandas-dev#24506)
…s-dev#18146) (pandas-dev#22229) * ENH: Allow fixed-length strings in df.to_records() Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts. * Add dtype parameters instead of fix-string-like The original parameter was causing a lot of acrobatics with regards to string dtypes between 2.x and 3.x. The new parameters simplify the internal logic and pass the responsibility and motivation of memory efficiency back to the users. * MAINT: Use is_dict_like in to_records More generic than checking whether our mappings are instances of dict. Expands is_dict_like check to include whether it has a __contains__ method. * TST: Add test for is_dict_like expanded def * MAINT: Address final comments
…s-dev#18146) (pandas-dev#22229) * ENH: Allow fixed-length strings in df.to_records() Adds parameter to allow string-like columns to be cast as fixed-length string-like dtypes for more efficient storage. Closes pandas-devgh-18146. Originally authored by @qinghao1 but cleaned up by @gfyoung to fix merge conflicts. * Add dtype parameters instead of fix-string-like The original parameter was causing a lot of acrobatics with regards to string dtypes between 2.x and 3.x. The new parameters simplify the internal logic and pass the responsibility and motivation of memory efficiency back to the users. * MAINT: Use is_dict_like in to_records More generic than checking whether our mappings are instances of dict. Expands is_dict_like check to include whether it has a __contains__ method. * TST: Add test for is_dict_like expanded def * MAINT: Address final comments
DataFrame.to_records()
outputs string columns with theobject
dtype, which is sometimes not efficient (e.g. for short, similar-length strings, or when storing withnp.save()
). I wrote the following function to fix this:I suggest exposing something like this as an option in
DataFrame.to_records()
. An option to convert to Unicode ('U'
) too would be good too (NumPy's'S'
is effectivelybytes
in Python 3).The text was updated successfully, but these errors were encountered: