-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DISCUSSION: Add format parameter to .astype when converting to str dtype #17211
Comments
A big +1 to add a more convenient way to do achieve string formatting. However, IMO it might be cleaner to have it as a separate |
xref #15550 Personally I'm not a huge of the overloaded I'm not sure this is the right api, but in #15550 I suggested something like this to split out the numpy behavior and pandas overrides
|
@jorisvandenbossche , a
In addition, if we have a I find @chris-b1 's idea very, very appealing, and it is kind of similar to how An idea: could
This is very similar to how |
FWIW, the |
Sorry, closed by accident. |
From the there seemes to be support for the idea, but the API for type conversions has not been settled. This issue is primarily on a Probably the API discussion needs to be finished first, and then the |
adding parameters to
why are you wanting to add complexity to an this API? |
A The closest equivalent as far as I know would require an apply: Anyway, I understand that I'm clearly in the minority wrt. adding a |
This is pretty much a special case. |
This is something completely different. This converts the full dataframe to a string represenation, while here it is about converting values to formatted string values inside a dataframe I like having some way to do this (but the question is indeed in what kind of API), but I would also be OK to end the discussion with the decision that it is not important enough to add specialized functionality and that using the |
Yeah, Pandas does offer conversion through I agree on the benefits of agreeing on a "canonical" way for converting numbers to strings and writing up this approved method the docs, even if |
ok, given the discussion we are having on #18347. more amenable to this. |
Any update on this? (IOW has a conclusion been reached that could be implemented?) |
I don't believe so.
…On Thu, Sep 12, 2019 at 1:59 AM h-vetinari ***@***.***> wrote:
Any update on this?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17211?email_source=notifications&email_token=AAKAOIVFB4TEYGWFOTU2TFDQJHSGJA5CNFSM4DWMU3B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6Q34YQ#issuecomment-530693730>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIRXHVYB6OG5UC2MA4DQJHSGJANCNFSM4DWMU3BQ>
.
|
My two cents: if you have a mixed datatype Series >>> x = pd.Series([np.nan, 1, 2.0, "foo"])
>>> print(x)
0 NaN
1 1
2 2
3 foo
dtype: object
>>> print(x.astype(str))
0 nan
1 1
2 2.0
3 foo
dtype: object
>>> print(x.fillna("").astype(str).str.split())
0 []
1 [1]
2 [2.0]
3 [foo]
dtype: object None of the above suggestions would let me get a string column where the 2 is formatted like the int. |
I am surprised |
Probably not intentional.
…On Tue, Sep 17, 2019 at 9:20 AM kylekeppler ***@***.***> wrote:
I am surprised Series.str.format() is not a thing, was that intentional?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17211?email_source=notifications&email_token=AAKAOIVLYWRRXWJBXWJLE3DQKDRS7A5CNFSM4DWMU3B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD64WDPQ#issuecomment-532242878>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIR42YOR3H46I6ZBTM3QKDRS7ANCNFSM4DWMU3BQ>
.
|
I'd say it hasn't been intentionally omitted, but it cannot easily be added due to the way the Of course it would be hypothetically possible to delay the execution of calculating the string representation, and more than that, allow calling |
Ah, yes @h-vetinari is correct.
Typically I'd use `series.apply("{}".format)`
…On Tue, Sep 17, 2019 at 1:11 PM h-vetinari ***@***.***> wrote:
I am surprised Series.str.format() is not a thing, was that intentional?
@TomAugspurger <https://github.com/TomAugspurger>: Probably not
intentional.
I'd say it hasn't been intentionally omitted, but it cannot easily be
added due to the way the .str-accessor works. You need to have strings
*before* you're even able to call .str, so by the time you get to
.str.format(...) the question of the format has already been settled for
you (to the default).
Of course it would be hypothetically possible to delay the execution of
calculating the string representation, and more than that, allow calling
.str also for non-string columns (or rather non-object cols for lack of a
string-type), but the trend has been going in the opposite direction - as
in disabling the .str-accessor for columns that are not strings.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17211?email_source=notifications&email_token=AAKAOIQRU3LQSWDVVGBPYJ3QKEMWJA5CNFSM4DWMU3B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD65NIZI#issuecomment-532337765>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIQ4PZFTS2XVBGMVBNTQKEMWJANCNFSM4DWMU3BQ>
.
|
I propose adding a string formatting possibility to
.astype
when converting tostr
dtype: I think it's reasonable to expect that you can choose the string format when converting to a string dtype, as you're basically freezing a representation of your series, and just using.astype(str)
for this is often too crude.This possibility should take shape of a
format
parameter to.astype
, that can take a string and can only be used when converting to string dtype. This would lessen the reliance on.apply
for converting non-strings to more complex strings and make such conversions more readable (IMO) and maybe faster (as we're avoiding.apply
which is slow, though Im not too knowledgable on such optimizations).The current procedure for converting to a complex string goes like this:
I propose to make this possible:
If the
dtype
parameter is notstr
, setting of theformat
parameter should raise an exception. Ifformat
is not set, the current behaviour will be used. The proposed change is therefore backward compatible.Also to consider:
Allowing a placeholder name
Should a placeholder name be available? Then you could do:
(Note that we above have an implicit parameter on
.astype
with a default value "value", so adding a placeholder name is transparent. Note also the above behaviour is present in ser.dt.strftime, but please look at the principle rather than the concrete example).A downside to allowing a placeholder name could be the potential for abuse (stuffing too much into the format string) and possibly losing the option to vectorize (though this is not my expertize).
Adding a
.format
methodIt could also be considered adding a
.str.format
or.format
method to DataFrame/Series.If
.format
is added to the.str
namespace it would only be usable for string dataframes/series (which I'd be quite ok with, if theformat
parameter is also available on.astype
for other data types).Alternatively, such a method could be available directly on all DataFrames/Series. Then you'd do
ser.format('{:+.1f}')
rather thanser.astype(str, format='{:+.1f}')
. IMO though, it would be inconsistent to have such a string conversion method directly on pandas objects, but not for other types. Why have.format
but not.to_numeric
as a dataframes/series method?IMO therefore,
astype(str, format=...)
combined with a.str.format
method is better than adding a new.format
method for this. So:.astype(str, format=...)
makes it very obvious that we're now changing to string datatype, and.str.format(...)
makes it clear that we're doing a string manipulation.The text was updated successfully, but these errors were encountered: