Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected string->float conversion in DataFrame.groupby().apply() #15421

Closed
yegulalp opened this issue Feb 16, 2017 · 1 comment
Closed

Unexpected string->float conversion in DataFrame.groupby().apply() #15421

yegulalp opened this issue Feb 16, 2017 · 1 comment
Labels
Bug Duplicate Report Duplicate issue or pull request
Milestone

Comments

@yegulalp
Copy link

yegulalp commented Feb 16, 2017

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd
df = pd.DataFrame({'A':[10, 20, 30], 'B': [ 'foo', '3', '4'], 'T': [pd.Timestamp("12:31:22")]*3})
def get_B(g):
    return g.iloc[0][['B']]
print df.groupby('A').apply(get_B)

# Observed output:
      B
A      
10  NaN
20  3.0
30  4.0

Problem description

groupby.apply() does an unexpected conversion from string to float for column 'B' in the example above. The bug is triggered only when both of the following happen:

  1. A column ('B' in the example above) has string values, some of which are parseable as numbers and some which are not.
  2. Another column ('T' in the example above) in the dataframe has timestamps.

Expected Output

      B
A      
10  foo
20  3
30  4

Output of pd.show_versions()

# Paste the output here pd.show_versions() here

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8.1
boto: 2.45.0
pandas_datareader: 0.2.1

@jorisvandenbossche
Copy link
Member

@yegulalp Thanks for the report!

This is a duplicate of #14849, and also related to #14423

Contributions to try to fix this are always welcome!

@jorisvandenbossche jorisvandenbossche added Bug Duplicate Report Duplicate issue or pull request labels Feb 16, 2017
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Feb 16, 2017
gwpdt added a commit to gwpdt/pandas that referenced this issue Mar 16, 2017
Rename test_numeric_coercion to
test_apply_numeric_coercion_when_datetime, and add tests for GH pandas-dev#15421
and pandas-dev#14423
@jreback jreback modified the milestones: 0.20.0, No action Mar 16, 2017
jreback pushed a commit that referenced this issue Mar 16, 2017
closes #14423
closes #15421
closes #15670

During a group-by/apply
on a DataFrame, in the presence of one or more  DateTime-like columns,
Pandas would incorrectly coerce the type of all  other columns to
numeric.  E.g. a String column would be coerced to  numeric, producing
NaNs.

Author: Greg Williams <[email protected]>

Closes #15680 from gwpdt/bugfix14423 and squashes the following commits:

e1ed104 [Greg Williams] TST: Rename and expand test_numeric_coercion
0a15674 [Greg Williams] CLN: move import, add whatsnew entry
c8844e0 [Greg Williams] CLN: PEP8 (whitespace fixes)
46d12c2 [Greg Williams] BUG: Group-by numeric type-coericion with datetime
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
closes pandas-dev#14423
closes pandas-dev#15421
closes pandas-dev#15670

During a group-by/apply
on a DataFrame, in the presence of one or more  DateTime-like columns,
Pandas would incorrectly coerce the type of all  other columns to
numeric.  E.g. a String column would be coerced to  numeric, producing
NaNs.

Author: Greg Williams <[email protected]>

Closes pandas-dev#15680 from gwpdt/bugfix14423 and squashes the following commits:

e1ed104 [Greg Williams] TST: Rename and expand test_numeric_coercion
0a15674 [Greg Williams] CLN: move import, add whatsnew entry
c8844e0 [Greg Williams] CLN: PEP8 (whitespace fixes)
46d12c2 [Greg Williams] BUG: Group-by numeric type-coericion with datetime
mattip pushed a commit to mattip/pandas that referenced this issue Apr 3, 2017
closes pandas-dev#14423
closes pandas-dev#15421
closes pandas-dev#15670

During a group-by/apply
on a DataFrame, in the presence of one or more  DateTime-like columns,
Pandas would incorrectly coerce the type of all  other columns to
numeric.  E.g. a String column would be coerced to  numeric, producing
NaNs.

Author: Greg Williams <[email protected]>

Closes pandas-dev#15680 from gwpdt/bugfix14423 and squashes the following commits:

e1ed104 [Greg Williams] TST: Rename and expand test_numeric_coercion
0a15674 [Greg Williams] CLN: move import, add whatsnew entry
c8844e0 [Greg Williams] CLN: PEP8 (whitespace fixes)
46d12c2 [Greg Williams] BUG: Group-by numeric type-coericion with datetime
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants