You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importpandasaspdfromdatetimeimportdatetimedata= [
{
't': datetime(2017,6,1,0),
'x': 1.0,
'y': 2.0
},
{
't': datetime(2017,6,1,1),
'x': 2.0,
'y': 2.0
},
{
't': datetime(2017,6,1,2),
'x': 3.0,
'y': 1.5
}
]
df=pd.DataFrame(data)
df=df.set_index('t')
# Perform a resample is get a binned time series DataFrame... this works finets=df.resample('30T').agg({'x':['mean'],'y':['median']})
printts['x'].shape# (5,1)# What if I put a field in there that doesn't exist? ts=df.resample('30T').agg({'x':['mean'],'y':['median'],'z':['sum']})
printts['x'].shape# (5,2) ??? I don't understand why the shape isn't (5,1)printts['x'].values#[[ 1. 2. ]# [ nan nan]# [ 2. 2. ]# [ nan nan]# [ 3. 1.5]]# Looks like a copy of the full aggregation even though I only requested the 'x' column# Furthermore, now ts['z'] exists
I am using pandas on records from an Elasticsearch database. The queries are pulling from multiple indices with overlapping column/field names. Most records will have most of their data in common, but when available, I want to know the values of other fields. I'm essentially creating a time series for each column with a specific time-based binning and a per-column aggregation.
I think this should either ignore columns that don't exist or raise an exception. If it silently ignores these columns, ts['x'] should return the same result as the first example. I spent several hours on a workaround today that required that I check for which columns were available for each aggregation and remove those from my agg dictionary. I feel like the current behavior doesn't have a purpose, but perhaps I'm missing something.
Code Sample, a copy-pastable example if possible
Output:
(5, 1)
[[ 1.]
[ nan]
[ 2.]
[ nan]
[ 3.]]
(5, 2)
[[ 1. 2. ]
[ nan nan]
[ 2. 2. ]
[ nan nan]
[ 3. 1.5]]
Problem description
I am using pandas on records from an Elasticsearch database. The queries are pulling from multiple indices with overlapping column/field names. Most records will have most of their data in common, but when available, I want to know the values of other fields. I'm essentially creating a time series for each column with a specific time-based binning and a per-column aggregation.
I think this should either ignore columns that don't exist or raise an exception. If it silently ignores these columns,
ts['x']
should return the same result as the first example. I spent several hours on a workaround today that required that I check for which columns were available for each aggregation and remove those from my agg dictionary. I feel like the current behavior doesn't have a purpose, but perhaps I'm missing something.Expected Output
OR
Output of
pd.show_versions()
pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 0.9.8
Cython: None
numpy: 1.11.3
scipy: 0.18.1
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 1.5
pytz: 2012d
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: