Skip to content

Commit

Permalink
Merge tag 'v0.10.1' into debian
Browse files Browse the repository at this point in the history
Version 0.10.1

* tag 'v0.10.1': (195 commits)
  RLS: set released to true
  RLS: Version 0.10.1
  TST: skip problematic xlrd test
  Merging in MySQL support pandas-dev#2482
  Revert "Merging in MySQL support pandas-dev#2482"
  BUG: don't let np.prod overflow int64
  RLS: note changed return type in DatetimeIndex.unique
  RLS: more what's new for 0.10.1
  RLS: some what's new for 0.10.1
  API: restore inplace=TRue returns self, add FutureWarnings. re pandas-dev#1893
  Merging in MySQL support pandas-dev#2482
  BUG: fix python 3 dtype issue
  DOC: fix what's new 0.10 doc bug re pandas-dev#2651
  BUG: fix C parser thread safety. verify gil release close pandas-dev#2608
  BUG: usecols bug with implicit first index column. close pandas-dev#2654
  BUG: plotting bug when base is nonzero pandas-dev#2571
  BUG: period resampling bug when all values fall into a single bin. close pandas-dev#2070
  BUG: fix memory error in sortlevel when many multiindex levels. close pandas-dev#2684
  STY: CRLF
  BUG: perf_HEAD reports wrong vbench name when an exception is raised
  ...
  • Loading branch information
yarikoptic committed Jan 22, 2013
2 parents 88119b2 + 31ecaa9 commit 9201e79
Show file tree
Hide file tree
Showing 199 changed files with 10,719 additions and 5,707 deletions.
8 changes: 6 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ language: python
python:
- 2.6
- 2.7
- 3.1 # travis will soon EOL this
# - 3.1 # travis EOL
- 3.2
- 3.3

Expand All @@ -15,6 +15,8 @@ matrix:
include:
- python: 2.7
env: VBENCH=true
- python: 2.7
env: LOCALE_OVERRIDE="zh_CN.GB18030" # simplified chinese
- python: 2.7
env: FULL_DEPS=true
- python: 3.2
Expand Down Expand Up @@ -45,8 +47,10 @@ before_install:
install:
- echo "Waldo2"
- ci/install.sh
- ci/print_versions.py # not including stats

script:
- echo "Waldo3"
- ci/script.sh

after_script:
- ci/print_versions.py
132 changes: 132 additions & 0 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,138 @@ Where to get it
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
* Documentation: http://pandas.pydata.org

pandas 0.10.1
=============

**Release date:** 2013-01-22

**New features**

- Add data inferface to World Bank WDI pandas.io.wb (#2592)

**API Changes**

- Restored inplace=True behavior returning self (same object) with
deprecation warning until 0.11 (GH1893_)
- ``HDFStore``
- refactored HFDStore to deal with non-table stores as objects, will allow future enhancements
- removed keyword ``compression`` from ``put`` (replaced by keyword
``complib`` to be consistent across library)
- warn `PerformanceWarning` if you are attempting to store types that will be pickled by PyTables

**Improvements to existing features**

- ``HDFStore``

- enables storing of multi-index dataframes (closes GH1277_)
- support data column indexing and selection, via ``data_columns`` keyword in append
- support write chunking to reduce memory footprint, via ``chunksize``
keyword to append
- support automagic indexing via ``index`` keywork to append
- support ``expectedrows`` keyword in append to inform ``PyTables`` about
the expected tablesize
- support ``start`` and ``stop`` keywords in select to limit the row
selection space
- added ``get_store`` context manager to automatically import with pandas
- added column filtering via ``columns`` keyword in select
- added methods append_to_multiple/select_as_multiple/select_as_coordinates
to do multiple-table append/selection
- added support for datetime64 in columns
- added method ``unique`` to select the unique values in an indexable or data column
- added method ``copy`` to copy an existing store (and possibly upgrade)
- show the shape of the data on disk for non-table stores when printing the store
- added ability to read PyTables flavor tables (allows compatiblity to other HDF5 systems)
- Add ``logx`` option to DataFrame/Series.plot (GH2327_, #2565)
- Support reading gzipped data from file-like object
- ``pivot_table`` aggfunc can be anything used in GroupBy.aggregate (GH2643_)
- Implement DataFrame merges in case where set cardinalities might overflow
64-bit integer (GH2690_)
- Raise exception in C file parser if integer dtype specified and have NA
values. (GH2631_)
- Attempt to parse ISO8601 format dates when parse_dates=True in read_csv for
major performance boost in such cases (GH2698_)
- Add methods ``neg`` and ``inv`` to Series
- Implement ``kind`` option in ``ExcelFile`` to indicate whether it's an XLS
or XLSX file (GH2613_)

**Bug fixes**

- Fix read_csv/read_table multithreading issues (GH2608_)
- ``HDFStore``

- correctly handle ``nan`` elements in string columns; serialize via the
``nan_rep`` keyword to append
- raise correctly on non-implemented column types (unicode/date)
- handle correctly ``Term`` passed types (e.g. ``index<1000``, when index
is ``Int64``), (closes GH512_)
- handle Timestamp correctly in data_columns (closes GH2637_)
- contains correctly matches on non-natural names
- correctly store ``float32`` dtypes in tables (if not other float types in
the same table)
- Fix DataFrame.info bug with UTF8-encoded columns. (GH2576_)
- Fix DatetimeIndex handling of FixedOffset tz (GH2604_)
- More robust detection of being in IPython session for wide DataFrame
console formatting (GH2585_)
- Fix platform issues with ``file:///`` in unit test (#2564)
- Fix bug and possible segfault when grouping by hierarchical level that
contains NA values (GH2616_)
- Ensure that MultiIndex tuples can be constructed with NAs (seen in #2616)
- Fix int64 overflow issue when unstacking MultiIndex with many levels (#2616)
- Exclude non-numeric data from DataFrame.quantile by default (GH2625_)
- Fix a Cython C int64 boxing issue causing read_csv to return incorrect
results (GH2599_)
- Fix groupby summing performance issue on boolean data (GH2692_)
- Don't bork Series containing datetime64 values with to_datetime (GH2699_)
- Fix DataFrame.from_records corner case when passed columns, index column,
but empty record list (GH2633_)
- Fix C parser-tokenizer bug with trailing fields. (GH2668_)
- Don't exclude non-numeric data from GroupBy.max/min (GH2700_)
- Don't lose time zone when calling DatetimeIndex.drop (GH2621_)
- Fix setitem on a Series with a boolean key and a non-scalar as value (GH2686_)
- Box datetime64 values in Series.apply/map (GH2627_, GH2689_)
- Upconvert datetime + datetime64 values when concatenating frames (GH2624_)
- Raise a more helpful error message in merge operations when one DataFrame
has duplicate columns (GH2649_)
- Fix partial date parsing issue occuring only when code is run at EOM (GH2618_)
- Prevent MemoryError when using counting sort in sortlevel with
high-cardinality MultiIndex objects (GH2684_)
- Fix Period resampling bug when all values fall into a single bin (GH2070_)
- Fix buggy interaction with usecols argument in read_csv when there is an
implicit first index column (GH2654_)

.. _GH512: https://github.com/pydata/pandas/issues/512
.. _GH1277: https://github.com/pydata/pandas/issues/1277
.. _GH2070: https://github.com/pydata/pandas/issues/2070
.. _GH2327: https://github.com/pydata/pandas/issues/2327
.. _GH2585: https://github.com/pydata/pandas/issues/2585
.. _GH2599: https://github.com/pydata/pandas/issues/2599
.. _GH2604: https://github.com/pydata/pandas/issues/2604
.. _GH2576: https://github.com/pydata/pandas/issues/2576
.. _GH2608: https://github.com/pydata/pandas/issues/2608
.. _GH2613: https://github.com/pydata/pandas/issues/2613
.. _GH2616: https://github.com/pydata/pandas/issues/2616
.. _GH2621: https://github.com/pydata/pandas/issues/2621
.. _GH2624: https://github.com/pydata/pandas/issues/2624
.. _GH2625: https://github.com/pydata/pandas/issues/2625
.. _GH2627: https://github.com/pydata/pandas/issues/2627
.. _GH2631: https://github.com/pydata/pandas/issues/2631
.. _GH2633: https://github.com/pydata/pandas/issues/2633
.. _GH2637: https://github.com/pydata/pandas/issues/2637
.. _GH2643: https://github.com/pydata/pandas/issues/2643
.. _GH2649: https://github.com/pydata/pandas/issues/2649
.. _GH2654: https://github.com/pydata/pandas/issues/2654
.. _GH2668: https://github.com/pydata/pandas/issues/2668
.. _GH2684: https://github.com/pydata/pandas/issues/2684
.. _GH2689: https://github.com/pydata/pandas/issues/2689
.. _GH2690: https://github.com/pydata/pandas/issues/2690
.. _GH2692: https://github.com/pydata/pandas/issues/2692
.. _GH2698: https://github.com/pydata/pandas/issues/2698
.. _GH2699: https://github.com/pydata/pandas/issues/2699
.. _GH2700: https://github.com/pydata/pandas/issues/2700
.. _GH2694: https://github.com/pydata/pandas/issues/2694
.. _GH2686: https://github.com/pydata/pandas/issues/2686
.. _GH2618: https://github.com/pydata/pandas/issues/2618

pandas 0.10.0
=============

Expand Down
1 change: 0 additions & 1 deletion bench/bench_dense_to_sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,3 @@
this_rng = rng2[:-i]
data[100:] = np.nan
series[i] = SparseSeries(data, index=this_rng)

7 changes: 7 additions & 0 deletions bench/bench_get_put_value.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,39 +4,46 @@
N = 1000
K = 50


def _random_index(howmany):
return Index([rands(10) for _ in xrange(howmany)])

df = DataFrame(np.random.randn(N, K), index=_random_index(N),
columns=_random_index(K))


def get1():
for col in df.columns:
for row in df.index:
_ = df[col][row]


def get2():
for col in df.columns:
for row in df.index:
_ = df.get_value(row, col)


def put1():
for col in df.columns:
for row in df.index:
df[col][row] = 0


def put2():
for col in df.columns:
for row in df.index:
df.set_value(row, col, 0)


def resize1():
buf = DataFrame()
for col in df.columns:
for row in df.index:
buf = buf.set_value(row, col, 5.)
return buf


def resize2():
from collections import defaultdict

Expand Down
9 changes: 6 additions & 3 deletions bench/bench_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,19 @@
random.shuffle(foo)
random.shuffle(foo2)

df = DataFrame({'A' : foo,
'B' : foo2,
'C' : np.random.randn(n * k)})
df = DataFrame({'A': foo,
'B': foo2,
'C': np.random.randn(n * k)})

import pandas._sandbox as sbx


def f():
table = sbx.StringHashTable(len(df))
ret = table.factorize(df['A'])
return ret


def g():
table = sbx.PyObjectHashTable(len(df))
ret = table.factorize(df['A'])
Expand Down
44 changes: 26 additions & 18 deletions bench/bench_join_panel.py
Original file line number Diff line number Diff line change
@@ -1,49 +1,55 @@
# reasonably effecient
# reasonably efficient


def create_panels_append(cls, panels):
""" return an append list of panels """
panels = [ a for a in panels if a is not None ]
panels = [a for a in panels if a is not None]
# corner cases
if len(panels) == 0:
return None
elif len(panels) == 1:
return panels[0]
elif len(panels) == 2 and panels[0] == panels[1]:
return panels[0]
#import pdb; pdb.set_trace()
# import pdb; pdb.set_trace()
# create a joint index for the axis

def joint_index_for_axis(panels, axis):
s = set()
for p in panels:
s.update(list(getattr(p,axis)))
s.update(list(getattr(p, axis)))
return sorted(list(s))

def reindex_on_axis(panels, axis, axis_reindex):
new_axis = joint_index_for_axis(panels, axis)
new_panels = [ p.reindex(**{ axis_reindex : new_axis, 'copy' : False}) for p in panels ]
new_panels = [p.reindex(**{axis_reindex: new_axis,
'copy': False}) for p in panels]
return new_panels, new_axis
# create the joint major index, dont' reindex the sub-panels - we are appending
# create the joint major index, dont' reindex the sub-panels - we are
# appending
major = joint_index_for_axis(panels, 'major_axis')
# reindex on minor axis
panels, minor = reindex_on_axis(panels, 'minor_axis', 'minor')
# reindex on items
panels, items = reindex_on_axis(panels, 'items', 'items')
# concatenate values
try:
values = np.concatenate([ p.values for p in panels ],axis=1)
values = np.concatenate([p.values for p in panels], axis=1)
except (Exception), detail:
raise Exception("cannot append values that dont' match dimensions! -> [%s] %s" % (','.join([ "%s" % p for p in panels ]),str(detail)))
#pm('append - create_panel')
p = Panel(values, items = items, major_axis = major, minor_axis = minor )
#pm('append - done')
raise Exception("cannot append values that dont' match dimensions! -> [%s] %s"
% (','.join(["%s" % p for p in panels]), str(detail)))
# pm('append - create_panel')
p = Panel(values, items=items, major_axis=major,
minor_axis=minor)
# pm('append - done')
return p



# does the job but inefficient (better to handle like you read a table in pytables...e.g create a LongPanel then convert to Wide)

# does the job but inefficient (better to handle like you read a table in
# pytables...e.g create a LongPanel then convert to Wide)
def create_panels_join(cls, panels):
""" given an array of panels's, create a single panel """
panels = [ a for a in panels if a is not None ]
panels = [a for a in panels if a is not None]
# corner cases
if len(panels) == 0:
return None
Expand All @@ -62,16 +68,18 @@ def create_panels_join(cls, panels):
for minor_i, minor_index in panel.minor_axis.indexMap.items():
for major_i, major_index in panel.major_axis.indexMap.items():
try:
d[(minor_i,major_i,item)] = values[item_index,major_index,minor_index]
d[(minor_i, major_i, item)] = values[item_index, major_index, minor_index]
except:
pass
# stack the values
minor = sorted(list(minor))
major = sorted(list(major))
items = sorted(list(items))
# create the 3d stack (items x columns x indicies)
data = np.dstack([ np.asarray([ np.asarray([ d.get((minor_i,major_i,item),np.nan) for item in items ]) for major_i in major ]).transpose() for minor_i in minor ])
data = np.dstack([np.asarray([np.asarray([d.get((minor_i, major_i, item), np.nan)
for item in items])
for major_i in major]).transpose()
for minor_i in minor])
# construct the panel
return Panel(data, items, major, minor)
add_class_method(Panel, create_panels_join, 'join_many')

Loading

0 comments on commit 9201e79

Please sign in to comment.