Skip to content

Commit

Permalink
Merge commit 'v0.4.1' into debian
Browse files Browse the repository at this point in the history
* commit 'v0.4.1': (53 commits)
  RLS: Version 0.4.1
  BUG: use int64
  BUG: reverted Series constructor NumPy < 1.6 bug
  TST: wrap up test coverage
  TST: test coverage, minor refactoring
  TST: test coverage and minor bugfix in NDFrame.swaplevel
  DOC: documented reading CSV/table into MultiIndex, address GH pandas-dev#165
  DOC: documented swaplevel, address GH pandas-dev#150
  ENH: better JR join function
  ENH: add join panel function for testing and later integration
  BUG: do not allow appending with different item order
  ENH: don't raise exception when calling remove on non-existent node
  ENH: tinkering with other join impl
  ENH: speed up assert_almost_equal
  BUG: DateRange.copy did not produce well-formed object. fixes GH pandas-dev#168
  DOC: update release notes
  BUG: count_level did not handle zero-length data case, caused segfault with NumPy < 1.6 for some. Fixes GH pandas-dev#169
  ENH: sped up inner/outer_join_indexer cython functions
  ENH: don't boundscheck or wraparound
  ENH: bug fixes, speed enh, benchmark suite to compare with xts
  ...
  • Loading branch information
yarikoptic committed Sep 26, 2011
2 parents 645d611 + cdc607c commit a1ae6f2
Show file tree
Hide file tree
Showing 37 changed files with 1,720 additions and 227 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ MANIFEST
*.pyd
pandas/src/tseries.c
pandas/src/sparse.c
pandas/version.py
doc/source/generated
*flymake*
scikits
Expand Down
99 changes: 84 additions & 15 deletions RELEASE.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,83 @@
========================
pandas 0.4 Release Notes
========================
=============
Release Notes
=============

What is it
This is the list of changes to pandas between each release. For full details,
see the commit logs at http://github.com/wesm/pandas


pandas 0.4.1
============

**Release date:** Not yet released

This is primarily a bug fix release but includes some new features and
improvements

**New features / modules**

- Added new `DataFrame` methods `get_dtype_counts` and property `dtypes`
- Setting of values using ``.ix`` indexing attribute in mixed-type DataFrame
objects has been implemented (fixes GH #135)
- `read_csv` can read multiple columns into a `MultiIndex`. DataFrame's
`to_csv` method will properly write out a `MultiIndex` which can be read
back (PR #151, thanks to Skipper Seabold)
- Wrote fast time series merging / joining methods in Cython. Will be
integrated later into DataFrame.join and related functions
- Added `ignore_index` option to `DataFrame.append` for combining unindexed
records stored in a DataFrame

**Improvements to existing features**

- Some speed enhancements with internal Index type-checking function
- `DataFrame.rename` has a new `copy` parameter which can rename a DataFrame
in place
- Enable unstacking by level name (PR #142)
- Enable sortlevel to work by level name (PR #141)
- `read_csv` can automatically "sniff" other kinds of delimiters using
`csv.Sniffer` (PR #146)
- Improved speed of unit test suite by about 40%
- Exception will not be raised calling `HDFStore.remove` on non-existent node
with where clause
- Optimized `_ensure_index` function resulting in performance savings in
type-checking Index objects

**Bug fixes**

- Fixed DataFrame constructor bug causing downstream problems (e.g. .copy()
failing) when passing a Series as the values along with a column name and
index
- Fixed single-key groupby on DataFrame with as_index=False (GH #160)
- `Series.shift` was failing on integer Series (GH #154)
- `unstack` methods were producing incorrect output in the case of duplicate
hierarchical labels. An exception will now be raised (GH #147)
- Calling `count` with level argument caused reduceat failure or segfault in
earlier NumPy (GH #169)
- Fixed `DataFrame.corrwith` to automatically exclude non-numeric data (GH
#144)
- Unicode handling bug fixes in `DataFrame.to_string` (GH #138)
- Excluding OLS degenerate unit test case that was causing platform specific
failure (GH #149)
- Skip blosc-dependent unit tests for PyTables < 2.2 (PR #137)
- Calling `copy` on `DateRange` did not copy over attributes to the new object
(GH #168)
- Fix bug in `HDFStore` in which Panel data could be appended to a Table with
different item order, thus resulting in an incorrect result read back

Thanks
------
- Yaroslav Halchenko
- Jeff Reback
- Skipper Seabold
- Dan Lovell
- Nick Pentreath

pandas 0.4
==========

What is it
----------

**pandas** is a library of powerful labeled-axis data structures, statistical
tools, and general code for working with relational data sets, including time
series and cross-sectional data. It was designed with the practical needs of
Expand All @@ -13,14 +86,14 @@ particularly well suited for, among other things, financial data analysis
applications.

Where to get it
===============
---------------

Source code: http://github.com/wesm/pandas
Binary installers on PyPI: http://pypi.python.org/pypi/pandas
Documentation: http://pandas.sourceforge.net

Release notes
=============
-------------

**Release date:** 9/12/2011

Expand Down Expand Up @@ -279,12 +352,8 @@ Thanks
- Skipper Seabold
- Chris Jordan-Squire

========================
pandas 0.3 Release Notes
========================

Release Notes
=============
pandas 0.3
==========

This major release of pandas represents approximately 1 year of continuous
development work and brings with it many new features, bug fixes, speed
Expand All @@ -293,22 +362,22 @@ change from the 0.2 release has been the completion of a rigorous unit test
suite covering all of the core functionality.

What is it
==========
----------

**pandas** is a library of labeled data structures, statistical models, and
general code for working with time series and cross-sectional data. It was
designed with the practical needs of statistical modeling and large,
inhomogeneous data sets in mind.

Where to get it
===============
---------------

Source code: http://github.com/wesm/pandas
Binary installers on PyPI: http://pypi.python.org/pypi/pandas
Documentation: http://pandas.sourceforge.net

Release notes
=============
-------------

**Release date:** February 20, 2011

Expand Down
77 changes: 77 additions & 0 deletions bench/bench_join_panel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# reasonably effecient

def create_panels_append(cls, panels):
""" return an append list of panels """
panels = [ a for a in panels if a is not None ]
# corner cases
if len(panels) == 0:
return None
elif len(panels) == 1:
return panels[0]
elif len(panels) == 2 and panels[0] == panels[1]:
return panels[0]
#import pdb; pdb.set_trace()
# create a joint index for the axis
def joint_index_for_axis(panels, axis):
s = set()
for p in panels:
s.update(list(getattr(p,axis)))
return sorted(list(s))
def reindex_on_axis(panels, axis, axis_reindex):
new_axis = joint_index_for_axis(panels, axis)
new_panels = [ p.reindex(**{ axis_reindex : new_axis, 'copy' : False}) for p in panels ]
return new_panels, new_axis
# create the joint major index, dont' reindex the sub-panels - we are appending
major = joint_index_for_axis(panels, 'major_axis')
# reindex on minor axis
panels, minor = reindex_on_axis(panels, 'minor_axis', 'minor')
# reindex on items
panels, items = reindex_on_axis(panels, 'items', 'items')
# concatenate values
try:
values = np.concatenate([ p.values for p in panels ],axis=1)
except (Exception), detail:
raise Exception("cannot append values that dont' match dimensions! -> [%s] %s" % (','.join([ "%s" % p for p in panels ]),str(detail)))
#pm('append - create_panel')
p = Panel(values, items = items, major_axis = major, minor_axis = minor )
#pm('append - done')
return p



# does the job but inefficient (better to handle like you read a table in pytables...e.g create a LongPanel then convert to Wide)

def create_panels_join(cls, panels):
""" given an array of panels's, create a single panel """
panels = [ a for a in panels if a is not None ]
# corner cases
if len(panels) == 0:
return None
elif len(panels) == 1:
return panels[0]
elif len(panels) == 2 and panels[0] == panels[1]:
return panels[0]
d = dict()
minor, major, items = set(), set(), set()
for panel in panels:
items.update(panel.items)
major.update(panel.major_axis)
minor.update(panel.minor_axis)
values = panel.values
for item, item_index in panel.items.indexMap.items():
for minor_i, minor_index in panel.minor_axis.indexMap.items():
for major_i, major_index in panel.major_axis.indexMap.items():
try:
d[(minor_i,major_i,item)] = values[item_index,major_index,minor_index]
except:
pass
# stack the values
minor = sorted(list(minor))
major = sorted(list(major))
items = sorted(list(items))
# create the 3d stack (items x columns x indicies)
data = np.dstack([ np.asarray([ np.asarray([ d.get((minor_i,major_i,item),np.nan) for item in items ]) for major_i in major ]).transpose() for minor_i in minor ])
# construct the panel
return Panel(data, items, major, minor)
add_class_method(Panel, create_panels_join, 'join_many')

52 changes: 52 additions & 0 deletions bench/bench_take_indexing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import numpy as np

from pandas import *
import pandas._tseries as lib

from pandas import DataFrame
import timeit

setup = """
from pandas import Series
import pandas._tseries as lib
import random
import numpy as np
import random
n = %d
k = %d
arr = np.random.randn(n, k)
indexer = np.arange(n, dtype=np.int32)
indexer = indexer[::-1]
"""

sizes = [100, 1000, 10000, 100000]
iters = [1000, 1000, 100, 1]

fancy_2d = []
take_2d = []
cython_2d = []

n = 1000

def _timeit(stmt, size, k=5, iters=1000):
timer = timeit.Timer(stmt=stmt, setup=setup % (sz, k))
return timer.timeit(n) / n

for sz, its in zip(sizes, iters):
print sz
fancy_2d.append(_timeit('arr[indexer]', sz, iters=its))
take_2d.append(_timeit('arr.take(indexer, axis=0)', sz, iters=its))
cython_2d.append(_timeit('lib.take_axis0(arr, indexer)', sz, iters=its))

df = DataFrame({'fancy' : fancy_2d,
'take' : take_2d,
'cython' : cython_2d})

print df

from pandas.rpy.common import r
r('mat <- matrix(rnorm(50000), nrow=10000, ncol=5)')
r('set.seed(12345')
r('indexer <- sample(1:10000)')
r('mat[indexer,]')
16 changes: 16 additions & 0 deletions doc/data/mindex_ex.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
year,indiv,zit,xit
1977,"A",1.2,.6
1977,"B",1.5,.5
1977,"C",1.7,.8
1978,"A",.2,.06
1978,"B",.7,.2
1978,"C",.8,.3
1978,"D",.9,.5
1978,"E",1.4,.9
1979,"C",.2,.15
1979,"D",.14,.05
1979,"E",.5,.15
1979,"F",1.2,.5
1979,"G",3.4,1.9
1979,"H",5.4,2.7
1979,"I",6.4,1.2
5 changes: 0 additions & 5 deletions doc/source/dsintro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -513,11 +513,6 @@ The API for insertion and deletion is the same as for DataFrame.
Indexing / Selection
~~~~~~~~~~~~~~~~~~~~

As of this writing, indexing with Panel is a bit more restrictive than in
DataFrame. Notably, :ref:`advanced indexing <indexing>` via the **ix** property
has not yet been integrated in Panel. This will be done, however, in a
future release.

.. csv-table::
:header: "Operation", "Syntax", "Result"
:widths: 30, 20, 10
Expand Down
19 changes: 13 additions & 6 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -291,19 +291,16 @@ than integer locations. Therefore, advanced indexing with ``.ix`` will always
Setting values in mixed-type objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Setting values on a mixed-type DataFrame or Panel is not yet supported:
Setting values on a mixed-type DataFrame or Panel is supported when using scalar
values, though setting arbitrary vectors is not yet supported:

.. ipython:: python
df2 = df[:4]
df2['foo'] = 'bar'
df2.ix[3]
df2.ix[3] = np.nan
The reason it has not been implemented yet is simply due to difficulty of
implementation relative to its utility. Handling the full spectrum of
exceptional cases for setting values is trickier than getting values (which is
relatively straightforward).
df2
.. _indexing.hierarchical:

Expand Down Expand Up @@ -523,6 +520,16 @@ However:
>>> s.ix[('a', 'b'):('b', 'a')]
Exception: MultiIndex lexsort depth 1, key was length 2

Swapping levels with ``swaplevel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``swaplevel`` function can switch the order of two levels:

.. ipython:: python
df[:5]
df[:5].swaplevel(0, 1, axis=0)
The ``delevel`` DataFrame function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
18 changes: 18 additions & 0 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,24 @@ fragile. Type inference is a pretty big deal. So if a column can be coerced to
integer dtype without altering the contents, it will do so. Any non-numeric
columns will come through as object dtype as with the rest of pandas objects.

Reading DataFrame objects with ``MultiIndex``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Suppose you have data indexed by two columns:

.. ipython:: python
print open('data/mindex_ex.csv').read()
The ``index_col`` argument to ``read_csv`` and ``read_table`` can take a list of
column numbers to turn multiple columns into a ``MultiIndex``:

.. ipython:: python
df = read_csv("data/mindex_ex.csv", index_col=[0,1])
df
df.ix[1978]
Excel 2003 files
----------------

Expand Down
Loading

0 comments on commit a1ae6f2

Please sign in to comment.