Fix for issue pandas-dev#11317

This includes updates to 3 Excel files, plus a test in test_excel.py, plus the fix in parsers.py issue when read_html with previous fix With read_html, the fix didn't work on Python 2.7. Handle the string conversion correctly Add bug fixed to what's new Revert "Add bug fixed to what's new" This reverts commit 05b2344. Revert "issue when read_html with previous fix" This reverts commit d1bc296. Add what's new to describe bug. fix issue with original fix Added text to describe the bug. Fixed issue so that it works correctly in Python 2.7 Add round trip test Added round trip test and fixed error in writing sheets when merge_cells=false and columns have multi index DEPR: deprecate pandas.io.ga, pandas-dev#11308 DEPR: deprecate engine keyword from to_csv pandas-dev#11274 remove warnings from the tests for deprecation of engine in to_csv PERF: Checking monotonic-ness before sorting on an index pandas-dev#11080 BUG: Bug in list-like indexing with a mixed-integer Index, pandas-dev#11320 Add hex color strings test CLN: GH11271 move _get_handle, UTF encoders to io.common TST: tests for list skiprows in read_excel BUG: Fix to_dict() problem when using only datetime pandas-dev#11247 Fix a bug where to_dict() does not return Timestamp when there is only datetime dtype present. Undo change for when columns are multiindex There is still something wrong here in the format of the file when there are multiindex columns, but that's for another day Fix formatting in test_excel and remove spurious test See title BUG: bug in comparisons vs tuples, pandas-dev#11339 bug#10442 : fix, adding note and test BUG pandas-dev#10442(test) : Convert datetimelike index to strings with astype(str) BUG#10422: note added bug#10442 : tests added bug#10442 : note udated BUG pandas-dev#10442(test) : Convert datetimelike index to strings with astype(str) bug#10442: fix, adding note and test bug#10442: fix, adding note and test Adjust test so that merge_cells=False works correctly Adjust the test so that if merge_cells=false, it does a proper formatting of the columns in the single row header, and puts the row header in the first row Fix test for Python 2.7 and 3.5 The test is failing on Python 2.7 and 3.5, which appears to read in the values as floats, and I cannot replicate. So force the tests to pass by just making the column names equal when merge_cells=False Fix for openpyxl < 2, and for issue pandas-dev#11408 If using openpyxl < 2, and value is a string that could be a number, force a string to be written out. If using openpyxl >= 2.2, then fix issue pandas-dev#11408 to do with merging cells Use set_value_explicit instead of set_explicit_value set_value_explicit is in openpyxl 1.6, changed in openpyxl 1.8, but there is code in 1.8 to set set_value_explicit to set_explicit_value for compatibility Add line in whatsnew for issue 11408 ENH: added capability to handle Path/LocalPath objects, pandas-dev#11033 DOC: typo in whatsnew/0.17.1.txt PERF: Release GIL on some datetime ops BUG: Bug in DataFrame.replace with a datetime64[ns, tz] and a non-compat to_replace pandas-dev#11326 CLN: clean up internal impl of fillna/replace, xref pandas-dev#11153 PERF: fast inf checking in to_excel PERF: Series.dropna with non-nan dtypes fixed pathlib tests on windows DEPR: remove some SparsePanel deprecation warnings in testing DEPR: avoid numpy comparison to None warnings API: indexing with a null key will raise a TypeError rather than a ValueError, pandas-dev#11356 WARN: elementwise comparisons with index names, xref pandas-dev#11162 DEPR warning in io/data.py w.r.t. order->sort_values WARN: more elementwise comparisons to object WARN: more uncomparables of numeric array vs object BUG: quick fix for pandas-dev#10989 TST: add test case from Issue pandas-dev#10989 API: add _to_safe_for_reshape to allow safe insert/append with embedded CategoricalIndexes Signed-off-by: Jeff Reback <[email protected]> BLD: conda Revert "BLD: conda" This reverts commit 0c8a8e1. TST: remove invalid symbol warnings TST: move some tests to slow TST: fix some warnings filters TST: import pandas_datareader, use for tests TST: remove some deprecation warnings from imports DEPR: fix VisibleDeprecationWarnings in sparse TST: remove some warnings in test_nanops ENH: Improve the error message in to_gbq when the DataFrame schema does not match pandas-dev#11359 add libgfortran to 1.8.1 build binstar -> anaconda remove link to issue 11328 in whatsnew Fixes to document issue in code, small efficiency fix Try to resolve rebase conflict in whats new
Dr-Irv · Oct 24, 2015 · 4f62b99 · 4f62b99
1 parent 3914e0f
commit 4f62b99
Show file tree

Hide file tree

Showing 69 changed files with 1,664 additions and 816 deletions.
diff --git a/asv_bench/asv.conf.json b/asv_bench/asv.conf.json
@@ -43,6 +43,7 @@
         "numexpr": [],
         "pytables": [],
         "openpyxl": [],
+        "xlsxwriter": [],
         "xlrd": [],
         "xlwt": []
     },

diff --git a/asv_bench/benchmarks/frame_methods.py b/asv_bench/benchmarks/frame_methods.py
@@ -930,6 +930,16 @@ def time_frame_xs_row(self):
         self.df.xs(50000)
 
 
+class frame_sort_index(object):
+    goal_time = 0.2
+
+    def setup(self):
+        self.df = DataFrame(randn(1000000, 2), columns=list('AB'))
+
+    def time_frame_sort_index(self):
+        self.df.sort_index()
+
+
 class series_string_vector_slice(object):
     goal_time = 0.2
 

diff --git a/asv_bench/benchmarks/gil.py b/asv_bench/benchmarks/gil.py
@@ -320,3 +320,49 @@ def time_nogil_kth_smallest(self):
         def run(arr):
             algos.kth_smallest(arr, self.k)
         run()
+
+class nogil_datetime_fields(object):
+    goal_time = 0.2
+
+    def setup(self):
+        self.N = 100000000
+        self.dti = pd.date_range('1900-01-01', periods=self.N, freq='D')
+        self.period = self.dti.to_period('D')
+        if (not have_real_test_parallel):
+            raise NotImplementedError
+
+    def time_datetime_field_year(self):
+        @test_parallel(num_threads=2)
+        def run(dti):
+            dti.year
+        run(self.dti)
+
+    def time_datetime_field_day(self):
+        @test_parallel(num_threads=2)
+        def run(dti):
+            dti.day
+        run(self.dti)
+
+    def time_datetime_field_daysinmonth(self):
+        @test_parallel(num_threads=2)
+        def run(dti):
+            dti.days_in_month
+        run(self.dti)
+
+    def time_datetime_field_normalize(self):
+        @test_parallel(num_threads=2)
+        def run(dti):
+            dti.normalize()
+        run(self.dti)
+
+    def time_datetime_to_period(self):
+        @test_parallel(num_threads=2)
+        def run(dti):
+            dti.to_period('S')
+        run(self.dti)
+
+    def time_period_to_datetime(self):
+        @test_parallel(num_threads=2)
+        def run(period):
+            period.to_timestamp()
+        run(self.period)
diff --git a/asv_bench/benchmarks/series_methods.py b/asv_bench/benchmarks/series_methods.py
@@ -71,3 +71,23 @@ def setup(self):
     def time_series_nsmallest2(self):
         self.s2.nsmallest(3, take_last=True)
         self.s2.nsmallest(3, take_last=False)
+
+
+class series_dropna_int64(object):
+    goal_time = 0.2
+
+    def setup(self):
+        self.s = Series(np.random.randint(1, 10, 1000000))
+
+    def time_series_dropna_int64(self):
+        self.s.dropna()
+
+class series_dropna_datetime(object):
+    goal_time = 0.2
+
+    def setup(self):
+        self.s = Series(pd.date_range('2000-01-01', freq='S', periods=1000000))
+        self.s[np.random.randint(1, 1000000, 100)] = pd.NaT
+
+    def time_series_dropna_datetime(self):
+        self.s.dropna()
diff --git a/ci/install_conda.sh b/ci/install_conda.sh
@@ -73,7 +73,7 @@ bash miniconda.sh -b -p $HOME/miniconda || exit 1
 conda config --set always_yes yes --set changeps1 no || exit 1
 conda update -q conda || exit 1
 conda config --add channels conda-forge || exit 1
-conda config --add channels http://conda.binstar.org/pandas || exit 1
+conda config --add channels http://conda.anaconda.org/pandas || exit 1
 conda config --set ssl_verify false || exit 1
 
 # Useful for debugging any issues with conda

diff --git a/ci/requirements-2.7.pip b/ci/requirements-2.7.pip
@@ -2,3 +2,5 @@ blosc
 httplib2
 google-api-python-client == 1.2
 python-gflags == 2.0
+pathlib
+py
diff --git a/ci/requirements-2.7_SLOW.pip b/ci/requirements-2.7_SLOW.pip
diff --git a/ci/requirements-3.4.build b/ci/requirements-3.4.build
@@ -2,3 +2,4 @@ python-dateutil
 pytz
 numpy=1.8.1
 cython
+libgfortran
diff --git a/doc/source/conf.py b/doc/source/conf.py
@@ -299,8 +299,9 @@
 intersphinx_mapping = {
     'statsmodels': ('http://statsmodels.sourceforge.net/devel/', None),
     'matplotlib': ('http://matplotlib.org/', None),
-    'python': ('http://docs.python.org/', None),
-    'numpy': ('http://docs.scipy.org/doc/numpy', None)
+    'python': ('http://docs.python.org/3', None),
+    'numpy': ('http://docs.scipy.org/doc/numpy', None),
+    'py': ('http://pylib.readthedocs.org/en/latest/', None)
 }
 import glob
 autosummary_generate = glob.glob("*.rst")

diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -79,9 +79,10 @@ for some advanced strategies
 
 They can take a number of arguments:
 
-  - ``filepath_or_buffer``: Either a string path to a file, URL
+  - ``filepath_or_buffer``: Either a path to a file (a :class:`python:str`,
+    :class:`python:pathlib.Path`, or :class:`py:py._path.local.LocalPath`), URL
     (including http, ftp, and S3 locations), or any object with a ``read``
-    method (such as an open file or ``StringIO``).
+    method (such as an open file or :class:`~python:io.StringIO`).
   - ``sep`` or ``delimiter``: A delimiter / separator to split fields
     on. With ``sep=None``, ``read_csv`` will try to infer the delimiter
     automatically in some cases by "sniffing".

diff --git a/doc/source/whatsnew/v0.17.1.txt b/doc/source/whatsnew/v0.17.1.txt
@@ -17,6 +17,7 @@ Highlights include:
 
 Enhancements
 ~~~~~~~~~~~~
+- ``DatetimeIndex`` now supports conversion to strings with astype(str)(:issue:`10442`)
 
 - Support for ``compression`` (gzip/bz2) in :method:`DataFrame.to_csv` (:issue:`7615`)
 
@@ -27,6 +28,10 @@ Enhancements
 Other Enhancements
 ^^^^^^^^^^^^^^^^^^
 
+- ``pd.read_*`` functions can now also accept :class:`python:pathlib.Path`, or :class:`py:py._path.local.LocalPath`
+  objects for the ``filepath_or_buffer`` argument. (:issue:`11033`)
+- Improve the error message displayed in :func:`pandas.io.gbq.to_gbq` when the DataFrame does not match the schema of the destination table (:issue:`11359`)
+
 .. _whatsnew_0171.api:
 
 API changes
@@ -37,17 +42,31 @@ API changes
 - Regression from 0.16.2 for output formatting of long floats/nan, restored in (:issue:`11302`)
 - Prettyprinting sets (e.g. in DataFrame cells) now uses set literal syntax (``{x, y}``) instead of
   Legacy Python syntax (``set([x, y])``) (:issue:`11215`)
+- Indexing with a null key will raise a ``TypeError``, instead of a ``ValueError`` (:issue:`11356`)
 
 .. _whatsnew_0171.deprecations:
 
 Deprecations
 ^^^^^^^^^^^^
 
+- The ``pandas.io.ga`` module which implements ``google-analytics`` support is deprecated and will be removed in a future version (:issue:`11308`)
+- Deprecate the ``engine`` keyword from ``.to_csv()``, which will be removed in a future version (:issue:`11274`)
+
+
 .. _whatsnew_0171.performance:
 
 Performance Improvements
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
+- Checking monotonic-ness before sorting on an index (:issue:`11080`)
+- ``Series.dropna`` performance improvement when its dtype can't contain ``NaN`` (:issue:`11159`)
+
+
+- Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)
+
+
+- Improved performance to ``to_excel`` (:issue:`11352`)
+
 .. _whatsnew_0171.bug_fixes:
 
 Bug Fixes
@@ -58,13 +77,19 @@ Bug Fixes
 
 - Bug in ``HDFStore.select`` when comparing with a numpy scalar in a where clause (:issue:`11283`)
 
-- Bug in tz-conversions with an ambiguous time and ``.dt`` accessors (:issues:`11295`)
+
+- Bug in tz-conversions with an ambiguous time and ``.dt`` accessors (:issue:`11295`)
+- Bug in comparisons of Series vs list-likes (:issue:`11339`)
 
 
+- Bug in ``DataFrame.replace`` with a ``datetime64[ns, tz]`` and a non-compat to_replace (:issue:`11326`, :issue:`11153`)
 
 
 
+- Bug in list-like indexing with a mixed-integer Index (:issue:`11320`)
 
+- Bug in ``pivot_table`` with ``margins=True`` when indexes are of ``Categorical`` dtype (:issue:`10993`)
+- Bug in ``DataFrame.plot`` cannot use hex strings colors (:issue:`10299`)
 
 
 
@@ -88,5 +113,12 @@ Bug Fixes
 
 
 - Bugs in ``to_excel`` with duplicate columns (:issue:`11007`, :issue:`10982`, :issue:`10970`)
+
 - Fixed a bug that prevented the construction of an empty series of dtype
   ``datetime64[ns, tz]`` (:issue:`11245`).
+
+- Bug in ``read_excel`` with multi-index containing integers (:issue:`11317`)
+
+- Bug in ``to_excel`` with openpyxl 2.2+ and merging (:issue:`11408`)
+
+- Bug in ``DataFrame.to_dict()`` produces a ``np.datetime64`` object instead of ``Timestamp`` when only datetime is present in data (:issue:`11327`)
-Original file line number
+Diff line change
@@ Expand Up / @@ -2,3 +2,4 @@ python-dateutil @@
     pytz
     numpy=1.8.1
     cython
+    libgfortran