Skip to content

Commit

Permalink
Merge branch 'master' of github.com:pandas-dev/pandas
Browse files Browse the repository at this point in the history
* 'master' of github.com:pandas-dev/pandas: (188 commits)
  Separate out _convert_datetime_to_tsobject (pandas-dev#17715)
  DOC: remove whatsnew note for xref pandas-dev#17131
  BUG: Regression in .loc accepting a boolean Index as an indexer (pandas-dev#17738)
  DEPR: Deprecate cdate_range and merge into bdate_range (pandas-dev#17691)
  CLN: replace %s syntax with .format in pandas.core: categorical, common, config, config_init (pandas-dev#17735)
  Fixed the memory usage explanation of categorical in gotchas from O(nm) to O(n+m) (pandas-dev#17736)
  TST: add backward compat for offset testing for pickles (pandas-dev#17733)
  remove unused time conversion funcs (pandas-dev#17711)
  DEPR: Deprecate convert parameter in take (pandas-dev#17352)
  BUG:Time Grouper bug fix when applied for list groupers (pandas-dev#17587)
  BUG: Fix some PeriodIndex resampling issues (pandas-dev#16153)
  BUG: Fix unexpected sort in groupby (pandas-dev#17621)
  DOC: Fixed typo in documentation for 'pandas.DataFrame.replace' (pandas-dev#17731)
  BUG: Fix series rename called with str altering name rather index (GH17407) (pandas-dev#17654)
  DOC: Add examples for MultiIndex.get_locs + cleanups (pandas-dev#17675)
  Doc improvements for IntervalIndex and Interval (pandas-dev#17714)
  BUG: DataFrame sort_values and multiple "by" columns fails to order NaT correctly
  Last of the timezones funcs (pandas-dev#17669)
  Add missing file to _pyxfiles, delete commented-out (pandas-dev#17712)
  update imports of DateParseError, remove unused imports from tslib (pandas-dev#17713)
  ...
  • Loading branch information
Krzysztof Chomski committed Oct 2, 2017
2 parents 7818486 + a3d538a commit ddc84d9
Show file tree
Hide file tree
Showing 305 changed files with 12,203 additions and 7,264 deletions.
6 changes: 6 additions & 0 deletions .github/ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@

**Note**: Many problems can be resolved by simply upgrading `pandas` to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if `master` addresses this issue, but that is not necessary.

For documentation-related issues, you can check the latest versions of the docs on `master` here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

#### Expected Output

#### Output of ``pd.show_versions()``
Expand Down
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ matrix:
- JOB="3.5_OSX" TEST_ARGS="--skip-slow --skip-network"
- dist: trusty
env:
- JOB="2.7_LOCALE" TEST_ARGS="--only-slow --skip-network" LOCALE_OVERRIDE="zh_CN.UTF-8"
- JOB="2.7_LOCALE" LOCALE_OVERRIDE="zh_CN.UTF-8" SLOW=true
addons:
apt:
packages:
Expand All @@ -62,7 +62,7 @@ matrix:
# In allow_failures
- dist: trusty
env:
- JOB="2.7_SLOW" TEST_ARGS="--only-slow --skip-network"
- JOB="2.7_SLOW" SLOW=true
# In allow_failures
- dist: trusty
env:
Expand All @@ -82,7 +82,7 @@ matrix:
allow_failures:
- dist: trusty
env:
- JOB="2.7_SLOW" TEST_ARGS="--only-slow --skip-network"
- JOB="2.7_SLOW" SLOW=true
- dist: trusty
env:
- JOB="2.7_BUILD_TEST" TEST_ARGS="--skip-slow" BUILD_TEST=true
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ include LICENSE
include RELEASE.md
include README.rst
include setup.py
include pyproject.toml

graft doc
prune doc/build
Expand Down
8 changes: 7 additions & 1 deletion appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ install:

# install our build environment
- cmd: conda config --set show_channel_urls true --set always_yes true --set changeps1 false
- cmd: conda update -q conda
# - cmd: conda update -q conda
- cmd: conda config --set ssl_verify false

# add the pandas channel *before* defaults to have defaults take priority
Expand All @@ -74,12 +74,18 @@ install:
# create our env
- cmd: conda create -n pandas python=%PYTHON_VERSION% cython pytest>=3.1.0 pytest-xdist
- cmd: activate pandas
- cmd: pip install moto
- SET REQ=ci\requirements-%PYTHON_VERSION%_WIN.run
- cmd: echo "installing requirements from %REQ%"
- cmd: conda install -n pandas --file=%REQ%
- cmd: conda list -n pandas
- cmd: echo "installing requirements from %REQ% - done"

# add some pip only reqs to the env
- SET REQ=ci\requirements-%PYTHON_VERSION%_WIN.pip
- cmd: echo "installing requirements from %REQ%"
- cmd: pip install -Ur %REQ%

# build em using the local source checkout in the correct windows env
- cmd: '%CMD_IN_ENV% python setup.py build_ext --inplace'

Expand Down
10 changes: 6 additions & 4 deletions asv_bench/asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,10 @@
// with results. If the commit is `null`, regression detection is
// skipped for the matching benchmark.
//
// "regressions_first_commits": {
// "some_benchmark": "352cdf", // Consider regressions only after this commit
// "another_benchmark": null, // Skip regression detection altogether
// }
"regressions_first_commits": {
".*": "v0.20.0"
},
"regression_thresholds": {
".*": 0.05
}
}
3 changes: 3 additions & 0 deletions asv_bench/benchmarks/categoricals.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ def time_value_counts_dropna(self):
def time_rendering(self):
str(self.sel)

def time_set_categories(self):
self.ts.cat.set_categories(self.ts.cat.categories[::2])


class Categoricals3(object):
goal_time = 0.2
Expand Down
20 changes: 20 additions & 0 deletions asv_bench/benchmarks/index_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,3 +199,23 @@ def time_datetime_level_values_full(self):

def time_datetime_level_values_sliced(self):
self.mi[:10].values


class Range(object):
goal_time = 0.2

def setup(self):
self.idx_inc = RangeIndex(start=0, stop=10**7, step=3)
self.idx_dec = RangeIndex(start=10**7, stop=-1, step=-3)

def time_max(self):
self.idx_inc.max()

def time_max_trivial(self):
self.idx_dec.max()

def time_min(self):
self.idx_dec.min()

def time_min_trivial(self):
self.idx_inc.min()
30 changes: 30 additions & 0 deletions asv_bench/benchmarks/io_bench.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import os
from .pandas_vb_common import *
from pandas import concat, Timestamp, compat
try:
Expand Down Expand Up @@ -192,3 +193,32 @@ def time_read_nrows(self, compression, engine):
ext = ".bz2"
pd.read_csv(self.big_fname + ext, nrows=10,
compression=compression, engine=engine)


class read_json_lines(object):
goal_time = 0.2
fname = "__test__.json"

def setup(self):
self.N = 100000
self.C = 5
self.df = DataFrame(dict([('float{0}'.format(i), randn(self.N)) for i in range(self.C)]))
self.df.to_json(self.fname,orient="records",lines=True)

def teardown(self):
try:
os.remove(self.fname)
except:
pass

def time_read_json_lines(self):
pd.read_json(self.fname, lines=True)

def time_read_json_lines_chunk(self):
pd.concat(pd.read_json(self.fname, lines=True, chunksize=self.N//4))

def peakmem_read_json_lines(self):
pd.read_json(self.fname, lines=True)

def peakmem_read_json_lines_chunk(self):
pd.concat(pd.read_json(self.fname, lines=True, chunksize=self.N//4))
88 changes: 88 additions & 0 deletions asv_bench/benchmarks/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,35 @@
from pandas import Series, Period, PeriodIndex, date_range


class PeriodProperties(object):
def setup(self):
self.per = Period('2012-06-01', freq='M')

def time_year(self):
self.per.year

def time_month(self):
self.per.month

def time_quarter(self):
self.per.quarter

def time_day(self):
self.per.day

def time_hour(self):
self.per.hour

def time_minute(self):
self.per.second

def time_second(self):
self.per.second

def time_leap_year(self):
self.per.is_leapyear


class Constructor(object):
goal_time = 0.2

Expand Down Expand Up @@ -49,6 +78,65 @@ def time_value_counts_pindex(self):
self.i.value_counts()


class Properties(object):
def setup(self):
self.per = Period('2017-09-06 08:28', freq='min')

def time_year(self):
self.per.year

def time_month(self):
self.per.month

def time_day(self):
self.per.day

def time_hour(self):
self.per.hour

def time_minute(self):
self.per.minute

def time_second(self):
self.per.second

def time_is_leap_year(self):
self.per.is_leap_year

def time_quarter(self):
self.per.quarter

def time_qyear(self):
self.per.qyear

def time_week(self):
self.per.week

def time_daysinmonth(self):
self.per.daysinmonth

def time_dayofweek(self):
self.per.dayofweek

def time_dayofyear(self):
self.per.dayofyear

def time_start_time(self):
self.per.start_time

def time_end_time(self):
self.per.end_time

def time_to_timestamp():
self.per.to_timestamp()

def time_now():
self.per.now()

def time_asfreq():
self.per.asfreq('A')


class period_standard_indexing(object):
goal_time = 0.2

Expand Down
69 changes: 66 additions & 3 deletions asv_bench/benchmarks/sparse.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from itertools import repeat
import itertools

from .pandas_vb_common import *
import scipy.sparse
from pandas import SparseSeries, SparseDataFrame
from pandas import SparseSeries, SparseDataFrame, SparseArray


class sparse_series_to_frame(object):
Expand All @@ -23,6 +23,69 @@ def time_sparse_series_to_frame(self):
SparseDataFrame(self.series)


class sparse_array_constructor(object):
goal_time = 0.2

def setup(self):
np.random.seed(1)
self.int64_10percent = self.make_numeric_array(length=1000000, dense_size=100000, fill_value=0, dtype=np.int64)
self.int64_1percent = self.make_numeric_array(length=1000000, dense_size=10000, fill_value=0, dtype=np.int64)

self.float64_10percent = self.make_numeric_array(length=1000000, dense_size=100000, fill_value=np.nan, dtype=np.float64)
self.float64_1percent = self.make_numeric_array(length=1000000, dense_size=10000, fill_value=np.nan, dtype=np.float64)

self.object_nan_fill_value_10percent = self.make_object_array(length=1000000, dense_size=100000, fill_value=np.nan)
self.object_nan_fill_value_1percent = self.make_object_array(length=1000000, dense_size=10000, fill_value=np.nan)

self.object_non_nan_fill_value_10percent = self.make_object_array(length=1000000, dense_size=100000, fill_value=0)
self.object_non_nan_fill_value_1percent = self.make_object_array(length=1000000, dense_size=10000, fill_value=0)

def make_numeric_array(self, length, dense_size, fill_value, dtype):
arr = np.array([fill_value] * length, dtype=dtype)
indexer = np.unique(np.random.randint(0, length, dense_size))
arr[indexer] = np.random.randint(0, 100, len(indexer))
return (arr, fill_value, dtype)

def make_object_array(self, length, dense_size, fill_value):
elems = np.array(['a', 0.0, False, 1, 2], dtype=np.object)
arr = np.array([fill_value] * length, dtype=np.object)
indexer = np.unique(np.random.randint(0, length, dense_size))
arr[indexer] = np.random.choice(elems, len(indexer))
return (arr, fill_value, np.object)

def time_sparse_array_constructor_int64_10percent(self):
arr, fill_value, dtype = self.int64_10percent
SparseArray(arr, fill_value=fill_value, dtype=dtype)

def time_sparse_array_constructor_int64_1percent(self):
arr, fill_value, dtype = self.int64_1percent
SparseArray(arr, fill_value=fill_value, dtype=dtype)

def time_sparse_array_constructor_float64_10percent(self):
arr, fill_value, dtype = self.float64_10percent
SparseArray(arr, fill_value=fill_value, dtype=dtype)

def time_sparse_array_constructor_float64_1percent(self):
arr, fill_value, dtype = self.float64_1percent
SparseArray(arr, fill_value=fill_value, dtype=dtype)

def time_sparse_array_constructor_object_nan_fill_value_10percent(self):
arr, fill_value, dtype = self.object_nan_fill_value_10percent
SparseArray(arr, fill_value=fill_value, dtype=dtype)

def time_sparse_array_constructor_object_nan_fill_value_1percent(self):
arr, fill_value, dtype = self.object_nan_fill_value_1percent
SparseArray(arr, fill_value=fill_value, dtype=dtype)

def time_sparse_array_constructor_object_non_nan_fill_value_10percent(self):
arr, fill_value, dtype = self.object_non_nan_fill_value_10percent
SparseArray(arr, fill_value=fill_value, dtype=dtype)

def time_sparse_array_constructor_object_non_nan_fill_value_1percent(self):
arr, fill_value, dtype = self.object_non_nan_fill_value_1percent
SparseArray(arr, fill_value=fill_value, dtype=dtype)


class sparse_frame_constructor(object):
goal_time = 0.2

Expand All @@ -33,7 +96,7 @@ def time_sparse_from_scipy(self):
SparseDataFrame(scipy.sparse.rand(1000, 1000, 0.005))

def time_sparse_from_dict(self):
SparseDataFrame(dict(zip(range(1000), repeat([0]))))
SparseDataFrame(dict(zip(range(1000), itertools.repeat([0]))))


class sparse_series_from_coo(object):
Expand Down
2 changes: 1 addition & 1 deletion asv_bench/benchmarks/timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def setup(self):
self.no_freq = self.rng7[:50000].append(self.rng7[50002:])
self.d_freq = self.rng7[:50000].append(self.rng7[50000:])

self.rng8 = date_range(start='1/1/1700', freq='B', periods=100000)
self.rng8 = date_range(start='1/1/1700', freq='B', periods=75000)
self.b_freq = self.rng8[:50000].append(self.rng8[50000:])

def time_add_timedelta(self):
Expand Down
Loading

0 comments on commit ddc84d9

Please sign in to comment.