ASV: occasional asv failures on xlwt #19779

jreback · 2018-02-20T01:00:19Z

https://travis-ci.org/pandas-dev/pandas-ci/jobs/343616367
this is on current pandas master (this is just the CI job running it).

excel asv's

The following asvs benchmarks (if any) failed.
[ 43.02%] ··· Running io.excel.Excel.time_read_excel                  1/3 failed
                   xlwt      failed 
DONE displaying failed asvs benchmarks.

I have seen this work as well. maybe a race-condition?

The text was updated successfully, but these errors were encountered:

jreback · 2018-02-20T01:05:18Z

from (pandas) bash-3.2$ more asv_bench/benchmarks/io/excel.py

def setup(...):
      ....
        self.bio_write = BytesIO()
        self.bio_write.seek(0)
        self.writer_write = ExcelWriter(self.bio_write, engine=engine)

    def time_read_excel(self, engine):
        read_excel(self.bio_read)

    def time_write_excel(self, engine):
        self.df.to_excel(self.writer_write, sheet_name='Sheet1')
        self.writer_write.save()

I think the time_write_excel should have the .writer_write setup I think.

closes pandas-dev#19779

closes #19779

jorisvandenbossche · 2018-02-21T12:38:31Z

If it is defined in the setup function (as it is now), it should be available in the benchmark function. It would be strange that this solves it.

jreback · 2018-02-26T11:01:28Z

its passing 1 our of n times: https://travis-ci.org/pandas-dev/pandas-ci/jobs/345996588

jreback · 2018-02-26T11:01:41Z

cc @mroeschke @WillAyd if you have any ideas

WillAyd · 2018-02-26T17:02:11Z

Perhaps we should be explicitly closing the BytesIO objects that are getting created in a teardown? Due to the intermittency of it I'm wondering if the GC is taking an pass at closing those for us, but getting tripped up with the parallelized execution that asv provides

mroeschke · 2018-02-26T17:51:57Z

Looks to be specifically a problem with the time_read_excel benchmark.

File "/home/travis/build/pandas-dev/pandas-ci/pandas/asv_bench/benchmarks/io/excel.py", line 29, in time_read_excel
                    read_excel(self.bio_read)
File "/home/travis/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas-0.23.0.dev0+381.g8f1dfa74e-py3.6-linux-x86_64.egg/pandas/util/_decorators.py", line 172, in wrapper
                    return func(*args, **kwargs)
File "/home/travis/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas-0.23.0.dev0+381.g8f1dfa74e-py3.6-linux-x86_64.egg/pandas/util/_decorators.py", line 172, in wrapper
                    return func(*args, **kwargs)
File "/home/travis/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas-0.23.0.dev0+381.g8f1dfa74e-py3.6-linux-x86_64.egg/pandas/io/excel.py", line 315, in read_excel
                    io = ExcelFile(io, engine=engine)
File "/home/travis/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas-0.23.0.dev0+381.g8f1dfa74e-py3.6-linux-x86_64.egg/pandas/io/excel.py", line 391, in __init__
                    self.book = xlrd.open_workbook(file_contents=data)
File "/home/travis/miniconda3/envs/pandas/lib/python3.6/site-packages/xlrd/__init__.py", line 116, in open_workbook
                    with open(filename, "rb") as f:
                TypeError: expected str, bytes or os.PathLike object, not NoneType

Digging into xlrd.open_workbook for the file_contents variable.
http://www.lexicon.net/sjmachin/xlrd.html#xlrd.open_workbook-function

file_contents
... as a string or an mmap.mmap object or some other behave-alike object. If file_contents is supplied, filename will not be used, except (possibly) in messages.

Looks like filename=None is the default as well, but for some reason its being used despite the note above?

jreback · 2018-02-27T01:02:45Z

its funny because it has worked at times. really odd.

WillAyd · 2018-02-27T06:14:21Z

While it doesn't explain why this is happening I think if we add io.seek(0) just before the below line it will "fix" the issue at hand (at least it did locally for me):

pandas/pandas/io/excel.py

Line 390 in 74dbfd0

data = io.read()

jreback · 2018-02-27T10:16:35Z

can u replicate this in a test? (and then fix)?

closes pandas-dev#19779

mcrot · 2018-03-20T15:41:52Z

Hi all,

I'm completely new to pandas development and I've just prepared a working environment following the guide Contributing to pandas, because I wanted to contribute to some other issue. So I'm not fully sure whether this is the right place in order to address the following test failure, but for me it seems to be related to #19926:

When running the tests for pandas.io.excel

pytest pandas/tests/io/test_excel.py

it comes up with three failures (because of 3 parameters) for the test method TestXlrdReader.test_read_from_http_url:

F
pandas/tests/io/test_excel.py:557 (TestXlrdReader.test_read_from_http_url[.xls])
self = <pandas.tests.io.test_excel.TestXlrdReader object at 0x7fc1d823a1d0>
ext = '.xls'

    @tm.network
    def test_read_from_http_url(self, ext):
        url = ('https://raw.github.com/pandas-dev/pandas/master/'
               'pandas/tests/io/data/test1' + ext)
>       url_table = read_excel(url)

test_excel.py:562: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../util/_decorators.py:172: in wrapper
    return func(*args, **kwargs)
../../util/_decorators.py:172: in wrapper
    return func(*args, **kwargs)
../../io/excel.py:315: in read_excel
    io = ExcelFile(io, engine=engine)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.io.excel.ExcelFile object at 0x7fc1d82d50f0>
io = <http.client.HTTPResponse object at 0x7fc1d81e3320>, kwds = {}
err_msg = 'Install xlrd >= 0.9.0 for Excel support'
xlrd = <module 'xlrd' from '/home/mcrot/miniconda3/envs/pandas-dev/lib/python3.6/site-packages/xlrd/__init__.py'>
ver = (1, 1), engine = None

    def __init__(self, io, **kwds):
    
        err_msg = "Install xlrd >= 0.9.0 for Excel support"
    
        try:
            import xlrd
        except ImportError:
            raise ImportError(err_msg)
        else:
            ver = tuple(map(int, xlrd.__VERSION__.split(".")[:2]))
            if ver < (0, 9):  # pragma: no cover
                raise ImportError(err_msg +
                                  ". Current version " + xlrd.__VERSION__)
    
        # could be a str, ExcelFile, Book, etc.
        self.io = io
        # Always a string
        self._io = _stringify_path(io)
    
        engine = kwds.pop('engine', None)
    
        if engine is not None and engine != 'xlrd':
            raise ValueError("Unknown engine: {engine}".format(engine=engine))
    
        # If io is a url, want to keep the data as bytes so can't pass
        # to get_filepath_or_buffer()
        if _is_url(self._io):
            io = _urlopen(self._io)
        elif not isinstance(self.io, (ExcelFile, xlrd.Book)):
            io, _, _, _ = get_filepath_or_buffer(self._io)
    
        if engine == 'xlrd' and isinstance(io, xlrd.Book):
            self.book = io
        elif not isinstance(io, xlrd.Book) and hasattr(io, "read"):
            # N.B. xlrd.Book has a read attribute too
            if hasattr(io, 'seek'):
                # GH 19779
>               io.seek(0)
E               io.UnsupportedOperation: seek

../../io/excel.py:392: UnsupportedOperation

It seems like the HTTPResponse object returned by urllib.request.urlopen does not support seeking, although the seek() method is available.

This fixes the tests for me:

@@ -10,6 +10,7 @@ import os
 import abc
 import warnings
 import numpy as np
+from http.client import HTTPResponse
 
 from pandas.core.dtypes.common import (
     is_integer, is_float,
@@ -387,7 +388,9 @@ class ExcelFile(object):
             self.book = io
         elif not isinstance(io, xlrd.Book) and hasattr(io, "read"):
             # N.B. xlrd.Book has a read attribute too
-            if hasattr(io, 'seek'):
+            #
+            # http.client.HTTPResponse.seek() -> UnsupportedOperation exception
+            if not isinstance(io, HTTPResponse) and hasattr(io, 'seek'):
                 # GH 19779
                 io.seek(0)

Should I create a pull request here or a new issue or do I miss something in my setup such that the tests can't run?

My currently installed versions are:

INSTALLED VERSIONS
------------------
commit: 01882ba5b4c21b0caf2e6b9279fb01967aa5d650
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-116-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8

pandas: 0.23.0.dev0+657.g01882ba
pytest: 3.4.2
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.2
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None

WillAyd · 2018-03-20T15:47:01Z

@mcrot I would suggest that you open a new issue for that and then submit a PR referencing the issue

Closes pandas-dev#20434. Back in pandas-dev#19779 a call of a seek() method was added. This call fails on HTTPResponse instances with an UnsupportedOperation exception, so for this case a try..except wrapper was added here.

Closes #20434. Back in #19779 a call of a seek() method was added. This call fails on HTTPResponse instances with an UnsupportedOperation exception, so for this case a try..except wrapper was added here.

jreback added Performance Memory or execution speed performance IO Excel read_excel, to_excel labels Feb 20, 2018

jreback added this to the 0.23.0 milestone Feb 20, 2018

jreback added a commit to jreback/pandas that referenced this issue Feb 21, 2018

ASV: excel asv occasional failure

6198af6

closes pandas-dev#19779

jreback mentioned this issue Feb 21, 2018

ASV: excel asv occasional failure #19811

Merged

jreback closed this as completed in #19811 Feb 21, 2018

jreback added a commit that referenced this issue Feb 21, 2018

ASV: excel asv occasional failure (#19811)

aa59954

closes #19779

jreback reopened this Feb 26, 2018

WillAyd mentioned this issue Feb 27, 2018

Added seek to buffer to fix xlwt asv failure #19926

Merged

4 tasks

TomAugspurger closed this as completed in #19926 Feb 27, 2018

harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018

ASV: excel asv occasional failure (pandas-dev#19811)

c767f22

closes pandas-dev#19779

mcrot mentioned this issue Mar 21, 2018

UnsupportedOperation 'seek' when loading excel files from url #20434

Closed

WillAyd mentioned this issue Mar 21, 2018

BUG: Patch for skipping seek() when loading Excel files from url #20437

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASV: occasional asv failures on xlwt #19779

ASV: occasional asv failures on xlwt #19779

jreback commented Feb 20, 2018 •

edited

Loading

jreback commented Feb 20, 2018

jorisvandenbossche commented Feb 21, 2018

jreback commented Feb 26, 2018

jreback commented Feb 26, 2018

WillAyd commented Feb 26, 2018

mroeschke commented Feb 26, 2018

jreback commented Feb 27, 2018

WillAyd commented Feb 27, 2018

jreback commented Feb 27, 2018

mcrot commented Mar 20, 2018

WillAyd commented Mar 20, 2018

ASV: occasional asv failures on xlwt #19779

ASV: occasional asv failures on xlwt #19779

Comments

jreback commented Feb 20, 2018 • edited Loading

jreback commented Feb 20, 2018

jorisvandenbossche commented Feb 21, 2018

jreback commented Feb 26, 2018

jreback commented Feb 26, 2018

WillAyd commented Feb 26, 2018

mroeschke commented Feb 26, 2018

jreback commented Feb 27, 2018

WillAyd commented Feb 27, 2018

jreback commented Feb 27, 2018

mcrot commented Mar 20, 2018

WillAyd commented Mar 20, 2018

jreback commented Feb 20, 2018 •

edited

Loading