Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: disallow read_hdf mode='w' #13623

Closed
jkokorian opened this issue Jul 11, 2016 · 3 comments · Fixed by #13858
Closed

ERR: disallow read_hdf mode='w' #13623

jkokorian opened this issue Jul 11, 2016 · 3 comments · Fixed by #13858
Labels
Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore
Milestone

Comments

@jkokorian
Copy link

jkokorian commented Jul 11, 2016

Passing mode='w' to pd.read_hdf erases the target file immediately.

I believe this behavior is undesired, because deleting data is a side effect that no method called read_anything should have. The fact that a mode argument is even accepted is not documented anywhere. I just lost some experimental data this way, no backup...

Code Sample

import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.rand(10))

#write some data to a new hdf5 file
df.to_hdf('test.hdf','test')

#read back the data with mode='w'
df = pd.read_hdf('test.hdf','test',mode='w')

Expected Output

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-5659d158fdb7> in <module>()
----> 1 df = pd.read_hdf('test.hdf','test',mode='w')

C:\Anaconda2\lib\site-packages\pandas\io\pytables.pyc in read_hdf(path_or_buf, key, **kwargs)
    328                                  'multiple datasets.')
    329             key = keys[0]
--> 330         return store.select(key, auto_close=auto_close, **kwargs)
    331     except:
    332         # if there is an error, close the store

C:\Anaconda2\lib\site-packages\pandas\io\pytables.pyc in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
    660         group = self.get_node(key)
    661         if group is None:
--> 662             raise KeyError('No object named %s in the file' % key)
    663 
    664         # create the storer and axes

KeyError: 'No object named test in the file'

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jul 11, 2016

this is just a passed thru argument (not in the doc-string). No checking is done on mode. Addtl args are passed thru to PyTables to support various forms of opening stores (e.g. in-memory).

I suppose it could be checked, though this will be quite tricky. Stores are allowed to be opened in multiple processes in append/read-only mode. So I guess 'w' could be disallowed if mode is supplied. But passing tests will be a bit tricky.

a pull-request would be appreciated.

@jreback jreback added Difficulty Novice Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore labels Jul 11, 2016
@jreback jreback added this to the Next Major Release milestone Jul 11, 2016
@jreback jreback changed the title read_hdf mode='w' argument erases the hdf5 file ERR: disallow read_hdf mode='w' Jul 11, 2016
@jkokorian
Copy link
Author

I'll be happy to give it a try. However, this is the first time that I'm contributing to pandas, so I might need some guidance.

Am I correct that the check for mode='w' should be added to the read_hdf function in pandas/io/pytables.py (line 268)? How this function ends up as an instance method of DataFrame is still a mystery to me.

@jreback
Copy link
Contributor

jreback commented Jul 12, 2016

add as another named argument in read_hdf itself. you need to just disallow w if its passed, otherwise pass it thru. Need to test for passing r, r+, a (a) is the default.

contributing docs are here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants