Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for opening file from memory #652

Merged
merged 12 commits into from
May 1, 2017

Conversation

thehesiod
Copy link
Contributor

@thehesiod thehesiod commented Apr 28, 2017

fixes: #406 and #295, requires: Unidata/netcdf-c#394 to be fixed

@thehesiod
Copy link
Contributor Author

thehesiod commented Apr 28, 2017

btw unittests are failing because it can't find the nc_open_mem method, not sure if this should be dynamically looked up or bump up the minimum netCDF version required

@jswhit
Copy link
Collaborator

jswhit commented Apr 28, 2017

This is great, thanks!

To fix the issue with the tests, take a look at the check_api function in setup.py. You can add a check there to see if the include file has nc_open_mem. setup.py should then modify include/constants.pyx to add DEF HAS_NC_OPEN_MEM = 1. Thenc_open_mem calls in _netCDF4.pyx can then be wrapped with IF HAS_NC_OPEN_MEM:.

@thehesiod
Copy link
Contributor Author

btw, why is netCDF4/_netCDF4.c checked in? it's generated when setup.py is run.

@jswhit
Copy link
Collaborator

jswhit commented Apr 28, 2017

We'll also need a test.

_netCD4.c has to be manually generated. It should be created with all the values in include/constants.pyx set to 0. It's only used if cython is not installed.

@jswhit
Copy link
Collaborator

jswhit commented Apr 28, 2017

Probably we should just require cython and get rid of _netCDF.c in the repo.

@matthew-brett
Copy link
Contributor

Cython is super-easy to install these days - there are wheels for many versions for OSX, Windows, Linux.

@thehesiod
Copy link
Contributor Author

working on updates, btw mind if I PEP'ify some of the files? get a lot of warnings in PyCharm

@thehesiod
Copy link
Contributor Author

ok, made nc_open_mem lookup dynamic and added a test.

@thehesiod
Copy link
Contributor Author

thehesiod commented Apr 28, 2017

btw, I'm not even sure how the .c file here helps because isn't it specific to a python version / platform / cython version? Not sure how that ever worked correctly. The one in this PR is definitely wrong so need to figure out how to create a "correct" one.

@thehesiod
Copy link
Contributor Author

btw let me know how I should fix this new unittest error. I'm guessing it's compiling with all the settings off. Further it won't work until there's a version of netcdf-c with the fix

@shoyer
Copy link
Contributor

shoyer commented Apr 28, 2017

Probably we should just require cython and get rid of _netCDF.c in the repo.

The standard way to do this is to generate C files for upload to pypi, but not include them in the source.
cytoolz has a relatively sane version of this setup: https://github.com/pytoolz/cytoolz/blob/master/setup.py

@@ -1712,7 +1721,7 @@ references to the parent Dataset or Group.
the parent Dataset or Group."""

def __init__(self, filename, mode='r', clobber=True, format='NETCDF4',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make filename optional if memory is provided?

Even if netCDF-C needs a bogus filename argument, I imagine we could generate something at random.

Copy link
Contributor Author

@thehesiod thehesiod Apr 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets reflected in the Dataset.filepath attribute so I think we should keep it. My unittest actually validates this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I guess so. It just seems pretty pointless to me :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, weirdness :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some text should be added to the docstring noting that filename is only used to set the value of filepath when memory is specified.

@jswhit
Copy link
Collaborator

jswhit commented Apr 29, 2017

Looks like the _netCDF.c still uses nc_open_mem, so that will cause a failing test. To generate one that will work with the minimum version of the netcdf library, edit constants.pyx and set all the elements to zero, then cd _netCDF; cython -I../include _netCDF4.pyx, then commit the new version of _netCDF.c.

I'd like to remove _netCDF.c from the repo, but that should be a separate pull request.

Are the other failing tests due to the bug in Unidata/netcdf-c#394?

@thehesiod
Copy link
Contributor Author

thehesiod commented Apr 29, 2017

updated, so I think the tests are failing now because it's running the mem test with the feature disabled?

@jswhit
Copy link
Collaborator

jswhit commented Apr 29, 2017

The test fails on my machine even though I have 4.4.1 (with nc_open_mem). I guess this is because of Unidata/netcdf-c#394? If so, a solution would be to disable the test in run_all.py if __netcdf4libversion__ < 4.4.1.2 (assuming that the fix gets in the next release of netcdf-c).

@thehesiod
Copy link
Contributor Author

btw, it seems like we should be able to use a memoryview insead reading http://docs.cython.org/en/latest/src/userguide/memoryviews.html?highlight=memory%20views.html. Let me know what you guys think

"""
cdef int grpid, ierr, numgrps, numdims, numvars
cdef char *path
cdef char namstring[NC_MAX_NAME+1]

memset(&self._buffer, 0, sizeof(self._buffer))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably ought to add _buffer to _private_atts so it doesn't get mistaken for a netcdf attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@jswhit
Copy link
Collaborator

jswhit commented Apr 30, 2017

Don't see any real benefit to using memoryviews, since we're not slicing or manipulating the buffer in any way.

# so it will compile with versions <= 4.2.
NC_DISKLESS = 0x0008
NC_INMEMORY = 0x8000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like NC_INMEMORY is ever used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

class TestOpenMem(unittest.TestCase):
def test_mem_open(self):
# Needs: https://github.com/Unidata/netcdf-c/pull/400
if netCDF4.__netcdf4libversion__ >= '4.4.1.2':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just add this check in run_all.py so the test is skipped altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually updated the test to test different scenarios of this

@jswhit
Copy link
Collaborator

jswhit commented May 1, 2017

Merging now - thanks again for this contribution!

@jswhit jswhit merged commit 37f582b into Unidata:master May 1, 2017
@thehesiod
Copy link
Contributor Author

thank you for the quick responses! pleasure!

@TWellman
Copy link

Is there a working example/documentation based on this discussion for pulling a dataset via a url into memory using the newest netCDF4 version? I've read material on adding a memory=bytes construct in the init function to use the in-memory read capability from nc_open_mem(). However, my initial attempts have produced file location errors - "No such file or directory". I also ran a portion of the test script provided at https://github.com/Unidata/netcdf4-python/blob/master/test/tst_open_mem.py. An error was raised for netCDF4.netcdf4libversion = '4.4.1.1' (< 4.4.1.2). Do I need to pull from Unidata/netcdf-c#400 and integrate netcdf library changes directly into the package? Clarification is appreciated!

@TWellman
Copy link

I am testing on a Jupyter notebook using Python 3.5 and netCDF4 version 1.3.0, but could switch if needed.

@jswhit
Copy link
Collaborator

jswhit commented Jul 11, 2017

I think you need to link against the updated netcdf-c library. This one should work:

https://github.com/Unidata/netcdf-c/releases/tag/v4.5.0-rc1

@dopplershift
Copy link
Member

Another option is to pass it a filename to an empty file, like from tempfile.NamedTemporaryFile

@TWellman
Copy link

Great, substituting a completely unrelated local netCDF file as the "filename" but also passing a memory file in byte format proved to be successful. The returned file dimensions were consistent with the memory file rather than the unrelated local netCDF file with different dimensions/variables. So that checks out.

I also attempted unsuccessfully to use the temp file option using FILE_NAME = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name, mainly because it is much cleaner in my view. However, using a temp file as prescribed returned an OSError: NetCDF: Unknown file format. Should I format the temp file in a specific way? Thank you.

@thehesiod
Copy link
Contributor Author

I think "filename" just needs to be non-empty string

@TWellman
Copy link

If I just input a text string the code returns an OSError: No such file or directory, e.g. using nc_file = netCDF4.Dataset('temp_file', mode = 'r', memory=bytefile)

@thehesiod
Copy link
Contributor Author

what version of netcdf-c ?

@dopplershift
Copy link
Member

@TWellman I needed it to be a netCDF file (apparently--I guess I forgot about it). This is what I've done for now until netCDF 4.5 is released:

    with NamedTemporaryFile(suffix='.nc') as temp:
        # Create a temporary netCDF file to work around bug in netCDF C 4.4.1.1
        # We shouldn't actually need any file on disk.
        nc_temp = Dataset(temp.name, 'w').close()
        nc =  Dataset(temp.name, memory=mem_buffer)

@TWellman
Copy link

I should probably upgrade as suggested earlier. I am using netCDF4.netcdf4libversion = '4.4.1.1', netCDF4 version 1.3.0.

@TWellman
Copy link

Great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reading Dataset from memory
6 participants