Incomplete Data Reading from URL #836

sgdecker · 2018-08-22T17:26:10Z

Example Code

import matplotlib.pyplot as plt
import numpy as np
import netCDF4

url = ('http://thredds-test.unidata.ucar.edu/thredds/dodsC/'
       'satellite/goes16/GRB16/ABI/CONUS/Channel02/current/'
       'OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc')

ds = netCDF4.Dataset(url)

values = ds.variables['Rad'][:,:]
print(np.count_nonzero(values==-20.289911))

plt.imshow(values)
plt.show()

Problem

I am not sure if this is an issue with netCDF4, the server, or my machine, but I am getting a value on the order of 40 million printing out, and the plot shows incomplete data:

Each time I run the example code, I get slightly different results. The last three runs indicated 42159004, 38724464, and 44907954 missing pixels, respectively.

I would expect either a value of 0 printing out (and no missing data), or some sort of error message indicating there was trouble retrieving the data.

I am using 1.4.0

The text was updated successfully, but these errors were encountered:

dopplershift · 2018-08-22T19:26:37Z

So I ran your code just now and got a complete image with no missing data. It's possible the data file on the server just wasn't complete. Was this the most recent file when you tried?

sgdecker · 2018-08-22T19:56:09Z

Originally, it was the most recent file, but I just reran the code and got 44037412 missing pixels.

When I try the code on a different machine (with 1.3.1), I see the same as you: No missing data.

I tried creating a new barebones conda environment (brings in netCDF4 1.4.0 by default) and I still got missing pixels. Then I tried an environment with netCDF4 1.3.1, since that worked on the other machine, and, lo and behold, there are no missing data!

Which version of netCDF4 are you running? I could try 1.4.1, which I see has been released, I guess through a manual install, if that would be useful.

dopplershift · 2018-08-22T19:59:22Z

My results were with 1.4.1 from conda-forge on my mac.

dopplershift · 2018-08-22T20:02:51Z

Works for me with 1.4.0 on Python 3.6

sgdecker · 2018-08-22T20:08:44Z

OK, I just tried a conda-forge-based environment with 1.4.1, and I am getting missing pixels. Here is the output of conda list for this environment:

# packages in environment at /home/decker/local/anaconda3/envs/nctest3:
#
# Name                    Version                   Build  Channel
blas                      1.1                    openblas    conda-forge
bzip2                     1.0.6                h470a237_2    conda-forge
ca-certificates           2018.8.13            ha4d7672_0    conda-forge
certifi                   2018.8.13                py36_0    conda-forge
cftime                    1.0.1            py36h7eb728f_0    conda-forge
curl                      7.61.0               h93b3f91_1    conda-forge
cycler                    0.10.0                     py_1    conda-forge
dbus                      1.13.0               h3a4f0e9_0    conda-forge
expat                     2.2.5                hfc679d8_1    conda-forge
fontconfig                2.13.0               h65d0f4c_5    conda-forge
freetype                  2.9.1                h6debe1e_1    conda-forge
gettext                   0.19.8.1                      0    conda-forge
glib                      2.55.0               h464dc38_2    conda-forge
gst-plugins-base          1.12.5               hde13a9d_0    conda-forge
gstreamer                 1.12.5               h61a6719_0    conda-forge
hdf4                      4.2.13                        0    conda-forge
hdf5                      1.10.2               hc401514_1    conda-forge
icu                       58.2                 hfc679d8_0    conda-forge
jpeg                      9c                   h470a237_0    conda-forge
kiwisolver                1.0.1            py36h2d50403_2    conda-forge
krb5                      1.14.6                        0    conda-forge
libffi                    3.2.1                hfc679d8_4    conda-forge
libgcc-ng                 7.2.0                hdf63c60_3    conda-forge
libgfortran               3.0.0                         1    conda-forge
libiconv                  1.15                 h470a237_2    conda-forge
libnetcdf                 4.6.1                he6cff42_7    conda-forge
libpng                    1.6.35               ha92aebf_0    conda-forge
libssh2                   1.8.0                h5b517e9_2    conda-forge
libstdcxx-ng              7.2.0                hdf63c60_3    conda-forge
libuuid                   2.32.1               h470a237_0    conda-forge
libxcb                    1.13                 h470a237_2    conda-forge
libxml2                   2.9.8                h422b904_3    conda-forge
matplotlib                2.2.3            py36h8e2386c_0    conda-forge
ncurses                   6.1                  hfc679d8_1    conda-forge
netcdf4                   1.4.1            py36h62672b6_0    conda-forge
numpy                     1.15.1          py36_blas_openblashd3ea46f_0  [blas_openblas]  conda-forge
openblas                  0.2.20                        8    conda-forge
openssl                   1.0.2o               h470a237_1    conda-forge
pcre                      8.41                 h470a237_2    conda-forge
pip                       18.0                     py36_1    conda-forge
pthread-stubs             0.4                  h470a237_1    conda-forge
pyparsing                 2.2.0                      py_1    conda-forge
pyqt                      5.6.0            py36h8210e8a_6    conda-forge
python                    3.6.6                h5001a0f_0    conda-forge
python-dateutil           2.7.3                      py_0    conda-forge
pytz                      2018.5                     py_0    conda-forge
qt                        5.6.2                hf70d934_9    conda-forge
readline                  7.0                  haf1bffa_1    conda-forge
setuptools                40.0.0                   py36_1    conda-forge
sip                       4.18                     py36_1    conda-forge
six                       1.11.0                   py36_1    conda-forge
sqlite                    3.24.0               h2f33b56_0    conda-forge
tk                        8.6.8                         0    conda-forge
tornado                   5.1              py36h470a237_1    conda-forge
wheel                     0.31.1                   py36_1    conda-forge
xorg-libxau               1.0.8                h470a237_6    conda-forge
xorg-libxdmcp             1.1.2                h470a237_7    conda-forge
xz                        5.2.4                h470a237_1    conda-forge
zlib                      1.2.11               h470a237_3    conda-forge

The environment that works on my machine:

# packages in environment at /home/decker/local/anaconda3/envs/nctest2:
#
# Name                    Version                   Build  Channel
blas                      1.0                         mkl  
bzip2                     1.0.6                h14c3975_5  
ca-certificates           2018.03.07                    0  
certifi                   2018.8.13                py36_0  
curl                      7.61.0               h84994c4_0  
cycler                    0.10.0                   py36_0  
dbus                      1.13.2               h714fa37_1  
expat                     2.2.5                he0dffb1_0  
fontconfig                2.13.0               h9420a91_0  
freetype                  2.9.1                h8a8886c_0  
glib                      2.56.1               h000015b_0  
gst-plugins-base          1.14.0               hbbd80ab_1  
gstreamer                 1.14.0               hb453b48_1  
hdf4                      4.2.13               h3ca952b_2  
hdf5                      1.10.1               h9caa474_1  
icu                       58.2                 h9c2bf20_1  
intel-openmp              2018.0.3                      0  
jpeg                      9b                   h024ee3a_2  
kiwisolver                1.0.1            py36hf484d3e_0  
libcurl                   7.61.0               h1ad7b7a_0  
libedit                   3.1.20170329         h6b74fdf_2  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 8.2.0                hdf63c60_1  
libgfortran-ng            7.3.0                hdf63c60_0  
libnetcdf                 4.5.0                hc9a163a_7  
libpng                    1.6.34               hb9fc6fc_0  
libssh2                   1.8.0                h9cfc8f7_4  
libstdcxx-ng              8.2.0                hdf63c60_1  
libuuid                   1.0.3                h1bed415_2  
libxcb                    1.13                 h1bed415_1  
libxml2                   2.9.8                h26e45fe_1  
matplotlib                2.2.3            py36hb69df0a_0  
mkl                       2018.0.3                      1  
mkl_fft                   1.0.4            py36h4414c95_1  
mkl_random                1.0.1            py36h4414c95_1  
ncurses                   6.1                  hf484d3e_0  
netcdf4                   1.3.1            py36h73d701b_2  
numpy                     1.15.0           py36h1b885b7_0  
numpy-base                1.15.0           py36h3dfced4_0  
openssl                   1.0.2p               h14c3975_0  
pcre                      8.42                 h439df22_0  
pip                       10.0.1                   py36_0  
pyparsing                 2.2.0                    py36_1  
pyqt                      5.9.2            py36h22d08a2_0  
python                    3.6.6                hc3d631a_0  
python-dateutil           2.7.3                    py36_0  
pytz                      2018.5                   py36_0  
qt                        5.9.6                h52aff34_0  
readline                  7.0                  ha6073c6_4  
setuptools                40.0.0                   py36_0  
sip                       4.19.8           py36hf484d3e_0  
six                       1.11.0                   py36_1  
sqlite                    3.24.0               h84994c4_0  
tk                        8.6.7                hc745277_3  
tornado                   5.1              py36h14c3975_0  
wheel                     0.31.1                   py36_0  
xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               ha838bed_2

sgdecker · 2018-08-22T20:52:06Z

I don't have a Mac handy, but trying a Windows 10 machine (the other tests were on CentOS 7 machines) makes matters even more confusing: I have missing pixels with both 1.4.1 and 1.3.1.

sgdecker · 2018-08-22T21:53:37Z

As a final test for now, I took the same Windows 10 machine home, so it's on my home network rather than the campus network, and there was no change in the results.

dopplershift · 2018-08-22T21:55:17Z

So we have all this data...and I have no clue what conclusion to draw from it. 😆

sgdecker · 2018-08-23T16:00:14Z

My conclusion: Use a Mac!

Interestingly, I found a different conda environment on CentOS 7 that has missing pixels even with 1.3.1, but it has libnetcdf 4.6.1 (in common with my other broken environment). So my current working hypothesis is that the Python module isn't the issue, libnetcdf is. 4.5.0 is OK (on CentOS anyway), but 4.6.1 is not.

sgdecker · 2018-08-23T16:23:41Z

The combination netcdf4 1.3.1 and libnetcdf 4.5.0 also works on Windows 10. It appears it is libnetcdf 4.6.1 that is at fault.

dopplershift · 2018-08-23T16:25:22Z

Unfortunately for that theory (which might not be completely wrong), 4.6.1 is working fine for me here.

What happens if instead of ds.variables['Rad'][:,:] you plot ds.variables['Rad'][::4, ::4]? That will download 1/16 of the data.

sgdecker · 2018-08-23T16:38:49Z

OK, I made that tweak. Running the revised program on CentOS with 1.4.1/4.6.1 results in a hang. Even Control-C didn't give me a prompt back, but using kill did.

sgdecker · 2018-08-23T16:54:04Z

I ran the program a second time, and after 14 minutes, it finally crashes with:

syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <!DOCTYPE^ HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>503 Service Unavailable</title></head><body><h1>Service Unavailable</h1><p>The server is temporarily unable to service yourrequest due to maintenance downtime or capacityproblems. Please try again later.</p></body></html>
Traceback (most recent call last):
  File "sat_data_missing_2.py", line 11, in <module>
    values = ds.variables['Rad'][::4,::4]
  File "netCDF4/_netCDF4.pyx", line 4105, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4/_netCDF4.pyx", line 5016, in netCDF4._netCDF4.Variable._get
  File "netCDF4/_netCDF4.pyx", line 1745, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: DAP server error

sgdecker · 2018-08-23T17:00:00Z

However, I see the server is indeed down, so the above may be a red herring.

sgdecker · 2018-08-23T17:30:46Z

Now that the server is back up, I can report the program hangs on Windows 10 with 1.4.1/4.6.1 as well.

ocefpaf · 2018-08-23T17:58:51Z

@sgdecker if I just read this issue quickly and it seems that the issues are all on Windows, right? I'll try to get a Windows machine and test it.

Note that netcdf 4.6.1 does not build properly on Windows* so we had to add tons of patches and they may be the cause of the problems you are experiencing.

* We had issue building it for Python 2.7, backporting patches that fixed OPeNDAP problems, etc.

sgdecker · 2018-08-23T18:02:47Z

@ocefpaf no I am also having the issue on CentOS, which is exhibiting the same behavior for me as Windows 10: 1.3.1/4.5.0 is fine, but 1.3.1/4.6.1 and 1.4.1/4.6.1 are not.

rsignell-usgs · 2018-08-23T18:22:05Z

I think this might help us narrow this problem down.

I have two different linux machines, each with identical versions of netcdf4, libnetcdf and curl from conda-forge:

# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
libnetcdf                 4.6.1                h5e45101_3    conda-forge
netcdf4                   1.4.0            py36h62672b6_1    conda-forge
curl                      7.61.0               h93b3f91_1    conda-forge

One works fine, the other displays the problem!
https://gist.github.com/rsignell-usgs/f3e4452b0bd7445178291e0551aa7bea

@DennisHeimbigner does netcdf4 access of OPeNDAP URLs depend on any other packages or could there be some difference caused by internet connectivity/timeout?

I tried setting HTTP.TIMEOUT=25 in both ~/.daprc and ~/.dodsrc per this conversation but it didn't seem to have any impact.

sgdecker · 2018-08-23T19:00:26Z

I just got my hands on a Mac, and, contrary to @dopplershift 's experience, I am seeing the same behavior (with the original test program) as with the other machines I've tried: 1.4.1/4.6.1 has missing data, but 1.3.1/4.5.0 is fine.

dopplershift · 2018-08-23T19:15:22Z

This makes me think so kind of network issue is involved--I'm cheating because I'm sitting next to the data. 😁

I will say that I do see a problem with the slicing I suggested. It's not technically a hang; it just took > 45 minutes to error out with an IndexError. Which is nuts because with the plain slices it took < 45 minutes and completed with out error (or missing data) for me. I'm pretty sure that's now how server-side subsetting is supposed to work. 😳

Addendum: Oh W.T.F. 😡 (cc @lesserwhirls @jrleeman )

128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7904] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7908] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7912] HTTP/1.1" 200 220 "-" "oc4.6.1" 4
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7916] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7920] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7924] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7928] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7932] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7936] HTTP/1.1" 200 220 "-" "oc4.6.1" 4
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7940] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7944] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7948] HTTP/1.1" 200 220 "-" "oc4.6.1" 4
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7952] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7956] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7960] HTTP/1.1" 200 220 "-" "oc4.6.1" 5
165.230.171.64 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[192][4700] HTTP/1.1" 200 220 "-" "oc4.5.0" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7964] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7968] HTTP/1.1" 200 220 "-" "oc4.6.1" 10

The client is requesting individual grid points, so it's making a round-trip request to download. 3. bytes. 😱 That seems...like something that should be improved.

ocefpaf · 2018-08-23T19:22:43Z

This makes me think so kind of network issue is involved

I just tested many versions of the conda package and pip and I could not figure out a pattern besides that, when I get the whole data it takes a while but when I get the incomplete version it is very fast. So yeah, it may be network related, but why some of the conda versions are "OK" returning the zeroes instead of keep waiting for the data puzzles me. Maybe something changed in latest curl, I'll keep digging...

DennisHeimbigner · 2018-08-23T19:24:12Z

Is someone in a position to run a packet sniffer on this. In particular, to look
for a dropped connection? This may be related to this
Unidata/netcdf-c#1074

dopplershift · 2018-08-23T19:32:05Z

If @sgdecker (or anyone seeing the missing data) can send me their IP address (here or email), I can look at the server logs and see if I see anything.

sgdecker · 2018-08-23T19:32:06Z

I should be able to run tcpdump on my machine, but I've never used it before, so I don't know how exactly I should invoke it to get the data that would be useful. If that is what you have in mind, and there is a recipe I could follow, I can try it out.

sgdecker · 2018-08-23T19:32:47Z

My CentOS machine is 165.230.171.64

DennisHeimbigner · 2018-08-23T20:30:15Z

Sorry, I have not used a packet sniffer in a long time, so I can't help.

dopplershift · 2018-08-23T23:55:27Z

Opened #838 for issue about the strided access taking forever.

dopplershift · 2018-08-24T00:40:47Z

So, I've been able to reproduce on my system without changing my environment, by running from my home (even though I've got a 1Gb/s connection). So here's a log of my opendap download attempts:

8.42.77.33 - - [23/Aug/2018:17:39:00 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 119494799 "-" "oc4.6.1" 10010
8.42.77.33 - - [23/Aug/2018:17:43:44 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.5.0" 13751
8.42.77.33 - - [23/Aug/2018:17:47:28 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 232728623 "-" "oc4.6.1" 10016
8.42.77.33 - - [23/Aug/2018:17:51:18 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.5.0" 9032
8.42.77.33 - - [23/Aug/2018:17:51:39 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.5.0" 8587
8.42.77.33 - - [23/Aug/2018:17:55:53 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.6.1" 8549
8.42.77.33 - - [23/Aug/2018:17:56:13 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 193928279 "-" "oc4.6.1" 10005
8.42.77.33 - - [23/Aug/2018:18:00:16 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 233694335 "-" "oc4.6.1" 10012

The interesting parts to glean are that the request should return 240000223 bytes. For netcdf 4.5.0, this amount of data is always returned. For 4.6.1, this amount only seems to succeed if the time of the connection is less than, it seems, 10000 (ms?). This time effect is even more prevalent on @sgdecker 's downloads:

165.230.171.64 - - [23/Aug/2018:09:54:51 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 54211031 "-" "oc4.6.1" 10000
165.230.171.64 - - [23/Aug/2018:09:55:44 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.5.0" 27760
165.230.171.64 - - [23/Aug/2018:10:09:41 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.5.0" 28281
165.230.171.64 - - [23/Aug/2018:10:29:06 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 84729167 "-" "oc4.6.1" 9999
165.230.171.64 - - [23/Aug/2018:13:39:18 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 61339295 "-" "oc4.6.1" 10000

Here we see with 4.6.1, his connections always stop right around 10000, whereas with 4.5.0 it lasts until 20k, and the correct amount of data is returned. On my machine, I can confirm 4.6.1 is problematic both from conda-forge and Homebrew. 4.5.1 seems to have no problem. This is in a conda environment where the only thing changing is libnetcdf (no changes to netcdf4-python or libcurl).

@DennisHeimbigner did something about opendap/curl change in 4.6.1? (cc @WardF )

DennisHeimbigner · 2018-08-24T01:56:19Z

I do not think so, but I will have to review the pull requests.
We recently saw an effect when switching the Tomcat version
for url escape characters. Is is possible that there is some new timeout.
Are the Tomcat/Apache version in these tests the same?

dopplershift · 2018-08-24T17:17:43Z

@DennisHeimbigner This is all using the same running TDS 5.0 instance on http://thredds-test.unidata.ucar.edu .

always use nc_get_vars for strided access over http (issue #836)

Add the ability to set some additional curlopt values via .daprc (aka .dodsrc). This effects both DAP2 and DAP4 protocols. Related issues: [1] re: esupport: KOZ-821332 [2] re: github issue Unidata/netcdf4-python#836 [3] re: github issue #1074 1. CURLOPT_BUFFERSIZE: Relevant to [1]. Allow user to set the read/write buffersizes used by curl. This is done by adding the following to .daprc (aka .dodsrc): HTTP.READ.BUFFERSIZE=n where n is the buffersize in bytes. There is a built-in (to curl) limit of 512k for this value. 2. CURLOPT_TCP_KEEPALIVE (and CURLOPT_TCP_KEEPIDLE and CURLOPT_TCP_KEEPINTVL): Relevant (maybe) to [2] and [3]. Allow the user to turn on KEEPALIVE This is done by adding the following to .daprc (aka .dodsrc): HTTP.KEEPALIVE=on|n/m If the value is "on", then simply enable default KEEPALIVE. If the value is n/m, then enable KEEPALIVE and set KEEPIDLE to n and KEEPINTVL to m.

add 'master_file' kwarg to MFDataset.__init__ (issue #836)

efiring · 2018-08-30T02:58:30Z

Maybe a related problem, maybe different, but the following (as a script):

from netCDF4 import Dataset
d = Dataset('http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/monthly_v2')
u = d.variables['uwnd'][:]

consistently fails for me with conda-forge libnetcdf 4.6.0 and 4.6.1, but not with 4.5.0-3. Output is:

(nc3) ~/Desktop $ python CCMP_failure.py
Traceback (most recent call last):
  File "CCMP_failure.py", line 3, in <module>
    u = d.variables['uwnd'][:]
  File "netCDF4/_netCDF4.pyx", line 4060, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4/_netCDF4.pyx", line 4950, in netCDF4._netCDF4.Variable._get
  File "netCDF4/_netCDF4.pyx", line 1733, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: DAP failure

This is with a Mac and with Linux. With the URL above, the request works for some time slices, but not for others. With a related URL, I can't even get a single time slice; the following fails:

from netCDF4 import Dataset
d = Dataset('http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2')
print(d.variables['uwnd'].shape)
u = d.variables['uwnd'][0]

ocefpaf · 2018-08-30T03:02:32Z

@dopplershift at this point I'm inclined to revert all of conda-forge's builds to libnetcdf 4.5.0 :-(

It will be quite painful to re-build everything but at least users will get a working environment again. (Note that wheels are still building with 4.4.x, due to Python 2.7 on Windows.)

DennisHeimbigner · 2018-08-30T03:04:54Z

Can someone try this experiment.
Modify the url to look like this:
"[log][show=fetch]http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2"
This assumes that python does not attempt to do anything with the url other than pass it to
nc_open.

efiring · 2018-08-30T03:08:33Z

(py36) ~/Desktop $ python CCMP2.py
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dds
Log:fetch complete: 0.008 secs
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.das
Log:fetch complete: 0.002 secs
Log:prefetchable: lat=628
Log:prefetchable: lon=1440
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dds
Log:fetch complete: 0.002 secs
(42212, 628, 1440)
Log:prefetch: lat
Log:prefetch: lon
Log:prefetch.final: lat*[0:627/628],lon*[0:1439/1440]
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dods?lat,lon
Log:fetch complete: 0.003 secs
Log:prefetch.complete
Log:prefetch.vars:  lat lon

Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dods?uwnd.uwnd[0][0:627][0:1439]
Log:curl error: Timeout was reached
Log:fetch complete: 10.004 secs
Traceback (most recent call last):
  File "CCMP2.py", line 6, in <module>
    u = d.variables['uwnd'][0]
  File "netCDF4/_netCDF4.pyx", line 4105, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4/_netCDF4.pyx", line 5016, in netCDF4._netCDF4.Variable._get
  File "netCDF4/_netCDF4.pyx", line 1745, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: DAP failure
(py36) ~/Desktop $

DennisHeimbigner · 2018-08-30T03:20:23Z

There may be two problems here.

the curl timeout is set at 10 secs (that may be my fault).
the server is taking as much as 10 sec to compute the response: is that normal?

Let me suggest another experiment.
WRT #1. create a file in your home directory called .daprc (or .dodsrc)
with this line in it:
HTTP.TIMEOUT=time
where time is the timeout in seconds. Zero is supposed to mean never time out, but
I would suggest using an actual large integer such as 1000.

Meanwhile, I will see if I am the one setting the timeout to 10 seconds.

efiring · 2018-08-30T04:05:43Z

OK, now it looks like I am waiting for 1000 s to elapse.

DennisHeimbigner · 2018-08-30T04:16:06Z

That is odd. It implies that the server is taking an very long time to
complete its response. I suppose one could set the HTTP.TIMEOUT value
to something horrific like 100000 (seconds) to try to see if the server
ever finishes its response.

In any case, I do set the default timeout to 10 seconds. Should I change
that to zero so that the connection will never timeout?

DennisHeimbigner · 2018-08-30T04:24:21Z

On the other hand, the uwnd variable is 152692623360 bytes (~152 gig),
so I guess it should take a long time to download.

DennisHeimbigner · 2018-08-30T04:26:45Z

As an aside, I have a fix in to allow the setting of the curl download buffer size
(currently something like 16k? by default) It will allow setting it to is maximum
of 512k. This will probably help some, but with such a large variable, its effect
may be limited.

efiring · 2018-08-30T04:26:57Z

But I was downloading only one time slice, so it should have been quick.

It took over 500 s. It looks like at least part of the problem here is server-side. I'm trying this url now with the earlier libnetcdf, and it is still taking a long time. I'm in contact with the people running the server, so we can look into that.
Was the default timeout changed for 4.6?

py36) ~/Desktop $ python CCMP2.py
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dds
Log:fetch complete: 0.045 secs
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.das
Log:fetch complete: 0.006 secs
Log:prefetchable: lat=628
Log:prefetchable: lon=1440
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dds
Log:fetch complete: 0.005 secs
(42212, 628, 1440)
Log:prefetch: lat
Log:prefetch: lon
Log:prefetch.final: lat*[0:627/628],lon*[0:1439/1440]
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dods?lat,lon
Log:fetch complete: 0.012 secs
Log:prefetch.complete
Log:prefetch.vars:  lat lon

Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dods?uwnd.uwnd[0][0:627][0:1439]
Log:fetch complete: 578.212 secs

efiring · 2018-08-30T04:30:29Z

As you can see above, the download was only 628x1440 floats, so unless they had to be collected from a huge number of files it should have been very fast. I would be surprised if a single time slice is not in a single file.

efiring · 2018-08-30T04:41:17Z

I'm beginning to think that the problem I originally reported might be an interaction between the timeout value and a badly-behaving (or at least oddly-behaving) server, since the test above took slightly longer (611 s) just now with libnetcdf 4.4.1.1.

dopplershift · 2018-08-30T04:41:59Z

@DennisHeimbigner I'm not sure about @efiring 's server, since it looks like it's a straight DODS server, but the original problem was from our own TDS running on http://thredds.ucar.edu. I would say that 10 seconds is way too quick. If that's the time the connection takes to finish, that's going to have to allow for download time. Speaking as someone on hotel wifi right now, we cannot be assuming that users have a performant network connection.

So if that means setting it to something large like 1800 (half an hour) or just disabling it altogether, I leave that up to the netCDF team or other interested parties to decide. But I can say that with netCDF 4.5, the connections were finishing in 15-25 seconds, so the new 10 seconds is (relative to real-world) ridiculously short.

DennisHeimbigner · 2018-08-30T17:10:08Z

I am going to set the default timeout to a much larger value. I dislike setting it to
infinity. Probably something like 1/2 hour should cover essentially all cases.

The slow response time from the server is baffling. Some possible causes (all unlikely):

the server is taking a long time to compute and assemble the slice ready for shipping.
the http protocol is taking a long time to ship the data (maybe http chunk is small)
there is some kind of proxy thru which the server is sending data.
Looking at the server logs might tell you if NetCDF4 interface compile error #1 is the issue.

ocefpaf · 2018-08-30T17:22:45Z

@DennisHeimbigner while you fix this upstream I'm looking for a short term solution to the users of the packages we are building on conda-forge. My options are:

rebuilding everything against netcdf 4.4.x, which is the last version that "just works" and we have packages on Windows and *nix;
rebuild *nix with 4.5.x and Windows with 4.4.x so we don't break nco users that need 4.5.x features on *nix;
rebuild 4.6.1 with a higher timeout value.

Both 1 and 2 are a lot of work and will take some time. So my question is: is there a way to do 3 with a flag or something? Or, if you point me to a PR when you address it, I can try to backport it.

DennisHeimbigner · 2018-08-30T17:33:00Z

You also can use .daprc to set the timeout as I indicated in a previous comment.
If you want to modify the code to use a longer timeout, then do this:
In libdispatch/dauth.c, about line 33, change
"HTTP.TIMEOUT","10", /*seconds */
to
"HTTP.TIMEOUT","1800", /*seconds */
That will give a 30minute timeout and is the value I plan
to change it to (unless you have a preferable value).
If you change the value to zero, then the timeout should
be infinite, but IMO that is not a good idea.
Let me know what you choose.

ocefpaf · 2018-08-30T17:34:58Z

You also can use .daprc to set the timeout as I indicated in a previous comment.

My guess is that a .daprc is something for the user space, no?

If you want to modify the code to use a longer timeout, then do this:
In libdispatch/dauth.c, about line 33, change "HTTP.TIMEOUT","10", /*seconds */
to "HTTP.TIMEOUT","1800", /*seconds */

I'll give that a try. Thanks.

DennisHeimbigner · 2018-08-30T18:42:55Z

My guess is that a .daprc is something for the user space, no?

Yes, each user would have to set this up.

ocefpaf · 2018-08-30T23:03:05Z

The dataset mentioned in #836 (comment) seems to be gone but I tested the longer time out PR with @efiring's example and, after a long wait, I get the correct data:

libnetcdf 4.6.1: http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/ocefpaf/5f031dc4abf4d65fce273589537fe477/raw/feba58b922ec1ba86437047e64c77a4b3a821d82/netcdf4-test.ipynb

Here are the results for the same test under other versions, sometimes I get all zeroes, sometimes I get the proper data:

marekmarcos · 2020-05-31T07:49:26Z

I am looking for a script to extract the variable T3P1_IntegralProtonFlux from files and create a CSV or SQLITE file from them.
The files are here:

https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/seis-l1b-sgps/2020/01/

Ultimately, I want to combine data from files from 1/1/2019 to today.

jswhit added a commit that referenced this issue Aug 24, 2018

always use nc_get_vars for strided access over http (issue #836)

b07d2a1

jswhit added a commit that referenced this issue Aug 24, 2018

Merge pull request #839 from Unidata/issue836

a5e26f3

always use nc_get_vars for strided access over http (issue #836)

DennisHeimbigner mentioned this issue Aug 26, 2018

Add the ability to set some additional curlopt values Unidata/netcdf-c#1126

Merged

jswhit added a commit that referenced this issue Aug 27, 2018

Merge pull request #837 from Unidata/master_file

4fd3f9d

add 'master_file' kwarg to MFDataset.__init__ (issue #836)

ocefpaf mentioned this issue Aug 30, 2018

re-pin libnetcdf conda-forge/conda-forge-pinning-feedstock#114

Closed

ocefpaf mentioned this issue Aug 30, 2018

Higher timeout conda-forge/libnetcdf-feedstock#53

Merged

ocefpaf mentioned this issue Aug 30, 2018

Low time out values for libnetcdf 4.6.1 AnacondaRecipes/libnetcdf-feedstock#5

Closed

rsignell-usgs mentioned this issue Oct 23, 2018

Problems with distributed and opendap netCDF endpoint pydata/xarray#2503

Closed

Incomplete Data Reading from URL #836

Incomplete Data Reading from URL #836

Comments

sgdecker commented Aug 22, 2018

Example Code

Problem

dopplershift commented Aug 22, 2018

sgdecker commented Aug 22, 2018

dopplershift commented Aug 22, 2018

dopplershift commented Aug 22, 2018

sgdecker commented Aug 22, 2018

sgdecker commented Aug 22, 2018

sgdecker commented Aug 22, 2018

dopplershift commented Aug 22, 2018

sgdecker commented Aug 23, 2018

sgdecker commented Aug 23, 2018

dopplershift commented Aug 23, 2018

sgdecker commented Aug 23, 2018

sgdecker commented Aug 23, 2018

sgdecker commented Aug 23, 2018

sgdecker commented Aug 23, 2018

ocefpaf commented Aug 23, 2018

sgdecker commented Aug 23, 2018

rsignell-usgs commented Aug 23, 2018 • edited Loading

sgdecker commented Aug 23, 2018

dopplershift commented Aug 23, 2018 • edited Loading

ocefpaf commented Aug 23, 2018

DennisHeimbigner commented Aug 23, 2018

dopplershift commented Aug 23, 2018

sgdecker commented Aug 23, 2018

sgdecker commented Aug 23, 2018

DennisHeimbigner commented Aug 23, 2018

dopplershift commented Aug 23, 2018

dopplershift commented Aug 24, 2018

DennisHeimbigner commented Aug 24, 2018 • edited Loading

dopplershift commented Aug 24, 2018

efiring commented Aug 30, 2018

ocefpaf commented Aug 30, 2018

DennisHeimbigner commented Aug 30, 2018

efiring commented Aug 30, 2018

DennisHeimbigner commented Aug 30, 2018 • edited Loading

efiring commented Aug 30, 2018

DennisHeimbigner commented Aug 30, 2018 • edited Loading

DennisHeimbigner commented Aug 30, 2018

DennisHeimbigner commented Aug 30, 2018

efiring commented Aug 30, 2018

efiring commented Aug 30, 2018

efiring commented Aug 30, 2018

dopplershift commented Aug 30, 2018

DennisHeimbigner commented Aug 30, 2018

ocefpaf commented Aug 30, 2018

DennisHeimbigner commented Aug 30, 2018

ocefpaf commented Aug 30, 2018

DennisHeimbigner commented Aug 30, 2018 • edited by dopplershift Loading

ocefpaf commented Aug 30, 2018

marekmarcos commented May 31, 2020

rsignell-usgs commented Aug 23, 2018 •

edited

Loading

dopplershift commented Aug 23, 2018 •

edited

Loading

DennisHeimbigner commented Aug 24, 2018 •

edited

Loading

DennisHeimbigner commented Aug 30, 2018 •

edited

Loading

DennisHeimbigner commented Aug 30, 2018 •

edited

Loading

DennisHeimbigner commented Aug 30, 2018 •

edited by dopplershift

Loading