Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with reading "big" arrays (>8.1Gb) #383

Closed
durack1 opened this issue Jan 28, 2020 · 10 comments · Fixed by #389
Closed

Problems with reading "big" arrays (>8.1Gb) #383

durack1 opened this issue Jan 28, 2020 · 10 comments · Fixed by #389
Labels
pending-release Fix is included in a pending release of CDAT metapackage.
Milestone

Comments

@durack1
Copy link
Member

durack1 commented Jan 28, 2020

Describe the bug
I have hit a reproducible error where big arrays (>8.1Gb) are not read correctly, rather with a zero array (rather than real numbers) being returned. I was a little puzzled by this error, and got talking with @painter1 who also had this problem and reported it back via email in May 2019. It turns out that the issue is with arrays greater than 8.1Gb, with the original error a bug with libnetcdf versions for big variables (from @painter1's notes/emails). @dnadeau4 and @doutriaux1 may recall some of the specific details about this. I note I may not be using the latest versions of libraries below.

To Reproduce
Steps to reproduce the behavior:

  1. Install CDAT with: cdms2-3.1.4-py37ha6f5e91_3, libnetcdf-4.6.2-h303dfb8_1003, netcdf-fortran-4.4.5-h0789656_1004
  2. Execute the code attached (which reads larger and larger arrays)
  3. Watch as some summary stats go from real numbers to 0's when the arrays being read are >8Gb, which for the demo below happens at year 1989 (3rd step of the loop) when 26 years of data are being read (with the model having a vert/horiz grid of 60 vertical levels, 384 lat, 320 lon).

Expected behavior
Big arrays should be read validly, returning non-zero arrays

Desktop (please complete the following information):

  • OS: RHEL7.x

The code to reproduce this:

# imports
import sys
import cdat_info
import cdms2 as cdm
import numpy as np
from socket import gethostname

#%% Define function
def calcAve(var):
    print('type(var);',type(var),'; var.shape:',var.shape)
    # Start querying stat functions
    print('var.min():'.ljust(21),var.min())
    print('var.max():'.ljust(21),var.max())
    print('np.ma.mean(var.data):',np.ma.mean(var.data)) ; # Not mask aware
    # Problem transientVariable.mean() function
    #print('var.mean():'.ljust(21),var.mean())
    print('-----')

#%% Load subset of variable
f = ['/p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Omon/so/gn/v20190308/so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc']
# Try building up arrays stepping in a single year
times = np.arange(1991,1984,-1)
print('host:',gethostname())
print('Python version:',sys.version)
print('cdat env:',sys.executable.split('/')[5])
print('cdat version:',cdat_info.version()[0])
print('*****')
for timeSlot in times:
    for filePath in f:
        fH = cdm.open(filePath)
        print('filePath:',filePath.split('/')[-1])
        # Loop through single years
        start = timeSlot ; end = 2014
        print('times:',start,end,'; total years:',(end-start)+1)
        d1 = fH('so',time=(str(start),str(end)))
        print("Array size: %d Mb" % ( (d1.size * d1.itemsize) / (1024*1024) ) )
        calcAve(d1)
        del(d1)
        fH.close()
    print('----- -----')

@pochedls @muryanto1 @downiec @jasonb5 @gabdulla @gleckler1 @lee1043 ping

@muryanto1
Copy link
Member

muryanto1 commented Jan 28, 2020

@durack1 I tried running the code with latest cdms2 in cdat/label/nightly and latest libnetcdf, and was able to reproduce.
`
cdat/label/nightly/linux-64::cdms2-3.1.4.2020.01.14.21.45.gee3f0ff-py37h34d3450_0
libnetcdf 4.7.3 nompi_h9f9fd6a_101 conda-forge
netcdf-fortran 4.5.2 nompi_h09cde99_103 conda-forge

$ curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o Miniconda3-latest-MacOSX-x86_64.sh

$ source miniconda3/etc/profile.d/conda.sh
$ conda activate base
$ conda activate nightly_py3.7

on aims1:
$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O Miniconda3-latest-Linux-x86_64.sh
$ bash ./Miniconda3-latest-Linux-x86_64.sh -b -p miniconda3
$ source miniconda3/etc/profile.d/conda.sh
$ conda activate base
$ conda config --set channel_priority strict
$ conda config --add channel conda-forge
$ conda config --add channels cdat/label/nightly

$ conda create -n nightly_py3.7 cdat mesalib easydev nbsphinx myproxyclient testsrunner coverage pytest "python=3.7" -c cdat/label/nightly -c conda-forge
$ conda activate nightly_py3.7

# I put your code into a file: test_big_array.py
$ python ./test_big_array.py
host: aims1.llnl.gov
Python version: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:33:48) 
[GCC 7.3.0]
cdat env: miniconda3
cdat version: 8
*****
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1991 2014 ; total years: 24
Array size: 7762 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (276, 60, 384, 320)
var.min():            6.940156
var.max():            48.25107
np.ma.mean(var.data): 4.2389736e+19
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1990 2014 ; total years: 25
Array size: 8100 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (288, 60, 384, 320)
var.min():            6.940156
var.max():            48.25107
np.ma.mean(var.data): 4.239067e+19
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1989 2014 ; total years: 26
Array size: 8437 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (300, 60, 384, 320)
var.min():            0.0
var.max():            0.0
np.ma.mean(var.data): 0.0
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1988 2014 ; total years: 27
Array size: 8775 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (312, 60, 384, 320)
var.min():            0.0
var.max():            0.0
np.ma.mean(var.data): 0.0
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1987 2014 ; total years: 28
Array size: 9112 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (324, 60, 384, 320)
var.min():            0.0
var.max():            0.0
np.ma.mean(var.data): 0.0
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1986 2014 ; total years: 29
Array size: 9450 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (336, 60, 384, 320)
var.min():            0.0
var.max():            0.0
np.ma.mean(var.data): 0.0
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1985 2014 ; total years: 30
Array size: 9787 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (348, 60, 384, 320)
var.min():            0.0
var.max():            0.0
np.ma.mean(var.data): 0.0
-----
----- -----`


@durack1
Copy link
Member Author

durack1 commented Jan 29, 2020

@muryanto1 thanks for picking up and reproducing this issue. It'd be helpful to know whether @dnadeau4 or @doutriaux1 had worked on a fix a while ago, and if there are any open issues, branches or commits, or web documentation they can point us to for a resolution

@mzelinka
Copy link

Thanks for documenting and reproducing this issue. I am also hitting this issue. I note that it also occurs at least as far back as CDAT2.10.

@jasonb5
Copy link
Contributor

jasonb5 commented Feb 19, 2020

@durack1 How was this file created?

@durack1
Copy link
Member Author

durack1 commented Feb 19, 2020

@jasonb5 it’s one of the CMIP6 contributed files, NCAR doesn’t use CMOR so not 100% sure what software was used to create it

@durack1
Copy link
Member Author

durack1 commented Mar 12, 2020

Folks, just an FYI @jasonb5 determined the issue and found a fix, and @muryanto1 has wrapped this up in the nightly builds - thanks guys!! So for bleeding edge bug fixes come and get it

@lee1043
Copy link

lee1043 commented Mar 12, 2020

@durack1 great to know the issue has been resolved. Thank you all for the effort!

@pochedls
Copy link

@jasonb5 and @muryanto1 - Thank you! For those of us who prefer more stability than the nightly build, is this slated for a release? 8.2.x? 8.3?

@muryanto1
Copy link
Member

@pochedls Yes, but we do not have a time frame yet, but working on it.

@jasonb5 jasonb5 added this to the 3.1.5 milestone Jul 6, 2020
@jasonb5
Copy link
Contributor

jasonb5 commented Jul 8, 2020

Linking PR #389, this will be available in CDAT 8.2.1.

@jasonb5 jasonb5 linked a pull request Jul 8, 2020 that will close this issue
@jasonb5 jasonb5 added the pending-release Fix is included in a pending release of CDAT metapackage. label Jul 8, 2020
@jasonb5 jasonb5 modified the milestones: 3.1.5, 3.1.6 Jul 16, 2020
@jasonb5 jasonb5 closed this as completed Jul 27, 2020
@jasonb5 jasonb5 modified the milestones: 3.1.5, 8.2.1 Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-release Fix is included in a pending release of CDAT metapackage.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants