Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError from xarray in make_GLM_grids.py #22

Closed
jlc248 opened this issue Jan 4, 2019 · 16 comments
Closed

KeyError from xarray in make_GLM_grids.py #22

jlc248 opened this issue Jan 4, 2019 · 16 comments

Comments

@jlc248
Copy link

jlc248 commented Jan 4, 2019

I am getting a KeyError from the xarray library when running glmtools/examples/grid/make_GLM_grids.py

I have installed glmval environment using the latest lmatools, stormdrain, and glmtools (unifiedgridfile branch).

Here is my command:
(glmval) [jcintineo@fuego glm]$ python glmtools/examples/grid/make_GLM_grids.py -o OUT --fixed_grid --split_events --goes_position east --goes_sector conus --dx=2.0 --dy=2.0 *s20190041729*

And the error:
Traceback (most recent call last): File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/xarray/core/dataset.py", line 896, in _construct_dataarray variable = self._variables[name] KeyError: 'group_id'

I also get the message During handling of the above exception, another exception occurred:
With the following Traceback:

Traceback (most recent call last): File "/data/probsevere//src/glm/libs/glmtools/examples/grid/make_GLM_grids.py", line 301, in <module> gridder(glm_filenames, start_time, end_time, **grid_kwargs) File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/grid/make_grids.py", line 842, in grid_GLM_flashes outputs = list(map(this_proc_each_grid, subgrids)) File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/grid/make_grids.py", line 903, in proc_each_grid gridder.process_flashes(glm, **process_flash_kwargs_ij) File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/grid/make_grids.py", line 394, in process_flashes nadir_lon=nadir_lon) File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/io/mimic_lma.py", line 82, in read_flashes min_events=min_events, min_groups=min_groups) File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/io/glm.py", line 293, in subset_flashes return self.get_flashes(flash_ids) File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/io/glm.py", line 300, in get_flashes these_flashes = self.reduce_to_entities('flash_id', flash_ids) File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/io/traversal.py", line 217, in reduce_to_entities last_entity_ids = dataset[e_var].data File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/xarray/core/dataset.py", line 970, in __getitem__ return self._construct_dataarray(key) File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/xarray/core/dataset.py", line 899, in _construct_dataarray self._variables, name, self._level_coords, self.dims) File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/xarray/core/dataset.py", line 74, in _get_virtual_variable ref_var = variables[ref_name] KeyError: 'group_id'

The datafiles look fine to me. I'm not quite sure how to begin debugging this. I first wanted to see if you could replicate my error.

@jlc248
Copy link
Author

jlc248 commented Jan 28, 2019

Has anyone been able to look into this issue at any depth?

I found that by rolling back to xarray v.0.10.9 seems to keep glmtools working (namely, make_GLM_grids.py). This worked for master and unifiedgridfile branches. However, I receive a bunch of FutureWarnings and a RuntimeWarning, invalid value encountered in true_divide. The output looks expected on the test case I ran (s20190192142).

@deeplycloudy
Copy link
Owner

I hadn't encountered this in my testing yet, so thanks the note about the version dependence. What version of xarray were you using in your first report, @jlc248?

@jlc248
Copy link
Author

jlc248 commented Jan 28, 2019

I received the error messages with v0.11.1 and v0.11.0 of xarray.

@deeplycloudy
Copy link
Owner

I just tried the included sample data with 0.11.3, 0.11.1, and 0.11.0, and can't reproduce the bug (I see all the same warnings, and those are unrelated).

The only thing I haven't tried is the specific date/time you reported. The error looks like there is no group_id variable in the dataset, which sure seems odd. I'll look at that file next.

@jlc248
Copy link
Author

jlc248 commented Jan 29, 2019

I'm guessing the problem may be how I build the package then? Are the directions here still valid?

https://github.com/deeplycloudy/glmtools/blob/master/docs/index.rst

@deeplycloudy
Copy link
Owner

Yeah, I just created a fresh environment today using those instructions, and it worked fine.

However, I just got the error on GLM-L2-LCFA-2019-004-17-OR_GLM-L2-LCFA_G16_s20190041729000_e20190041729200_c20190041729226.nc … so now I should be able to track down what's happening.

@jlc248
Copy link
Author

jlc248 commented Jan 29, 2019

Oh, I tried it on several times and kept getting the error, so I figured I goofed something. Maybe it has to do with the level of lightning activity (or lack thereof).

@deeplycloudy
Copy link
Owner

This issue occurs when the number of flashes in the domain of interest is zero. Specifically, the call to GLMDataset.get_flashes is [].

The attached demo shows that xarray 0.11 started dropping coordinates that were part of a multi index when the dimension dropped to zero size. Previously, those coordinates were retained.

That behavior does not take place when the coordinates are not part of a MultiIndex.

It's possible that I don't even need the MultiIndex (I don't remember my own reason for including it early in development 👎), and could change dataset.set_index(**idx), in glmtools.io.glm to not set the index.

I think I'd be satisfied with that change if we demonstrated the grids looked identical with and without the index.

xarray-0.10.9.log
xarray-0.11.3.log

import numpy as np
import xarray as xr

print()
print("############# original dataset ###################")
print()

d = xr.open_dataset('GLM-L2-LCFA-2019-004-17-OR_GLM-L2-LCFA_G16_s20190041729000_e20190041729200_c20190041729226.nc')

print(d)

print()
print("############# selection with no index ###################")
print()

# Select an empty set of flashes
reduced_nomulti = d[{'number_of_flashes':np.zeros_like(d.flash_id.data, dtype=bool)}]
print(reduced_nomulti)
print("flash_id is still there")

print()
print("############# single variable index ###################")
print()

idx = {'number_of_groups': 'group_id',
       'number_of_events': 'event_id',
       'number_of_flashes': 'flash_id',
        }

idx_d = d.set_index(**idx)
print(idx_d)
print("flash_id is not there - it's always clobbered by setting the index")

# Can no longer even access the flash_id variable ¬ this gives AttributeError
try:
    reduced_idx = idx_d[{'number_of_flashes':np.zeros_like(idx_d.flash_id.data, dtype=bool)}]
except AttributeError:
    print("AttributeError: Can't access flash_id attribute")

print()
print("################ multi index ################")
print()

midx = {'number_of_groups': ['group_parent_flash_id', 'group_id',
                     'group_time_offset',
                     'group_lat', 'group_lon'],
       'number_of_events': ['event_parent_group_id', 'event_id',
                     'event_time_offset',
                     'event_lat', 'event_lon'],
       'number_of_flashes': ['flash_id',
                     'flash_time_offset_of_first_event',
                     'flash_time_offset_of_last_event',
                     'flash_lat', 'flash_lon']}

midx_d = d.set_index(**midx)

reduced_midx = midx_d[{'number_of_flashes':np.zeros_like(midx_d.flash_id.data, dtype=bool)}]
print(reduced_midx)
print("flash_id is not there in 0.11, but is in 0.10")

@deeplycloudy
Copy link
Owner

For future reference, the set_index call was introduced in this commit.

A quick glance at my usual test dataset shows that removing set_index as described above causes no change to the grids. I'm testing it further today, and checking to see if it fixes the KeyError issue on empty files.

@deeplycloudy
Copy link
Owner

@jlc248 see the PR above for a fix to this problem. If you have a moment to verify that it works, please let me know; I'll probably merge these changes into master by the end of this week at the latest.

@jlc248
Copy link
Author

jlc248 commented May 14, 2019

Hm, I checked out the branch isatss for glmtools and updated the master branches for lmatools and stormdrain. I reinstalled my environment and now I'm getting a new error related to pyproj. It's quite possible that I'm screwing something up.

(glmval) [jcintineo@gusto glmtools]$ python examples/grid/make_GLM_grids.py -o OUT --fixed_grid --split_events --goes_position east --goes_sector conus --dx=2.0 --dy=2.0 *s20190411729*

  File "examples/grid/make_GLM_grids.py", line 301, in <module>
    gridder(glm_filenames, start_time, end_time, **grid_kwargs)
  File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/grid/make_grids.py", line 826, in grid_GLM_flashes
    outputs = list(map(this_proc_each_grid, subgrids))
  File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/grid/make_grids.py", line 890, in proc_each_grid
    gridder.process_flashes(glm, **process_flash_kwargs_ij)
  File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/grid/make_grids.py", line 394, in process_flashes
    nadir_lon=nadir_lon)
  File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/io/mimic_lma.py", line 109, in read_flashes
    results = list(map(chunk_func,flash_chunks))
  File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/glmtools-0.1.dev0-py3.6.egg/glmtools/io/mimic_lma.py", line 162, in fast_fixed_grid_read_chunk
    np.zeros_like(flash_data.flash_x.data)))
  File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/lmatools/coordinateSystems.py", line 210, in toECEF
    return proj4.transform(self.fixedgrid, self.ECEFxyz, X, Y, Z)
  File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/pyproj/transformer.py", line 367, in transform
    xx=x, yy=y, zz=z, tt=tt, radians=radians, errcheck=errcheck
  File "/data/probsevere/ancillary/miniconda3/envs/glmval/lib/python3.6/site-packages/pyproj/transformer.py", line 180, in transform
    inx, iny, inz=inz, intime=intime, radians=radians, errcheck=errcheck
  File "pyproj/_transformer.pyx", line 107, in pyproj._transformer._Transformer._transform
pyproj.exceptions.ProjError: x,y,z, and time must be same size```

@mrugna
Copy link

mrugna commented May 14, 2019

I had the same issue. Downgrading proj4 from version 6.0 to 5.2 in the conda-forge channel seems to do the trick.

@jlc248
Copy link
Author

jlc248 commented May 14, 2019

Ah, thanks @mrugna ! It works fine now, @deeplycloudy .

@deeplycloudy
Copy link
Owner

Thanks @mrugna and @jlc248 - I ran into the pyproj issue yesterday while helping someone set up glmtools for the first time. I opened a new issue for the proj problem in #31.

@deeplycloudy
Copy link
Owner

@jlc248 @mrugna are we ok to close this issue?

@jlc248
Copy link
Author

jlc248 commented Jan 21, 2020

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants