Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dot product implementation #4

Merged
merged 3 commits into from
Feb 17, 2022
Merged

dot product implementation #4

merged 3 commits into from
Feb 17, 2022

Conversation

jsadler2
Copy link
Contributor

Starting this pull request. This is code that implements a dot-product approach for doing the aggregation. See #2

This works for my application but I have not run the tests on this yet.

@jsadler2 jsadler2 marked this pull request as draft May 24, 2021 22:51
@ks905383
Copy link
Owner

ks905383 commented Jun 1, 2021

Will look at this more in a bit, but I think the test failures are the result of the xarray structure's variable names no longer being kept through the whole routine.

That might be in line 410 of the modified xagg/core.py, in this line:
ds_combined = xr.Dataset(data_dict)

(which is odd; my first assumption would be that that should preserve variable names)

@rsignell-usgs
Copy link

@jsadler2 , is there a blocker here?

@jsadler2
Copy link
Contributor Author

@rsignell-usgs - no blocker - just haven't had time to spend on this in the past couple of weeks.

@ks905383
Copy link
Owner

ks905383 commented Jul 1, 2021

Finally got a chance to look at this a bit more closely -

So the issue right now is that the current proposed change changes wm.agg (from the output of get_pixel_overlaps()) from a geodataframe that keeps the variable information to DataArrays that don't. The rest of the code is set up so as to add the aggregated variables into that geodataframe, which then more robustly keeps the geographic information in a format that's easier to mess with for export.

I think the way to make this framework compatible with the proposed change would be to

  1. in get_pixel_overlaps() make overlaps_da just a separate part of the weightmap class (with the simplified gdf_in that doesn't need the pixel indices etc. in there
  2. fit the output of wm.agg.dot(var_array) in aggregate() into the wm.agg geodataframe like it is in the original code. This should be relatively easy in terms of indexing, since I assume the order of polygons wouldn't change during the overlaps_da calculation

Then the rest of the code should work without having to change anything else.

@jsadler2 jsadler2 marked this pull request as ready for review December 3, 2021 22:35
@jsadler2
Copy link
Contributor Author

jsadler2 commented Dec 3, 2021

@ks905383 - I finally got around to looking at this again. I think I figured out the compatibility things. Could you run your checks again?

@ks905383
Copy link
Owner

Hm, looks like it's generating a nan in the test that involves processing data when part of the input raster grid is nans.

Currently xagg behavior is that if a polygon only overlaps rasters with nans , then the aggregated value should be all nans, but if only some of the pixels are nans, then the nan pixels are ignored in the aggregating calculation.

This is an issue with a dot product, since in python it always returns nan if there are any nans in the inputted arrays. The way I've dealt with that in other code is to subset the array to only non-nan values before calculating a dot product, but that might be a bit complex here.

More broadly, there should probably be a warning if there are polygons that overlap with nan-valued raster pixels, or some sort of functionality that allows the user to choose whether to return nan if any pixel that overlaps is nan, or only when all pixels that overlap are nan.

@ks905383
Copy link
Owner

ks905383 commented Dec 21, 2021

hm actually more broadly I'm worried about the failure of test_aggregate_with_weights() - I think this just means the current setup of this implementation doesn't include the additional weights functionality (i.e., the ability to add another weight raster, e.g. pop density). (I'm pretty sure the failure of test_get_pixel_overlaps_gdf_wpreexisting_index() is for the same reason)

The weights are dealt with in:

normed_areaweights = normalize(tmp_areas*weights[wm.agg.iloc[poly_idx,:].pix_idxs],drop_na=True)

The best way to fix that is probably to incorporate the weights into this part in a similar way?:
https://github.com/jsadler2/xagg/blob/26affb47069124fe34b81adf6efcd5291c1a3ad1/xagg/core.py#L337

This is getting there though, very excited to see this!

@jsadler2
Copy link
Contributor Author

Thanks for those pointers, @ks905383. I ran the tests locally and they are now passing.

@ks905383 ks905383 merged commit 1e1cd5e into ks905383:main Feb 17, 2022
@jsadler2
Copy link
Contributor Author

@ks905383 - exciting to see this go in.

I probably should have said this earlier, but I found that when I tried to use this branch with a larger dataset, there was a memory error in the weight functionality, somewhere around here :

normed_weights = weights_and_overlaps/weights_and_overlaps.sum(dim = "loc", skipna=True)

Just wanted to give you a heads up.

I can do a little more digging to get more specifics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants