-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bilinear resampling across dateline gives ValueError #199
Comments
As noted on the mailing list, doing nearest neighbor resampling seems to work so this must be a bug in the way bilinear interpolation gathers/uses multiple neighbors. |
Slight update to the above comment: in looking closer at the results, it looks like while doing nearest neighbor resampling does produce output without throwing an exception, the output is incorrect in that all values to the west of the dateline return 0. Further testing is probably in order, but in the simple case the same test data and code as above can be used to reproduce the issue by simply replacing the bilinear calls with the following nearest_neighbor calls (pretty much copied from the documentation): valid_input_index, valid_output_index, index_array, distance_array = \
pyresample.kd_tree.get_neighbour_info(source_def, target_def, 5000, neighbours=1)
binned_data = pyresample.kd_tree.get_sample_from_neighbour_info('nn', target_def.shape, data['so2'],
valid_input_index, valid_output_index,
index_array) |
I was doing some more debugging on this this morning, and I think I found part of the problem. Specifically, in # Remove illegal values
valid_out = ((target_lons >= -180) & (target_lons <= 180) &
(target_lats <= 90) & (target_lats >= -90)) The problem here is that when you specify the target area as I did in the example, it returns target Lons from -189 to -171. So the above promptly declares all values less than -180 as "invalid", thus only leaving the values to the east of the dateline. Incidentally, this explains the zero values to the west of the dateline when doing nearest neighbor as well. Of course, the difficulty is that the above code makes sense, and may even be required to prevent other problems later. My gut feeling is that the # Or something like this...
target_lons[target_lons < -180] += 360 ...but I could easily see that messing up something else (like trying to wrap around the wrong way again). Dunno. Hope this helps track things down though. |
I managed to hack together a working solution. I could provide a pull request if desired, but I haven't really tested the solution, and kinda think it may not be the best option. It does appear to work for me however. Three changes: In the Then, in Bilinear/init.py, on line 520, I again added the line of code to "translate" invalid longitudes. After these changes, bilinear at least appears to work across the dateline. I haven't tested nearest neighbor yet, but it may fix that too given that the first couple of changes were to the Like I said though, I'm not convinced these are the right changes to make to fix the problem, just the obvious ones that, at first glance, worked for me. |
So the main point of that masking is to avoid querying for pixels that are "outside" the earth on the target projection. The most common case is space pixels in geostationary projections where we are looking at the Earth from a distance but the corners of our "grid" of pixels are in space (@mraspaud or @pnuu may be able to think of other cases) or any other projection that allows for off-earth pixels. We also have to be careful with how we fix this in that this code for -180/180 and -90/90 checks happen in multiple places (at least 2 in kd_tree.py) because of our slow migration to a more dask and xarray friendly version of pyresample (pyresample 2.0). Another complication, typically pyproj (accessing the PROJ C library) would return invalid values (1e30) for off-earth pixels when going from a metered projection space (typically) to lon/lat space. However, for a lon/lat ( I see two possible solutions which have further complications depending on how they are implemented:
The two things to consider:
For these reasons, I think option 1 may be the safer choice because 1e30 shouldn't be sent to |
For the longitude limit check, wouldn't it be sufficient to limit it to +/- 540? More than that and you are wrapping around the earth multiple times, which wouldn't make sense. So you could do something like -540 to -180 (start at the dateline and wrap all the way around to the west), but if you tried to do something like -545 to -185 that would be an error (as it should be specified as -185 to +175 or something). Or am I oversimplifying things? This isn't really my field... :D |
We could do that, I guess my problem with that would be that these numbers are relatively arbitrary. If we use |
Gotcha. Honestly, my preference would be to leave the limit checking as is, but make the area spec always wrap from west to east, so when crossing the dateline the area would be specified as |
@ibrewster I think your situation is a bit of a special case. You want to use degrees to define your extent on this latlong "projection" but the coordinate system for that uses longitudes (x axis) from -180 to 180. By adding the If you were using a mercator projection it would feel odd to me to support a user wanting to define an area from -45000000 meters to 45000000 meters (I made these numbers up) where we would be crossing the anti-meridian of the projection. In this case I would advise a user to change the reference longitude for the projection so that the origin of the coordinate system is closer to their desired region, making the new x limits something like -5000000 meters to 5000000 meters. |
Just a data point to show why this is a real issue: Processing my 500 GB of data files, using my hacked method of enabling pyresample to cross the dateline using latitude/longitude took around 4 hours. Throwing in the extra step of translating the coordinates to some other projection with a shifted antemeridian in the AreaDefinition caused the process to take 16 hours - four times as long. I might be able to bring that back down by pre-filtering the data on latitude and longitude to only include data within the gridding area, but that seems kludgy - and I'm not convinced it would be as efficient as simply not translating the coordinates anyway. |
If you have code changes that made this work for you then a pull request would be a great way to start a discussion and get this updated in the released version of pyresample. |
@djhoese If you look back over this thread, you'll see that I detailed the changes I made to make this work - but both you and I agreed that the changes weren't good ones, i.e. that there should be a better, if non-trivial, way. Yes, the changes make it work, but they're kludgy. Incidentally, I've also discovered a caveat with the changes I made, that being that it leaves a 1 grid square wide swath on either side of the dateline where the data is "reflected" back on itself or something. With smaller grids this effect isn't overly noticeable, but with larger grids it becomes problematic. So my "solution" really is only a partial hack that worked "well enough" for my purposes, but is not a real solution. If what you are saying though is that the only way this will get fixed is if I fix it, I understand. Thank you for your time and an otherwise great product. |
Thanks for providing the summary. This thread is long enough and my time short right now that I didn't re-read it all. I did want to point out that as part of #267 I think some of the operations done with lat/lon grids in pyresample might behave better. I can't guarantee anything, especially with bilinear resampling which I have little experience with, but it might be something to come back to. In new versions of PROJ the Again, this is based on the skimming of this issue that I did. Since none of the core developers have a need for exactly what you are using I can't guarantee that we'll get to this any time soon. |
Sure, I can completely understand that. I guess none of the core developers live in/work with data from Alaska :-) I don't think the I'll figure out something. Perhaps a simple pre-filter will speed things up enough that the projection change isn't an issue. Thanks again! |
Ok I reread the whole issue and I think I understand some of what's going on. You mentioned getting the error with bilinear but nearest neighbor didn't raise an error but did give the wrong output. I think we should ignore bilinear for now and assume there is a bug there. I think part of this wrong output image problem with nearest neighbor stems from what I mentioned earlier where pyresample has some custom The other part of this is defining an AreaDefinition that passes the antimeridian of the coordinate system it is (crossing the -180/180 boundary in a coordinate space whose limits are -180 and 180). Without the All of that said, I think we could probably relax some of the valid pixel checks if we assume that the output pixels are all valid-ish. Side Note: I work with GINA in Alaska a lot. Just not in lat/lon space for outputs. |
Well, that brings up a point - the final output image I want in, in fact, in a different projection - specifically I'm using a mercator projection with a +lon_0, although it really could be just about anything (and there may be better options - projection selection is outside my realm of knowledge, so I just picked a "default" option). The difficulty is the order of operations - using the shifted mercator as the projection in the AreaDef makes total sense, and does give me exactly what I want. Logically, that seems the way to go. The problem is, that requires translating all the coordinates from lat/lon to mercator before binning, which takes quite some time. If, however, I can wait to do any translation until after binning, only working with the binned coordinates, then I am dealing with MUCH less data, and the translation is much faster. Apparently, about 4x faster. So, to summarize: My original data is in standard lat/lon, -180 to 180. My final output is in some projection with a shifted lon_0 - could be just about any, nothing fancy. My goal was/is to be able to perform the translation after the binning, not before, in order to save huge amounts of time - that is, grid in lat/lon, with the grid crossing the dateline. ...and the more I write this, the more I realize that maybe I am trying to do something weird here, and expecting too much from the tools. :-P Yeah, specifying coordinates as, for example -210 to -140 longitude doesn't really make sense to me either. However, it apparently does make sense to the professors I'm writing the software for. Oh well, like I said, I'll figure something out. The more I think about it, the more it seems like this is my problem, not yours. It would be nice if I could say "give me a grid longitude 170 to -140, traveling east", and just have it work, but as you have pointed out, this does appear to be somewhat odd. Sorry for the noise! |
By binning, do you mean the bilinear interpolation? What kind of data are you sampling? If the data is relatively high resolution compared to the output grid, you could try the bucket resampler. It should be much faster, and you could use directly the final grid. One gotcha is, that it doesn't do any interpolation and leaves holes in the result if there are no data in all the target locations. Something like this to calculate the average: from pyresample.bucket import BucketResampler
import dask.array as da
lons = da.from_array(data['longitude'], chunks=1024)
lats = da.from_array(data['latitude'], chunks=1024)
so2_data = da.from_array(data['so2'], chunks=1024)
resampler = BucketResampler(target_def, lons, lats)
average = resampler.get_average(so2_data)
# This triggers the actual computation
ave = average.compute() |
Or nearest neighbor, yes.
Satellite gas data. TROPOMI, to be exact. I've actually been looking at this more over the past hour, and I'm thinking at least some of the slowdown is my fault - I had removed some other filters on the data, and it looks like I accidentally removed the latitude/longitude filters as well. Meaning that the data being resampled was the entire raw dataset, rather than just the (much smaller) area of interest. I haven't yet tested to see how much things improve by simply limiting the data I'm throwing at pyresample. It may actually make this whole thing a moot point. |
Sure, but also this package is here to help. This is exactly the type of data we and our users work with so if a use case is really difficult with the way things currently are then that suggests a change is needed. There is a tropomi_l2 reader in Satpy and Satpy uses dask and xarray to perform these operations quickly. It may be something to look at; especially if what @pnuu suggested looks like something you want to use. As for the coordinates and crossing the anti-meridian: it makes sense in words but when applied to an actual coordinate system it makes a little less sense. It's difficult because we are all so use to treating longitudes like a continuous string of numbers, but the latlong projection isn't that. We just need to find something that is equivalent for what you want in satpy/pyresample. |
Ooooh, satpy has a tropomi_l2 reader now? Sweet! There wasn't one back when I started looking at this data. I'll have to check that out!
I'll have to see how much of a difference my accidentally removed filters make. It may well be that with those filters back in place, letting pyresample do the re-projection is just as fast (or nearly so) as doing the resampling first (in lat lon) then re-projecting. Ideally, the resolution of the gridded data should be similar to the original resolution, so in theory at least the only difference is the area covered, and if I can get that down before resampling, the number of data points dealt with should be similar. At which point I can just use pyresample (or satpy) as intended. In theory. We'll see.
Right. I'm seeing that better now. Took a bit for it to click :-) |
I guess one point in favor of fixing this though: the lat/lon "projection" is kind of unique (to my limited knowledge, at least) in that is is a spherical "projection", that is, not really a projection at all, such that technically, in the real world longitudes do flow smoothly from -180 to +180 (or visa-versa) - there is no "edge" of the world :-). This is different than, say, the mercator projection, which has edges. Of course, that is a viewer perspective, not a computer perspective, so it may not mean much. Just a thought. (shrug) |
It is very likely that removing our custom Proj stuff might just make this work too. Hard to say as I haven't tried it. It also depends how the data is stored on the output. For example, I don't think it is valid to create a geotiff in the lat/lon projection that starts at 170 and ends on the east side of the anti-meridian. I could be wrong though. |
Ok, I realized (again, I'm sure I realized this back when I started this issue) why this is an issue for me. As happens all too often with me, I spoke too soon with declaring the missing filters to be the issue. My workflow is this:
The key here is step 3: if I do any sort of re-projection from the lat/lon projection before step 3, such as when resampling, that re-projection has to apply to all 4,440 data files loaded. If I can wait to re-project until after step 3, I only have to re-project one grid worth of data. So, actually, the fact that it is only 4x slower when having to do 4,000x as many reprojections is kinda impressive... :-) |
Code Sample, a minimal, complete, and verifiable piece of code
Sample data file to use with this code sample:
test_data.pickle.zip
Problem description
When running the above code, with an area definition that crosses the dateline, the get_bil_info function throws a ValueError exception rather than properly binning the data. Attempting other methods of specifying the area extent produce incorrect latitude/longitude bins.
Expected Output
The data should be binned across the dateline properly
Actual Result, Traceback if applicable
ValueError:
Versions of Python, package at hand and relevant dependencies
Using python 3.7 in my testing, with the only dependancies being numpy and pyresample.
The text was updated successfully, but these errors were encountered: