-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError when run multithreaded #589
Comments
My suspicion is that it has to do with sharing a Here is how it is recommended to use the Otherwise, you can get strange behavior..I ran into this earlier: #384 Are you by chance storing the CRS object itself on the xarray or dask object? |
Yes. I mentioned in my wall of text above that we are putting the CRS in the |
I missed that. I don't know how
|
Interesting. Satpy doesn't currently work with xarray Dataset objects and instead uses collections of DataArray objects. So we need all the CRS information in each DataArray. I assume that's what you meant by a The thing about dask is that it should only be working with the underlying dask array underneath the DataArray. Another possibility I can see is that Satpy might be passing another type of object we use that has a CRS object as one of its instance attributes to a dask delayed function. This would require accessing the object from a dask worker thread. |
Here is an example: >>> import xarray
>>> import rioxarray
>>> xds = xarray.DataArray(1)
>>> xds
<xarray.DataArray ()>
array(1)
>>> xds_crs = xds.rio.write_crs("EPSG:4326")
>>> xds_crs
<xarray.DataArray ()>
array(1)
Coordinates:
spatial_ref int64 0
Attributes:
grid_mapping: spatial_ref
>>> xds_crs.spatial_ref
<xarray.DataArray 'spatial_ref' ()>
array(0)
Coordinates:
spatial_ref int64 0
Attributes:
spatial_ref: GEOGCRS["WGS 84",DATUM["World Geodetic System 1984",ELLIPSO...
crs_wkt: GEOGCRS["WGS 84",DATUM["World Geodetic System 1984",ELLIPSO... Got a little trigger happy. Updated ^^ |
Are the duplicate I looked through the code being run that produces these errors and we have three cases that I can see where a CRS object might cross thread boundaries:
I suppose just to be safe I should switch to what you are doing in rioxarray for storing the coordinate and I should only store the wkt in our AreaDefinition objects. Edit: Another case: We use rasterio |
Probably keep both for a while for backwards compatibility with GDAL.
My guess is that it is similar to |
Sounds good on the spatial_ref stuff. For rasterio CRS, since we are creating a rasterio object, it must be handling it properly. Also, I think I found the problem in our processing: We get the If I were to implement the same strategy for spatial_ref in satpy and geoxarray, do you think that would be a good idea? Would you do anything differently if you could now? |
I think that would be great 👍. The more consistency across libraries the better.
Nothing that I can think of at the moment. |
Also see: opendatacube/datacube-core#837 |
Sorry, I thought I closed this already. This was our fault for using a CRS object with a dask |
@djhoese this may be useful for reference: geopandas/geopandas#1842 |
Fix in #793 - I am unable to replicate your issue, but I think that it will fix this. If you are able to verify that it fixes the issue that would be helpful. |
Even when this did happen, it was random and sometimes more common depending on the CPU and work load of the machine (it seemed). Our code no longer does what was being done to cause this issue (passing CRS objects between dask workers). |
That makes sense. No worries if you are unable to test it out. |
Code Sample, a copy-pastable example if possible
# TODO: working on it
Problem description
This is something that's been noticed in Satpy specifically and is being tracked here: pytroll/satpy#1114
The bottom line is that a couple of our users have been getting UnicodeDecodeErrors or errors about bad proj definitions. The really annoying bit is that is seems to be some sort of race condition or other multi-threading related issue. We are using xarray+dask and have a pyproj CRS object in the
.coords
of our DataArrays. We get errors like:Or:
And other times it will print out the invalid projection with characters mixed in where they shouldn't be. Like very clearly wrong changes where
+proj=merc
is changed to some odd unicode character in place of thep
inproj
.I'm trying my best to reproduce this, but so far have been unsuccessful which is why I don't have a reproducible example yet. I've only ever noticed this in logs.
Expected Output
No error.
Environment Information
python -m pyproj -v
Specific conda-forge builds:
Installation method
Conda environment information (if you installed with conda):
I mentioned specific conda packages above, but we've seen this now on Ubuntu, Windows, and a CentOS 7 docker container running a conda-pack'd version of a conda-forge environment.
The text was updated successfully, but these errors were encountered: