Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add 'compute' keyword argument to 'to_raster' #219

Merged
merged 10 commits into from
Feb 1, 2021

Conversation

djhoese
Copy link
Contributor

@djhoese djhoese commented Jan 26, 2021

See #211 for additional context. The summary is that dask's store function allows you to pass compute=False. Doing this returns a Delayed object which can be computed later. This is really useful if your input dask arrays are used in other outputs (very common in the Satpy library where one input raster may be used for different RGB channels in various composites).

At the time of writing this only implements the functionality and includes some docstrings. Some questions and issues I ran into that I'm looking for help on:

  1. Where would you like tests added for this? I could add more parametrize cases to test_to_raster in test_integration_rioxarray.
  2. This, as I feared in Add 'compute' keyword argument to dask rasterio writer #211, doesn't seem to work on a distributed client (multiprocess). But this doesn't actually seem to work with the existing behavior (compute=True). See below.
  3. Other feedback?

Distributed Issues

Example:

import rioxarray
import dask.diagnostics
import dask.distributed

g = rioxarray.open_rasterio("any_geotiff.tif", chunks=1024)
client = dask.distributed.Client()
with dask.diagnostics.ProgressBar():
    delayed = g.rio.to_raster("/tmp/some_out.tif", windowed=True)

# or
with dask.diagnostics.ProgressBar():
    delayed = g.rio.to_raster("/tmp/some_out.tif", windowed=True, compute=False)
with dask.diagnostics.ProgressBar():
    delayed.compute()
~/miniconda3/envs/satpy_py37/lib/python3.7/site-packages/dask/array/core.py in load_store_chunk()
   3711     try:
   3712         if x is not None:
-> 3713             out[index] = np.asanyarray(x)
   3714         if return_stored and load_stored:
   3715             result = out[index]

~/repos/git/rioxarray/rioxarray/raster_writer.py in __setitem__()
    121
    122         with rasterio.open(self.raster_path, "r+") as rds:
--> 123             rds.write(item, window=Window(chx_off, chy_off, chx, chy), indexes=indexes)
    124
    125     def to_raster(self, xarray_dataarray, tags, windowed, compute, **kwargs):

rasterio/_io.pyx in rasterio._io.DatasetWriterBase.write()

RasterioIOError: Read or write failed. /tmp/c07.tif, band 1: IReadBlock failed at X offset 0, Y offset 1108: TIFFReadEncodedStrip() failed.

Ideas?

Checklist

@djhoese
Copy link
Contributor Author

djhoese commented Jan 26, 2021

Crap, I wanted this to be a draft PR. It isn't done. Can I change it?

@codecov
Copy link

codecov bot commented Jan 26, 2021

Codecov Report

Merging #219 (bacd8ff) into master (e114cec) will increase coverage by 0.99%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #219      +/-   ##
==========================================
+ Coverage   92.11%   93.10%   +0.99%     
==========================================
  Files          11       11              
  Lines        1306     1306              
==========================================
+ Hits         1203     1216      +13     
+ Misses        103       90      -13     
Impacted Files Coverage Δ
rioxarray/raster_array.py 97.55% <100.00%> (ø)
rioxarray/raster_dataset.py 100.00% <100.00%> (ø)
rioxarray/raster_writer.py 93.75% <100.00%> (+16.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e114cec...a659364. Read the comment docs.

@djhoese
Copy link
Contributor Author

djhoese commented Jan 26, 2021

Ah ok the error is due to reading not being dask friendly, right? Seems related to #214 maybe.

If I do a g = g.persist() before trying to_raster then it seems to work (it completes without error). However, the image is corrupt (missing tiles).

image

@snowman2
Copy link
Member

Where would you like tests added for this?

That is a good question. I think it would be good to check the return. For clarity, I am thinking it would make sense to add a separate test. But, if it is a minor tweak to the existing test, then it would probably be fine to add to the parametrized test.

@snowman2
Copy link
Member

I wanted this to be a draft PR. It isn't done. Can I change it?

That would be a nice feature. I haven't found it yet, but if I do I will let you know. But, don't worry, I won't merge it in until you give the go ahead.

@snowman2
Copy link
Member

However, the image is corrupt (missing tiles).

That's not good. I bet it has to do with aquiring the lock before writing. I need to test this on a bigger raster.

@snowman2
Copy link
Member

I think that the implementation you have seems fine. Would you mind also updating the RasterDataset to_raster method as well?

@djhoese
Copy link
Contributor Author

djhoese commented Jan 27, 2021

From the SerializableLock docstring:

    This wraps a normal ``threading.Lock`` object and satisfies the same
    interface.  However, this lock can also be serialized and sent to different
    processes.  It will not block concurrent operations between processes (for
    this you should look at ``multiprocessing.Lock`` or ``locket.lock_file``
    but will consistently deserialize into the same lock.

I can't find an easy way to get multiprocessing lock to work.

Edit: Can't figure out locket either.

rioxarray/raster_array.py Outdated Show resolved Hide resolved
@djhoese djhoese force-pushed the feature-to-raster-compute branch from b3e05ed to cbe4064 Compare January 31, 2021 22:03
@djhoese
Copy link
Contributor Author

djhoese commented Jan 31, 2021

@snowman2 I just merged master and forced push here. I'm noticing that none of the tests actually cover the dask writing. Is that intentional?

@djhoese djhoese requested a review from snowman2 February 1, 2021 02:09
rioxarray/raster_array.py Outdated Show resolved Hide resolved
rioxarray/raster_array.py Outdated Show resolved Hide resolved
@snowman2
Copy link
Member

snowman2 commented Feb 1, 2021

@djhoese, this looks great. Thanks for fixing the tests 👍. Just a couple of tweaks to address.

@snowman2 snowman2 requested a review from justingruca February 1, 2021 13:18
@djhoese
Copy link
Contributor Author

djhoese commented Feb 1, 2021

@snowman2 thanks for the review. All done now.

@snowman2
Copy link
Member

snowman2 commented Feb 1, 2021

@djhoese this looks great. Would you like to squash you commits or do you mind if I squash and merge?

@djhoese
Copy link
Contributor Author

djhoese commented Feb 1, 2021

Feel free to squash and merge.

@snowman2 snowman2 merged commit dc24ca5 into corteva:master Feb 1, 2021
@snowman2
Copy link
Member

snowman2 commented Feb 1, 2021

Thanks @djhoese 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add 'compute' keyword argument to dask rasterio writer
3 participants