Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigations on using GPU for computations in satpy #2990

Open
mraspaud opened this issue Nov 20, 2024 · 0 comments
Open

Investigations on using GPU for computations in satpy #2990

mraspaud opened this issue Nov 20, 2024 · 0 comments

Comments

@mraspaud
Copy link
Member

This issue is a summary on some exploratory work with using GPU for some computations in satpy.

Setup

Using a SAR-C scene from sentinel 1, the idea was to try to speed up image generation without any resampling.
This data is quite complex in that it need to create full arrays of noise estimations for denoising from sparse arrays.
The data is about 20000x10000 pixels.

To work on this, we used a gpu-enabled server, and created a new environment, that includes cupy, cupy-xarray, satpy and trollimage.

Cupy had to be build from the source of the main branch as some features like interpolation where not available otherwise.

Satpy and Trollimage where installed in edit mode.

Experiment

We switched on the usage of cupy subarrays from dask with:
dask.config.set({"array.backend": "cupy"})

Then, multiple places where adjust to make use of cupy when available.
Satpy didn't need any modifications anywhere else than in the reader.
Trollimage needed some small adjustments in the enhancement part to ensure cupy arrays were preserved.

Note that reading the data was done in the CPU, as was the writing.

The test script that reads the data, generates the composite and saves it to disk was called with:
GDAL_NUM_THREADS=ALL_CPUS DASK_NUM_WORKERS=8 DASK_ARRAY__CHUNK_SIZE=32MB /usr/bin/time -v python test_s1.py

Results

The data was read, composited and written without problems as far as execution goes.
The performance was however slower than in the CPU case, 45 seconds instead of 40 seconds. The memory usage dropped though by 15% maybe in the GPU case, and did free the CPU (quite logically)

# on GPU User time (seconds): 156.38 System time (seconds): 9.59 Percent of CPU this job got: 362% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:45.83 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 3819596 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 2351368 Voluntary context switches: 131790 Involuntary context switches: 33327 Swaps: 0 File system inputs: 0 File system outputs: 513936 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096

on CPU

    User time (seconds): 252.34
    System time (seconds): 15.85
    Percent of CPU this job got: 655%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:40.94
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 4534916
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 1226254
    Voluntary context switches: 134968
    Involuntary context switches: 61213
    Swaps: 0
    File system inputs: 0
    File system outputs: 639520
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096

Reflexion

It is likely that the decreased perfomance is due to the overhead of reading the data in CPU, then transfering it to GPU for computation, and then transfering it back to CPU for writing. The processing of the image data is probably to light to have a significant impact when processed with the GPU.

At the time of writing, we weren't able to find python libraries that would allow use to read or write geotiff files (input was also geotiff) directly into GPU. It is possible though, and to go further on this topic we should check the kvikio engine for xarray that can read zarr data to gpu directly (see example here: https://xarray.dev/blog/xarray-kvikio).
Rasterio uses cython to load the data into a numpy array. At the moment, cupy does not have a stable or documented cython interface.
Another solution would be to go directly through the GPU Data Storage (GDS) interface, like kvikio is doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant