Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing to GCS fails even when CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE=YES #3604

Closed
AndreaGiardini opened this issue Mar 23, 2021 · 8 comments

Comments

@AndreaGiardini
Copy link
Contributor

AndreaGiardini commented Mar 23, 2021

Expected behavior and actual behavior.

I should be able to read/write tif files from/to GCS using GDAL using CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE

Steps to reproduce the problem.

CPL_CURL_VERBOSE=YES CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE=YES CPL_DEBUG='ON' gdal_translate /vsigs_streaming/mybucket/temp/0_PNOA_MA_OF_ETRS89_HU30_h50_0508.tif /vsigs/mybucket/temp/0_PNOA_MA_OF_ETRS89_HU30_h50_0508_tiled.tif

I used gdal_translate in the example, but the same happens even with rasterio.
From the logs, I see that the file is downloaded and translated correctly. Writing failes. Debug logs don't give me any extra information:

* Connected to storage.googleapis.com (108.177.126.128) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=*.storage.googleapis.com
*  start date: Feb 23 15:41:05 2021 GMT
*  expire date: May 18 15:41:04 2021 GMT
*  subjectAltName: host "storage.googleapis.com" matched cert's "*.googleapis.com"
*  issuer: C=US; O=Google Trust Services; CN=GTS CA 1O1
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x559bbaee3d70)
> PUT /mybucket/temp/0_PNOA_MA_OF_ETRS89_HU30_h50_0508_tiled.tif HTTP/2
Host: storage.googleapis.com
accept: */*
authorization: Bearer ****

* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
ERROR 1: Error 0: 
* stopped the pause stream!
* Connection #0 to host storage.googleapis.com left intact
ERROR 3: /vsigs/mybucket/temp/0_PNOA_MA_OF_ETRS89_HU30_h50_0508_tiled.tif: I/O error
GDAL: GDALClose(/vsigs/mybucket/temp/0_PNOA_MA_OF_ETRS89_HU30_h50_0508_tiled.tif, this=0x559b903a7850)
VSICURL: Stop download for https://storage.googleapis.com/mybucket/temp/0_PNOA_MA_OF_ETRS89_HU30_h50_0508.tif
GDAL: GDALClose(/vsigs_streaming/mybucket/temp/0_PNOA_MA_OF_ETRS89_HU30_h50_0508.tif, this=0x559b9021e970)
GDAL: In GDALDestroy - unloading GDAL shared library.

I tried writing files to GCS using gsutil and that works fine, so I doubt it's a permission problem.

Operating system

Ubuntu 20.04 container running on GKE

GDAL version and provenance

GDAL 3.2.1-v3.2.1
Docker image built from ubuntu-small

@rouault
Copy link
Member

rouault commented Mar 23, 2021

Does it make a difference if you try with /vsigs/ instead of /vsigs_streaming/ as the source ? Note that efficient streamed reading of GeoTIFF files assumes they have a friendly layout (headers and tile offsets/bytecounts arrays at the beginning, 'sequential' ordering of tiles/strips within the file), but that would be more a performance issue (reseting the streamed reading over and over when a back seek is needed) than a functional one

@AndreaGiardini
Copy link
Contributor Author

Hey @rouault
No difference, both of them trigger the same error.

Anything else I should look for? The GDAL error is very cryptic...
ERROR 1: Error 0:

@AndreaGiardini
Copy link
Contributor Author

Another example with rasterio, giving the same problem.

import rasterio as rio

PATH_ORG_TILE = '/vsigs/mybucket/temp/0_PNOA_MA_OF_ETRS89_HU30_h50_0508.tif'
PATH_TILED_TILE = '/vsigs/mybucket/temp/0_PNOA_MA_OF_ETRS89_HU30_h50_0508_tiled_test.tif'

src = rio.open(PATH_ORG_TILE, 'r')
profile = src.profile
profile.update(tiled=True)

with rio.Env(CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE='YES', CPL_DEBUG='ON'):
    with rio.open(PATH_TILED_TILE, 'w+', **profile, blockxsize=256, blockysize=256) as dst:
        dst.write(src.read())

Completes successfully (no exception triggered), but no file is created on GCS

@rouault
Copy link
Member

rouault commented Mar 23, 2021

From the log and the code, the following occurs on a curl error:

> * Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
> ERROR 1: Error 0:

Googling for "Connection state changed (MAX_CONCURRENT_STREAMS == 100)" shows a number of occurrences.
You may want to try with the GDAL_HTTP_VERSION=1.1 configuration option / environment variable to see if that makes a difference.

@AndreaGiardini
Copy link
Contributor Author

Hey @rouault

I believe that MAX_CONCURRENT_STREAMS is used in the HTTP2 protocol between the client and server to negotiate the number of concurrent streams that can be open. The same happens during the download, so I believe it's just normal behavior.

Setting the environment variable GDAL_HTTP_VERSION=1.1 indeed fixes the problem (at least on gdal_translate, i still need to try on rasterio). So it looks like vsigs doesn't manage to write to GCS when the protocol used is HTTP2.

  • Is this expected behavior?
  • Any idea how this can be fixed?

@rouault
Copy link
Member

rouault commented Mar 23, 2021

* Is this expected behavior?

No.

Any idea how this can be fixed?

Not without further investigation. Might be an issue in curl/libnghttp2 or the way we use curl

Can you reproduce on a non-GCE instance ? On a GCE instance GDAL selects HTTP2 automatically, but on non-GCE machines, not (since when testing a few years ago, it was found that HTTP2 was slower on non-GCE machines). So you might need to set GDAL_HTTP_VERSION=2 on a non-GCE machine

@AndreaGiardini
Copy link
Contributor Author

I will see what I can do... unfortunately, I do not have a spare machine outside GCE that I can use for that. Moreover, the bucket is private and the authentication is set up using instance metadata so that's going to be tricky to reproduce.

In the meanwhile, thank you for your help!

@AndreaGiardini
Copy link
Contributor Author

GDAL_HTTP_VERSION=1.1 can be removed on newer GDAL versions and is no longer needed for this to work. Tested on GDAL 3.4.1. I am closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants