Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 when doing a resumable upload POST #623

Open
BigJerBD opened this issue Nov 15, 2021 · 9 comments
Open

404 when doing a resumable upload POST #623

BigJerBD opened this issue Nov 15, 2021 · 9 comments
Labels

Comments

@BigJerBD
Copy link

BigJerBD commented Nov 15, 2021

Image version : latest (v1.30.2)

I'm using apache beam to do resumable uploads into a fake gcs bucket (for testing purpose) , but I get this error

"GET  /storage/v1/b/data?alt=json HTTP/1.1\" 200 112" 
"POST /resumable/upload/storage/v1/b/data/o?alt=json&name=aac%2Ftest1%2Fbeam-temp-data-820e4a4e464311ecac030242ac150002%2F18266b8a-3b30-4bf3-bda5-af203113e46d.data.csv&uploadType=resumable HTTP/1.1\" 404 59"

I also confirmed that the path test1 was present :

"GET /storage/v1/b/data/o?maxResults=1&projection=noAcl&prefix=aac%2Ftest1%2F2021103019551635623713%2F&delimiter=%2F&prettyPrint=false HTTP/1.1\" 200 533"

It work with the real GCS service so I was wondering if the sent POST has any version compatibility error or if it isnt supported yet anyhow.

Thanks !

@fsouza
Copy link
Owner

fsouza commented Apr 22, 2022

@BigJerBD hey, would you be able to share a snippet on how to reproduce the issue? I can definitely look into this some time this weekend or early next week.

@fsouza fsouza added the bug label Apr 22, 2022
@wwwjn
Copy link

wwwjn commented Apr 22, 2022

Hi @BigJerBD , I'm also trying to use Apache beam Filesystems to upload and download (Using Filesystems). But I keep getting error: HttpError accessing <https://www.googleapis.com/resumable/upload/storage/v1/b/. It seems that it keeps accessing www.googleapis.com using Apache Beam, no matter how I set the environment variable. Could you please share a snippet how you do this? Thanks a lot!

@BigJerBD
Copy link
Author

I'll try this weekend to share a snippet the error that I had .

It's been a while so I probably lost it and have to reproduce it again 😅

@wwwjn
Copy link

wwwjn commented Apr 23, 2022

Hi, I monkey-patch Apache Beam to replace www.googleapis.com with fake-gcs-server, then I got the same error with @BigJerBD (I got 404 !)
And my script is: test.py (Apache beam version : apache-beam==2.36.0)

def test_GCS():
    URL = "gs://sample-bucket/test.gz"

    # write to test buckets
    with FileSystems.create(URL, compression_type=CompressionTypes.UNCOMPRESSED) as f:
        f.write(gzip.compress(b"hello world"))

if __name__ == "__main__":
    from .gcsio import *
    test_GCS()

And the gcsio.py file is (which is used for monkey-patch Apache Beam):

# Monkey-patch init function of GcsIO
import apache_beam.io.gcp.gcsio
from apache_beam.io.gcp.internal.clients import storage
from apache_beam.internal.gcp import auth
from apache_beam.internal.http_client import get_new_http

from google.auth.credentials import AnonymousCredentials

def new_init(self, storage_client=None):
    # raise Exception("This is a test")
    if storage_client is None:
        storage_client = storage.StorageV1(
            url = "http://0.0.0.0:4443/storage/v1/",
            credentials=auth.get_service_credentials(),
            get_credentials=False,
            http=get_new_http(),
            response_encoding='utf8'
        )
    self.client = storage_client
    self._rewrite_cb = None
    self.bucket_to_project_number = {}

# Monkey Patch the GcsIO to upload
apache_beam.io.gcp.gcsio.GcsIO.__init__ = new_init

And I got following error with resumable url:
image
And the following info is from the fake-gcs-docker:

time="2022-04-23T00:36:40Z" level=info msg="172.17.0.1 - - [23/Apr/2022:00:36:40 +0000] \"GET /storage/v1/b/sample-bucket?alt=json HTTP/1.1\" 200 153"

time="2022-04-23T00:36:40Z" level=info msg="172.17.0.1 - - [23/Apr/2022:00:36:40 +0000] \"POST /resumable/upload/storage/v1/b/sample-bucket/o?alt=json&name=test.gz&uploadType=resumable HTTP/1.1\" 404 59"

Thanks a lot for your help and hope this will help!

@BigJerBD
Copy link
Author

@wwwjn thank you very much for the snippet! This is indeed something like that I did when I was doing to use fake-gcs-server.

Apache beam or not, since this were also giving a 404, I was also wondering if this feature was implemented within fake-gcs-server or not.

Thanks ! :)

@wwwjn
Copy link

wwwjn commented May 10, 2022

Hi @fsouza, is there any progress on this bug? Thanks a lot for your help!

@fsouza
Copy link
Owner

fsouza commented May 11, 2022

Hi @fsouza, is there any progress on this bug? Thanks a lot for your help!

Hey, I haven't had a chance to look at it yet, but I assume the fix should be simple. I'll check it out in the coming weeks.

@wwwjn
Copy link

wwwjn commented May 11, 2022

Hi @fsouza, is there any progress on this bug? Thanks a lot for your help!

Hey, I haven't had a chance to look at it yet, but I assume the fix should be simple. I'll check it out in the coming weeks.

Thanks a lot! If there is anything I could do, feel free to just let me know!

@martinbjeldbak
Copy link

For anyone like me coming from Google and simply want to override the URL for Apache Beam to point to fake-gcs-server url, there's an issue tracking this here: apache/beam#21255

For now, the solution is still to patch the url in the test. This worked for me:

from unittest import mock

@mock.patch.object(apache_beam.io.gcp.internal.clients.storage.StorageV1, "BASE_URL",
                   "http://localhost:4443/storage/v1/")
def test_gcs_source():
    pass # test implementation here should now call the emulator

where http://localhost:4443 is the url of your fake-gcs-server instance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants