Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export_workspace: allow setting the region when configuring the workspaces #913

Open
VictorVerhaert opened this issue Oct 22, 2024 · 11 comments
Assignees

Comments

@VictorVerhaert
Copy link

Check what region is used for setting up the s3 client to a bucket and allow for setting the region specific for export_workspace.
Otherwise the same workspaces that work on CDSE won't work on OTC which has a different default s3 endpoint

@JeroenVerstraelen
Copy link
Contributor

region = endpoint

Option 1:

  • Use the same bucket on OTC and WAW3-1
  • Bucket credentials given in WAW3-1 should accessible in OTC

Option 2:

  • Create new bucket on OTC
  • Merge buckets over time

@pvbouwel
Copy link
Contributor

pvbouwel commented Nov 14, 2024

In the draft standard for workspaces (https://github.com/Open-EO/openeo-api/blob/draft/extensions/workspaces/openapi.yaml#L41) each workspace has a workspace provider. This workspace provider could determine which region (endpoint) should be used.

@pvbouwel
Copy link
Contributor

pvbouwel commented Nov 20, 2024

This relates to MVP1 of https://confluence.vito.be/pages/viewpage.action?spaceKey=EP&title=OpenEO+S3+access

It requires:

@pvbouwel
Copy link
Contributor

It seems we do not yet have a way to configure config files for a job execution in geopyspark yet. Most of the time environment variables are used. There are already config maps that are created for a spark application (like prometheus config) but these seem to be managed by the spark operator. It seems we need one for ourselves as well to store:

  • the token file
  • the aws profiles
  • bucket to profile mappings (future)

@pvbouwel
Copy link
Contributor

First add public and private keys to each of the environments (CDSE) as that will be a pre-requisite for the web identity tokens

@pvbouwel
Copy link
Contributor

pvbouwel commented Nov 22, 2024

Public and private keys for IDP are staged into vault. For each environment I created a new version of the vault object we keep but I added the 2 fields that are generated using:

echo -e "y\n\n\n" | ssh-keygen -t rsa -b 2048 -m PEM -f rsa &&ssh-keygen -f rsa -e -m PEM > rsa.pub && echo -e "  \"idp_private_key\": \"$(cat rsa | tr '\n' '@' | sed 's/@/\\n/g' | sed 's/\\n$//'| sed 's/\\n/\n/'g  | base64 | tr -d '\n')\",\n  \"idp_public_key\": \"$(cat rsa.pub | tr '\n' '@' | sed 's/@/\\n/g' | sed 's/\\n$//' | sed 's/\\n/\n/'g | base64 | tr -d '\n')\","

@pvbouwel
Copy link
Contributor

Deployed a build-version of the STS and s3 proxy with the token trust to cdse-staging but when I run the artifacts upload workflow it fails if I don´t specify the region explicitly. So that would be a regression must fix that first.

@pvbouwel
Copy link
Contributor

Now running with latest build version and with s3 headless:

sh-4.4$ ping s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local
PING s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local (10.153.2.72) 56(84) bytes of data.
^C
--- s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 9240ms

sh-4.4$ dig s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local

; <<>> DiG 9.11.26-RedHat-9.11.26-6.el8 <<>> s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28673
;; flags: qr aa rd; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 38cdc92fec70699c (echoed)
;; QUESTION SECTION:
;s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	IN A

;; ANSWER SECTION:
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.2.72
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.1.199
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.0.178
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.0.3
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.1.112
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.3.26
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.4.80
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.0.185
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.2.197
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.2.55
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.1.44
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.4.196
s3-s3proxy-fakes3pp-headless.s3proxy.svc.cluster.local.	15 IN A	10.153.1.4

;; Query time: 3 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Thu Nov 28 12:32:13 UTC 2024
;; MSG SIZE  rcvd: 1005

We also have the S3Proxy as a host port but in order to not have problems accessing it from Traefik we had to add a security group rule.

@pvbouwel
Copy link
Contributor

While we will use profiles to enable these special access cases we will use region to identify the target where the bucket resides. The profile name we can keep the same as the workspace name but we do need to track the region in the backend. There is a bigger effort undergoing for supporting profile but in the meantime I will add region as an optional parameter in the config as that will ease roll-out at a later stage (allowing different types of config).

@pvbouwel
Copy link
Contributor

Created #955 just to allow the region to be specified in the config.

@pvbouwel
Copy link
Contributor

Implemented https://github.com/eu-cdse/openeo-cdse-infra/issues/328 in order to have the OTC setup prepared. Using the token from a local profile works correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants