You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have observed on our Kubernetes deployments some responsiveness issue in the UI. After investigation, we've discovered the API calls made to fetch pipelines and jobs were pending and eventually timed out. Digging deeper, this was due to the controller calling the GCP metadata server too often:
object_store::gcp::credential::fetching token from metadata server
Which results in:
object_store::client::retry::Encountered transport error (error sending request for url (http://metadata/computeMetadata/v1/instance/service-accounts/default/token?audience=https%3A%2F%2Fwww.googleapis.com%2Foauth2%2Fv4%2Ftoken): error trying to connect: operation timed out) backing off for 0.1 seconds, retry 1 of 10
So after an attempt to reduce the number of object_store::gcp::credential call to the metadata server, I've switched from Kubernetes SA authentication to passing the serialised JSON of the SA credentials as GOOGLE_SERVICE_ACCOUNT_KEY environment variable, which did reduce the number of calls but did not resolve the unresponsive Arroyo API issue. I also tried hacking the object_store by forcing it to use the S3 client for authenticating to GCS with AWS_DEFAULT_REGION, AWS_ENDPOINT but had no luck for the authentication...
Here are the screenshots of what that looks like in practice:
Happy to help if there's anything I can do !
The text was updated successfully, but these errors were encountered:
Still facing weird UI behaviour with the cache mecanism in the v0.11.0 release unfortunately :/ The workers run as expected I think but the checkpoint would suddenly go from a couple of secs duration to several minutes, for all pipelines at the same moment. Regarding the UI it seems the pipelines page returns partial and varying results each time it is refreshed.
We have observed on our Kubernetes deployments some responsiveness issue in the UI. After investigation, we've discovered the API calls made to fetch pipelines and jobs were pending and eventually timed out. Digging deeper, this was due to the controller calling the GCP metadata server too often:
object_store::gcp::credential::fetching token from metadata server
Which results in:
object_store::client::retry::Encountered transport error (error sending request for url (http://metadata/computeMetadata/v1/instance/service-accounts/default/token?audience=https%3A%2F%2Fwww.googleapis.com%2Foauth2%2Fv4%2Ftoken): error trying to connect: operation timed out) backing off for 0.1 seconds, retry 1 of 10
So after an attempt to reduce the number of
object_store::gcp::credential
call to the metadata server, I've switched from Kubernetes SA authentication to passing the serialised JSON of the SA credentials asGOOGLE_SERVICE_ACCOUNT_KEY
environment variable, which did reduce the number of calls but did not resolve the unresponsive Arroyo API issue. I also tried hacking the object_store by forcing it to use the S3 client for authenticating to GCS withAWS_DEFAULT_REGION
,AWS_ENDPOINT
but had no luck for the authentication...Here are the screenshots of what that looks like in practice:
Happy to help if there's anything I can do !
The text was updated successfully, but these errors were encountered: