Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set the configured protocol transport for service metadata #9490

Merged
merged 4 commits into from
Aug 26, 2024

Conversation

butonic
Copy link
Member

@butonic butonic commented Jun 27, 2024

This allows using dns or unix as the grpc protocol for services. Required reva changes have been bumped.

related:

With this PR we can configure the grpc clients to use dns:///service with headless services in kubernetes to gracefully handle connection errors when a pod goes down.
To also handle scale up we need to make grpc servers force close connections by setting the keep alive parameter:

@butonic
Copy link
Member Author

butonic commented Jun 27, 2024

when registering the service with dns:/// we may have to register the domain name ... not the ip address. needs rethinking. maybe configure the svc endpoints with dns:///gateway... and skip the micro service registry if the endpoint starts with dns:///. That would allow reverting the if protocol == dns; protocol = 'tcp' hack.

@butonic
Copy link
Member Author

butonic commented Jun 27, 2024

meh but now we need to configure all service names as dns:/// and cannot use the port lookup ... so all occurences of the service name need to be made configurable, eg. the gateway now needs env vars so we can configure the port ....

or we register the service as is and read the port and dns name from metadata? nah how would we know the dns service name from the grpc server config, we need to make the service names configurable for clients ... 🤔

@butonic
Copy link
Member Author

butonic commented Jun 27, 2024

actually, none of the changes are necessary. the transport is ignored by clients anyway. the reva pool will just skip the service lookup if the address has a dns:/// prefix. but we need to make all occurences of the service names for client configuration configurable... preferably as env vars ...

@butonic
Copy link
Member Author

butonic commented Jun 28, 2024

I added a commit that makes the endpoints used by the gateway configurable again. Together with cs3org/reva#4744 we can now test how ocis scales in kubernetes.

To enable retries for cs3 grpc clients (unything that uses the reve pool to get a selector) the endoint must not be configured as a service name, eg. GATEWAY_STORAGE_USERS_ENDPOINT=com.owncloud.api.storage-users (the default), but as a grpc service name including port, eg. GATEWAY_STORAGE_USERS_ENDPOINT=kubernetes:///storage-users:9157. See the upstream docs on how to name grpc services.

More things to test

  • the storage-system service does not need use the service registry to talkh to itself. we can configure the new STORAGE_SYSTEM_ENDPOINT=127.0.0.1:9215 ... maybe that needs a different env name as the service also exposes an http endpoint. done with Omit some lookups in gateway #9715
  • test dns:/// grpc names with headless services? everything I read suggests using the kubernetes api as kuberesolver does is the way to go.
  • for a single node deployment we could use unix sockets and use paths like unix:/run/ocis/{servicename}-{uuid}.sock

@butonic
Copy link
Member Author

butonic commented Jun 28, 2024

Added a commit to fix the startup of ocis by parsing the OCIS_REVA_GATEWAY early.

Double checked this works by starting ocis with a gateway that listens on a unix socket:

					"OCIS_REVA_GATEWAY": "unix:ocis-reva-gateway.sock",
					"GATEWAY_GRPC_ADDR": "ocis-reva-gateway.sock",
					"GATEWAY_GRPC_PROTOCOL": "unix",
					"STORAGE_USERS_GATEWAY_GRPC_ADDR": "unix:ocis-reva-gateway.sock",
					"WEB_GATEWAY_GRPC_ADDR": "unix:ocis-reva-gateway.sock",
  • WEB_GATEWAY_GRPC_ADDR does not pick up OCIS_GATEWAY_GRPC_ADDR. Why?
  • GATEWAY_GRPC_ADDR does pick up OCIS_GATEWAY_GRPC_ADDR but it should not. OCIS_GATEWAY_GRPC_ADDR is used to configure clients. GATEWAY_GRPC_ADDR is used to configure the server listener ... it takes into account the GATEWAY_GRPC_PROTOCOL

moved to dedicated issue: #9718

@butonic butonic force-pushed the set-service-transport branch from fe4f543 to 93a86f1 Compare July 15, 2024 09:53
@butonic butonic self-assigned this Jul 16, 2024
@butonic butonic force-pushed the set-service-transport branch from 93a86f1 to 2b390e0 Compare July 30, 2024 15:23
StorageSharesEndpoint string `yaml:"-"`
AppRegistryEndpoint string `yaml:"-"`
OCMEndpoint string `yaml:"-"`
UsersEndpoint string `yaml:"users_endpoint" env:"GATEWAY_USERS_ENDPOINT" desc:"The USERS API endpoint." introductionVersion:"%%NEXT%%"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are a lot of new envvars. Can't we make it more convenient for devops to configure them? I imagine they don't want to maintain another 13 envvars.

If it's only about the protocol, can't we have a global envvar for that and just add hardcoded service names?

If that is not possible, maybe it makes sense to NOT support env configuration? One could still add them to their yaml file if they so choose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our current default uses the service name. and we need to make this configurable to test dns based load balancing that skips the registry lookup. the grpc client will then use the configured dns:///service or kubernetes:///service name to look up all available endpoints. that is why it has to be 13 different endpoint.

If we decide to change the default for eg unix sockets on single node deployments we need to change the helm charts as well. but since we do not change the default from the existing hardcoded service name this is not breaking the chart.

For testing the load balancing changing env vars in kubernetes is actuallly a lot easier than providing custom yaml files.

So I don't see any inconvenience for devops right now. And if we decide to change the default WE - as in engineering not devops - need to update the helm charts anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then fine for me. But if it is real envvars they need a proper description please. The USERS API endpoint. is not explanatory enough.

@butonic butonic requested a review from kobergj July 31, 2024 13:22
@mmattel
Copy link
Contributor

mmattel commented Aug 1, 2024

Not talking about the endpoint envvars like xxx_AUTH_BASIC_ENDPOINT, but:

Taking your comment from above, it makes imho not that much sense to configure all services individually, see your example below.

"OCIS_REVA_GATEWAY": "unix:ocis-reva-gateway.sock",
"GATEWAY_GRPC_ADDR": "ocis-reva-gateway.sock",
"GATEWAY_GRPC_PROTOCOL": "unix",
"STORAGE_USERS_GATEWAY_GRPC_ADDR": "unix:ocis-reva-gateway.sock",
"WEB_GATEWAY_GRPC_ADDR": "unix:ocis-reva-gateway.sock",

Why not using additional global envvars and add them to each affecting service envvar to define defaults?
Any deviations are usually made by custom settings in the service envvar...

OCIS_GRPC_ADDR:     "ocis-reva-gateway.sock"
OCIS_GRPC_PROTOCOL: "unix"

Note that endpoint envvar descriptions need and update because The AUTH BASIC API endpoint. is by far not suffcient...

@butonic butonic force-pushed the set-service-transport branch 3 times, most recently from db19b78 to 892c51a Compare August 12, 2024 13:07
@butonic
Copy link
Member Author

butonic commented Aug 12, 2024

@kobergj please rereview

Copy link
Contributor

@mmattel mmattel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@butonic

  • Taking the pattern: GATEWAY_service_name_ENDPOINT
    This means that some, but not all servcies need to have an endpoint definition in the gateway service. We have a checklist when a new service is created. This needs to be added as item and a description when applicapable.
  • The README.md of the gateway service needs an update to describe the new settings (with more words) as stated in the changelog: "This allows configuring services to listan on tcp or unix sockets and clients to use the dns, kubernetes or unix protocol URIs instead of service names."
  • GATEWAY_PERMISSIONS_ENDPOINT --> The endpoint of the settings service.
    The naming of the envvar does not match the serivce name.
  • There is a new envvar COLLABORATION_GRPC_PROTOCOL but just for that service. In former commits, we had that scheme in many other services too...?

@butonic
Copy link
Member Author

butonic commented Aug 14, 2024

@butonic

* Taking the pattern: `GATEWAY_service_name_ENDPOINT`
  This means that some, but not all servcies need to have an endpoint definition in the gateway service. We have a [checklist](https://owncloud.dev/services/general-info/new-service-checklist/) when a new service is created. This needs to be added as item and a description when applicable.

When the gateway needs to talk to a new service it needs to follow the pattern. This maght happen when the CS3 API sees changes that affect the way the gateway interacts with other CS3 services. But that is not part of the process of introducing a new service.

* The [README.md](https://github.com/owncloud/ocis/blob/master/services/gateway/README.md) of the gateway service needs an update to describe the new settings (with more words) as stated in the changelog: "This allows configuring services to listen on `tcp` or `unix` sockets and clients to use the `dns`, `kubernetes` or `unix` protocol URIs instead of service names." 

I plan to add that to the documentation with an example that replaces all internal tcp connections with unix sockets. In a subsequent PR. As that requires more testing I did not want to document this change in depth. The idea is to also test the dns and kubernetes protocols for grpc clients instead of the service registry. We need to test all these different options before fully documenting them. To do that we need to make ocis configurable ... so ... more details and recommendations will follow in a subsequent PR.

* `GATEWAY_PERMISSIONS_ENDPOINT` --> `The endpoint of the settings service.`
  The naming of the envvar does not match the serivce name.

well, the settings service implements the gRPC permissions endpoint. The documentation tells you what to set this to, the gateway only needs the permissions endpoint of the service. That is why the description differs.

* There is a new envvar `COLLABORATION_GRPC_PROTOCOL` but just for that service. In former commits, we had that scheme in many other services too...?

Yes, it was missing from the collaboration service. All other services could already be configured to use unix sockets. this can now also be done for the collaboration service.

Sound remarks! Alas, this PR is more intended to make changing and testing the way ocis behaves in kubernetes possible. The results of that testing will come in another PR.

@butonic butonic requested a review from mmattel August 17, 2024 11:18
@mmattel
Copy link
Contributor

mmattel commented Aug 17, 2024

Thanks for the explanations.

  1. With respect that all this gets (well 😅 ) documented in various places, I believe so and it will come - but I will keep a hawk eye on, as always 😄
    I still believe that GATEWAY_service_name_ENDPOINT needs to be part of the new service documentation (maybe I did not get the point why not) but it belongs to the doc part in this item.

  2. COLLABORATION_GRPC_PROTOCOL
    OK, I got it. But when this now gets configurable with more than tcp which is the current default and only value, I strongly advise to generate a global envvar like OCIS_GRPC_PROTOCOL in all services that use it grep -rn _GRPC_PROTOCOL. Imho it really does not make sense to have no global envvar for that. This should be added in this PR.

  3. GATEWAY_PERMISSIONS_ENDPOINT naming
    I disagree on your explanation. It maybe ok for devs but in no way for admins who have to deal with. The naming of the envvar must be GATEWAY_SETTINGS_ENDPOINT because the description text tells it so. Alternatively it could be named GATEWAY_SETTINGS_PERMISSIONS_ENDPOINT which than would meet both requirements.

You can of course overrule my change request (comments), but 2/3 should be fixed here and not in another PR...

@ScharfViktor ScharfViktor mentioned this pull request Aug 20, 2024
21 tasks
@butonic butonic force-pushed the set-service-transport branch from 892c51a to ebb93a1 Compare August 23, 2024 18:45
@butonic
Copy link
Member Author

butonic commented Aug 23, 2024

@mmattel I updated the gateway readme to give a hint at what this can do. but it really is only meant as a way to explore different types of deployment.

I also added a OCIS_GRPC_PROTOCOL env var. I do not use it in the readme, because I left out the collaboration service in my testing.

Lastly, I changed the description of the GATEWAY_PERMISSIONS_ENDPOINT to aling with eg. STORAGE_USERS_OCIS_PERMISSIONS_ENDPOINT which also uses 'permissions service' in the description and uses the com.owncloud.api.settings as the default value. If this needs clarification we can do it in a subsequent PR.

@butonic butonic force-pushed the set-service-transport branch from ebb93a1 to 78c160f Compare August 23, 2024 18:51
services/gateway/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@mmattel mmattel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one small suggestable commit, the rest is 👍

We should, I have first seen that by your comment:
GATEWAY_PERMISSIONS_ENDPOINT and STORAGE_USERS_OCIS_PERMISSIONS_ENDPOINT clarify somehow that permissions is not a service as written - but this is only a docs thing...

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>

Update services/gateway/README.md

Co-authored-by: Martin <[email protected]>

fix env tag

Signed-off-by: Jörn Friedrich Dreyer <[email protected]>
@butonic butonic force-pushed the set-service-transport branch from 81ae9c6 to e552196 Compare August 26, 2024 09:03
@butonic butonic dismissed kobergj’s stale review August 26, 2024 09:04

all adressed

Copy link

@butonic butonic merged commit c8ac43f into master Aug 26, 2024
4 checks passed
@butonic butonic deleted the set-service-transport branch August 26, 2024 10:06
ownclouders pushed a commit that referenced this pull request Aug 26, 2024
set the configured protocol transport for service metadata
mmattel added a commit that referenced this pull request Aug 26, 2024
References: #9490 (set the configured protocol transport for service metadata)

As discussed with @butonic, adding that _configurable service endpoints_ are currently experimental.
@micbar micbar mentioned this pull request Sep 12, 2024
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants