Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISCUSSION: proxy: add CORS headers to /twitcher/, affect all services behind it #450

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tlvu
Copy link
Collaborator

@tlvu tlvu commented Apr 26, 2024

Overview

This PR is rather to start a discuss than ready to merge. That's why there is no CHANGES.md update.

So I needed to add CORS allow headers for Thredds, so our partner javascript webapps running on other domains than pavics.ouranos.ca can hit our Thredds, so we act as the backend for their frontend.

  1. Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

  2. By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header (optional-components/x-robots-tag-header) ! Is that expected? Or the way I add headers to Twitcher is wrong? I was just doing the same thing as all the existing services. The X-Robots-Tag header is important to avoid being hit by crawlers.

birdhouse/config/magpie/config/proxy/conf.extra-service.d/magpie.conf.template
5:        include /etc/nginx/conf.d/cors.include;

birdhouse/deprecated-components/ncwms2/config/proxy/conf.extra-service.d/ncwms2.conf.template
5:    #    include /etc/nginx/conf.d/cors.include;

birdhouse/components/cowbird/config/proxy/conf.extra-service.d/cowbird.conf.template
8:        include /etc/nginx/conf.d/cors.include;

birdhouse/components/weaver/config/proxy/conf.extra-service.d/weaver.conf.template
16:        include /etc/nginx/conf.d/cors.include;

birdhouse/components/stac/config/proxy/conf.extra-service.d/stac.conf.template
12:        include /etc/nginx/conf.d/cors.include;
  1. Is our current CORS headers way too permissive?

This is what we return https://github.com/bird-house/birdhouse-deploy/blob/97ee8da24821391aeef52b13ea9adda28f919085/birdhouse/components/proxy/conf.d/cors.include

Access-Control-Allow-Origin: *                                                                                                                                                           
Access-Control-Allow-Methods: GET, POST, OPTIONS                                                                                                                                         
Access-Control-Allow-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range
Access-Control-Expose-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range

This is already enough

Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept, Authorization
Access-Control-Allow-Methods: POST, GET, OPTIONS

I think perhaps we should not allow-origin * but a list of known partners domain? And trim down the allow-headers list?

I am not security expert so I want to hear from you guys.

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false

@huard
Copy link
Collaborator

huard commented Apr 26, 2024

Proposal to white list https://raven.uwaterloo.ca/

@mishaschwartz
Copy link
Collaborator

Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

Unfortunately I don't think it will even do that.

If we want to apply the cors headers everywhere, why not apply it at the proxy level instead of on twitcher. Not all requests will go through the twitcher proxy necessarily.

Some components (geoserver, jupyterhub, all the monitoring components, secure-data-auth) just use twitcher's verify endpoint to check whether a user has access, in those cases, the cors headers wouldn't be included.

Jupyterhub and the monitoring components probably won't matter for cross-site scripts but the others will matter I'm sure.

By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header

Is this when making a cross-origin request or everywhere?

You might have to add the X-Robots-Tag to the headers listed in Access-Control-Allow-Headers and Access-Control-Expose-Headers otherwise they won't be sent (but we'll have to test it to be sure)

Is our current CORS headers way too permissive?

I think perhaps we should not allow-origin * but a list of known partners domain?

This would have to be configurable as not every deployment would want the same domains.
We should also think about whether we would need to set Access-Control-Allow-Credentials as well.

And trim down the allow-headers list?

This one will require some thought. Do you know if there are recommended best-practices for setting these?

@tlvu
Copy link
Collaborator Author

tlvu commented Apr 26, 2024

Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

Unfortunately I don't think it will even do that.

Sorry I meant adding CORS headers to the proxy location /twitcher/ for Twitcher. So the change is actually in Nginx and not in Twitcher but for Twitcher and all services behind it. See my initial code change.

By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header

Is this when making a cross-origin request or everywhere?

Everywhere. But I think I understood how this works. X-Robots-Tag: noindex, nofollow was added to the root location. But all child location directive do not inherit but override all added headers so to keep any headers added by the parent, the child location will have to repeat those headers.

So probably each time that cors.include file was included, we lost X-Robots-Tag header ! So we've been losing that header for magpie, weaver, cowbird, stac and we didn't even know !

I think perhaps we should not allow-origin * but a list of known partners domain?

This would have to be configurable as not every deployment would want the same domains. We should also think about whether we would need to set Access-Control-Allow-Credentials as well.

Agreed it has to be configurable.

I was reading how to make it configurable, it's not so simple. I didn't know "if is evil" in Nginx config and our current cors.include file uses if !

Some interesting read I found:

https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/

http://agentzh.blogspot.com/2011/03/how-nginx-location-if-works.html

https://www.juannicolas.eu/how-to-set-up-nginx-cors-multiple-origins/

And trim down the allow-headers list?

This one will require some thought. Do you know if there are recommended best-practices for setting these?

Have not had time but yes we should follow some newer best practices. That cors.include file was there even before I started at Ouranos.

@fmigneault
Copy link
Collaborator

fmigneault commented Apr 26, 2024

Unfortunately I don't think it will even do that.

If we want to apply the cors headers everywhere, why not apply it at the proxy level instead of on twitcher. Not all requests will go through the twitcher proxy necessarily.

That is my current understanding of what is already applied by the following, which includes cors.include:

include /etc/nginx/conf.d/*.conf;

Setting it under /twitcher/ "should" only redefine it the same way (it WON'T though, see next).

1. Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

2. By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header (optional-components/x-robots-tag-header) ! Is that expected? Or the way I add headers to Twitcher is wrong? I was just doing the same thing as all the existing services. The X-Robots-Tag header is important to avoid being hit by crawlers.

if inside location definitions are big no-no: https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/
It's not a thing about "overriding headers or not", it is just undefined behavior.
if are perfectly valid at the server level (as they are currently applied).

Is our current CORS headers way too permissive?

Maybe?
But good consideration must be given about which set of Origins are defined if set explicitly.
If only the partner's Origin is set, any following partner or browser JavaScript using the server as a backend will suddenly be refused.
Since the purpose of the platform is to provide access to services, I'm not sure adding strict origin control will change much other than make our maintenance harder.
If configurable, I don't mind, as long as the default remains as the current value.

For the Access-Control-Allow-Headers, I think currents ones are good. Some of them are for caching or prechecks, which can help reduce request/response time/content-size if supported by a service. They shouldn't hurt. The only one I don't know is X-CustomHeader.

@mishaschwartz
Copy link
Collaborator

Sorry I meant adding CORS headers to the proxy location /twitcher/ for Twitcher. So the change is actually in Nginx and not in Twitcher but for Twitcher and all services behind it. See my initial code change.

I understand... what I was saying is that there are components that are not behind twitcher that will not be affected by this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants