Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port not provided in URLs produced with url_for #1380

Closed
eric-spitler opened this issue Oct 25, 2018 · 15 comments · Fixed by #2052
Closed

Port not provided in URLs produced with url_for #1380

eric-spitler opened this issue Oct 25, 2018 · 15 comments · Fixed by #2052

Comments

@eric-spitler
Copy link

Describe the bug
The default port for a Sanic app is 8000, which by HTTP standards is a custom port. This port can also be set during calls to app.run().

When using app.url_for('handler.name', _external=True), the resultant URL does not contain the port without manual intervention.

This line is the final step before the application returns the URL. The URL is build using urlunparse which takes a tuple of values as would be producted by urlparse. The snippet below shows that only the first 6 arguments are provided, which does not include the port (unless included as part of netloc).
https://github.com/huge-success/sanic/blob/905c51bef03818c629173938a1183b38dd0ebc65/sanic/app.py#L695

In order to get port, the value of app.confg.SERVER_NAME must be set to the hostname and port

app.config.SERVER_NAME = '{}:{}'.format(host, port)
app.run(host=host, port=port)

Thus, the default behavior for creating the URL is incorrect as it requires manual attribute setting for every single application.

Expected behavior
Calls to app.url_for() with the _external flag should build the correct URL, including the port number, such as http://hostname:8000 without forcing every application to specify the SERVER_NAME config attribute.

Environment (please complete the following information):

  • OS: CentOS 7.5
  • Version 0.7.0
@eric-spitler
Copy link
Author

Thinking on this more last night, I can see the use case for things like Cloud Foundry deploys where the internal port being served would be obfuscated by the cloud instance, so the application should report :80 or :443, even if its running at :8000.

Perhaps a flag then to enable port inclusion?

@sjsadowski
Copy link
Contributor

Port 80 is the default port for HTTP and 443 is default for https, if the external URL is serving something like 'http://www.example.com/blah' then there's no real reason for port inclusion - however in every other case where the port is non-standard, I agree with your assessment that it should be included in the url. Possibly just inclusion of the port for the external uri should be enough. I'm sure one of the better devs will weigh in on it.

@eric-spitler
Copy link
Author

Sorry, when I said "should report :80" I meant that the application would need to generate URLs as if it were on that port (despite running on 8000), meaning the behavior as it stands right now. But this is also just for the case where there is an obfuscation layer or load balancer between callers and the application.

@ahopkins
Copy link
Member

@eric-spitler I think I am understanding now ... You are asking for a setting for a "base URL" that would be used to create an absolute URL that is different than what is being exposed.

For example, I have a sanic instance running as http://auth-service:7777/, but there is a load balancer ahead of it that proxies http://mydomain.com/auth to http://auth-service:7777/. Is this the use case?

@eric-spitler
Copy link
Author

Consider the default use case, a simple app serving an endpoint that returns the result of app.url_for:

from sanic.app import Sanic
from sanic.response import text

app = Sanic()
app.config.SERVER_NAME = 'localhost'

@app.route('/test', name='test', methods=['GET'])
def make_url(request):
        return text(request.app.url_for('test', _external=True))

app.run()

The application starts up on 127.0.0.1 (default host) and port 8000 (also default). Hitting that simple endpoint returns:

> curl http://127.0.0.1:8000/test
http://localhost/test

Butting hitting the URL returned by app.url_for results in:

> curl http://127.0.0.1/test
curl: (7) Failed connect to 127.0.0.1:80; Connection refused

My point in this issue is that the URL generated by app.url_for does not by its nature return the operating port - it requires setting the following in order to return the actual URL:

app.SERVER_NAME = 'localhost:8000'

The more complex use case is where the application is running behind another interface, such as Cloud Foundry or AWS load balancers, in which case the developer must set app.SERVER_NAME anyway to return the outward-visible hostname. In these cases, the internal port doesn't matter, so calls to app.url_for will commonly need to produce a URL without a port - the load balance should be serving it at 80/443. Considering that, the default behavior in Sanic right now seems as though it is intended to be behind another interface since it omits the port by default. However, many uses of Sanic will not be done this way, they will serve on a specific port and the generated URL should reflect that by default.

As it stands, the developer must always set the following, but the documentation does not make that immediately apparent.

app.SERVER_NAME = '{}:{}'.format(hostname, port)
app.run(host=hostname, port=port)

I propose 2 possible updates:

The second bullet will support both the simple and complex use cases I spoke of. Default the value to True such that the simple example I provided works as-is. Setting it to False would be for the complex case where it is generating a URL that should never have the port because it is being another interface.

Hopefully that makes more sense? Sorry if I was unclear before.

@ahopkins
Copy link
Member

Thanks for the explanation @eric-spitler.

I certainly feel a documentation update is in order here. That makes sense.

As for the flag, I am not opposed to having an include_port=True parameter. But, I would default it to False so that the behavior does not suddenly change for users. Yes, that means the "more complex" use case would be default, which seems sort of backwards, but I feel keeping backwards compatibility on this is a bigger priority. Otherwise unsuspecting developers would all of a sudden start getting ports unexpectedly on their URLs if they upgraded without reading the changelog.

@lixxu
Copy link
Contributor

lixxu commented Oct 29, 2018

I suggest not use include_port as parameter name due to current special parameters all starts with _ (may use _port for it, and default to None, and also can get the value from configuration like SERVER_NAME does).
Then in url_for function, check _port, if has value, show it, else don't show port in the url.

@sjsadowski
Copy link
Contributor

@lixxu I don't understand your argument; _var indicates semiprivate which sanic honors - for more info.

@ahopkins
Copy link
Member

@lixxu I think the point @eric-spitler was making is that it should be configurable, and transparent.

@eric-spitler
Copy link
Author

@lixxu If it pulls from configuration, would that config be set implicitly during the call to app.run()?

Having to specify _port as a number as part of the url_for call defeats the transparency idea that @ahopkins mentioned.

@lixxu
Copy link
Contributor

lixxu commented Oct 30, 2018

SERVER_NAME is from configuration which is _server in url_for parameters, I mean that (e.g.) use _port in url_for parameters, then SERVER_PORT in configuration file.

@vltr
Copy link
Member

vltr commented Nov 5, 2018

In that case, I think that having a SERVER_PORT in the configuration, like @lixxu have mentioned, would be the best approach since we have this approach for the SERVER_NAME:

https://github.com/huge-success/sanic/blob/f13f4510846e21456ad93e5cd916564fb32e987a/sanic/app.py#L634-L636

One problem I see is that urllib.parse.urlunparse doesn't have a "port" value to be set, so we'll have to append it to the netloc parameter, knowing that this function does not ignore default port for protocols¹ (scheme):

>>> from urllib.parse import urlunparse 
>>> urlunparse(("https", "localhost", "/hello/world", "", "", ""))
'https://localhost/hello/world'
>>> urlunparse(("https", "localhost:443", "/hello/world", "", "", ""))
'https://localhost:443/hello/world'
>>> urlunparse(("http", "localhost:80", "/hello/world", "", "", ""))
'http://localhost:80/hello/world'
>>> urlunparse(("http", "localhost", "/hello/world", "", "", ""))
'http://localhost/hello/world'

So, the thing is ...

  • Should we need to tackle this, since the "correct" port can be set with SERVER_NAME already?
  • If so, should we "fix" default ports, like stripping 80 for http and 443 for https¹?

[1] I don't think that any URL builder would strip the port value if one is given.

@lixxu
Copy link
Contributor

lixxu commented Nov 8, 2018

I created a PR (#1406) for it.

@Tronic
Copy link
Member

Tronic commented Sep 2, 2019

This is sorted out by #1638, which gets hostname and port always from the same location (proxy headers or if not found, Host header), and thus proper URLs may be generated via request.url_for which, contrary to app.url_for, has access to this data. No fallback to host/port specified at run, or SSL server name identifier / socket port, is needed because Host header is available in all normal circumstances, and it contains this information.

app.url_for, which might be used at layout generation time or otherwise while the app is not yet running, can only rely on config.SERVER_NAME, which should contain the full URL to application root including any non-standard port number.

@eric-spitler Do you believe that the current approach is sufficient?

One shortcoming of this is that in order to use app.url_for, one must manually set app.config.SERVER_NAME = "http://localhost:8000/" to get external URLs with the default development setup, but we cannot give a default value for SERVER_NAME either.

Since app.url_for is likely used before app.run is called, any host/port passed to run is not available to it, and I cannot readily see any easy solution to this.

@eric-spitler
Copy link
Author

That PR resolves this case for situations where the URL is being assembled from with a request context. However, it does not resolve it in other cases, such as another thread that is processing ongoing tasks unrelated to incoming requests. The latter use case is likely uncommon so this PR resolves a large percentage of these situations.

So long as the request.url_for usage properly retains the outward facing hostname/IP (such as when behind an AWS VPC), then that will always be a better way to generate the URL anyway.

I personally am okay with ensuring that the hostname:port combination is set (and already apply this automatically in my service template I use), but the documentation update would make this clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants