Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Host based routing to check path #2263

Closed

Conversation

aminalaee
Copy link
Member

@aminalaee aminalaee commented Sep 1, 2023

Summary

Closes #2248

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

@aminalaee aminalaee force-pushed the fix-host-based-routing-to-check-path branch from fab3fdc to edac97d Compare September 1, 2023 15:32
@aminalaee aminalaee marked this pull request as ready for review September 4, 2023 07:02
@aminalaee aminalaee force-pushed the fix-host-based-routing-to-check-path branch from edac97d to 343f735 Compare September 4, 2023 07:02
@aminalaee aminalaee changed the title Fix Host based routing to check path too Fix Host based routing to check path Sep 4, 2023
@aminalaee aminalaee force-pushed the fix-host-based-routing-to-check-path branch from 343f735 to 2f24fea Compare September 6, 2023 14:12
tests/test_routing.py Outdated Show resolved Hide resolved
@aminalaee aminalaee force-pushed the fix-host-based-routing-to-check-path branch from 2f24fea to fdd7d43 Compare September 7, 2023 08:22
@nicolaipre
Copy link

Tested and can confirm this solves #2248

@aminalaee aminalaee force-pushed the fix-host-based-routing-to-check-path branch from 65f9b08 to 6f80445 Compare September 19, 2023 07:40
@nicolaipre
Copy link

Any ETA on merge / next release, @Kludex? 🙂

@Kludex
Copy link
Member

Kludex commented Sep 22, 2023

No. You are not blocked to have this on your project. Subclass Host, and override matches for your case.

@nicolaipre
Copy link

No. You are not blocked to have this on your project. Subclass Host, and override matches for your case.

Thank you for the feedback, and yes I am aware. That is what I am currently doing, but just looking forward to see it added in so I can remove my local patch/override 👍

@nicolaipre
Copy link

Good thing this was not merged in yet, because I just found another bug while testing. It seems the same issue occurs for Mount() objects as well after the patch is applied.

Here is a modified example to test with:

# example.py
from starlette.applications import Starlette
from starlette.routing import Host, Mount, Route, Router, Match as M
from starlette.responses import PlainTextResponse
import uvicorn

# -----------------------------------------------------------------------------------

from starlette.datastructures import Headers
from starlette.types import Scope
import typing
class Host(Host):
    # Overrides matches as according to PR: https://github.com/encode/starlette/pull/2263
    def matches(self, scope: Scope) -> typing.Tuple[M, Scope]:
        if scope["type"] in ("http", "websocket"):
            headers = Headers(scope=scope)
            host = headers.get("host", "").split(":")[0]
            host_match = self.host_regex.match(host)
            path_match = self.routes == [] or any(
                [route.matches(scope)[0] == M.FULL for route in self.routes]
            )
            if host_match and path_match:
                matched_params = host_match.groupdict()
                for key, value in matched_params.items():
                    matched_params[key] = self.param_convertors[key].convert(value)
                path_params = dict(scope.get("path_params", {}))
                path_params.update(matched_params)
                child_scope = {"path_params": path_params, "endpoint": self.app}
                return M.FULL, child_scope
        return M.NONE, {}
# -----------------------------------------------------------------------------------

async def home(request):
    return PlainTextResponse("home\n")

async def foo(request):
    return PlainTextResponse("foo\n")

async def bar(request):
    return PlainTextResponse("bar\n")

ROUTES = [
    Route("/", endpoint=home, name="Home"),
    
    Host("api.example.com", name="Foo Routes", app=Router(routes=[
        #Mount("/foo", name="Foo Mount", routes=[
        Mount("/", name="Foo Mount", routes=[ # 200 OK
            Route("/foo", foo, name="Foo")
        ]),
    ])),
    
    Host("api.example.com", name="Bar Routes", app=Router(routes=[
        Mount("/bar", name="Bar Mount", routes=[ # 404 Not Found
            Route("/bar", bar, name="Bar")
        ]),
    ])),
]

for r in ROUTES:
    print(r)

app = Starlette(routes=ROUTES)

if __name__ == '__main__':
    uvicorn.run("example:app", host='0.0.0.0', port=8000, reload=True)
    
# Run tests:
# curl -i api.example.com:8000/foo # 200 OK
# curl -i api.example.com:8000/bar/bar # 404 Not found

The example above is very similar to the one in #2248 except that patch #2263 is added and the Foo Mount now only mounts to the prefix / instead of /foo. This produces the following results:

GET /        -> 200 OK
GET /foo     -> 200 OK
GET /bar/bar -> 404 Not Found

This can probably be resolved similar to how Host() was fixed, with a quick fix to Mount.matches. I will update this post if I manage to fix it.

@aminalaee
Copy link
Member Author

aminalaee commented Sep 29, 2023

@nicolaipre I don't think this is related to the Host change here as you can try this without Host, if you get the latest release of Starlette and try this:

ROUTES = [
    Route("/", endpoint=home, name="Home"),
    Mount("/", name="Foo Mount", routes=[
            Route("/foo", foo, name="Foo")
    ]),
    Mount("/bar", name="Bar Mount", routes=[ # 404 Not Found
            Route("/bar", bar, name="Bar")
    ]),
]

You will see:

GET /        -> 200 OK
GET /foo     -> 200 OK
GET /bar/bar -> 404 Not Found

Update: This is because the first Mount is more general than the second specific Mount so the first one is tried: https://www.starlette.io/routing/#route-priority
We can also change this PR to just document that the first Host match will be picked up and avoid this change in the behavior.

@nicolaipre
Copy link

@aminalaee You are completely right, and thank you for reminding me of routing priority.

I copied the routes from your last reply and did some testing. It works fine when the more specific Mount() is placed at the top:

ROUTES = [
    Mount("/bar", name="Bar Mount", routes=[  # More specific
            Route("/bar", bar, name="Bar")
    ]),
    Mount("/", name="Foo Mount", routes=[     # Less specific
            Route("/foo", foo, name="Foo")
    ]),
]

In this case you get 200 OK for both /foo and /bar/bar.

Initially I thought this would be the opposite of the example in the documentation, or that it shouldn't really matter since these are only hard-coded routes without variables, so I am still a little confused as to why this happens.

Here is a response from a forum for a different framework where someone had a similar issue: You should always declare hard-coded routes first, because any wild-card (variable) routes would be executed if they're declared before hard-coded routes.

In the documentation there is an example of /users/me and /users/{username} where it makes sense that /users/me is more specific as it is a hard-coded route while the /users/{username} route is less specific since {username} could be anything, but in my example we have two static hard-coded routes...

When you use Mount("/", ), it seems like it acts like a wild-card prefix? I thought that all hard-coded (read: non variable) routes within a Mount() would map out to a hard-coded route prefixed by the Mount prefix? If we use my initial example, wouldn't that just resolve to the following?

  • Foo Mount ("/") + Foo Route ("/foo") = /foo
  • Bar Mount ("/bar") + Bar Route ("/bar") = /bar/bar

Sorry if I am misunderstanding here, but I am confused by why this happens. Wouldn't this just be the same as registering the following two routes when resolved?

Route("/foo", foo, name="Foo Route"),
Route("/bar/bar", bar, name="Bar Route")

What exactly is it that makes Mount("/" less specific than Mount("/bar" other than the length (which doesnt matter if you were to register i.e. /home and /profile)?

When building bigger applications with multiple Host() and Mount() objects you can quickly mess this up if you have a lot of routes.

I think it is good to keep the PR as the proposed fix actually solves the Host() problem raised in #2248, but maybe add some more examples to the Route Priority documentation that also includes Host()and Mount() objects.

@nicolaipre

This comment was marked as outdated.

@aminalaee
Copy link
Member Author

@nicolaipre Yes, please start a discussion for this, as it's not entirely related to this issue, we can link the issues for the reference.

@nicolaipre
Copy link

nicolaipre commented Oct 10, 2023

@nicolaipre Yes, please start a discussion for this, as it's not entirely related to this issue, we can link the issues for the reference.

I agree. Started a separate discussion in #2299. Discovered some interesting behavior when testing again. Seems like it is a bug in the routing or something...

Edit: proposed fix in PR: #2301

@abersheeran
Copy link
Member

This PR makes large sites vulnerable to CC attacks. Imagine a site with hundreds or even thousands of routes, performing hundreds or thousands of regular expression matches at the beginning of each request, which consumes a lot of CPU resources.

We have two other solutions:

  1. Merge identical hosts of same-level Host
  2. Explain in detail in the document why the same host cannot be registered twice, and mark the relevant issues as wontfix

@aminalaee
Copy link
Member Author

I think I've mentioned this before, but I agree with adding it as a limitation in the docs, the same way route priority is done https://www.starlette.io/routing/#route-priority
With the current design, I don't think the fix is going to be a simple one.

@Kludex
Copy link
Member

Kludex commented Dec 22, 2023

I think I've mentioned this before, but I agree with adding it as a limitation in the docs, the same way route priority is done starlette.io/routing/#route-priority With the current design, I don't think the fix is going to be a simple one.

Let's go with the documentation, then. 🙏

@Kludex Kludex closed this Dec 22, 2023
@aminalaee aminalaee deleted the fix-host-based-routing-to-check-path branch December 24, 2023 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Host() routes are not registering correctly with the same subdomain
4 participants