Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index resources in the UrlDispatcher to avoid linear search for most cases #7829

Merged
merged 64 commits into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
f00fa23
Index resources in the UrlDispatcher to avoid linear search on hit
bdraco Nov 12, 2023
e76297d
tweak
bdraco Nov 12, 2023
6fe2952
Update aiohttp/web_urldispatcher.py
bdraco Nov 12, 2023
ffbad86
timeline
bdraco Nov 12, 2023
ee3fc18
match templates with /{x}
bdraco Nov 13, 2023
9b07111
add support for subapps
bdraco Nov 13, 2023
4863246
dry
bdraco Nov 13, 2023
7cbd606
Merge branch 'master' into fast_url_dispatcher
bdraco Nov 13, 2023
16747a1
Merge branch 'master' into fast_url_dispatcher
bdraco Nov 13, 2023
3d0b0f9
only search the matched subapp resources linearly
bdraco Nov 14, 2023
68bcb01
only search the matched subapp resources linearly
bdraco Nov 14, 2023
0d9fc5f
Merge branch 'master' into fast_url_dispatcher
bdraco Nov 14, 2023
b6b482a
improve comment
bdraco Nov 14, 2023
85671bd
improve comment
bdraco Nov 14, 2023
4dade9a
improve comment
bdraco Nov 14, 2023
02a9658
improve comment
bdraco Nov 14, 2023
a380b8d
fix resources registered at /
bdraco Nov 14, 2023
6691003
fixes, tests
bdraco Nov 14, 2023
0fa6cf6
fixes, tests
bdraco Nov 14, 2023
7d4f108
mypy
bdraco Nov 14, 2023
764055c
mypy
bdraco Nov 14, 2023
b1c06ea
mypy
bdraco Nov 14, 2023
816aec9
fix: tests
bdraco Nov 14, 2023
a2ea12b
coverage
bdraco Nov 14, 2023
0ee50bd
adjust test
bdraco Nov 14, 2023
edf2f98
adjust docs, add missing resource types
bdraco Nov 14, 2023
d643685
docs
bdraco Nov 14, 2023
9225f9f
Update CHANGES/7829.misc
bdraco Nov 14, 2023
f838b44
tweak
bdraco Nov 14, 2023
ed34628
Merge remote-tracking branch 'bdraco/fast_url_dispatcher' into fast_u…
bdraco Nov 14, 2023
ea0e619
tweak
bdraco Nov 14, 2023
b4dcc32
tweak
bdraco Nov 14, 2023
0fc6be4
tweak
bdraco Nov 14, 2023
164ed78
tweak
bdraco Nov 14, 2023
c35a789
tweak
bdraco Nov 14, 2023
0e94b98
fix link
bdraco Nov 14, 2023
adba241
Update docs/web_quickstart.rst
bdraco Nov 15, 2023
c4d732c
copy to both places so they do not miss it
bdraco Nov 16, 2023
75bc97b
copy to both places so they do not miss it
bdraco Nov 16, 2023
003ea83
Update aiohttp/web_urldispatcher.py
bdraco Nov 26, 2023
ac41ef9
Update docs/web_reference.rst
bdraco Nov 26, 2023
7baae39
Update aiohttp/web_urldispatcher.py
bdraco Nov 26, 2023
d4eebd1
Update docs/web_reference.rst
bdraco Nov 26, 2023
a609032
Update aiohttp/web_urldispatcher.py
bdraco Nov 26, 2023
47dd886
was used below
bdraco Nov 26, 2023
2bac458
drop dupe
bdraco Nov 26, 2023
d126a82
remove release
bdraco Nov 26, 2023
5810f02
optimize dispatcher
bdraco Nov 26, 2023
1f3aff8
preen
bdraco Nov 26, 2023
01e383b
use raw_path
bdraco Nov 26, 2023
0cda69f
preen
bdraco Nov 26, 2023
c83d3a1
Update 7829.misc
Dreamsorcerer Nov 26, 2023
02381ba
remove unreachable code
bdraco Nov 27, 2023
0ff388e
Merge branch 'master' into fast_url_dispatcher
bdraco Nov 27, 2023
42e3f47
speed up 404 case
bdraco Nov 27, 2023
13cc627
Merge remote-tracking branch 'bdraco/fast_url_dispatcher' into fast_u…
bdraco Nov 27, 2023
01c9353
bind to object
bdraco Nov 27, 2023
e23796e
no point in freezing system routes
bdraco Nov 27, 2023
281edbf
freeze
bdraco Nov 27, 2023
5883444
Revert "freeze"
bdraco Nov 27, 2023
8cdeac1
Revert "no point in freezing system routes"
bdraco Nov 27, 2023
4b8045e
Revert "bind to object"
bdraco Nov 27, 2023
8ed3bdc
Revert "speed up 404 case"
bdraco Nov 27, 2023
162a9ea
reduce optimization
bdraco Nov 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGES/7829.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Improved URL handler resolution time by indexing resources in the UrlDispatcher.
For applications with a large number of handlers, this should increase performance significantly.
-- by :user:`bdraco`
80 changes: 67 additions & 13 deletions aiohttp/web_urldispatcher.py
Dreamsorcerer marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -716,13 +716,20 @@ class PrefixedSubAppResource(PrefixResource):
def __init__(self, prefix: str, app: "Application") -> None:
super().__init__(prefix)
self._app = app
for resource in app.router.resources():
resource.add_prefix(prefix)
self._add_prefix_to_resources(prefix)

def add_prefix(self, prefix: str) -> None:
super().add_prefix(prefix)
for resource in self._app.router.resources():
self._add_prefix_to_resources(prefix)

def _add_prefix_to_resources(self, prefix: str) -> None:
router = self._app.router
for resource in router.resources():
# Since the canonical path of a resource is about
# to change, we need to unindex it and then reindex
router.unindex_resource(resource)
resource.add_prefix(prefix)
router.index_resource(resource)

def url_for(self, *args: str, **kwargs: str) -> URL:
raise RuntimeError(".url_for() is not supported " "by sub-application root")
Expand All @@ -731,11 +738,6 @@ def get_info(self) -> _InfoDict:
return {"app": self._app, "prefix": self._prefix}

async def resolve(self, request: Request) -> _Resolve:
if (
not request.url.raw_path.startswith(self._prefix2)
and request.url.raw_path != self._prefix
):
return None, set()
match_info = await self._app.router.resolve(request)
match_info.add_app(self._app)
if isinstance(match_info.http_exception, HTTPMethodNotAllowed):
Expand Down Expand Up @@ -974,27 +976,55 @@ def __contains__(self, route: object) -> bool:

class UrlDispatcher(AbstractRouter, Mapping[str, AbstractResource]):
NAME_SPLIT_RE = re.compile(r"[.:-]")
HTTP_NOT_FOUND = HTTPNotFound()

def __init__(self) -> None:
super().__init__()
self._resources: List[AbstractResource] = []
self._named_resources: Dict[str, AbstractResource] = {}
self._resource_index: dict[str, list[AbstractResource]] = {}
self._matched_sub_app_resources: List[MatchedSubAppResource] = []

async def resolve(self, request: Request) -> UrlMappingMatchInfo:
method = request.method
resource_index = self._resource_index
allowed_methods: Set[str] = set()

for resource in self._resources:
# Walk the url parts looking for candidates. We walk the url backwards
# to ensure the most explicit match is found first. If there are multiple
# candidates for a given url part because there are multiple resources
# registered for the same canonical path, we resolve them in a linear
# fashion to ensure registration order is respected.
url_part = request.rel_url.raw_path
while url_part:
for candidate in resource_index.get(url_part, ()):
match_dict, allowed = await candidate.resolve(request)
if match_dict is not None:
return match_dict
else:
allowed_methods |= allowed
if url_part == "/":
break
url_part = url_part.rpartition("/")[0] or "/"

#
# We didn't find any candidates, so we'll try the matched sub-app
# resources which we have to walk in a linear fashion because they
# have regex/wildcard match rules and we cannot index them.
#
# For most cases we do not expect there to be many of these since
# currently they are only added by `add_domain`
#
for resource in self._matched_sub_app_resources:
match_dict, allowed = await resource.resolve(request)
if match_dict is not None:
return match_dict
else:
allowed_methods |= allowed

if allowed_methods:
return MatchInfoError(HTTPMethodNotAllowed(method, allowed_methods))
else:
return MatchInfoError(HTTPNotFound())
return MatchInfoError(HTTPMethodNotAllowed(request.method, allowed_methods))

return MatchInfoError(self.HTTP_NOT_FOUND)

def __iter__(self) -> Iterator[str]:
return iter(self._named_resources)
Expand Down Expand Up @@ -1050,6 +1080,30 @@ def register_resource(self, resource: AbstractResource) -> None:
self._named_resources[name] = resource
self._resources.append(resource)

if isinstance(resource, MatchedSubAppResource):
# We cannot index match sub-app resources because they have match rules
self._matched_sub_app_resources.append(resource)
else:
self.index_resource(resource)

def _get_resource_index_key(self, resource: AbstractResource) -> str:
"""Return a key to index the resource in the resource index."""
# strip at the first { to allow for variables
return resource.canonical.partition("{")[0].rstrip("/") or "/"

def index_resource(self, resource: AbstractResource) -> None:
"""Add a resource to the resource index."""
resource_key = self._get_resource_index_key(resource)
# There may be multiple resources for a canonical path
# so we keep them in a list to ensure that registration
# order is respected.
self._resource_index.setdefault(resource_key, []).append(resource)

def unindex_resource(self, resource: AbstractResource) -> None:
"""Remove a resource from the resource index."""
resource_key = self._get_resource_index_key(resource)
self._resource_index[resource_key].remove(resource)

def add_resource(self, path: str, *, name: Optional[str] = None) -> Resource:
if path and not path.startswith("/"):
raise ValueError("path should be started with / or be empty")
Expand Down
39 changes: 30 additions & 9 deletions docs/web_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1865,20 +1865,38 @@ unique *name* and at least one :term:`route`.

:term:`web-handler` lookup is performed in the following way:

1. Router iterates over *resources* one-by-one.
2. If *resource* matches to requested URL the resource iterates over
own *routes*.
3. If route matches to requested HTTP method (or ``'*'`` wildcard) the
route's handler is used as found :term:`web-handler`. The lookup is
finished.
4. Otherwise router tries next resource from the *routing table*.
5. If the end of *routing table* is reached and no *resource* /
*route* pair found the *router* returns special :class:`~aiohttp.abc.AbstractMatchInfo`
1. The router splits the URL and checks the index from longest to shortest.
For example, '/one/two/three' will first check the index for
'/one/two/three', then '/one/two' and finally '/'.
2. If the URL part is found in the index, the list of routes for
that URL part is iterated over. If a route matches to requested HTTP
method (or ``'*'`` wildcard) the route's handler is used as the chosen
:term:`web-handler`. The lookup is finished.
3. If the route is not found in the index, the router tries to find
the route in the list of :class:`~aiohttp.web.MatchedSubAppResource`,
(current only created from :meth:`~aiohttp.web.Application.add_domain`),
and will iterate over the list of
:class:`~aiohttp.web.MatchedSubAppResource` in a linear fashion
until a match is found.
4. If no *resource* / *route* pair was found, the *router*
returns the special :class:`~aiohttp.abc.AbstractMatchInfo`
instance with :attr:`aiohttp.abc.AbstractMatchInfo.http_exception` is not ``None``
but :exc:`HTTPException` with either *HTTP 404 Not Found* or
*HTTP 405 Method Not Allowed* status code.
Registered :meth:`~aiohttp.abc.AbstractMatchInfo.handler` raises this exception on call.

Fixed paths are preferred over variable paths. For example,
if you have two routes ``/a/b`` and ``/a/{name}``, then the first
route will always be preferred over the second one.

If there are multiple dynamic paths with the same fixed prefix,
they will be resolved in order of registration.

For example, if you have two dynamic routes that are prefixed
with the fixed ``/users`` path such as ``/users/{x}/{y}/z`` and
``/users/{x}/y/z``, the first one will be preferred over the
second one.

User should never instantiate resource classes but give it by
:meth:`UrlDispatcher.add_resource` call.

Expand All @@ -1900,7 +1918,10 @@ Resource classes hierarchy::
Resource
PlainResource
DynamicResource
PrefixResource
StaticResource
PrefixedSubAppResource
MatchedSubAppResource


.. class:: AbstractResource
Expand Down
7 changes: 7 additions & 0 deletions tests/test_urldispatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -1264,10 +1264,17 @@ async def test_prefixed_subapp_overlap(app: Any) -> None:
subapp2.router.add_get("/b", handler2)
app.add_subapp("/ss", subapp2)

subapp3 = web.Application()
handler3 = make_handler()
subapp3.router.add_get("/c", handler3)
app.add_subapp("/s/s", subapp3)

match_info = await app.router.resolve(make_mocked_request("GET", "/s/a"))
assert match_info.route.handler is handler1
match_info = await app.router.resolve(make_mocked_request("GET", "/ss/b"))
assert match_info.route.handler is handler2
match_info = await app.router.resolve(make_mocked_request("GET", "/s/s/c"))
assert match_info.route.handler is handler3


async def test_prefixed_subapp_empty_route(app: Any) -> None:
Expand Down
86 changes: 84 additions & 2 deletions tests/test_web_urldispatcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

from aiohttp import web
from aiohttp.pytest_plugin import AiohttpClient
from aiohttp.web_urldispatcher import SystemRoute
from aiohttp.web_urldispatcher import Resource, SystemRoute


@pytest.mark.parametrize(
Expand Down Expand Up @@ -142,7 +142,6 @@ async def test_access_to_the_file_with_spaces(
r = await client.get(url)
assert r.status == 200
assert (await r.text()) == data
await r.release()


async def test_access_non_existing_resource(
Expand Down Expand Up @@ -544,3 +543,86 @@ async def handler(request: web.Request) -> web.Response:
r = await client.get(yarl.URL(urlencoded_path, encoded=True))
assert r.status == expected_http_resp_status
await r.release()


async def test_order_is_preserved(aiohttp_client: AiohttpClient) -> None:
"""Test route order is preserved.

Note that fixed/static paths are always preferred over a regex path.
"""
app = web.Application()

async def handler(request: web.Request) -> web.Response:
assert isinstance(request.match_info._route.resource, Resource)
return web.Response(text=request.match_info._route.resource.canonical)

app.router.add_get("/first/x/{b}/", handler)
app.router.add_get(r"/first/{x:.*/b}", handler)

app.router.add_get(r"/second/{user}/info", handler)
app.router.add_get("/second/bob/info", handler)

app.router.add_get("/third/bob/info", handler)
app.router.add_get(r"/third/{user}/info", handler)

app.router.add_get(r"/forth/{name:\d+}", handler)
app.router.add_get("/forth/42", handler)

app.router.add_get("/fifth/42", handler)
app.router.add_get(r"/fifth/{name:\d+}", handler)

client = await aiohttp_client(app)

r = await client.get("/first/x/b/")
assert r.status == 200
assert await r.text() == "/first/x/{b}/"

r = await client.get("/second/frank/info")
assert r.status == 200
assert await r.text() == "/second/{user}/info"

# Fixed/static paths are always preferred over regex paths
r = await client.get("/second/bob/info")
assert r.status == 200
assert await r.text() == "/second/bob/info"

r = await client.get("/third/bob/info")
assert r.status == 200
assert await r.text() == "/third/bob/info"

r = await client.get("/third/frank/info")
assert r.status == 200
assert await r.text() == "/third/{user}/info"

r = await client.get("/forth/21")
assert r.status == 200
assert await r.text() == "/forth/{name}"

# Fixed/static paths are always preferred over regex paths
r = await client.get("/forth/42")
assert r.status == 200
assert await r.text() == "/forth/42"

r = await client.get("/fifth/21")
assert r.status == 200
assert await r.text() == "/fifth/{name}"

r = await client.get("/fifth/42")
assert r.status == 200
assert await r.text() == "/fifth/42"


async def test_url_with_many_slashes(aiohttp_client: AiohttpClient) -> None:
app = web.Application()

class MyView(web.View):
async def get(self) -> web.Response:
return web.Response()

app.router.add_routes([web.view("/a", MyView)])

client = await aiohttp_client(app)

r = await client.get("///a")
assert r.status == 200
await r.release()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, these aren't necessary. The connection is already released once .text() returns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dreamsorcerer I've been sticking these around all over, at some point. The reason is that some things were leaking into the following tests, whenever responses or sessions weren't closed. Sometimes this led to flaky resource warnings that were hard to spot because they failed a different test than the one causing the leak.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but we know they are closed if there is a .read() call completed. It's the other ones that are actually causing problems. Like, for example, this test failed because in some of the test cases it didn't call .read() and could result in the connection not being closed:
https://github.com/aio-libs/aiohttp/pull/7896/files#diff-e708432f3ba422740bfc04230f149eac4ff3787dde29204b6ee22b622a0e8915R77-R79

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading