Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent query arg handling between python and C http request parsers with yarl 1.6.0 #4972

Closed
mwhudson opened this issue Sep 25, 2020 · 3 comments · Fixed by #4973
Closed

Comments

@mwhudson
Copy link

🐞 Describe the bug

If you upgrade to yarl 1.6.0 and are using the C HttpRequestParser class, your query arguments are not dequoted (by which I mean, request.query['foo'] is something like %7B%22val%22: 3%7D and not {"val": 3}). If you use the Python HttpRequestParser class, yarl 1.6.0 does not change the behaviour.

💡 To Reproduce

>>> from aiohttp import http_parser
>>> p = http_parser.HttpRequestParserC(1, None); p.feed_data(b'GET /test?b=%22val%22 HTTP/1.1\r\n\r\n')[0][0][0].url.query['b']
'%22val%22'
>>> p = http_parser.HttpRequestParserPy(1, None); p.feed_data(b'GET /test?b=%22val%22 HTTP/1.1\r\n\r\n')[0][0][0].url.query['b']
'"val"'

(with yarl 1.5.1 both return "val").

📋 Your version of the Python

3.6 and 3.8 at least -- don't think it matters

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: /usr/local/lib/python3.6/dist-packages
Requires: attrs, async-timeout, yarl, multidict, chardet, typing-extensions, idna-ssl
$ python -m pip show multidict
Name: multidict
Version: 4.7.6
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /usr/local/lib/python3.6/dist-packages
Requires: 
$ python -m pip show yarl
Name: yarl
Version: 1.6.0
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /usr/local/lib/python3.6/dist-packages
Requires: idna, multidict, typing-extensions

📋 Additional context

The reason for the inconsistency is that the Python parser calls URL(path) whereas the C parser calls URL.build(path=path) and the latter appears to have changed behaviour in 1.6.0.

@danking
Copy link
Contributor

danking commented Sep 28, 2020

This bit us as well.

Here's an example curl

(base) # curl 'localhost:5000/x?q=tag%3Dfoo'

Here's a simple server:

from web_common import setup_common_static_routes
from aiohttp import web

routes = web.RouteTableDef()


@routes.get('/x')
async def get_jobs(request):
    print(f'q {request.query.get("q")}')
    return web.Response()


app = web.Application()

setup_common_static_routes(routes)
app.add_routes(routes)

web.run_app(app,
            host='0.0.0.0',
            port=5000)

aiohttp 3.6.0, yarl 1.6.0 the log shows the %3D was not decoded. aiohttp 3.6.0, yarl 1.5.1 the log shows tag=foo.

danking pushed a commit to danking/hail that referenced this issue Sep 28, 2020
Related issues in upstream projects:

- YARL issue: aio-libs/yarl#509
- aiohttp issue: aio-libs/aiohttp#4972
danking pushed a commit to danking/hail that referenced this issue Sep 28, 2020
Related issues in upstream projects:

- YARL issue: aio-libs/yarl#509
- aiohttp issue: aio-libs/aiohttp#4972
danking added a commit to hail-is/hail that referenced this issue Sep 28, 2020
Related issues in upstream projects:

- YARL issue: aio-libs/yarl#509
- aiohttp issue: aio-libs/aiohttp#4972
@TBBle
Copy link
Contributor

TBBle commented Sep 30, 2020

I've been poking at this. It affects path parameters as well as query parameters.

The problem appears to be yarl 1.6.0 fixed a bug where yarl.URL.build expects unencoded segments, but was accidentally ignoring % and hence passing already-urlencoded segments through unchanged. Now it converts % into %25C, and later the urldecoding gives you the urlencoded segments from the raw HTTP request back, instead of the segments that the client had urlencoded before putting into the raw HTTP request.

This doesn't affect HttpRequestParserPy because it uses yarl.URL, which does support already-encoded URLs. Although we should be passing encoded=True in the call in HttpRequestParserPy.parse_message, so that it knows it's already-encoded, rather than detecting it. It only makes a difference if yarl's urlencoding formula wants to encode something aiohttp has excluded from encoding:

>>> yarl.URL("/test/%val%22",encoded=True)
URL('/test/%val%22')
>>> yarl.URL("/test/%val%22",encoded=False) # The default for encoded
URL('/test/%25val%22')

Edit: Ah, I didn't notice there was an open PR to fix this already.

@dcecile
Copy link

dcecile commented Oct 15, 2020

On October 12, aiohttp 3.6.3 was released which restricts yarl to <1.6.0, see bbceca7

netbsd-srcmastr referenced this issue in NetBSD/pkgsrc Oct 24, 2020
This fixes py-yarl in pkgsrc being too new for py-aiohttp.


3.7.0 (2020-10-24)
==================

Features
--------

- Response headers are now prepared prior to running ``on_response_prepare`` hooks, directly before headers are sent to the client.
  `#1958 <https://github.com/aio-libs/aiohttp/issues/1958>`_
- Add a ``quote_cookie`` option to ``CookieJar``, a way to skip quotation wrapping of cookies containing special characters.
  `#2571 <https://github.com/aio-libs/aiohttp/issues/2571>`_
- Call ``AccessLogger.log`` with the current exception available from ``sys.exc_info()``.
  `#3557 <https://github.com/aio-libs/aiohttp/issues/3557>`_
- `web.UrlDispatcher.add_routes` and `web.Application.add_routes` return a list
  of registered `AbstractRoute` instances. `AbstractRouteDef.register` (and all
  subclasses) return a list of registered resources registered resource.
  `#3866 <https://github.com/aio-libs/aiohttp/issues/3866>`_
- Added properties of default ClientSession params to ClientSession class so it is available for introspection
  `#3882 <https://github.com/aio-libs/aiohttp/issues/3882>`_
- Don't cancel web handler on peer disconnection, raise `OSError` on reading/writing instead.
  `#4080 <https://github.com/aio-libs/aiohttp/issues/4080>`_
- Implement BaseRequest.get_extra_info() to access a protocol transports' extra info.
  `#4189 <https://github.com/aio-libs/aiohttp/issues/4189>`_
- Added `ClientSession.timeout` property.
  `#4191 <https://github.com/aio-libs/aiohttp/issues/4191>`_
- allow use of SameSite in cookies.
  `#4224 <https://github.com/aio-libs/aiohttp/issues/4224>`_
- Use ``loop.sendfile()`` instead of custom implementation if available.
  `#4269 <https://github.com/aio-libs/aiohttp/issues/4269>`_
- Apply SO_REUSEADDR to test server's socket.
  `#4393 <https://github.com/aio-libs/aiohttp/issues/4393>`_
- Use .raw_host instead of slower .host in client API
  `#4402 <https://github.com/aio-libs/aiohttp/issues/4402>`_
- Allow configuring the buffer size of input stream by passing ``read_bufsize`` argument.
  `#4453 <https://github.com/aio-libs/aiohttp/issues/4453>`_
- Pass tests on Python 3.8 for Windows.
  `#4513 <https://github.com/aio-libs/aiohttp/issues/4513>`_
- Add `method` and `url` attributes to `TraceRequestChunkSentParams` and `TraceResponseChunkReceivedParams`.
  `#4674 <https://github.com/aio-libs/aiohttp/issues/4674>`_
- Add ClientResponse.ok property for checking status code under 400.
  `#4711 <https://github.com/aio-libs/aiohttp/issues/4711>`_
- Don't ceil timeouts that are smaller than 5 seconds.
  `#4850 <https://github.com/aio-libs/aiohttp/issues/4850>`_
- TCPSite now listens by default on all interfaces instead of just IPv4 when `None` is passed in as the host.
  `#4894 <https://github.com/aio-libs/aiohttp/issues/4894>`_
- Bump ``http_parser`` to 2.9.4
  `#5070 <https://github.com/aio-libs/aiohttp/issues/5070>`_


Bugfixes
--------

- Fix keepalive connections not being closed in time
  `#3296 <https://github.com/aio-libs/aiohttp/issues/3296>`_
- Fix failed websocket handshake leaving connection hanging.
  `#3380 <https://github.com/aio-libs/aiohttp/issues/3380>`_
- Fix tasks cancellation order on exit. The run_app task needs to be cancelled first for cleanup hooks to run with all tasks intact.
  `#3805 <https://github.com/aio-libs/aiohttp/issues/3805>`_
- Don't start heartbeat until _writer is set
  `#4062 <https://github.com/aio-libs/aiohttp/issues/4062>`_
- Fix handling of multipart file uploads without a content type.
  `#4089 <https://github.com/aio-libs/aiohttp/issues/4089>`_
- Preserve view handler function attributes across middlewares
  `#4174 <https://github.com/aio-libs/aiohttp/issues/4174>`_
- Fix the string representation of ``ServerDisconnectedError``.
  `#4175 <https://github.com/aio-libs/aiohttp/issues/4175>`_
- Raising RuntimeError when trying to get encoding from not read body
  `#4214 <https://github.com/aio-libs/aiohttp/issues/4214>`_
- Remove warning messages from noop.
  `#4282 <https://github.com/aio-libs/aiohttp/issues/4282>`_
- Raise ClientPayloadError if FormData re-processed.
  `#4345 <https://github.com/aio-libs/aiohttp/issues/4345>`_
- Fix a warning about unfinished task in ``web_protocol.py``
  `#4408 <https://github.com/aio-libs/aiohttp/issues/4408>`_
- Fixed 'deflate' compression. According to RFC 2616 now.
  `#4506 <https://github.com/aio-libs/aiohttp/issues/4506>`_
- Fixed OverflowError on platforms with 32-bit time_t
  `#4515 <https://github.com/aio-libs/aiohttp/issues/4515>`_
- Fixed request.body_exists returns wrong value for methods without body.
  `#4528 <https://github.com/aio-libs/aiohttp/issues/4528>`_
- Fix connecting to link-local IPv6 addresses.
  `#4554 <https://github.com/aio-libs/aiohttp/issues/4554>`_
- Fix a problem with connection waiters that are never awaited.
  `#4562 <https://github.com/aio-libs/aiohttp/issues/4562>`_
- Always make sure transport is not closing before reuse a connection.

  Reuse a protocol based on keepalive in headers is unreliable.
  For example, uWSGI will not support keepalive even it serves a
  HTTP 1.1 request, except explicitly configure uWSGI with a
  ``--http-keepalive`` option.

  Servers designed like uWSGI could cause aiohttp intermittently
  raise a ConnectionResetException when the protocol poll runs
  out and some protocol is reused.
  `#4587 <https://github.com/aio-libs/aiohttp/issues/4587>`_
- Handle the last CRLF correctly even if it is received via separate TCP segment.
  `#4630 <https://github.com/aio-libs/aiohttp/issues/4630>`_
- Fix the register_resource function to validate route name before splitting it so that route name can include python keywords.
  `#4691 <https://github.com/aio-libs/aiohttp/issues/4691>`_
- Improve typing annotations for ``web.Request``, ``aiohttp.ClientResponse`` and
  ``multipart`` module.
  `#4736 <https://github.com/aio-libs/aiohttp/issues/4736>`_
- Fix resolver task is not awaited when connector is cancelled
  `#4795 <https://github.com/aio-libs/aiohttp/issues/4795>`_
- Fix a bug "Aiohttp doesn't return any error on invalid request methods"
  `#4798 <https://github.com/aio-libs/aiohttp/issues/4798>`_
- Fix HEAD requests for static content.
  `#4809 <https://github.com/aio-libs/aiohttp/issues/4809>`_
- Fix incorrect size calculation for memoryview
  `#4890 <https://github.com/aio-libs/aiohttp/issues/4890>`_
- Add HTTPMove to _all__.
  `#4897 <https://github.com/aio-libs/aiohttp/issues/4897>`_
- Fixed the type annotations in the ``tracing`` module.
  `#4912 <https://github.com/aio-libs/aiohttp/issues/4912>`_
- Fix typing for multipart ``__aiter__``.
  `#4931 <https://github.com/aio-libs/aiohttp/issues/4931>`_
- Fix for race condition on connections in BaseConnector that leads to exceeding the connection limit.
  `#4936 <https://github.com/aio-libs/aiohttp/issues/4936>`_
- Add forced UTF-8 encoding for ``application/rdap+json`` responses.
  `#4938 <https://github.com/aio-libs/aiohttp/issues/4938>`_
- Fix inconsistency between Python and C http request parsers in parsing pct-encoded URL.
  `#4972 <https://github.com/aio-libs/aiohttp/issues/4972>`_
- Fix connection closing issue in HEAD request.
  `#5012 <https://github.com/aio-libs/aiohttp/issues/5012>`_
- Fix type hint on BaseRunner.addresses (from ``List[str]`` to ``List[Any]``)
  `#5086 <https://github.com/aio-libs/aiohttp/issues/5086>`_
- Make `web.run_app()` more responsive to Ctrl+C on Windows for Python < 3.8. It slightly
  increases CPU load as a side effect.
  `#5098 <https://github.com/aio-libs/aiohttp/issues/5098>`_


Improved Documentation
----------------------

- Fix example code in client quick-start
  `#3376 <https://github.com/aio-libs/aiohttp/issues/3376>`_
- Updated the docs so there is no contradiction in ``ttl_dns_cache`` default value
  `#3512 <https://github.com/aio-libs/aiohttp/issues/3512>`_
- Add 'Deploy with SSL' to docs.
  `#4201 <https://github.com/aio-libs/aiohttp/issues/4201>`_
- Change typing of the secure argument on StreamResponse.set_cookie from ``Optional[str]`` to ``Optional[bool]``
  `#4204 <https://github.com/aio-libs/aiohttp/issues/4204>`_
- Changes ``ttl_dns_cache`` type from int to Optional[int].
  `#4270 <https://github.com/aio-libs/aiohttp/issues/4270>`_
- Simplify README hello word example and add a documentation page for people coming from requests.
  `#4272 <https://github.com/aio-libs/aiohttp/issues/4272>`_
- Improve some code examples in the documentation involving websockets and starting a simple HTTP site with an AppRunner.
  `#4285 <https://github.com/aio-libs/aiohttp/issues/4285>`_
- Fix typo in code example in Multipart docs
  `#4312 <https://github.com/aio-libs/aiohttp/issues/4312>`_
- Fix code example in Multipart section.
  `#4314 <https://github.com/aio-libs/aiohttp/issues/4314>`_
- Update contributing guide so new contributors read the most recent version of that guide. Update command used to create test coverage reporting.
  `#4810 <https://github.com/aio-libs/aiohttp/issues/4810>`_
- Spelling: Change "canonize" to "canonicalize".
  `#4986 <https://github.com/aio-libs/aiohttp/issues/4986>`_
- Add ``aiohttp-sse-client`` library to third party usage list.
  `#5084 <https://github.com/aio-libs/aiohttp/issues/5084>`_


Misc
----

- `#2856 <https://github.com/aio-libs/aiohttp/issues/2856>`_, `#4218 <https://github.com/aio-libs/aiohttp/issues/4218>`_, `#4250 <https://github.com/aio-libs/aiohttp/issues/4250>`_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants