Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpPayloadParser fails with "Not enough data for satisfy transfer length header" on chunked transfer encoding if the data is split exactly where trailers could occur #4630

Closed
JustAnotherArchivist opened this issue Mar 16, 2020 · 7 comments
Labels
bug need pull request reproducer: present This PR or issue contains code, which reproduce the problem described or clearly understandable STR

Comments

@JustAnotherArchivist
Copy link
Contributor

🐞 Describe the bug
If the response data is fed into HttpPayloadParser.feed_data in a particular way, the parser is unable to successfully parse chunked data. Specifically, this happens when one call to the function contains the last 0\r\n chunk but the following \r\n is supplied in a separate call.

💡 To Reproduce

import aiohttp.http_parser
import io


class Payload: # A minimal payload implementation
        def __init__(self):
                self.data = io.BytesIO()
                self.exc = None

        def feed_data(self, data, size):
                self.data.write(data)

        def feed_eof(self):
                pass

        def set_exception(self, exc):
                self.exc = exc

        def begin_http_chunk_receiving(self):
                pass

        def end_http_chunk_receiving(self):
                pass


payload = Payload()
parser = aiohttp.http_parser.HttpPayloadParser(payload, length = None, chunked = True, compression = None, code = 200, method = 'GET')
print(repr(parser.feed_data(b'4\r\nasdf\r\n0\r\n')))
eof, data = parser.feed_data(b'\r\n')
print(repr((eof, data)))
if not eof:
        print(repr(parser.feed_eof()))
print(repr(payload.data.getvalue()))

I added print statements here to debug what exactly aiohttp is returning compared to the simple feed_data(b'4\r\nasdf\r\n0\r\n\r\n') call (which works fine).

💡 Expected behavior
The parser is able to process the data, and the last line produces b'asdf'.

📋 Logs/tracebacks

(False, b'')
(False, b'')
Traceback (most recent call last):
  File "aiohttp-test.py", line 32, in <module>
    print(repr(parser.feed_eof()))
  File ".../lib/python3.6/site-packages/aiohttp/http_parser.py", line 575, in feed_eof
    "Not enough data for satisfy transfer length header.")
aiohttp.http_exceptions.TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'

📋 Your version of the Python

$ python --version
Python 3.6.10

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp | grep Version
Version: 3.6.2
$ python -m pip show multidict | grep Version
Version: 4.7.5
$ python -m pip show yarl | grep Version
Version: 1.4.2

📋 Additional context
Discovered with aiohttp 2.3.10 due to errors in qwarc, which definitely uses aiohttp in weird, undocumented, and unsupported ways. But I believe the error could happen also in normal aiohttp usage if the data returned from the server is just the right size, namely two bytes over a multiple of the internal buffer size.

@JustAnotherArchivist
Copy link
Contributor Author

This happens on the split b'0\r\n\r' + b'\n' as well.

Also note that while feed_data may return remaining data in the second argument, it actually only does this when reaching EOF (i.e. when the data stream contains multiple responses), so the fact that I am not handling that properly is not relevant for this bug.

JustAnotherArchivist added a commit to JustAnotherArchivist/qwarc that referenced this issue Mar 16, 2020
@webknjaz
Copy link
Member

Could you please submit a PR with a test for that? Also, are you relying on the cext version of http_parser or pure-python?

@JustAnotherArchivist
Copy link
Contributor Author

I'd love to, but it wasn't immediately obvious to me how this needs to be fixed since the parser is not straightforward to understand due to the state transitions and is entirely undocumented, and I don't currently have time to dig into this deeper.

The HttpPayloadParser is Python-only and does not exist in the extension. I haven't tested what the extension HttpResponseParser would do with such a payload.

@socketpair
Copy link
Contributor

@JustAnotherArchivist Could you provide PR with the test only, without fixing the bug ?

@JustAnotherArchivist
Copy link
Contributor Author

@socketpair Done: #4651

While writing this, I realised that the error does not occur if there is a trailer field and the split happens before the closing CRLF, but it does happen if the split is between that CR and LF. I also added test cases for both of these.

webknjaz added a commit that referenced this issue Mar 25, 2020
PR #4651 by @JustAnotherArchivist

This change adds tests that demonstrate the failures described in #4630.
They are marked as xfail so that they don't affect the CI status.
Once the issue is fixed, they'll reported as XPASS and pytest will fail,
which would be a signal that it's time to remove the xfail markers
keeping the contents of the tests to prevent regressions.

(ref: https://pganssle-talks.github.io/xfail-lightning)


Co-Authored-By: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
@webknjaz
Copy link
Member

FTR the PR with tests is in. Once the defect is fixed, pytest will report them as XPASS.

@webknjaz webknjaz added need pull request reproducer: present This PR or issue contains code, which reproduce the problem described or clearly understandable STR labels Mar 25, 2020
rhdxmr added a commit to rhdxmr/aiohttp that referenced this issue Jun 7, 2020
HttpPayloadParser waits for trailers indefinitely even if there are no trailers
at the response. This happens when only the last CRLF or the last LF are sent
via separate TCP segment.

When the connection is keep alive and if this bug occurs then users experience
response timeout. But this problem is not exposed when keep alive is disabled
because .feed_eof is called.
rhdxmr added a commit to rhdxmr/aiohttp that referenced this issue Jun 8, 2020
If the last CRLF or only the LF are received via separate TCP segment,
HTTPPayloadParser misjudges that trailers should come after 0\r\n in the
chunked response body.

In this case, HttpPayloadParser starts waiting for trailers, but the only
remaining data to be received is CRLF. Thus, HttpPayloadParser waits trailers
indefinitely and this incurs TimeoutError in user code.

However, if the connection is keep alive disabled, this problem is not
reproduced because the server shutdown the connection explicitly after sending
all data. If the connection is closed .feed_eof is called and it helps
HttpPayloadParser finish its waiting.
rhdxmr added a commit to rhdxmr/aiohttp that referenced this issue Jun 8, 2020
rhdxmr added a commit to rhdxmr/aiohttp that referenced this issue Jun 10, 2020
If the last CRLF or only the LF are received via separate TCP segment,
HTTPPayloadParser misjudges that trailers should come after 0\r\n in the
chunked response body.

In this case, HttpPayloadParser starts waiting for trailers, but the only
remaining data to be received is CRLF. Thus, HttpPayloadParser waits trailers
indefinitely and this incurs TimeoutError in user code.

However, if the connection is keep alive disabled, this problem is not
reproduced because the server shutdown the connection explicitly after sending
all data. If the connection is closed .feed_eof is called and it helps
HttpPayloadParser finish its waiting.
rhdxmr added a commit to rhdxmr/aiohttp that referenced this issue Jun 10, 2020
rhdxmr added a commit to rhdxmr/aiohttp that referenced this issue Jun 10, 2020
rhdxmr added a commit to rhdxmr/aiohttp that referenced this issue Jul 1, 2020
If the last CRLF or only the LF are received via separate TCP segment,
HTTPPayloadParser misjudges that trailers should come after 0\r\n in the
chunked response body.

In this case, HttpPayloadParser starts waiting for trailers, but the only
remaining data to be received is CRLF. Thus, HttpPayloadParser waits trailers
indefinitely and this incurs TimeoutError in user code.

However, if the connection is keep alive disabled, this problem is not
reproduced because the server shutdown the connection explicitly after sending
all data. If the connection is closed .feed_eof is called and it helps
HttpPayloadParser finish its waiting.
rhdxmr pushed a commit to rhdxmr/aiohttp that referenced this issue Jul 1, 2020
PR aio-libs#4651 by @JustAnotherArchivist

This change adds tests that demonstrate the failures described in aio-libs#4630.
They are marked as xfail so that they don't affect the CI status.
Once the issue is fixed, they'll reported as XPASS and pytest will fail,
which would be a signal that it's time to remove the xfail markers
keeping the contents of the tests to prevent regressions.

(ref: https://pganssle-talks.github.io/xfail-lightning)


Co-Authored-By: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
rhdxmr added a commit to rhdxmr/aiohttp that referenced this issue Jul 1, 2020
rhdxmr added a commit to rhdxmr/aiohttp that referenced this issue Jul 1, 2020
asvetlov added a commit that referenced this issue Oct 16, 2020
* Parse the last CRLF of chunked response correctly (#4630)

If the last CRLF or only the LF are received via separate TCP segment,
HTTPPayloadParser misjudges that trailers should come after 0\r\n in the
chunked response body.

In this case, HttpPayloadParser starts waiting for trailers, but the only
remaining data to be received is CRLF. Thus, HttpPayloadParser waits trailers
indefinitely and this incurs TimeoutError in user code.

However, if the connection is keep alive disabled, this problem is not
reproduced because the server shutdown the connection explicitly after sending
all data. If the connection is closed .feed_eof is called and it helps
HttpPayloadParser finish its waiting.

Co-authored-by: JustAnotherArchivist <JustAnotherArchivist@users.noreply.github.com>
Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: Andrew Svetlov <andrew.svetlov@gmail.com>
@asvetlov
Copy link
Member

Fixed by #4846
Thanks, @rhdxmr !

asvetlov added a commit that referenced this issue Oct 16, 2020
* Parse the last CRLF of chunked response correctly (#4630)

If the last CRLF or only the LF are received via separate TCP segment,
HTTPPayloadParser misjudges that trailers should come after 0\r\n in the
chunked response body.

In this case, HttpPayloadParser starts waiting for trailers, but the only
remaining data to be received is CRLF. Thus, HttpPayloadParser waits trailers
indefinitely and this incurs TimeoutError in user code.

However, if the connection is keep alive disabled, this problem is not
reproduced because the server shutdown the connection explicitly after sending
all data. If the connection is closed .feed_eof is called and it helps
HttpPayloadParser finish its waiting.

Co-authored-by: JustAnotherArchivist <JustAnotherArchivist@users.noreply.github.com>
Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: Andrew Svetlov <andrew.svetlov@gmail.com>
asvetlov pushed a commit that referenced this issue Oct 16, 2020
PR #4651 by @JustAnotherArchivist

This change adds tests that demonstrate the failures described in #4630.
They are marked as xfail so that they don't affect the CI status.
Once the issue is fixed, they'll reported as XPASS and pytest will fail,
which would be a signal that it's time to remove the xfail markers
keeping the contents of the tests to prevent regressions.

(ref: https://pganssle-talks.github.io/xfail-lightning)


Co-Authored-By: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
asvetlov added a commit that referenced this issue Oct 16, 2020
* Parse the last CRLF of chunked response correctly (#4630)

If the last CRLF or only the LF are received via separate TCP segment,
HTTPPayloadParser misjudges that trailers should come after 0\r\n in the
chunked response body.

In this case, HttpPayloadParser starts waiting for trailers, but the only
remaining data to be received is CRLF. Thus, HttpPayloadParser waits trailers
indefinitely and this incurs TimeoutError in user code.

However, if the connection is keep alive disabled, this problem is not
reproduced because the server shutdown the connection explicitly after sending
all data. If the connection is closed .feed_eof is called and it helps
HttpPayloadParser finish its waiting.

Co-authored-by: JustAnotherArchivist <JustAnotherArchivist@users.noreply.github.com>
Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: Andrew Svetlov <andrew.svetlov@gmail.com>
netbsd-srcmastr referenced this issue in NetBSD/pkgsrc Oct 24, 2020
This fixes py-yarl in pkgsrc being too new for py-aiohttp.


3.7.0 (2020-10-24)
==================

Features
--------

- Response headers are now prepared prior to running ``on_response_prepare`` hooks, directly before headers are sent to the client.
  `#1958 <https://github.com/aio-libs/aiohttp/issues/1958>`_
- Add a ``quote_cookie`` option to ``CookieJar``, a way to skip quotation wrapping of cookies containing special characters.
  `#2571 <https://github.com/aio-libs/aiohttp/issues/2571>`_
- Call ``AccessLogger.log`` with the current exception available from ``sys.exc_info()``.
  `#3557 <https://github.com/aio-libs/aiohttp/issues/3557>`_
- `web.UrlDispatcher.add_routes` and `web.Application.add_routes` return a list
  of registered `AbstractRoute` instances. `AbstractRouteDef.register` (and all
  subclasses) return a list of registered resources registered resource.
  `#3866 <https://github.com/aio-libs/aiohttp/issues/3866>`_
- Added properties of default ClientSession params to ClientSession class so it is available for introspection
  `#3882 <https://github.com/aio-libs/aiohttp/issues/3882>`_
- Don't cancel web handler on peer disconnection, raise `OSError` on reading/writing instead.
  `#4080 <https://github.com/aio-libs/aiohttp/issues/4080>`_
- Implement BaseRequest.get_extra_info() to access a protocol transports' extra info.
  `#4189 <https://github.com/aio-libs/aiohttp/issues/4189>`_
- Added `ClientSession.timeout` property.
  `#4191 <https://github.com/aio-libs/aiohttp/issues/4191>`_
- allow use of SameSite in cookies.
  `#4224 <https://github.com/aio-libs/aiohttp/issues/4224>`_
- Use ``loop.sendfile()`` instead of custom implementation if available.
  `#4269 <https://github.com/aio-libs/aiohttp/issues/4269>`_
- Apply SO_REUSEADDR to test server's socket.
  `#4393 <https://github.com/aio-libs/aiohttp/issues/4393>`_
- Use .raw_host instead of slower .host in client API
  `#4402 <https://github.com/aio-libs/aiohttp/issues/4402>`_
- Allow configuring the buffer size of input stream by passing ``read_bufsize`` argument.
  `#4453 <https://github.com/aio-libs/aiohttp/issues/4453>`_
- Pass tests on Python 3.8 for Windows.
  `#4513 <https://github.com/aio-libs/aiohttp/issues/4513>`_
- Add `method` and `url` attributes to `TraceRequestChunkSentParams` and `TraceResponseChunkReceivedParams`.
  `#4674 <https://github.com/aio-libs/aiohttp/issues/4674>`_
- Add ClientResponse.ok property for checking status code under 400.
  `#4711 <https://github.com/aio-libs/aiohttp/issues/4711>`_
- Don't ceil timeouts that are smaller than 5 seconds.
  `#4850 <https://github.com/aio-libs/aiohttp/issues/4850>`_
- TCPSite now listens by default on all interfaces instead of just IPv4 when `None` is passed in as the host.
  `#4894 <https://github.com/aio-libs/aiohttp/issues/4894>`_
- Bump ``http_parser`` to 2.9.4
  `#5070 <https://github.com/aio-libs/aiohttp/issues/5070>`_


Bugfixes
--------

- Fix keepalive connections not being closed in time
  `#3296 <https://github.com/aio-libs/aiohttp/issues/3296>`_
- Fix failed websocket handshake leaving connection hanging.
  `#3380 <https://github.com/aio-libs/aiohttp/issues/3380>`_
- Fix tasks cancellation order on exit. The run_app task needs to be cancelled first for cleanup hooks to run with all tasks intact.
  `#3805 <https://github.com/aio-libs/aiohttp/issues/3805>`_
- Don't start heartbeat until _writer is set
  `#4062 <https://github.com/aio-libs/aiohttp/issues/4062>`_
- Fix handling of multipart file uploads without a content type.
  `#4089 <https://github.com/aio-libs/aiohttp/issues/4089>`_
- Preserve view handler function attributes across middlewares
  `#4174 <https://github.com/aio-libs/aiohttp/issues/4174>`_
- Fix the string representation of ``ServerDisconnectedError``.
  `#4175 <https://github.com/aio-libs/aiohttp/issues/4175>`_
- Raising RuntimeError when trying to get encoding from not read body
  `#4214 <https://github.com/aio-libs/aiohttp/issues/4214>`_
- Remove warning messages from noop.
  `#4282 <https://github.com/aio-libs/aiohttp/issues/4282>`_
- Raise ClientPayloadError if FormData re-processed.
  `#4345 <https://github.com/aio-libs/aiohttp/issues/4345>`_
- Fix a warning about unfinished task in ``web_protocol.py``
  `#4408 <https://github.com/aio-libs/aiohttp/issues/4408>`_
- Fixed 'deflate' compression. According to RFC 2616 now.
  `#4506 <https://github.com/aio-libs/aiohttp/issues/4506>`_
- Fixed OverflowError on platforms with 32-bit time_t
  `#4515 <https://github.com/aio-libs/aiohttp/issues/4515>`_
- Fixed request.body_exists returns wrong value for methods without body.
  `#4528 <https://github.com/aio-libs/aiohttp/issues/4528>`_
- Fix connecting to link-local IPv6 addresses.
  `#4554 <https://github.com/aio-libs/aiohttp/issues/4554>`_
- Fix a problem with connection waiters that are never awaited.
  `#4562 <https://github.com/aio-libs/aiohttp/issues/4562>`_
- Always make sure transport is not closing before reuse a connection.

  Reuse a protocol based on keepalive in headers is unreliable.
  For example, uWSGI will not support keepalive even it serves a
  HTTP 1.1 request, except explicitly configure uWSGI with a
  ``--http-keepalive`` option.

  Servers designed like uWSGI could cause aiohttp intermittently
  raise a ConnectionResetException when the protocol poll runs
  out and some protocol is reused.
  `#4587 <https://github.com/aio-libs/aiohttp/issues/4587>`_
- Handle the last CRLF correctly even if it is received via separate TCP segment.
  `#4630 <https://github.com/aio-libs/aiohttp/issues/4630>`_
- Fix the register_resource function to validate route name before splitting it so that route name can include python keywords.
  `#4691 <https://github.com/aio-libs/aiohttp/issues/4691>`_
- Improve typing annotations for ``web.Request``, ``aiohttp.ClientResponse`` and
  ``multipart`` module.
  `#4736 <https://github.com/aio-libs/aiohttp/issues/4736>`_
- Fix resolver task is not awaited when connector is cancelled
  `#4795 <https://github.com/aio-libs/aiohttp/issues/4795>`_
- Fix a bug "Aiohttp doesn't return any error on invalid request methods"
  `#4798 <https://github.com/aio-libs/aiohttp/issues/4798>`_
- Fix HEAD requests for static content.
  `#4809 <https://github.com/aio-libs/aiohttp/issues/4809>`_
- Fix incorrect size calculation for memoryview
  `#4890 <https://github.com/aio-libs/aiohttp/issues/4890>`_
- Add HTTPMove to _all__.
  `#4897 <https://github.com/aio-libs/aiohttp/issues/4897>`_
- Fixed the type annotations in the ``tracing`` module.
  `#4912 <https://github.com/aio-libs/aiohttp/issues/4912>`_
- Fix typing for multipart ``__aiter__``.
  `#4931 <https://github.com/aio-libs/aiohttp/issues/4931>`_
- Fix for race condition on connections in BaseConnector that leads to exceeding the connection limit.
  `#4936 <https://github.com/aio-libs/aiohttp/issues/4936>`_
- Add forced UTF-8 encoding for ``application/rdap+json`` responses.
  `#4938 <https://github.com/aio-libs/aiohttp/issues/4938>`_
- Fix inconsistency between Python and C http request parsers in parsing pct-encoded URL.
  `#4972 <https://github.com/aio-libs/aiohttp/issues/4972>`_
- Fix connection closing issue in HEAD request.
  `#5012 <https://github.com/aio-libs/aiohttp/issues/5012>`_
- Fix type hint on BaseRunner.addresses (from ``List[str]`` to ``List[Any]``)
  `#5086 <https://github.com/aio-libs/aiohttp/issues/5086>`_
- Make `web.run_app()` more responsive to Ctrl+C on Windows for Python < 3.8. It slightly
  increases CPU load as a side effect.
  `#5098 <https://github.com/aio-libs/aiohttp/issues/5098>`_


Improved Documentation
----------------------

- Fix example code in client quick-start
  `#3376 <https://github.com/aio-libs/aiohttp/issues/3376>`_
- Updated the docs so there is no contradiction in ``ttl_dns_cache`` default value
  `#3512 <https://github.com/aio-libs/aiohttp/issues/3512>`_
- Add 'Deploy with SSL' to docs.
  `#4201 <https://github.com/aio-libs/aiohttp/issues/4201>`_
- Change typing of the secure argument on StreamResponse.set_cookie from ``Optional[str]`` to ``Optional[bool]``
  `#4204 <https://github.com/aio-libs/aiohttp/issues/4204>`_
- Changes ``ttl_dns_cache`` type from int to Optional[int].
  `#4270 <https://github.com/aio-libs/aiohttp/issues/4270>`_
- Simplify README hello word example and add a documentation page for people coming from requests.
  `#4272 <https://github.com/aio-libs/aiohttp/issues/4272>`_
- Improve some code examples in the documentation involving websockets and starting a simple HTTP site with an AppRunner.
  `#4285 <https://github.com/aio-libs/aiohttp/issues/4285>`_
- Fix typo in code example in Multipart docs
  `#4312 <https://github.com/aio-libs/aiohttp/issues/4312>`_
- Fix code example in Multipart section.
  `#4314 <https://github.com/aio-libs/aiohttp/issues/4314>`_
- Update contributing guide so new contributors read the most recent version of that guide. Update command used to create test coverage reporting.
  `#4810 <https://github.com/aio-libs/aiohttp/issues/4810>`_
- Spelling: Change "canonize" to "canonicalize".
  `#4986 <https://github.com/aio-libs/aiohttp/issues/4986>`_
- Add ``aiohttp-sse-client`` library to third party usage list.
  `#5084 <https://github.com/aio-libs/aiohttp/issues/5084>`_


Misc
----

- `#2856 <https://github.com/aio-libs/aiohttp/issues/2856>`_, `#4218 <https://github.com/aio-libs/aiohttp/issues/4218>`_, `#4250 <https://github.com/aio-libs/aiohttp/issues/4250>`_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug need pull request reproducer: present This PR or issue contains code, which reproduce the problem described or clearly understandable STR
Projects
None yet
Development

No branches or pull requests

4 participants