-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linkcheck builder: begin using session-based HTTP requests #11340
linkcheck builder: begin using session-based HTTP requests #11340
Conversation
…nse-content reads are required
…nkchecking This ensures that the close method of the response is called when the response variable goes out-of-scope
…RNINGS configuration
…uring linkchecking" This reverts commit 9ec505f.
…ived" This reverts commit 98df982.
…een performed" This reverts commit 6fd82b3.
…nagers during linkchecking"" This reverts commit 692fdef.
Although it might seem strange, I do expect (and intend) the unit tests to fail with 4305d11 to begin with. Despite the pull request title, the session support (ad12d25) is reverted as of that commit, and so the assertion that only a single HTTP connection is used should fail. Reverting 3bf835c should re-introduce session support, and allow those tests to pass. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Intermittant failures on Python 3.10 with DU 0.18 or 0.19--other than that looks good. A |
@@ -424,6 +425,7 @@ def check(docname: str) -> tuple[str, str, int]: | |||
check_request = self.wqueue.get() | |||
next_check, hyperlink = check_request | |||
if hyperlink is None: | |||
self._session.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we close here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to ensure that the requests
session's resources are cleaned up when the linkchecker completes -- some warnings appeared during testing to indicate that they weren't -- and this seemed to be a good place for it (the None
hyperlink is used as a marker (sentinel?) to indicate that the worker should shutdown).
Thanks for the review; I'm taking a look into those test failures now. FWIW: I saw similar behaviour using Python 3.11 before adding acec710 - and I think that it's possible that calling |
On second thoughts: the asserted header is an auth header from the (linkcheck) client - so that makes the server headers seem less-relevant. Either way: I'll investigate. |
There seem to be two problems that contribute to the failures:
I've prepared fixups for each of these -- they're testing well so far -- and may also backport them to #11318 for pedantic consistency. |
Maybe it's a better idea not to backport them. That would help to confirm whether these issues affect both individual (non-session) HTTP requests, or only session-based HTTP requests. |
Even after adjusting the code to use I'm still relatively confident that sending the |
I'm planning to rebase this against the master branch, partly as an exercise in reminding myself what the series of changes applied actually is, and what remains relevant to investigate and consider. |
72c5e22
to
a30596f
Compare
I seem to be running in circles on this one without really figuring out why the remaining unit test failures (all timeouts, I believe) are occuring. Could it be that multiple unit tests share the same I'll try to test that theory in my personal fork of the repository. |
🤦 I should read before commenting! The failures don't exhibit timeouts here, but they do exhibit the |
…0 protocol (default) to HTTP/1.1
…ting the 'close_connection' handler attribute instead of calling connection.close Ref: https://docs.python.org/3/library/http.server.html#http.server.BaseHTTPRequestHandler.close_connection
…gth response header during redirects
…er capture to before any server-side communication has been initiated by the handler (cherry picked from commit 4d485ae)
There was a failure case that I hadn't seen before in https://github.com/sphinx-doc/sphinx/actions/runs/4820054596/jobs/8584030994 from commit 0ed465c here. From the test failure output of
I might rebase in a moment, because I think we can drop #11349 from the changeset here. But I'd like to revisit this failure even if it doesn't reappear immediately. |
0ed465c
to
bd2ed0b
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Perhaps a better approach than attempting to merge a set of inter-related changes in a single branch here would be to gradually introduce each one as a separate, simpler pull request. That would probably help to narrow down the cause of any problems. Apologies for the noise @AA-Turner @francoisfreitag, and thanks for the review and support. I'm going to close this and will re-open a few smaller, isolated changes (like adding connection-measurement support in the tests) soon. The goal continues to be to enable session-based linkchecking. |
This comment was marked as outdated.
This comment was marked as outdated.
This is an intermittent test failure in |
I think that this indicates that the failure occurs within This is confirmed by removing So what's the problem in Well, the test involves a server that will continually redirect the client. Meanwhile, our test client will close the connection when it reaches some threshold number of redirects. |
...only as long as the client performs HTTP HEAD requests, though. The HTTP GET handler should not perform a redirect... |
This comment was marked as outdated.
This comment was marked as outdated.
Ok, I think I've found the problem. It is straightforward to replicate a
The problem occurs after:
The last two items may occur in the inverse order and the result will be the same. The I think it'd be possible to update the |
Here's an attempt to create a minimal repro case for this problem: from contextlib import contextmanager
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
import time
from threading import Thread
import pytest
class TestHandler(BaseHTTPRequestHandler):
protocol_version = "HTTP/1.1"
def do_GET(self):
content = b"content"
self.send_response(200, "OK")
self.send_header("Content-Length", len(content))
self.end_headers()
self.wfile.write(content)
self.wfile.flush()
class TestServer(Thread):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.server = ThreadingHTTPServer(("localhost", 7777), TestHandler)
def run(self):
self.server.serve_forever()
def join(self):
self.server.shutdown()
self.server.server_close()
super().join()
@contextmanager
def server_context():
server_thread = TestServer()
server_thread.start()
yield server_thread
server_thread.join()
@pytest.mark.sphinx('linkcheck', testroot='linkcheck-localserver', freshenv=True)
def test_a(app, capsys):
with server_context():
app.build()
@pytest.mark.sphinx('linkcheck', testroot='linkcheck-localserver', freshenv=True)
def test_b(app, capsys):
time.sleep(0.5)
_stdout, stderr = capsys.readouterr()
assert stderr == "" A few notes:
I'm going to try to simplify the conditions further, likely by temporarily removing code from the |
Ok, after further reduction of the repro case, I don't think that anything about the
Further notes:
I'm not sure that there's anything else to do for this particular behaviour, given that we can fix it in the unit tests by having the server set It's potentially a slightly-annoying cause of noise in server logs, but everything is behaving to-spec, as far as I can tell (maybe that could indicate an HTTP spec limitation, or under-definition of the spec somewhere - I'm not completely sure). |
Feature or Bugfix
Purpose
Detail
Intersphinx requests and image downloading are also migrated to session-based requestsRelates