Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Encode JSON responses on a thread in C #10844

Closed
wants to merge 12 commits into from

Conversation

erikjohnston
Copy link
Member

Currently we use JsonEncoder.iterencode to write JSON responses, which ensures that we don't block the main reactor thread when encoding huge objects. The downside to this is that iterencode falls back to using a pure Python encoder that is much less efficient and can easily burn a lot of CPU for huge responses. To fix this, while still ensuring we don't block the reactor loop, we encode the JSON on a threadpool using the standard JsonEncoder.encode functions, which is backed by a C library.

Doing so, however, requires respond_with_json to have access to the reactor, which it previously didn't. There are two ways of doing this:

  1. threading through the reactor object, which is a bit fiddly as e.g. DirectServeJsonResource doesn't currently take a reactor, but is exposed to modules and so is a PITA to change; or
  2. expose the reactor in SynapseRequest, which requires updating a bunch of servlet types.

I went with the latter as that is just a mechanical change, and I think makes sense as a request already has a reactor associated with it (via its http channel).

There were two issues: 1) `channel` is actually private type so its hard
to type `.site`, and 2) `.site` was actually overwriting an existing
member and so we need to rename it.
@erikjohnston erikjohnston requested a review from a team September 17, 2021 14:46
@clokep
Copy link
Member

clokep commented Sep 17, 2021

Is the slow bit iterating a list of events? I wonder if we could iterate that manually and spit out the [ / ] manually as well, encoding each event with the fast C encoder. (Essentially a custom producer for this case.)

@erikjohnston
Copy link
Member Author

Is the slow bit iterating a list of events? I wonder if we could iterate that manually and spit out the [ / ] manually as well, encoding each event with the fast C encoder. (Essentially a custom producer for this case.)

This isn't specific to events, any large response will suffer from the same problem. Most of the time this will be due to events, but special casing things like /send_join and initial syncs just feels a bit tedious, though would work 🤷

@@ -184,7 +184,7 @@ async def _unsafe_process(self) -> None:

should_notify_at = max(notif_ready_at, room_ready_at)

if should_notify_at < self.clock.time_msec():
if should_notify_at <= self.clock.time_msec():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it has nothing to do with the changes in the rest of the PR right?? WRONG. Without this change the tests would tight loop and then OOM. What I think is going on is that moving the encoding of JSON to a thread has subtly changed the timings of the tests, causing us to hit this line such that should_notify_at is now, causing us to not send the response but then schedule the function to be rerun in 0 seconds, causing the tests to enter a tight loop and never make any progress.

Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me

Comment on lines +827 to +828
Python implementation (rather than the C backend), which is *much* more
expensive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and presumably holds the global interpreter lock, making thread pooling it useless?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ja

@erikjohnston
Copy link
Member Author

Note to self I should probably change this to use byte producer rather than request.write all at once

@richvdh
Copy link
Member

richvdh commented Sep 20, 2021

The downside to this is that iterencode falls back to using a pure Python encoder that is much less efficient and can easily burn a lot of CPU for huge responses.

I'm surprised about that. I'd have thought we'd have checked it before switching to iterencode, and it doesn't fit with my memory of how the json encoder works. I'll assume you've double-checked, though!

@erikjohnston
Copy link
Member Author

The downside to this is that iterencode falls back to using a pure Python encoder that is much less efficient and can easily burn a lot of CPU for huge responses.

I'm surprised about that. I'd have thought we'd have checked it before switching to iterencode, and it doesn't fit with my memory of how the json encoder works. I'll assume you've double-checked, though!

Yeah, it is surprising. FWIW the key bit here is, where _one_shot is always False for iterencode (I think because the C lib does the encoding all at once):

https://github.com/python/cpython/blob/5822ab672a1d26ff1837103c1ed8e4c3c2a42b87/Lib/json/encoder.py#L246-L256

@@ -105,7 +105,7 @@ def _throw(*args):
def _callback(request, **kwargs):
d = Deferred()
d.addCallback(_throw)
self.reactor.callLater(1, d.callback, True)
self.reactor.callLater(0.5, d.callback, True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because we time out responses that take 1s or longer to process. I'm not 100% why the change in this PR would make this suddenly start failing, but 🤷

@erikjohnston erikjohnston force-pushed the erikj/faster_json_response branch from 1bf359a to 3f89ad7 Compare September 20, 2021 16:35
@erikjohnston erikjohnston force-pushed the erikj/faster_json_response branch from 3f89ad7 to 16e8580 Compare September 21, 2021 07:58
@erikjohnston
Copy link
Member Author

I've updated this with a commit to use a producer rather than writing all the content at once, for the reasons mentioned in #3701

@erikjohnston erikjohnston requested a review from a team September 21, 2021 08:00
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to ask you to break this up. There's a lot changing, including a bunch of prep work, and going through commit-by-commit isn't working, as the implementation has obviously evolved a bit as you've been working on it. sorry.

request.write(json_str)
request.finish()
except RuntimeError as e:
logger.info("Connection disconnected before response was written: %r", e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really the only possible reason that a RuntimeError can be raised?

return NOT_DONE_YET


def _write_json_bytes_to_request(request: Request, json_bytes: bytes) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this specific to being json bytes, or could it be used for any byte sequence?

Comment on lines -709 to +710
# note that this is zero-copy (the bytesio shares a copy-on-write buffer with
# the original `bytes`).
bytes_io = BytesIO(json_bytes)

producer = NoRangeStaticProducer(request, bytes_io)
producer.start()
_write_json_bytes_to_request(request, json_bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changing in this PR? It seems unrelated to how we encode JSON responses?

@erikjohnston
Copy link
Member Author

I've split this up into separate PRs, with the final one being: #10905

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants