-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduces the minimum memory requirement for Prefect Server #16644
Conversation
This brings the minimum memory requirement for Prefect Server from ~400MB to ~345MB in my testing*. What I discovered was that when when we use FastAPI's `include_router`, FastAPI will create _new_ routes rather than using a reference to the previously declared router's routes. This is because additional dependencies may be injected that the original router wasn't aware of, so the new routes are built using the original router as a reference. With Prefect's (likely very commonplace) pattern of declaring module-level routers and then assembling them into a larger FastAPI `app`, this means that there is an entire copy of the routes left in memory that will likely never be touched again. Because the routes may include dynamically generated Pydantic models representing the bodies of requests, which themselves may be referring to other models that Prefect defines, the complexity of the generated validators can balloon quickly. In our case, something like ~50-55MB is held in references by those routes. *Note about the testing approach: I used a script that runs `prefect server start` in a `cgroup` to constrain the memory it has access to by setting `memory.max` and by disabling swap memory with `memory.swap=0`. It then polls the health check for 5 consecutive successes while confirming that the process remains running. From there it performs a binary search over the memory constraint until it finds the minimum that allows for stable startup and health checks. This new minimum of 345MB is probably too low for practical use, as some additional memory would be required to run an actual flow, receive its logs and events, and run periodic loop services.
@chrisguidry sweet! i think CI is failing bc of the active docker incident |
CodSpeed Performance ReportMerging #16644 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow surprisingly simple to understand -- thank you @chrisguidry this is great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌟
In #16644, we were able to get the standalone Prefect server we run in `prefect server start` from 400MB to 343MB. By running `uvicorn` directly in the parent process, we further reduce that to *266MB*. I measured this using the same process as before, using a `cgroup` with no swap to constrain the memory until I found the minimum. There are two main changes here: 1. Making `prefect server start` a synchronous command in both the background and the foreground cases. This helps to simplify the handling of signals and removes the need for `anyio` or piping the streams of a child process manually. 2. Splitting the paths between background and foreground servers. In the case of the background server, we run the subprocess as usual and then exit leaving it running. In the case of the foreground server, we run it with `uvicorn.run` as the final call, letting `uvicorn` handle signals. `uvicorn` already handles `SIGINT` and `SIGTERM` to cleanly shutdown, so we don't need to do anything additional here. I'm removing the two tests that were testing what happens when the signals are sent in escalating succession, as they aren't materially different now. In both cases, `uvicorn` will attempt to shutdown cleanly. I also updated the tests to reflect that `SIGINT` will lead to a successful exit, while `SIGTERM` will lead to a -15 exit (pretty standard).
This brings the minimum memory requirement for Prefect Server from
~400MB to ~345MB in my testing*.
What I discovered was that when when we use FastAPI's
include_router
,FastAPI will create new routes rather than using a reference to the
previously declared router's routes. This is because additional
dependencies may be injected that the original router wasn't aware of,
so the new routes are built using the original router as a reference.
With Prefect's (likely very commonplace) pattern of declaring
module-level routers and then assembling them into a larger FastAPI
app
, this means that there is an entire copy of the routes left inmemory that will likely never be touched again. Because the routes may
include dynamically generated Pydantic models representing the bodies of
requests, which themselves may be referring to other models that Prefect
defines, the complexity of the generated validators can balloon quickly.
In our case, something like ~50-55MB is held in references by those
routes.
*Note about the testing approach: I used a script that runs
prefect server start
in acgroup
to constrain the memory it has access to bysetting
memory.max
and by disabling swap memory withmemory.swap=0
.It then polls the health check for 5 consecutive successes while
confirming that the process remains running. From there it performs
a binary search over the memory constraint until it finds the minimum
that allows for stable startup and health checks.
This new minimum of 345MB is probably too low for practical use, as some
additional memory would be required to run an actual flow, receive its
logs and events, and run periodic loop services.
Checklist
<link to issue>
"mint.json
.