Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduces the minimum memory requirement for Prefect Server #16644

Merged
merged 4 commits into from
Jan 8, 2025

Conversation

chrisguidry
Copy link
Collaborator

@chrisguidry chrisguidry commented Jan 8, 2025

This brings the minimum memory requirement for Prefect Server from
~400MB to ~345MB in my testing*.

What I discovered was that when when we use FastAPI's include_router,
FastAPI will create new routes rather than using a reference to the
previously declared router's routes. This is because additional
dependencies may be injected that the original router wasn't aware of,
so the new routes are built using the original router as a reference.

With Prefect's (likely very commonplace) pattern of declaring
module-level routers and then assembling them into a larger FastAPI
app, this means that there is an entire copy of the routes left in
memory that will likely never be touched again. Because the routes may
include dynamically generated Pydantic models representing the bodies of
requests, which themselves may be referring to other models that Prefect
defines, the complexity of the generated validators can balloon quickly.
In our case, something like ~50-55MB is held in references by those
routes.

*Note about the testing approach: I used a script that runs prefect server start in a cgroup to constrain the memory it has access to by
setting memory.max and by disabling swap memory with memory.swap=0.
It then polls the health check for 5 consecutive successes while
confirming that the process remains running. From there it performs
a binary search over the memory constraint until it finds the minimum
that allows for stable startup and health checks.

This new minimum of 345MB is probably too low for practical use, as some
additional memory would be required to run an actual flow, receive its
logs and events, and run periodic loop services.

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json.
  • If this pull request adds functions or classes, it includes helpful docstrings.

This brings the minimum memory requirement for Prefect Server from
~400MB to ~345MB in my testing*.

What I discovered was that when when we use FastAPI's `include_router`,
FastAPI will create _new_ routes rather than using a reference to the
previously declared router's routes.  This is because additional
dependencies may be injected that the original router wasn't aware of,
so the new routes are built using the original router as a reference.

With Prefect's (likely very commonplace) pattern of declaring
module-level routers and then assembling them into a larger FastAPI
`app`, this means that there is an entire copy of the routes left in
memory that will likely never be touched again.  Because the routes may
include dynamically generated Pydantic models representing the bodies of
requests, which themselves may be referring to other models that Prefect
defines, the complexity of the generated validators can balloon quickly.
In our case, something like ~50-55MB is held in references by those
routes.

*Note about the testing approach: I used a script that runs `prefect
server start` in a `cgroup` to constrain the memory it has access to by
setting `memory.max` and by disabling swap memory with `memory.swap=0`.
It then polls the health check for 5 consecutive successes while
confirming that the process remains running.  From there it performs
a binary search over the memory constraint until it finds the minimum
that allows for stable startup and health checks.

This new minimum of 345MB is probably too low for practical use, as some
additional memory would be required to run an actual flow, receive its
logs and events, and run periodic loop services.
@zzstoatzz
Copy link
Collaborator

@chrisguidry sweet! i think CI is failing bc of the active docker incident

Copy link

codspeed-hq bot commented Jan 8, 2025

CodSpeed Performance Report

Merging #16644 will not alter performance

Comparing fastapi-memory-optimization (13312c3) with main (d58ecba)

Summary

✅ 3 untouched benchmarks

Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow surprisingly simple to understand -- thank you @chrisguidry this is great!

Copy link
Collaborator

@zzstoatzz zzstoatzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌟

@chrisguidry chrisguidry merged commit 0fc38cf into main Jan 8, 2025
45 checks passed
@chrisguidry chrisguidry deleted the fastapi-memory-optimization branch January 8, 2025 18:31
chrisguidry added a commit that referenced this pull request Jan 8, 2025
In #16644, we were able to get the standalone Prefect server we run in
`prefect server start` from 400MB to 343MB.  By running `uvicorn`
directly in the parent process, we further reduce that to *266MB*.  I
measured this using the same process as before, using a `cgroup` with
no swap to constrain the memory until I found the minimum.

There are two main changes here:

1. Making `prefect server start` a synchronous command in both the
   background and the foreground cases.  This helps to simplify the
   handling of signals and removes the need for `anyio` or piping the
   streams of a child process manually.

2. Splitting the paths between background and foreground servers.  In
   the case of the background server, we run the subprocess as usual and
   then exit leaving it running.  In the case of the foreground server,
   we run it with `uvicorn.run` as the final call, letting `uvicorn`
   handle signals.  `uvicorn` already handles `SIGINT` and `SIGTERM` to
   cleanly shutdown, so we don't need to do anything additional here.

I'm removing the two tests that were testing what happens when the
signals are sent in escalating succession, as they aren't materially
different now.  In both cases, `uvicorn` will attempt to shutdown
cleanly.  I also updated the tests to reflect that `SIGINT` will lead to
a successful exit, while `SIGTERM` will lead to a -15 exit (pretty
standard).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants