-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: out-of-memory crash (#7720) and significantly improve performance + feat: extracting fields to types #9705
Conversation
🦋 Changeset detectedLatest commit: 045b69b The changes in this PR will be included in the next version bump. This PR includes changesets to release 9 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@@ -1851,18 +1851,18 @@ export * from "./gql.js";`); | |||
export type FooQueryVariables = Exact<{ [key: string]: never; }>; | |||
|
|||
|
|||
export type FooQuery = { __typename?: 'Query', foo?: ( { __typename?: 'Foo' } & ( | |||
export type FooQuery = { __typename?: 'Query', foo?: { __typename?: 'Foo' } & ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: these braces that used to be here were unnecessary because intersection has higher precedence than union, new logic doesn't generate them (which is correct), hence the updated test.
I'll spend more time reviewing this in detail, but just wanted to reflect on the fact that the results are, indeed, jaw-dropping! Big unions and intersections come with a heavy price tag! This is amazing, thank you for contributing and for the well-thought-out description. It was interesting to read. 🙌 I look forward to reviewing this in detail. |
1ac6cf9
to
bcd7719
Compare
Co-authored-by: Cynthia Ma <cma@zendesk.com> Co-authored-by: Bazyli Brzoska <bbrzoska@zendesk.com> fixes dotansimha#7720
bcd7719
to
045b69b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! thank you so much 🙏🏼
Related
stage/1-reproduction
issue: #7720Description
Background
At Zendesk, we've been using
graphql-code-generator
for a few years now. Thank you to the creators and maintainers for the lovely tool! It's been a great help to have codegen work for us. 🥳As time went by, and our schemas and usage of GraphQL at the company has grown, we unfortunately started to experience increased codegen execution times, and TypeScript compilation times.
This has escalated to the point where engineers had to wait 4 minutes for codegen to complete on their high-spec engineering laptops, and the generated .ts files were over 35MB (yes, you read that right ❗). If that's not bad enough, executing TypeScript to typecheck our codebase on CI (‼️
tsc --noEmit
) would take a whopping 6:36 minutes on the CIFinally, we've hit a brick wall that just made us unable to make any changes to the schema. We stumbled upon issue #7720.
We've tried to mitigated it by increasing node's heap size, and running codegen with a 12GB heap size (
--max-old-space-size=12288
).But then we've hit another roadblock. Whenever we'd run TypeScript to typecheck our source code using the generated GraphQL file, we'd bump against a hard limit of V8 - there's a maximum number of entries a
Map
can hold:Call stack indicated that this crash was coming from TypeScript trying to calculate an incredibly complex intersection type.
Since this was essentially blocking all GraphQL-related projects, me and my colleague (@xnanodax) set out to improve the performance of codegen. After some preliminary analysis it seemed that the problem stems from codegen generating the same types, over and over again. The resulting types have very deep nesting, and some extremely complex union types.
The problem
Say you have a GraphQL interface with 50 implementing types. Within those 50 types, you have a field that's a union of 100 types.
Now you have a query that selects a field with those interfaces, and the interface has a field that includes the union.
What ends up happening, is that for every combination of 50 types and 100 implementations, the same type is generated over, and over again.
Not only they're not being cached in any way, these types are repeated inline in each combination, over and over again in the generated file. In the example case, you'd have 50 x 100 permutations = 5,000 repetitions of the same exact type, fully expanded and inlined in the resulting Fragment or operation types.
If you have many queries or uses of such fields, it's game over:
tsc
crashes, since you've reached hard limits of NodeJS.What we've done to fix this
extractAllFieldsToTypes: boolean
that makes codegen extract fields into their own typesWhen the property is
true
, each object field will have its own type generated.This means that the type will be reused by name if it already exists. This helps to take the 5,000 repetitions down to 1.
Since we didn't want to introduce breaking changes, this behavior is opt-in, though we think it would be a good idea to make it the default behavior in the next major version.
Improvements
The results of this change are jaw-dropping:
graph.ts
filesizetsc --noEmit
We've managed to save a significant amount of money on CI/CD, since we run
codegen
before every type of test (jest, eslint, typecheck, performance tests - and we parallelize heavily with 100s of runners spun simultaneously for each PR). The average step in our CI pipeline is faster by 2.5 minutes! Since some steps are run in sequence (first build, then UI tests), the full release pipeline is faster by ~5 minutes. That's a tremendous saving, both financially and in reducing the overall time-to-deploy.After this change we also found out that TS did not flag certain errors, because the union/intersection complexity was very high. This change helped us discover bugs in production code that were previously silently being ignored!
One more thing: for readability's sake, we've added a
printFieldsOnNewLines: boolean
configuration flag, which helps with readability of long, generated types, by placing each field on a separate line, rather than having the entire type generated on a single line, sometimes reaching tens of thousands of columns.Type of change
Please delete options that are not relevant.
Screenshots/Sandbox (if appropriate/relevant):
Aside from all the great performance benefits, and actually unblocking further development on our graph, we've noticed significant usability improvements: Understanding TS errors from types coming from codegen was often incredibly difficult, if not impossible. TS errors would often contain thousands of lines, with some IDEs not even able to display the error correctly. By giving these field interfaces names, it's significantly easier to parse what the error is about:
Before
The error is thousands of lines long, and you can't see it in full without copy & pasting it from VSCode to another file. It's extremely bad DX.
After
The error is short and sweet! All is clear.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
extractAllFieldsToTypes: true
, but does OOM with that property set tofalse
(this hasn't changed).Test Environment:
@graphql-codegen/...
: latestmaster
(33f35a5)Checklist:
Further comments
This PR
As far as this PR goes, if you have any comments or questions regarding how this is implemented, or would like to request some changes, me and @xnanodax can spend some dedicated time to make sure we can upstream our changes to your project.
Further optimizations and suggested printer refactor
The way the final TypeScript code is currently produced in codegen is through manipulation of raw strings. In a way, the resulting code is generated eagerly, without abstractions for unions, intersections, or TS objects. This works okay in most cases, but the current design makes it very hard to add further performance optimizations to the resulting types, like automatic merging intersections of objects into flat objects. Intersections are heavier on the TypeScript compiler and increase memory usage. Object types are also slightly less performant than interfaces, and in the current implementation we cannot easily tell if the type contained in the string could be made into an TypeScript
interface
, or does it have to remain atype
(because it contains unions).Additionally, it can be challenging to reason about the types, as it's not clear what the underlying strings represent when manipulating them, making it easy to accidentally output invalid TypeScript.
To mitigate this, we have started an experimental branch (a version that we've been using internally for the past month), which introduces a set of helper classes to encapsulate unions, intersections, objects and other TS features. It's not as heavy as a full blown AST, but just enough to helps generate more performant TypeScript, and much cleaner output (by getting rid of the unnecessary intersections of objects).
We've started refactoring codegen's typescript-operations plugin to use these new helper classes in a separate branch, and we're seeing huge benefits in the cleaner/more performant generated output, however there's still a bunch of work left to be done to fully refactor all the remaining plugins and features.
Since this would be a much larger refactor than the fix in this PR, I first would like to get your feedback to see if this is a direction that you'd be willing to go in, so we don't commit to doing something you don't want to merge.
Example of the refactored code: Before vs After
Please let me know if we can proceed with the refactor and further optimizations that it unlocks.