-
Notifications
You must be signed in to change notification settings - Fork 828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues with large data sets #268
Comments
@mwilliamson-healx Ive had the same problem. So you should give it a try and see if it solves your problem (and if not, maybe there's some new performance test cases to add to Graphene's performance tests) |
@mwilliamson-healx as Eran pointed, the We also added a benchmark for a similar case you are exposing (retrieving about 100k elements instead of 10k). The time spent for retrieving 10k elements should be about 10-20 times faster in the Would be great if you could test this case in the |
Thanks for the suggestion! I gave Graphene 1.0.dev0 a go, and while it's certainly faster, it still takes around a second to run the example above. Admittedly, I didn't try it out on the speediest of machines, but suggests that it would still be the dominant factor in response time for our real data. |
@mwilliamson-healx some of the performance bottleneck was also in the Could you try installing cyordereddict with Thanks! |
PS: There are plans to port some code to |
Thanks again for the suggestion! Using cyordereddict shaves about 200ms off the time (from 1s to 0.8s), so an improvement, but still not ideal. I had a look around the code, but nothing stuck out to me as an easy way of improving performance. The problem (from my extremely quick and poorly informed glance!) is that you end up resolving every single value, which includes going through any middleware and having to coordinate promises. Keeping that functionality while being competitive with just spitting out dicts directly seems rather tricky. The proof of concept I've got sidesteps the issue somewhat by parsing the GraphQL query, and then relying on the object types being able to generate the requested data directly, without having to further resolve values. It's very much a proof of concept (so doesn't support fragments, and isn't really GraphQL compliant yet), but feel free to have a look. Assuming the approach is sane, then it's hard to see how to reconcile that approach with the normal GraphQL resolve approach. |
Hi @mwilliamson-healx, I've been thinking for a while how we can improve performance in GraphQL. This repository - The problem we are seeing is that either in In the current
However we can create a "Query Builder" as intermediate step before executing that will know exactly what are the fields we are requesting and therefore it's associated types and resolvers, so we don't need to "search" for them each time we are completing the value.
Your proof of concept is doing the latter so the performance difference is considerable comparing with the current I think it's completely reasonable to introduce this extra And I also think this would be super valuable to have it too in the |
Thanks for the kind words. One question I had was how much you'd imagine trusting the query builder? For my implementation, I was planning on putting the responsibility of correctness onto the queries (rather than having the GraphQL implementation check). The result is that, unlike the normal implementations of GraphQL, it's possible to implement something that doesn't conform to the GraphQL spec. |
I'm working in the query builder concept. As of right now the benchmarks shows about 4x improvement when returning large datasets. Related PR in |
Some updates! BenchmarksRetrieving 10k ObjectTypesDoing something similar to the following query where {
allContainers {
x
}
}
Retrieving a List with 10k IntsDoing something similar to the following query where {
allInts
}
NOTE: Just serializing a plain list using GraphQLInt.serialize takes about 8ms, so the gains are better compared substracting this amount from the totals: 4ms vs 22ms ConclusionThe work I'm doing so far is being a demonstration the code performance still have margins to improve while preserving fully compatibility with GraphQL syntax. The proof of concept speedup goes between 5x and 15x while maintaining the syntax and features ExtraI think by using TransportApart of using Cython I'm thinking how we can plug multiple kind of transports into GraphQL. This way the result could be created directly in the output format. This way we can plug other transports like |
Thanks for working on this. I've taken a look at the proof of concept you wrote, but it's not clear to me exactly how it behaves, and how it's saving time versus the existing implementation. It seems like it's still resolving all fields of objects in the response, but I could easily have misread. I adjusted my proof of concept to (optionally) integrate with GraphQL properly. This means that you can do things like generating the schema, introspect, and the all other stuff that GraphQL does, but it means you hit the performance penalty again. It seems to me that the easiest way of fixing this for my use case would be a way to prevent resolution from descending into the object that my proof of concept produces -- a way of returning a value from resolve functions that doesn't trigger resolution on any fields (since they're already resolved). Perhaps something like: def resolve_users(...):
...
return FullyResolvedValue(users) where This shifts more responsibility onto the calling code to make sure that the returned value is of the correct shape in order to ensure it's still a valid GraphQL implementation, but that's definitely a good trade-off for me. |
@syrusakbary any update on this thread? I am using graphene in production and unfortunately it simply doesn't scale for even the moderate data sets being returned by my API. I'm slowly rewriting my API calls as normal HTTP calls and seeing 10x RPS increases (and therefore 10x reduction in server costs), but it means I'm losing the flexibility of the graphQL approach. Seems like the solution discussed in this thread would save me from this headache! |
In case it's useful, I've been using the project I mentioned above in production, and performance has been good enough. In particular, it avoids having to run a (potentially asynchronous) resolver for every field. I'm still tweaking the API, but it should be reasonably stable (and better documented!) soon. |
Hi @qubitron, If you use the experimental branch It should give you a ~3-5x speed improvement for both big and small datasets. How to use it
from graphql.execution import executor
executor.use_experimental_executor = True
If you can try it and output here your results would be great! Extra questionsTo help us optimize for your use case:
|
@syrusakbary it took me a bit of time to get to a place where I had a good test for this. The package you provided seems to make a big improvement! Cutting total execution time for my request roughly in half, with the graphene portion reduced by a factor of 3x. Initially it wasn't working because I already had graphql-core installed, doing "pip uninstall graphql-core" before running your command above finally yielded the performance improvements. More about my workload... I'm using a flask web server with graphene_sqlalchemy and returning objects that inherit from SQLAlchemyObjectType (not sure if that counts as middleware but I get similar results when I return plain graphene.ObjectType). For this particular example, I have ~300 items being returned, and resolving 5 fields (on each. The SQL Query takes about 18ms to return results, and the full HTTP response takes 78ms. After installing your package the request takes about 18ms and full HTTP response takes 37ms. This is much more reasonable, but there still might be some opportunities for improvements. I ran the CPython profiler for the duration of the request, here is the breakdown of time spent in the graphql libraries with the experimental executor:
I'm using a CPython runtime in AWS, do you think your experimental executor is complete/stable enough for me to use it in production (obviously I will test it)? |
Hi @qubitron, thanks for the info and the profiling data! I've fixed few issues in the experimental executor and now is as stable as the master branch. So yes, as stable as master! :) |
Unfortunately, this is still probably too slow for my use-case -- GraphJoiner is around four times faster. When profiling, it seems like most of the time is spent in (potentially asynchronous) field resolution. Having said that, I'm not sure that the approach I'm using is really compatible with the way Graphene works. I suspect my comments aren't particularly helpful, so I'll be quiet! |
@mwilliamson-healx I agree it would be nice if this could be faster, for me these changes make it usable but further performance improvements would be nice. I took a cursory look at the GraphJoiner, I haven't had time to full internalize how it works and although it seems like a promising alternative, I'd prefer if the graphene approach could be made faster or if some sort of hybrid approach could be used. One thing that would be interesting for me is if somehow we could select only the columns from SQL that were requested by the user's query, to further improve database performance. |
I'm still working on improving Performance. I'm going to drop here some numbers, so is easier to see the advantages by using just the faster implementation of promise: Non-optimized GraphQL resolutionOld promise
New promise implementation syrusakbary/promise#23
Optimized GraphQL resolution graphql-python/graphql-core#74Old Promise
New Promise implementation
|
When used with PyPy the difference is even bigger, and this is just the beginning. After finishing this promise implementation, I will work on separate the serializer that I assume will give another ~2x gains if using a simple And after that, optimizations with And all this, while preserving 100% compatibility with the GraphQL spec and the current GraphQL Graphene implementation, with no changes required for the developer, other than updating the package once the new version is published. |
PS: Meanwhile I'm also working on a |
Amazing work, @syrusakbary! Looking forward to the improvements, let me know if I can help test any changes. |
@syrusakbary I am a bit hesitant to use PyPy, I ran into some bugs/compatibility issues with Cython libraries (unrelated to graphene) and was getting mixed performance results using sqlalchemy. That being said, if the wins are there then it's always good to have that option. |
I've been able to improve a little bit more the type resolution, giving an extra ~35% in speed gains: graphql-python/graphql-core@81bcf8c. New benchmarks (new promise and better type resolution with experimental executor)
|
(all this benchmarks are without PyPy, just plain Python with the common CPython executor) |
The latest Just by running (you will also need to do: @qubitron Willing to know the extra performance improvements! |
I also make some tests and compare In both cases i used a view with pagination ( graphene:
Rest-API:
What immediately stands out is the much higher number of function calls. I also run a test with graphene v3 with django 2.2 with the graphene-django sources from: graphql-python/graphene-django#812 It's ~30% slower than graphene v2 See also: graphql-python/graphene-django#829 |
@jedie could you share the code you used to benchmark Graphene vs Rest Framework? |
Sorry, can't share the code. But it's really a minimal example code. But i made another tests and benchmark only with graphql-core: https://gist.github.com/jedie/581444e02e784ff7c2b9fb1e763759fa It fetches only a list of 1000 dummy items and takes ~20ms |
Now, i also made a similar test with tartiflette. To my surprise: tartiflette (~57ms) is significantly slower than graphql-core (~20ms). My benchmark code: tartiflette: https://gist.github.com/jedie/45ddf8ee7e24704c9485eb8cbcf9ba13 EDIT: I re-implement a "standalone" Benckmark test with Django REST-Framework that will so similar stuff... And yes, it's very, very faster: ~8ms https://gist.github.com/jedie/1d658a184eb4435383820aa0c647d7e9 |
I was fixing a performance issue in graphene-mongo: My pull request brought down the response time from 2s to 0.02s on a dataset of 12000 documents in MongoDB. The solution was to provide the list_slice_length in default_resolver to prevent the default resolver from doing a len() on the collection. It would appear that the default behavior for many orms when doing len on their collections is to load all objects in the collection. Although I resolved this particular issue, there were plenty more like this one. I stopped trying because it would require some major changes to Graphene in order to fix it. Will issues like this be fixed for v3? |
There doesn't seem to be this dicrepancy between 2.1.8 and 3.07b Graphene 2: 12.702938017901033 |
That particular problem was fixed in graphql-python/graphql-core#54, fix was released in graphql-core 3.1.0. |
FWIW, I am using
Perhaps still this old issue: graphql-python/graphql-core#54 (comment)? And this is from Sentry profiling (spans are created just for the top level instance of any recursive calls): |
In case this helps anyone, I ran some benchmarks against graphene-django and DRF with django-silk and discovered the To paint a picture here:
Fetching 50 profiles, the following timings were obtained:
0.7 seconds is still pretty bad for a query of 50 things and a single postgres query, but 90% of my problem was not graphene. |
Great find @kevindice . Thank you for posting! |
Just to offer my experience, it seems like performance is still an issue. Returning a set of 50 items using Graphene (I'd consider them medium sized maybe? nothing unusual), requests were taking over 2 seconds to return on a Heroku free dyno – nearly all of this time is spent in GraphQL code according to NewRelic, I had optimised the queries already. Switching to Strawberry made no difference, and switching DRF greatly improved performance, so it definitely seems like the GraphQL Core is the issue. I would have liked to investigate more but I am on a deadline, so I ended up switching to Rails with graphql-ruby which, to my surprise, is faster than either Python solution – the same query on Heroku with Rails returns in 300-400ms, so it's several times faster and makes the difference between a good and bad user experience. Interestingly with Rails, it seems using GraphQL is actually a bit faster than a normal REST endpoint! I prefer the developer ergonomics of Python and Django but to be honest Django and Rails are similar enough in many ways that it's not a big deal for me to switch. Obviously you can't do this if you're deep into a project, but for anyone considering Python for a GraphQL project I think it's worth being aware of these potential performance issues – and also being aware that Graphene seems to be stuck in a potentially-unmaintained limbo, if I'd realised this sooner I'd probably have started with Rails. |
@tomduncalf were you using Graphene v2 or v3? Also I'm surprised that switching to Strawberry didn't help. Based on these benchmarks: https://twitter.com/jayden_windle/status/1235323199220592644 I would expect Strawberry or Graphene v3 to be significantly better especially for lists of objects. IMO I would expect GraphQL to always have a bit of a performance overhead compared to a REST endpoint since it's doing quite a bit more. It would be good to get Python performance to a point where it's comparable to other similar languages though (like ruby). Can you share any of your code? |
@jkimbo This was on Graphene v3. Unfortunately I can't share the code as it's not open source and I don't really have the time to dig into exactly why it was slower right now, but if I do find time to make a simple repro comparing Python vs Ruby I will post it here! |
So @tomduncalf you properly nerd sniped me with this and I ended up building a more "realistic" benchmark. I implemented an API to fetch the top 250 rated IMDB movies in Graphene, Strawberry, DjangoRestFramework and a plain JSON api, all hosted on Django. All the data comes from a sqlite db. The code is here: https://github.com/jkimbo/django-graphql-benchmarks and you can try it out the Graphene API here. Here are the P99 results against the Heroku instance: So you can see that Graphene (v3) and Strawberry (v0.73.1 contains a fix for a performance regression btw) are pretty much neck and neck, which is what I would expect considering that they are just different ways to setup a graphql-core server. DRF is definitely faster (~25% faster) and the the plain json endpoint is faster still. I couldn't replicate the 2 second response times you were seeing with your API @tomduncalf so I'm not sure what is going on there. Overall GraphQL in Python is definitely slower that using something like DjangoRestFramework but not horribly so in my opinion. There are definitely things that can be improved though and thanks to this exercise I have some ideas for improvements we can make to Strawberry. Would be interested in how this all compares to graphql-ruby as well but unfortunately my experience there is lacking. |
Hey @jkimbo, thanks for doing this and I hope my initial post didn't come across too negatively – I just wanted to share my experience for anyone else in my situation (i.e not familiar with either Django or Rails and looking to pick one for a GraphQL project), as I didn't really find much online comparing the two for GraphQL specifically and I didn't realise there is a performance overhead. You prompted me to do a little bit more digging as I felt bad for just saying it was slow 🙃 as your demo seems to perform pretty well. One thing I didn't think to mention is that I am using Relay with my API – I did a little bit of testing with my API and it seems like using Relay adds a fairly significant overhead – almost doubling the response times on Heroku for the same query vs. a non-Relay version! I wonder if you could try using Relay with Graphene on yours and see if you see similar results? Or if you tell me how to run yours, I can try it (pretty new to Python so couldn't work out how to run yours from a git clone). My API does still seem quite slow compared to yours and I'm not really sure why, as yours is returning a larger set of data. I'm new to Django so I could be doing something a bit stupid somewhere. To be honest, I am going to stick with Rails at this point as it's probably a slightly better fit for what I am trying to do (build an API with as little code as possible basically, haha – the ecosystem of gems seems a bit more developed for some of the things I want to do), but if you have any suggestions of good ways to profile my Python I could give it a go. Anyway, you piqued my curiosity so I reproduced your demo in Rails! The code is at https://github.com/tomduncalf/rails-graphql-benchmark and I've deployed it to Heroku in the EU region. It seems like it returns a bit faster than yours, but not dramatically so – I'm not sure how you run your benchmark but I'd be happy to try it on mine if it's useful for comparison. There are two queries you can run, one Relay and one non-Relay (doesn't seem the Ruby version of GraphiQL supports embedding them in the URL!):
Cheers, |
Last year I spent a few weekends pulling off a very minimal PoC based on @syrusakbary idea of generating template code to improve GQL performance in python. Here's the link to the same - https://github.com/Ashish-Bansal/graphql-jit It's very vague, un-tested PoC implementation, and needs a lot of work. I won't get time to work on it, so in case anyone is interested, they can work over that PoC. |
This thread is interesting but kind of hard to get a grasp on. That would be great if someone competent, like a maintainer of I also have a few questions:
In my case, all requests I try to make take an enormous time to resolve (300ms for < 10 fields queries, and up to 3s for larger queries). On the other hand I sometimes get quite lower response time (~ x3 improvement) for the exact same query on the exact same database (same data state), all of which use the exact same docker image, sometimes I wonder if it's not a lack of memory/CPU issue which is causing this. Anyway, thanks for this discussion, I hope someone smarter than me would be able to summarize the good ideas that lies here and there in this thread. |
Currently asking myself the same question. I took the liberty to fork jkimbo's benchmark suite mentioned above, update the frameworks to their latest version and include graphene v2 as well: flbraun/django-graphql-benchmarks Here's the result from a fresh bench I did this morning: As you can see the difference between v2 and v3 is pretty much non-existent, however the benchmark suite currently only serializes a bunch of ObjectTypes. Since this is the most primitive building block of a GraphQL API and doesn't cover real-world setups utilizing pagination, connections, etc, your experience may vary. Maybe this is of any interest for you. |
@flbraun nice update! Probably makes sense to add some asyncio into the mix as this adds a lot of overhead as well. |
I updated my forked bench suite to be tested against multiple Python versions, see results here. tl;dw: Graphene (both v2 and v3) perform almost identical on the same Python version. However, Graphene seems to have heavily benefited from performance improvements in the Python interpreter. Jumping from 3.10 to 3.11 alone shaves off ~20% off of mean response times, which is kinda impressive. |
If you're running this in prod, you should try pyston...we've seen perf that is close to pypy but without warmup/other issues |
Hi. I'm curious what the latest is on this? Were noticing similar performance issues as OP posted, where a lot of time appears to be spent planning SQL Queries Our SQL takes 60-100ms to process, but time to first byte is close to 1 second. Our profile implies a lot of time is spent planning SQL Our Stack:
All optimized SQL, and we get numbers like: Chrome Dev Tools
Our PySpy on a Postgres and Django all running local implies much of the time is in Graphene / SQL planning? |
Did you ever figure this out? @vade |
@pfcodes We've done a lot of optimization on our stack in the interim, so I don't know if I can speak specifically to any one single change, but things we've observed. We use Relay, and pagination:
We've seen 10x improvement in response time with some of the above. I know it's hand wavy, but a lot of it was just really paying attention to details and ensuring that more complicated queries do in fact get optimized. We do some hand tuned query set tweaks in some cases for fields that require model / db lookups and annotate them, and we found some hot spots where we unintentionally did dumb shit like evaluate a query set in place rather than use annotated values so the DB could do the work. cc @rsomani95 - any thing else to add from the work we did that maybe im missing? |
Also, theres a link to an issue with observations about the flame graph in question with observations from the optimizers author @ MrThearMan (sorry for the tag) - they deserve a ton of credit, this optimizer is best in class right now for Django and no one knows it :) |
For the very specific use-case that I'm working with, which is synchronous-only and does not support the full range of GraphQL features, I've made a Gist which shows how I worked around this performance issue to get an apparent speedup of 10x in some cases (not formally measured). If your use-case is similar to mine then it might make a useful starting point. https://gist.github.com/dicknetherlands/2f6e8619409fa155a05b3a863f10269a It works by assuming that the developers will respect the schema. The key speedups come from discovering the type of each field only once, bypassing serializers for builtin scalars, bypassing nullability checks, not doing object type checks for nested fields, not re-validating the schema upon every query, and more efficient resolution of SyncFuture objects returned by graphql_sync_dataloader (if you're using that library). |
For our use case, we send a few thousand objects to the client. We're currently using a normal JSON API, but are considering using GraphQL instead. However, when returning a few thousand objects, the overhead of resolving values makes it impractical to use. For instance, the example below returns 10000 objects with an ID field, and that takes around ten seconds to run.
Is there a recommended way to improve the performance? The approach I've used successfully so far is to use the existing parser to parse the query, and then generate the response by creating dictionaries directly, which avoids the overhead of resolving/completing on every single value.
The text was updated successfully, but these errors were encountered: