-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression #7
Comments
The regression was introduced with b1e7691 |
It seems the issue is with the issue of Nokogiris collect_namespaces method. Apparently using this method is a bad idea, from the nokogiri documentation at http://nokogiri.org/Nokogiri/XML/Document.html#method-i-collect_namespaces :
Any ideas how to improve the situation? /cc @jkingdon |
A simple patch is https://github.com/jacobat/wasabi/commit/90c0339124dfb4a1759294fa748986688f4fca96 - not sure if it is acceptable though. |
I don't have any particular ability to answer for all WSDL files out there whether the namespaces must be on the root node, nor did I notice such a restriction in a quick glance at http://www.w3.org/TR/wsdl but I would think this patch is worth a try. My guess would be that namespaces on the root node would be the most common practice. But perhaps someone knows better than I on such matters. |
I'm not sure what the best approach is. Perhaps it would be better if Wasabi was somehow able to cache the results of it's parsing. I have no idea how much data is actually extracted from the WSDL by Wasabi, but it seems wasteful to parse the whole WSDL like Savon does everytime a new client instance is created. |
interesting find. thanks for reporting! wasabi should cache every expensive parse-operation, |
this solved issue #7 and should probably work for most services.
published v2.1.1 to fix this problem. |
This seems to be an issue still (actually, it seems to have gotten worse). Here are the numbers I get from running the above code in Ruby 1.8.7 on different versions of Wasabi:
Numbers from Ruby 1.9.3 are almost as bad:
|
Looks like 583cf65 introduced the 30ish second parsetime, and doesn't use collect_namespaces, so this is a new performance regression, not the one originally fixed, it seems. |
@hoverlover would you mind looking into this? |
Sure, I can take a look. |
Any update on this one? |
Sorry, I haven't had a moment to dig into this yet. Will try to look at it this weekend, but I'm going on vacation next week so it might be a little while. |
i looked into it. it's damn slow, but it's doing actual work. so i'm not sure how to optimize it right now. |
since this change parses almost the entire document, it might make sense to switch back to a sax parser. |
I'm sure a SAX parser would help, since it doesn't hold the entire document in memory. Not sure how it would affect speed. It would be worth a shot to create a test branch. |
looking into it ... |
Didn't do any benchmarking when I switched us from SAX to Nokogiri, but Nokogiri is pretty fast so my first instinct is that it is usually going to be our code which is the bottleneck rather than parsing. If we do want SAX, I think we want something like saxophone ( http://jkingdon2000.blogspot.com/2008/05/parse-xml-with-saxophone.html ). Trying to hack up the previous SAX parser to know about namespaces (and various other features of the WSDL which were needed at that time) was seriously painful. |
@jkingdon i think you're right. i played with it for a couple hours and can see this getting messy real quick. |
I can't say I entirely understand what the change actually does, but as far as I can tell, I haven't had a need for it in my projects - thus I would be perfectly happy not having to incur the parse time penalty (admittedly, I only use Savon for one endpoint/WSDL). If it turns out to be too cumbersome/impossible to optimize this, we could perhaps consider making it an option? |
not every change to the parser affects every wsdl. reverting this change doesn't seem to be an option. i started working on a sax parser which is actually pretty readable: fc27d4a keeping state is kind of ugly, but i'm sure this can be improved somehow. the idea here is to parse the wsdl into a format that can be interpreted by some other class. my next step would be to test against other wsdl files and then build something that |
one other feature i was thinking about to improve performance, is to somehow remove conditionals from the |
fyi: parsing the original wsdl takes about 5.4 sec on my mba. i expect that number to double since the current impl. |
What you've built on SAX (the stack, Wasabi::Matcher, etc) strike me as As for how to make all that matching performant, the holy grail I On 07/19/2012 07:07 AM, Daniel Harrington wrote:
|
saxophone looks nice and simple, but i would like to avoid pulling in another dependency for something this easy. |
If you need another big wsdl to test against you can try ExactTargets at: https://webservice.exacttarget.com/etframework.wsdl |
ok, so ... i have lots of things to tell you about this whole parser-thing, but right now, i can only give you a quick update. when i started this, parsing the economic wsdl took about 34 seconds on my machine. just ridiculous. here's the output from method_profiler before and after the refactoring: i'm quite satisfied with this, but i'm sure it could be even better. sadly, the code just sucks! |
if you could maybe point your gemfile to github master or otherwise test these changes, i'd really appreciate it! |
Ooh, this sounds awesome. Will give it a go tomorrow at work! On Mon, Apr 29, 2013 at 10:09 PM, Daniel Harrington <
|
@rubiii We're currently tied to Savon 0.9.5 because of another library that depends on that. Seems we can't update just Wasabi on its own, because Savon 0.9.5 wants Do you know off the top of your head if httpi 0.9 to 2.0 has any major breaking changes? |
@henrik not quite sure if the new httpi works with the old savon. could be that it depends on anything that changed, |
@henrik here's the changelog. |
Thanks! Oh yeah, the cookie stuff. We could and should upgrade, but I don't think it'd be fast or painless. I could try this out on another smaller project, though. |
@rubiii this sounds awesome, will check it out as soon as possible. Hopefully this means rconomic can finally upgrade to a newer savonrb. |
@koppen i would hope so as well. please let me know if there's anything i can do to help. |
Alright, tested in with my tiny On my Air: With latest Wasabi (3.1.0 / 9d190ed) and Savon (2.2.0 / 6a82e9b988f962f68e22979afbf41c31455867c6) and a test script that logs in and runs two requests, the best time I get is around 7 seconds all in all. With my marshalling patch (which is definitely not production ready), the best times I get are just over 2 seconds. Without either fix, the total run time is almost 40 seconds. So it seems to work fine with the requests I tried, and it's a huge speedup. Looking forward to getting to a point where we can use it. |
@henrik thanks cool. are you doing anything fancy in your test script? |
Nothing very fancy: https://gist.github.com/henrik/5488207 Not runnable without an e-conomic account, of course. |
@rubiii Oh yeah, the retrieving from URL bit might be it. The marshalled version will of course only do that once, whereas it will probably do it every time with your patch, but parse it fast after. Let me check that… |
With a local WSDL, the best times I get now are around 3–3.5 seconds. Just a few runs, no proper benchmark. With the marshalling instead, it's just under 2.5 seconds. So the difference is a lot smaller. And considering that the marshalling probably has a ton of issues, that's very promising. |
@henrik i'm working on cleaning up the type parsing right now, so that may or may not bring some performance improvements as well. there's a lots of code to improve 😉 not against caching, but it comes with a few new problems and i think we can actually get this thing even faster. |
What you have is plenty fast enough for me not to bother with the fragile marshalling stuff (and good riddance). Shaving off even more time would of course be great, but I'm very happy with what you have now :) On 30 apr 2013, at 14:05, Daniel Harrington notifications@github.com wrote:
|
@henrik cool 😄 |
This is looking really good, thanks so much @rubiii . A preliminary test with Savon 2.2 and running a spec from R-conomic:
This brings us into usable territory, let the upgrading commence. |
started to clean up schema type parsing and pushed the first step to master. the code still merges elements and complex types, ignores simple types and various other element types like group and choice, etc. which needs to be fixed, but it should be no different than before. and it exposes the same interface through the document class, so that it still works with savon. next step: improve the interface (document class), because it's hard to consume and the code that uses it is the worst code inside savon right now. it's also pretty slow for documents with many elements, because we're iterating over all the types at least 4 times until we have the final xml. i think i can cut that down to one iteration total. |
just noticed, that the "lazy parsing" probably doesn't speed up the overall time to request just yet, |
summed up everything i learned about this and pushed a lot of new code to master. i updated the changelog and the readme to reflect the changes. i had to change the public interface, because it did not match the real world, so i need to adjust savon and throw out a lot of bad code. not done here, but the new code already supports wsdl imports and xml schema imports are next on my todo list. this needs a lot more documentation and proper specs, but i'm very happy about it! ps. codeclimate is awesome :) |
Parsing https://www.e-conomic.com/secure/api1/EconomicWebservice.asmx?WSDL with Wasabi 1.0 takes ~0.3 sec:
The same with 2.1.0 takes ~7sec:
The text was updated successfully, but these errors were encountered: