-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark suite #185
base: master
Are you sure you want to change the base?
Benchmark suite #185
Conversation
abef3fa
to
a9f8f4b
Compare
14d3414
to
fe6752c
Compare
@@ -0,0 +1,31 @@ | |||
import SimpleHTTPServer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be better to use Twisted (or some other async framework) because SimpleHTTPServer is single-threaded, and Splash can download multiple resources in parallel. I wonder if it is important for tests, maybe not.
With Twisted we can also simulate conditions like non-responding servers, delayed responses, etc.
I think to fix tests it could be enough to add benchmarks here: https://github.com/scrapinghub/splash/blob/master/conftest.py |
args = parser.parse_args() | ||
logging.basicConfig(level=logging.DEBUG) | ||
|
||
splash = SplashServer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to allow to set Splash server address from the command line. This way it'll be easier to compare performance of different versions of Splash, or performance of dockerized Splash (maybe several dockerized Splash'es which use different base operating systems and/or qt/... versions).
We should really allow to run the same benchmark against different Splash versions. If the benchmark can only be executed against current Splash version we can't just checkout to older Splash version and run a benchmark again - checking out old Splash could also change the benchmark version, so we'll run a different benchmark.
8b9cfea
to
6e8744e
Compare
@kmike is there anything else that prevents merging this? |
I can't get it work :)
I think we shouldn't distribute splash.benchmark with other Splash code (it shouldn't be installed by installing Splash). What does it take to move benchmarks folder to the root? |
- download_sites: fix encoding unconditionally if it is missing - download_sites: add base/href only if it is missing - download_sites: remove scripts unconditionally - benchmark: specify pre-existing splash instance with --splash-server HOST:PORT
- add fileserver logs, write them to file (--logfile) - put bench results into file (--out-file) - silence requests.packages.urllib3.connectionpool logger - fix cputime metric for preexisting splash instances
- add support for preexisting file server instance (--fileserver) - add HTML endpoint benchmarks (--render-type html) - make --sites-dir required - dump output in proper JSON
if not, I'm not sure about organizing the code. The necessary subset of SplashServer functionality can probably be copy-pasted. But for the rest, there are three separate runnable scripts there: the downloader, the file server & the benchmark runner. they all use code from the file server module, so at least that module must be importable. |
it worked ok for me in |
0939cff
to
f7a43da
Compare
It respects #185 (not merged yet) and looks like a good place for benchmark notebook
It respects scrapinghub#185 (not merged yet) and looks like a good place for benchmark notebook
This should close #184.