Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark suite #185

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Benchmark suite #185

wants to merge 10 commits into from

Conversation

immerrr
Copy link
Contributor

@immerrr immerrr commented Feb 27, 2015

This should close #184.

@immerrr immerrr force-pushed the benchmark-utility branch from abef3fa to a9f8f4b Compare March 2, 2015 12:13
@immerrr immerrr mentioned this pull request Mar 2, 2015
@immerrr immerrr force-pushed the benchmark-utility branch from 14d3414 to fe6752c Compare March 2, 2015 17:35
@@ -0,0 +1,31 @@
import SimpleHTTPServer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be better to use Twisted (or some other async framework) because SimpleHTTPServer is single-threaded, and Splash can download multiple resources in parallel. I wonder if it is important for tests, maybe not.

With Twisted we can also simulate conditions like non-responding servers, delayed responses, etc.

@kmike
Copy link
Member

kmike commented Mar 2, 2015

I think to fix tests it could be enough to add benchmarks here: https://github.com/scrapinghub/splash/blob/master/conftest.py

args = parser.parse_args()
logging.basicConfig(level=logging.DEBUG)

splash = SplashServer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to allow to set Splash server address from the command line. This way it'll be easier to compare performance of different versions of Splash, or performance of dockerized Splash (maybe several dockerized Splash'es which use different base operating systems and/or qt/... versions).

We should really allow to run the same benchmark against different Splash versions. If the benchmark can only be executed against current Splash version we can't just checkout to older Splash version and run a benchmark again - checking out old Splash could also change the benchmark version, so we'll run a different benchmark.

@immerrr immerrr force-pushed the benchmark-utility branch 2 times, most recently from 8b9cfea to 6e8744e Compare March 2, 2015 20:00
@immerrr immerrr changed the title WIP: benchmark suite Benchmark suite Mar 2, 2015
@immerrr
Copy link
Contributor Author

immerrr commented Mar 18, 2015

@kmike is there anything else that prevents merging this?

@kmike
Copy link
Member

kmike commented Mar 31, 2015

I can't get it work :)

  • splash.benchmark and splash.tests imports are undefined when you execute code from benchmarks folder because these packages are not distributed with Splash.
  • if splash.benchmark and splash.tests are added to packages in setup.py download_sites.py raises an error while trying to start proxy server (Splash starts fine with python -m splash.server):
[-] Starting factory <splash.proxy_server.SplashProxyServerFactory instance at 0x114109710>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/Users/kmike/envs/splash/lib/python2.7/site-packages/splash-1.4-py2.7.egg/splash/tests/mockserver.py", line 725, in <module>
    run(opts.http_port, opts.https_port, opts.proxy_port, not opts.quiet)
  File "/Users/kmike/envs/splash/lib/python2.7/site-packages/splash-1.4-py2.7.egg/splash/tests/mockserver.py", line 698, in run
    sslport = reactor.listenSSL(sslport_num, factory, ssl_factory())
  File "/Users/kmike/envs/splash/lib/python2.7/site-packages/splash-1.4-py2.7.egg/splash/tests/mockserver.py", line 673, in ssl_factory
    return ssl.DefaultOpenSSLContextFactory(pem, pem)
  File "/Users/kmike/envs/splash/lib/python2.7/site-packages/twisted/internet/ssl.py", line 104, in __init__
    self.cacheContext()
  File "/Users/kmike/envs/splash/lib/python2.7/site-packages/twisted/internet/ssl.py", line 113, in cacheContext
    ctx.use_certificate_file(self.certificateFileName)
OpenSSL.SSL.Error: [('system library', 'fopen', 'No such file or directory'), ('BIO routines', 'FILE_CTRL', 'system lib'), ('SSL routines', 'SSL_CTX_use_certificate_file', 'system lib')]
  • I'm not sure how I did it, but download_sites.py was working before I installed Splash, BUT it put nothing to splash/benchmark/sites/localhost_8806/ folder, so benchmark found zero websites.

I think we shouldn't distribute splash.benchmark with other Splash code (it shouldn't be installed by installing Splash). What does it take to move benchmarks folder to the root?

@immerrr immerrr self-assigned this Apr 6, 2015
immerrr added 10 commits April 6, 2015 21:51
- download_sites: fix encoding unconditionally if it is missing
- download_sites: add base/href only if it is missing
- download_sites: remove scripts unconditionally
- benchmark: specify pre-existing splash instance with --splash-server HOST:PORT
- add fileserver logs, write them to file (--logfile)
- put bench results into file (--out-file)
- silence requests.packages.urllib3.connectionpool logger
- fix cputime metric for preexisting splash instances
- add support for preexisting file server instance (--fileserver)
- add HTML endpoint benchmarks (--render-type html)
- make --sites-dir required
- dump output in proper JSON
@immerrr
Copy link
Contributor Author

immerrr commented Apr 6, 2015

it shouldn't be installed by installing Splash

if not, I'm not sure about organizing the code. The necessary subset of SplashServer functionality can probably be copy-pasted. But for the rest, there are three separate runnable scripts there: the downloader, the file server & the benchmark runner. they all use code from the file server module, so at least that module must be importable.

@immerrr
Copy link
Contributor Author

immerrr commented Apr 6, 2015

it worked ok for me in pip install -e . mode.

@immerrr immerrr force-pushed the benchmark-utility branch from 0939cff to f7a43da Compare April 6, 2015 20:08
chekunkov added a commit that referenced this pull request Aug 3, 2015
It respects #185 (not merged yet) and looks like a good place for benchmark notebook
sunu pushed a commit to sunu/splash that referenced this pull request Aug 5, 2015
It respects scrapinghub#185 (not merged yet) and looks like a good place for benchmark notebook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Basic benchmark suite
2 participants