You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenStack Zuul relies on Gearman to run jobs on a farm of 800 nodes each having 20000 functions. The massive amount of CAN_DO cause stress to the server. @jeblair went with an expansion of the Gearman protocol that let ones register multiple functions in a single MASS_DO command.
Their software is in python and the protocol expansion is handled via child class: openstack-infra/zuul@d437159 . For the worker:
This sounds awesome. I'm a zuul user myself and I can see how this could happen. Could you prepare a patch to the protocol docs and maybe a patch for the C/C++ server as well? Thanks!
Zuul 2.5 Ansible launcher registered ten of thousands of functions on
each node which, when done serially, took a while. To alleviate that
issue the Gear protocol had been extended with a custom MASS_DO packet
to register several functions in a single call (see d437159).
The Ansible launcher has been superseeded by the executor server
removing the sole use of MASS_DO. The extended Gear.Server had not been
cleaned up though.
Replace custom zuul.lib.gearserver.GearServer() with gear.Server() and
remove code.
For posterity, the MASS_DO idea is captured in Gearman upstream issue
tracker:
gearman/gearmand#6
Change-Id: Ifc57f9b7a17d1d9291a535eb0d9f5e1da3713241
OpenStack Zuul relies on Gearman to run jobs on a farm of 800 nodes each having 20000 functions. The massive amount of
CAN_DO
cause stress to the server. @jeblair went with an expansion of the Gearman protocol that let ones register multiple functions in a singleMASS_DO
command.Their software is in python and the protocol expansion is handled via child class: openstack-infra/zuul@d437159 . For the worker:
On the server side:
So that is merely
\x00
joining the payload of severalCAN_DO
packets in a single one. That saves you all the overhead of a packet handling.Would that be a feature that could be added to the reference Gearman protocol?
A bulk equivalent for
CAN_DO_TIMEOUT
could be of interest as well.The text was updated successfully, but these errors were encountered: