-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async batch #50546
Async batch #50546
Conversation
@moio JFYI and, of course, ideas are welcome. |
What is the value of |
@moio it is the same value, 10s.
Something happens at some point and the test.ping returns are not coming anymore and then when the batching starts the batch job returns come and only after batch finishes the initial ping returns appear. |
Do the pings reach the minions in the first place? What does |
73d7089
to
6f2376a
Compare
|
9b937a3
to
293aded
Compare
@moio @cachedout I have updated my PR. NOTE: the async batching does not include the ssh-minions (we have to see how this can be implemented) NOTE: there is still some cleanup to be done but I wanted to get some opinions as soon as possible (that's why I removed the WIP) |
53a105e
to
2ac69c9
Compare
@dincamihai Can we start with having you please fix these lint errors? https://jenkinsci.saltstack.com/job/pr-lint/job/PR-50546/13/warnings52Result/new/file.-1608698382/ |
02360ab
to
df155a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dincamihai, this looks good!
I noted some questions/doubts inline. Please pardon my Python inexperience, some points might be trivial or downright wrong.
Apart from them and from tests which I understand being written, the main conceptual question is actually about the following note in the PR description:
The response is empty (please ignore that for now) but it basically looks like this: {"return": [{}]}
What the caller is interested in, after the call is placed, is the list of minions expected to return (those that responded in time to the initial ping and that are thus included in the batch queue).
I can see two ways to address this:
- we make the initial ping part blocking, thus the response could contain a minion list (and still proceed asynchronously for the bulk of the batch)
- we return nothing/
true
/any other dummy value, and then provide a way to get to this list (I can presently only imagine via an event)
Frankly speaking, looking at current Uyuni needs, 1. would fit better. Still I will not hide the downside: any request could take up to gather_job_timeout
seconds to return (gather_job_timeout
seconds being the worst case that would happen when at least one targeted minion takes more than gather_job_timeout
to respond, or is down).
How difficult is it to implement one or the other approach, or even even making it configurable?
Do you see aspects I am forgetting about in this discussion?
Do Salt maintainers have any specific remark?
Thanks again for all the efforts here!
b08fd5e
to
2f5da54
Compare
7112e88
to
616f589
Compare
- start batching immediatelly after all minions reply to ping - return jid, available minions and missing minions in the response - fire event when batching start and when it ends
3d613a2
to
0935932
Compare
Just to cover my bases, I am good with this PR. @DmitryKuzmenko won't be back until Monday, and I would still like him to take a look. But tests are passing and I think this is good to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this async approach! 👍
What does this PR do?
Implementation of RFC#0002
What issues does this PR fix or reference?
When doing a request to /run endpoint with the following payload:
A response is returned immediately. The response is empty (please ignore that for now) but it basically looks like this: https://github.com/saltstack/salt/blob/7f6e5d89a48bccf19cb45ed52a3a54f5cc53f400/salt/master.py#L2070-L2077
After returning the response, the batch executes a test.ping. Once the test.ping is done an event is published
salt/batch/<batch-jid>/start
with the following data: https://github.com/saltstack/salt/blob/7d4cae95f3fd54f10ad79544d4c9a2074ab11ed4/salt/cli/batch_async.py#L190-L194When the batch finishes, it fires
salt/batch/<batch-jid>/done
event: https://github.com/saltstack/salt/blob/7d4cae95f3fd54f10ad79544d4c9a2074ab11ed4/salt/cli/batch_async.py#L199-L205Previous Behavior
The response is only returned after the batch job was executed.
New Behavior
The response is returned immediately and the batch job continues to run as tornado coroutines.
Tests written?
TODO