Correct abortion of MR input-sending logic #408
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As of basho/riak_pipe@276e90c (released in Riak 1.1) using
riak_pipe:destroy/1
causes theriak_pipe_builder
process to exit with reasonnormal
instead of something abnormal. This means that the asynchronous MapReduce input sender spawned inriak_kv_mrc_pipe:send_inputs_async/3
no longer receives an exit signal, despite being linked to the builder. So, an input-sender might continue running long after its target pipe has disappeared. This will fill riak logs with messages of the form:This affects MR input types of explicit bucket-keys, and certain kinds of
modfun
inputs.Changing
riak_pipe_builder:destroy/1
to make the builder exit abnormally does not completely solve this case, because there are other points at which the builder decides to exit, but does so with reason 'normal' to avoid log spam from its supervisor.In addition, not all MapReduce users use
riak_kv_mrc_pipe:send_inputs_async/3
; some use:send_inputs/2,3
(non-async). These senders also need to notice that the pipe has vanished.This PR takes a two-pronged attack:
riak_kv_mrc_pipe:send_inputs/3
now check the return value of their calls toriak_pipe_vnode:queue_work/2
. When the pipe has closed, instead of returningok
,:queue_work
will return{error, [worker_startup_failed,...]}
. This error is raised for the endpoint to handle.riak_kv_wm_mapred
andriak_kv_pb_mapred
modules, handling HTTP and PB MapReduce requests, now explicitly kill the async sender process.Messages about "fitting was gone before startup" may still appear in the log, but they should be limited to one input's N-value worth (since each of the primary vnodes in the preflist must fail before the error is raised). In addition to being protection against holes in this strategy, the explicit kill added to the HTTP and PB endpoints should help reduce the number of spam log messages further.
@evanmcc and @jonmeredith should both be interested in this PR.