-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4714][CORE]: Check block have removed or not after got info lock in dropFromMemory #3574
Conversation
Can one of the admins verify this patch? |
if (!removedFromMemory && !removedFromDisk && !removedFromTachyon) { | ||
logWarning(s"Block $blockId could not be removed as it was not found in either " + | ||
"the disk, memory, or tachyon store") | ||
if (blockInfo.get(blockId).isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, this seems like the opposite of the condition we want here: if blockInfo doesn't contain an entry for blockId
, then why would we want to remove that entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ThreadA into RemoveBlock(), and got info for blockId1
- ThreadB into DropFromMemory(), and got info for blockId1
now Thread A, B all want got info.sychronized
B got, and drop block(blockId) from memory, and this block is use memory only, so it remove from blockinfo. and then release the lock.
then A got, but info is already not in blockinfo.
Did I have miss sth or misunderstand sth? may dropForMemory and removeBlock can't happen at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this extra check is necessary; check out my comment on the main pull request and see if you agree.
Even if we did want to add a check here, I think we want to check for if(blockInfo.get(blockId).nonEmpty)
, since this branch handles the case where blocks have not been removed already.
Can you elaborate on the problem that this PR addresses? Have you observed an actual error that this fixes? |
@JoshRosen B got, and drop block(blockId) from memory, and this block is use memory only, so it remove from blockinfo. and then release the lock. then A got, but info is already not in blockinfo. Did I have miss sth or misunderstand sth? may dropForMemory and removeBlock can't happen at the same time? |
Ah, I think I see your concern: let's say that we a block and there are two threads that are racing to perform operations on it: to use your example, thread A wants to call If we consider the case where all of thread A's steps execute before any of thread B's, then things work okay: thread A will have removed the entry from In another execution, though, both threads could find the
|
if (level.useOffHeap) { tachyonStore.remove(id) } | ||
iterator.remove() | ||
logInfo(s"Dropped block $id") | ||
if (blockInfo.get(id).isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I don't think that we strictly need a check here since the remove(id)
operations are idempotent. It might be nice to log a warning if the block has already been removed, but that might not be necessary since this is a background cleanup call.
0c1a249
to
55fa4ba
Compare
@JoshRosen |
if (blockInfo.get(blockId).isEmpty) { | ||
logWarning(s"Block $blockId was already dropped.") | ||
return None | ||
} else if(!info.waitForReady()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style nit: this needs a space after the if
and before the open paren: if (!info...
.
Jenkins, this is ok to test. |
Test build #24225 has started for PR 3574 at commit
|
Test build #24225 has finished for PR 3574 at commit
|
Test PASSed. |
@@ -1010,7 +1010,10 @@ private[spark] class BlockManager( | |||
info.synchronized { | |||
// required ? As of now, this will be invoked only for blocks which are ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment actually refers to the !info.waitForReady()
case, so I'd like to either move the comment or swap the order of these checks so that we check for blockInfo.get(blockId).isEmpty
in the else if
clause instead.
Left one minor code organization comment; aside from that, this looks good to me and should be ready to merge after you fix that up (I can do it if you don't have time, though; just let me know). There are a couple of edits that I'd like to make to the commit title / description before merging this, but I can do it myself on merge. Thanks for the careful analysis and for catching this issue! |
@JoshRosen I refine the code according your comments. |
Test build #24234 has started for PR 3574 at commit
|
Test build #24234 has finished for PR 3574 at commit
|
Test PASSed. |
LGTM. I like how this ended up evolving into a very small, focused fix. I'm going to fix up the title / commit message and merge this into Thanks! |
… has been removed after synchronizing on BlockInfo instance. After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed. The three methods have the same logic to get the `info` lock: ``` info = blockInfo.get(id) if (info != null) { info.synchronized { // do something } } ``` So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first. The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed. But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks. This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread. Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits: edb989d [hushan[胡珊]] Refine code style and comments position 55fa4ba [hushan[胡珊]] refine code e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn (cherry picked from commit 30dca92) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
… has been removed after synchronizing on BlockInfo instance. After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed. The three methods have the same logic to get the `info` lock: ``` info = blockInfo.get(id) if (info != null) { info.synchronized { // do something } } ``` So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first. The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed. But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks. This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread. Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits: edb989d [hushan[胡珊]] Refine code style and comments position 55fa4ba [hushan[胡珊]] refine code e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn (cherry picked from commit 30dca92) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
I've merged this into |
… has been removed after synchronizing on BlockInfo instance. After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed. The three methods have the same logic to get the `info` lock: ``` info = blockInfo.get(id) if (info != null) { info.synchronized { // do something } } ``` So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first. The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed. But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks. This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread. Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits: edb989d [hushan[胡珊]] Refine code style and comments position 55fa4ba [hushan[胡珊]] refine code e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn (cherry picked from commit 30dca92) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
[SPARK-4714][CORE]: Check block have removed or not after got info lock in dropFromMemory
After got info lock in one of them:
removeBlock
/dropOldBlocks
/dropFromMemory
method in BlockManager, the block thatinfo
represented may have already removed.The Three method have the same logic to got info lock:
So it is chanced that when one thread got
info.synchronized
but it already removed from other threads who gotinfo.synchronized
before him.In
removeBlock
anddropOldBlocks
is idempotent if they not haveblockinfo.get(blockId) != null
But in
dropFromMemory
it may be problematic for it may drop block data(which already removed) into diskstore, this method calls data store operations that should not handle missing blocks.