-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4772] Clear local copies of accumulators as soon as we're done with them #3570
Conversation
Test build #24074 has started for PR 3570 at commit
|
Test build #24074 has finished for PR 3570 at commit
|
Test PASSed. |
Hi @nkronenfeld, Thanks for this PR. These sorts of resource leakage issues can be tricky to debug, so thanks for spotting this. It would be great to file a dedicated JIRA for the memory-leak reported here. I find the current |
…readLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task.
Test build #24189 has started for PR 3570 at commit
|
Test build #24189 has finished for PR 3570 at commit
|
Test FAILed. |
…ncompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark.
Test build #24195 has started for PR 3570 at commit
|
Test build #24195 has finished for PR 3570 at commit
|
Test PASSed. |
The MiMa failure is surprising, since that class was marked as |
Should I back out the correction to the mima failure? |
No, I'd leave it. I just thought I'd mention it so that we eventually On Fri, Dec 5, 2014 at 7:27 PM, Nathan Kronenfeld notifications@github.com
|
great... I think outside the mima issue, it should be all set, unless I can figure out a way to unit test it. So far, my best methods of testing it involve instrumenting the code in ways I shouldn't check in. |
oh, a note for when you're reviewing - I didn't move the clear call, I just added a second one; I saw no particular harm in leaving the old one there too, just in case, but I can't see it doing all that much anymore - it should always be a no-op now. I'd be happier removing it if, again, I could figure out a good unit test to make sure all was functioning properly when I did so. But I would be totally open to removing it in the interests of code cleanliness if you want. |
Any word on this? |
val localAccums = Map[Thread, Map[Long, Accumulable[_, _]]]() | ||
val localAccums = new ThreadLocal[Map[Long, Accumulable[_, _]]]() { | ||
override protected def initialValue() = Map[Long, Accumulable[_, _]]() | ||
} | ||
var lastId: Long = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to your changes and I don't expect you to fix it, but this could be an AtomicInteger
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the lastId?
That should only ever get used on the client - it's only called from the constructor of an individual accumulator, and if someone is creating one of those on a worker, they're already in trouble - so it should be ok as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was just observing that this is only read through the newIdI()
method and that it's effectively being used like an AtomicInteger. Just another example of how this particular part of the code is kind of old / out-of-sync with the style of the rest of the codebase. Don't worry about it; we can do a larger cleanup pass on this later.
This looks good to me. If you don't mind, could you update the pull request description to more accurately describe the change that we're actually committing? This is important because that description will become the actual commit message. Also, it looks like the MiMa issue could have been caused by |
LGTM, nice catch. |
To fix the MiMA problem, can you instead make Accumulators a private[spark] object? No one I've asked seems to understand what "private" even means in this context -- private[spark] describes the desired semantics (based on my understanding), and doing that also removes the need for the MiMA exception (or at least it did for #3622) |
comment fixed. I'm trying to test the MiMa related changes to see if they work, and having problems running mima on my machine. I'll probably just push them in the suggested form to see if they pass on Jenkins. |
…to get around false positive in mima tests
Test build #24274 has started for PR 3570 at commit
|
@kayousterhout I'm glad to see that changing it to |
Test build #24274 has finished for PR 3570 at commit
|
Test PASSed. |
LGTM. Just in case you missed my earlier comment, are you still planning to update the PR description to reflect the actual changes vs. the ones you had planned? |
I thought I'd done so, it looks like it lost my changes |
sorry, must have accidentally hit cancel instead of comment the first time. Should be set now. |
Thanks for updating the description. This looks good to me, so I'm going to merge this into |
… with them Accumulators keep thread-local copies of themselves. These copies were only cleared at the beginning of a task. This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker. This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks. It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up. Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com> Closes #3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits: a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark. 537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task. 39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them (cherry picked from commit 94b377f) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/Accumulators.scala core/src/main/scala/org/apache/spark/executor/Executor.scala
… with them Accumulators keep thread-local copies of themselves. These copies were only cleared at the beginning of a task. This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker. This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks. It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up. Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com> Closes #3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits: a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark. 537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task. 39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them (cherry picked from commit 94b377f) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/Accumulators.scala core/src/main/scala/org/apache/spark/executor/Executor.scala
… with them Accumulators keep thread-local copies of themselves. These copies were only cleared at the beginning of a task. This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker. This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks. It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up. Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com> Closes #3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits: a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark. 537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task. 39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them (cherry picked from commit 94b377f) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
I've merged this into |
Accumulators keep thread-local copies of themselves. These copies were only cleared at the beginning of a task. This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker.
This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks. It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up.