-
Notifications
You must be signed in to change notification settings - Fork 32
Conversation
- Use unshare -fp to enforce all PIDs are killed by timeout - Wrap xvfb-run under timeout as well - Move timeout to constants.jl - Fallback to gtimeout or timeout on other platforms
Hmm, now that I think of this, you'd still need an exception for platforms without unshare :/ |
src/preptest.jl
Outdated
@@ -7,10 +7,8 @@ | |||
# See description in scripts/setup.sh for the purpose of this file | |||
####################################################################### | |||
|
|||
using Compat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can rely on Compat being present here since this runs inside the VM. For now I'm not running PkgEval on anything other than Linux so maybe just leave the platform generalizations commented out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. I was adding a :FREEZE
special exception for !is_linux().
Should I give up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about it for now. I'll stop running this on 0.4 sooner than I start running it on non-Linux platforms, so it'll be a little simpler then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though if you want to put some commented-out things in now while you're looking at it, I wouldn't mind and it'll jog my memory a bit later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's much cleaner without the exceptions and it's rather trivial to see why it would fail.
When we enable OSX, I'd go for what I wrote in the original comment: wrap both julia + tee in the timeout, so that even though a process might escape the kill, at least tests do not stop.
If this works, looking at constants.jl I see similar comments for SystemImageBuilder: "Freezes PkgEval". I wonder if this PR fixes that freeze as well. |
Were you able to do any tests with this? |
No, I haven't. Things were timing out for other reasons last week. |
Ran this locally and it didn't freeze. Will compare the results to what master gets and see if any issues. What this is doing is a bit complicated and I don't entirely follow it, but guess it helps. |
On Sun, Apr 23 2017, Tony Kelman wrote:
Ran this locally and it didn't freeze. Will compare the results to
what master gets and see if any issues. What this is doing is a bit
complicated and I don't entirely follow it, but guess it helps.
The basic issue is that as long as there is something holding an handle
to stdout, "tee" will not terminate (causing the tests to hang).
The idea is to ensure we kill *any* descendant process of "timeout",
*even* when they double-fork/detach, which is normally not possible.
unshare -fp creates a new PID namespace, so that when processes get
reparented, instead of becoming children of init, they become children
of unshare itself. When unshare receives sigkill, it will correctly
terminate any descendant.
Does this help?
The rest is just plumbing.
|
On Sun, Apr 23 2017, Tony Kelman wrote:
Ran this locally and it didn't freeze.
As an additional test, you should pin Expect.jl to an old version
(0.2.0), as the new version is "allegedly" fixed ;)
|
That included the run on julia 0.4. Sure, that's a nicely detailed technical explanation of what's going on. But it's getting into some messy, relatively obscure details that I am just not all that familiar with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all the failures were transient or unrelated to this change.
I'm not sure what was different between my testing of this and when it actually ran after merging it, but I'm getting |
On Wed, Apr 26 2017, Tony Kelman wrote:
I'm not sure what was different between my testing of this and when it actually ran after merging it, but
I'm getting unshare: invalid option -- 'f'
Old version of util-linux ?
I did my tests on ubuntu 14.04.05 and it did support -f.
Maybe it supports the long forms --fork --pid
|
Well, rechecking now from stratch, 14.04.05 server uses util-linux 2.20, which is too old. |
No specific plan. I'll do it at some point, but wasn't in a hurry since I think 14.04 has a bit wider test coverage than 16.04 across packages, at least for the time being. Maybe later in the year? |
On Fri, May 05 2017, Tony Kelman wrote:
No specific plan. I'll do it at some point, but wasn't in a hurry since I think
14.04 has a bit wider test coverage than 16.04 across packages, at least for the
time being. Maybe later in the year?
I'm fine with it, I just fear I'll forget to bring it up again :)
|
Just reminded myself of this. I know some images got updated on travis, but I didn't pay full attention to the churn of changes. |
This is another take at re-enabling testing on Expect+Polyglot the-right-way™.
See #158 (comment)
In short: test freeze because some children of the julia process are not killed by timeout, keeping tee alive indefinitely. Instead of skipping the tests or killing tee, we ensure any process spawned by julia cannot escape SIGKILL.
I also moved the xvfb invocation in the timeout as well. Since we need to know the current running platform, I removed the TIMEOUTPATH and instead added a global constant for the actual timeout value, which seems more appropriate.
I tested this on a fresh trusty VM and on debian sid, but I cannot run the entire vagrant machinery from scratch here (slow network connection). I apologize if something is missing.