Skip to content
This repository has been archived by the owner on Nov 24, 2020. It is now read-only.

Use timeout+unshare on linux #160

Merged
merged 4 commits into from
Apr 26, 2017
Merged

Conversation

wavexx
Copy link
Contributor

@wavexx wavexx commented Apr 15, 2017

This is another take at re-enabling testing on Expect+Polyglot the-right-way™.
See #158 (comment)

In short: test freeze because some children of the julia process are not killed by timeout, keeping tee alive indefinitely. Instead of skipping the tests or killing tee, we ensure any process spawned by julia cannot escape SIGKILL.

I also moved the xvfb invocation in the timeout as well. Since we need to know the current running platform, I removed the TIMEOUTPATH and instead added a global constant for the actual timeout value, which seems more appropriate.

I tested this on a fresh trusty VM and on debian sid, but I cannot run the entire vagrant machinery from scratch here (slow network connection). I apologize if something is missing.

wavexx added 3 commits April 15, 2017 17:24
- Use unshare -fp to enforce all PIDs are killed by timeout
- Wrap xvfb-run under timeout as well
- Move timeout to constants.jl
- Fallback to gtimeout or timeout on other platforms
@wavexx
Copy link
Contributor Author

wavexx commented Apr 15, 2017

Hmm, now that I think of this, you'd still need an exception for platforms without unshare :/

src/preptest.jl Outdated
@@ -7,10 +7,8 @@
# See description in scripts/setup.sh for the purpose of this file
#######################################################################

using Compat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can rely on Compat being present here since this runs inside the VM. For now I'm not running PkgEval on anything other than Linux so maybe just leave the platform generalizations commented out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. I was adding a :FREEZE special exception for !is_linux().
Should I give up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about it for now. I'll stop running this on 0.4 sooner than I start running it on non-Linux platforms, so it'll be a little simpler then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though if you want to put some commented-out things in now while you're looking at it, I wouldn't mind and it'll jog my memory a bit later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's much cleaner without the exceptions and it's rather trivial to see why it would fail.
When we enable OSX, I'd go for what I wrote in the original comment: wrap both julia + tee in the timeout, so that even though a process might escape the kill, at least tests do not stop.

@wavexx
Copy link
Contributor Author

wavexx commented Apr 15, 2017

If this works, looking at constants.jl I see similar comments for SystemImageBuilder: "Freezes PkgEval". I wonder if this PR fixes that freeze as well.

@wavexx wavexx mentioned this pull request Apr 16, 2017
@wavexx
Copy link
Contributor Author

wavexx commented Apr 20, 2017

Were you able to do any tests with this?

@tkelman
Copy link
Contributor

tkelman commented Apr 20, 2017

No, I haven't. Things were timing out for other reasons last week.

@tkelman
Copy link
Contributor

tkelman commented Apr 22, 2017

Ran this locally and it didn't freeze. Will compare the results to what master gets and see if any issues. What this is doing is a bit complicated and I don't entirely follow it, but guess it helps.

@wavexx
Copy link
Contributor Author

wavexx commented Apr 23, 2017 via email

@wavexx
Copy link
Contributor Author

wavexx commented Apr 23, 2017 via email

@tkelman
Copy link
Contributor

tkelman commented Apr 23, 2017

That included the run on julia 0.4.

Sure, that's a nicely detailed technical explanation of what's going on. But it's getting into some messy, relatively obscure details that I am just not all that familiar with.

Copy link
Contributor

@tkelman tkelman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like all the failures were transient or unrelated to this change.

@tkelman tkelman merged commit 3ca991f into JuliaCI:master Apr 26, 2017
tkelman added a commit that referenced this pull request Apr 26, 2017
@tkelman
Copy link
Contributor

tkelman commented Apr 26, 2017

I'm not sure what was different between my testing of this and when it actually ran after merging it, but I'm getting unshare: invalid option -- 'f'

@wavexx
Copy link
Contributor Author

wavexx commented Apr 26, 2017 via email

@wavexx
Copy link
Contributor Author

wavexx commented May 5, 2017

Well, rechecking now from stratch, 14.04.05 server uses util-linux 2.20, which is too old.
I'm now sure it's worth implementing an alternative here. unshare -f is the way to go for proper test isolation.
Is there a plan to move to the next LTS?

@tkelman
Copy link
Contributor

tkelman commented May 5, 2017

No specific plan. I'll do it at some point, but wasn't in a hurry since I think 14.04 has a bit wider test coverage than 16.04 across packages, at least for the time being. Maybe later in the year?

@wavexx
Copy link
Contributor Author

wavexx commented May 5, 2017 via email

@wavexx
Copy link
Contributor Author

wavexx commented Sep 14, 2017

Just reminded myself of this.
@tkelman would it be now time to reconsider this?

I know some images got updated on travis, but I didn't pay full attention to the churn of changes.
We'd just need to test for the availability of unshare -f.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants