Travis: Rewrite config, build and test also on Alpine Linux (musl libc) #1255

jirutka · 2017-07-28T01:22:08Z

Refactores .travis.yml to use Travis Build Stages. This allows to define jobs with very different configuration in more clean way.
Replaces sudo apt-get with APT addon. Since sudo is not needed anymore, it runs on Travis containers-based infrastructure that is much faster than their VMs infrastructure (used when sudo is needed).
Adds jobs that build and test OpenBLAS on Alpine Linux v3.6 (inside chroot), i.e. verify that it works with musl libc. I hope that this will help to avoid errors like Building v0.2.20 fails on non-glibc systems #1252 in future.

You’ve been still running on Ubuntu Presty builders, but new default is Trusty (read more). Thus I‘ve explicitly set dist: presty to let it stay on Presty, to not change build environment by this commit.

Alpine jobs needs sudo (for chroot), so they run on VMs infrastructure. That’s why they are much slower than other jobs.

jirutka · 2017-07-28T01:26:38Z

~~Travis killed one job due to too long output, because gcc generates tons of warnings on your code…~~ (EDIT: I was wrong, it actually segfaulted)

brada4 · 2017-07-28T08:03:39Z

How much warnings are too much? Will halving their volume help?

martin-frbg · 2017-07-28T08:23:48Z

Jobs being killed because of overly long output has not been a problem with the existing travis setup as far as I know - is this some consequence of your build environment (different compiler version etc) or did you use additional CFLAGS ? BTW the oversight from #1252 would easily have been caught with a judiciuosly placed #undef __GLIBC_PREREQ, in my opinion the main problem was that a misunderstanding between xianyi and me led to it being merged right before the release when I wanted to have it on hold until later.

jirutka · 2017-07-28T11:01:06Z

How much warnings are too much? Will halving their volume help?

Eh, any warnings? gcc doesn’t print them just for fun… But about Travis, it has limit 4 MiB for output on stdout+stderr per job.

The right solution is to fix code, possible workaround is to turn off some of these warnings.

Jobs being killed because of overly long output has not been a problem with the existing travis setup as far as I know - is this some consequence of your build environment (different compiler version etc) or did you use additional CFLAGS ?

When you look at some of your past jobs, for example this one, you’re already on the edge of the limit:

This log is too long to be displayed. Please reduce the verbosity of your build or download the raw log.

Preparing Alpine environment adds some more text into the log ~~and it just flipped over the limit~~ (EDIT: no, it segfaulted) (on Alpine job with USE_OPENMP=1). You will run into this issue sooner or later even with your existing jobs.

BTW the oversight from #1252 would easily have been caught with a judiciuosly placed #undef __GLIBC_PREREQ…

Of course, as all possible errors… It happened and got into the release without notice because you don’t regularly test against musl libc. That’s why I opened this PR to prevent such mistakes in future.

martin-frbg · 2017-07-28T11:27:32Z

From a quick look at the failed job, a big number of them appear to be the recently introduced gcc indentation warnings, complaints about potentially uninitialized variables that would need to be addressed in netlib lapack and a bunch of harmless pointer type issues that I believe may even be platform-specifc. Suggest to turn off at least the indentation warnings for now.

7heo · 2017-07-28T15:14:24Z

"[...] harmless pointer [...]" — @martin-frbg

Best joke ever. 🤣 Thanks for making my Friday mate.

martin-frbg · 2017-07-28T15:20:41Z

@7heo care to prove otherwise ? These superficially mismatching casts have been there forever but there are real problems elsewhere in the code that need fixing.

7heo · 2017-07-28T15:21:33Z

@martin-frbg memory management is tricky and shouldn't be dealth with lightly. the least you can do is cast your stuff explicitely so it reads what it actually does.

jirutka · 2017-07-28T15:48:12Z

I was wrong, the failing job (musl, USE_OPENMP=1) hasn’t been killed for too long output, it segfaulted in tests.*

https://travis-ci.org/xianyi/OpenBLAS/jobs/258570695#L6743:

...
 cblas_ctrmm  PASSED THE TESTS OF ERROR-EXITS
 cblas_ctrmm  PASSED THE COLUMN-MAJOR COMPUTATIONAL TESTS (  2592 CALLS)

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f53526ef3ef in ???
#1  0x7f53526ee9e5 in ???
#2  0x7f5352a00f85 in ???
Segmentation fault (core dumped)
make[1]: *** [Makefile:79: all3] Error 139
make: *** [Makefile:119: tests] Error 2

* Travis marks failed job with a cross when some command in script exited with non-zero status or with exclamation mark when some error occurred. Since the log output exceeded size for being displayed on web page (“This log is too long to be displayed.”), I just quickly assumed that it exceeded 4MiB limit.

jirutka · 2017-07-28T15:57:42Z

We set USE_OPENMP=0 in Alpine package due to gcc bug 60035. I don’t know if it’s still relevant though. I’m gonna mark this particular job as allowed to fail, I’m not really keen to investigate what exactly is wrong.

7heo · 2017-07-28T15:58:19Z

I'm gonna quote Natanael Copa's FOSDEM presentation (first slide):

The good things with musl

[...]

musl tries to be (annoyingly) correct

when patch is needed, code gets better

[...]

jirutka · 2017-07-28T18:36:58Z

Your existing job with USE_OPENMP=1 sometimes fails even on Ubuntu Precise, but with different error, see https://travis-ci.org/xianyi/OpenBLAS/jobs/258662730. It may be related to the change from VM to container environment (read #1255 (comment)); you can switch it back to VM (i.e. exactly the same environment as you currently have) with sudo: true on global or job level. However, it IMHO just confirms that there’s surely something wrong in the code…

I’ve also added job with NO_AFFINITY=1 which reveals bug #1252 and confirms that it’s not fixed in develop branch (#1249).

The job building/testing OpenBLAS with USE_OPENMP=1 in Alpine environment (i.e. musl libc) is marked as “allowed to fail”, i.e. it’s failure does not affect result of whole build (you can see it here).

I consider this PR finished and ready to merge. CI failure on this PR is not a problem of this PR, but existing problems.

martin-frbg · 2017-07-28T21:54:34Z

ctrmm issue may be the same issue as #1089 (worked around in #1254), at least that test triggers the same debug printf (in gbmv_thread.c) in my working copy. Will retest when I am more awake.

brada4 · 2017-07-28T23:28:33Z

@jirutka - your finger-pointing does not help. It is your code and you are a developer by modifying source files.
trmm test happens to be last in row, corruption can happen anywhere before. blank backtrace is of no value, even if coming from semantically picky library

jirutka · 2017-07-29T00:20:25Z

@brada4 What finger-pointing? What are you talking about? I'm not OpenBLAS developer, I've only contributed this PR, in a hope to prevent future breakages of compatibility with non-glibc systems.

brada4 · 2017-07-29T07:47:09Z

musl does not support complex numbers. Plese be so kind and generate backtrace with function labels or work around musl shortage yourself.

martin-frbg · 2017-07-29T07:57:42Z

@brada4 by all means let's look into that, but as far as I know OpenBLAS will work around lack of support for complex numbers by creating structs etc (which is where I believe the pointer warnings are from). However the ctrmm issue appears to be real and not limited to musl - will be interesting to learn if #1254 fixes it.

brada4 · 2017-07-29T08:13:31Z

Slightly worse - http://wiki.musl-libc.org/wiki/Mathematical_Library#Complex - probably needs some ifdef wrapper to force struct complex.

xianyi · 2017-08-01T06:47:36Z

Hi all,

Because I access travis-ci very slow, I cannot see the full log in travis-ci. I see some error case in travis-ci. Is it expected?

martin-frbg · 2017-08-01T08:24:46Z

One is "ld" getting killed with an actual SIGKILL - this is probably what jirutka attributed to travis aborting when the log gets too big (tons of warnings about implicit float/double conversions in LAPACK etc). The next is "my" cgroups patch breakage (musl does not define __GLIBC_PREREQ) so should hopefully go away with #1257. Guess I should take the courage to merge that now, and re-basing this PR afterwards would probably be sufficient to trigger a re-run of the travis checks. (With the xBMV overrun fix already merged, there should be no test failures remaining)

martin-frbg · 2017-08-01T11:00:16Z

Hmm. Still getting segmentation faults in the run you had marked as "allowed to fail" despite #1254, wonder if it would be possible to generate a backtrace for that.
All activity on the gcc/libgomp bug ticket you mentioned seems to have ceased in late 2014 - I wonder if
the pitfalls of the combination of fork and OMP in a gcc-built program should be documented in the wiki somewhere ?

martin-frbg · 2017-08-01T17:51:40Z

@jirutka re-enable the "allow_failures" if you like - I'm not sure if the commit would end up getting credited to me if I was the last developer to meddle with it. It seems I botched at least parts of the xbmv fix anyway, introducing accuracy errors.Perhaps I need to stop coding for a while until things settle down for me. At least the GLIBC_PREREQ problem seems to be gone now.

jirutka · 2017-08-01T17:55:05Z

Because I access travis-ci very slow, I cannot see the full log in travis-ci.

Travis is very slow in rendering very long build log. However, you can view it also in raw (as plain text), that’s fast. Just click on “Raw log” above the rendered log (on job page).

If you can’t get there due to slow JS, you can also access it directly. On build page, look at link of particular job, e.g. https://travis-ci.org/xianyi/OpenBLAS/jobs/259694421. Then download https://api.travis-ci.org/jobs/<ID>/log.txt?deansi=true with curl or web browser, e.g. https://api.travis-ci.org/jobs/259694421/log.txt?deansi=true.

Using APT addon has nice side-effect - you don't need sudo anymore, so it can run on Travis containers-based infrastructure that is much faster than their VMs infrastructure (used when sudo is needed). You've been still running on Ubuntu Presty builders, but new default is Trusty. Thus I've explicitly set `dist: presty` to let it stay on Presty, to not change build environment by this commit.

Alpine jobs needs sudo (for chroot), so they run on VMs infrastructure. That's why they are much slower than other jobs.

See: #1255 (comment)

jirutka · 2017-08-01T21:05:33Z

wonder if it would be possible to generate a backtrace for that.

I’ve prepared you a VM with Alpine and prepared setup for building OpenBLAS, but GitHub doesn’t return any SSH keys for your account https://github.com/martin-frbg.keys. I don’t understand why, this should work for all accounts. o.O

…re-enable the "allow_failures" if you like…

I’ve removed your commit disabling allow_failures and rebased this PR against the current develop branch. All jobs except LINUX64_MUSL USE_OPENMP=1 (the one “allowed to fail”) now pass!

BTW It didn’t make sense to disable it, the point of allow_failures is that Travis runs the listed jobs, correctly report the job status, just doesn’t mark whole build as failed when some of allow_failures job fail.

martin-frbg · 2017-08-01T21:34:45Z

probably because I do not have a local repository linked to my github account, strange as it may sound I still upload changed files in the browser or use the primitive editor on github for changes.
thought as much but just for triggering a rerun of travis this change seemed as good as any (and whitespace changes or deleting and restoring a character did not register for good reason)
(3 seems my panic revert of 1254 was overkill, as it looks now it is a single mistake in gbmv_thread.c
that caused the sudden "loss of precision" errors. My plan for now is to merge your PR tomorrow (if nobody complains), then push the fixed 1254PR and see if it makes the "allowed failure" case pass.

martin-frbg · 2017-08-02T12:17:51Z

Key added now, thanks for setting up the VM.
The "allowed failure" in cblas_ctrmm appears to be unrelated to the gbmv errors, helgrind shows lots of thread races apparently originating from missing locks in blas_server_omp. Unfortunately valgrind stumbles over the inline assembly when building for HASWELL, a NEHALEM build raises no issues (on a
glibc system) that would explain an eventual segfault. Edit: helgrind is known to be confused by libgomp, so that one seems to be a red herring as well.

jirutka · 2017-08-06T09:00:15Z

Hm, I can’t reproduce the error in LINUX64_MUSL USE_OPENMP=1 ctest on the VM or my building machine. It seems that it’s somehow related to Travis environment, maybe kernel version or hardware?

brada4 · 2017-08-06T18:17:08Z

Was it chroot inside Ubuntu LTS or ???

jirutka mentioned this pull request Jul 28, 2017

Honor cgroup/cpuset limits when enumerating cpus #1249

Merged

pbregener mentioned this pull request Jul 30, 2017

WIP: community/openblas: upgrade to 0.2.20 alpinelinux/aports#2007

Closed

jirutka added 4 commits August 1, 2017 21:52

Travis: Build and test also on Alpine Linux (musl libc)

e0bd5b5

Alpine jobs needs sudo (for chroot), so they run on VMs infrastructure. That's why they are much slower than other jobs.

Travis: Disable some gcc warnings to avoid exceeding Travis limit

08c7d1d

See: #1255 (comment)

Travis: Allow job LINUX64_MUSL USE_OPENMP=1 to fail

486a485

See: #1255 (comment)

martin-frbg merged commit 53aee36 into OpenMathLib:develop Aug 2, 2017

martin-frbg mentioned this pull request Aug 4, 2017

Support compiling with clang on windows #1256

Merged

jirutka deleted the travis branch August 6, 2017 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Travis: Rewrite config, build and test also on Alpine Linux (musl libc) #1255

Travis: Rewrite config, build and test also on Alpine Linux (musl libc) #1255

jirutka commented Jul 28, 2017

jirutka commented Jul 28, 2017 •

edited

Loading

brada4 commented Jul 28, 2017

martin-frbg commented Jul 28, 2017

jirutka commented Jul 28, 2017 •

edited

Loading

martin-frbg commented Jul 28, 2017 •

edited

Loading

7heo commented Jul 28, 2017

martin-frbg commented Jul 28, 2017

7heo commented Jul 28, 2017

jirutka commented Jul 28, 2017

jirutka commented Jul 28, 2017

7heo commented Jul 28, 2017 •

edited

Loading

The good things with musl

jirutka commented Jul 28, 2017 •

edited

Loading

martin-frbg commented Jul 28, 2017

brada4 commented Jul 28, 2017

jirutka commented Jul 29, 2017

brada4 commented Jul 29, 2017

martin-frbg commented Jul 29, 2017

brada4 commented Jul 29, 2017

xianyi commented Aug 1, 2017

martin-frbg commented Aug 1, 2017

martin-frbg commented Aug 1, 2017

martin-frbg commented Aug 1, 2017

jirutka commented Aug 1, 2017

jirutka commented Aug 1, 2017

martin-frbg commented Aug 1, 2017

martin-frbg commented Aug 2, 2017 •

edited

Loading

jirutka commented Aug 6, 2017

brada4 commented Aug 6, 2017

Travis: Rewrite config, build and test also on Alpine Linux (musl libc) #1255

Travis: Rewrite config, build and test also on Alpine Linux (musl libc) #1255

Conversation

jirutka commented Jul 28, 2017

jirutka commented Jul 28, 2017 • edited Loading

brada4 commented Jul 28, 2017

martin-frbg commented Jul 28, 2017

jirutka commented Jul 28, 2017 • edited Loading

martin-frbg commented Jul 28, 2017 • edited Loading

7heo commented Jul 28, 2017

martin-frbg commented Jul 28, 2017

7heo commented Jul 28, 2017

jirutka commented Jul 28, 2017

jirutka commented Jul 28, 2017

7heo commented Jul 28, 2017 • edited Loading

The good things with musl

jirutka commented Jul 28, 2017 • edited Loading

martin-frbg commented Jul 28, 2017

brada4 commented Jul 28, 2017

jirutka commented Jul 29, 2017

brada4 commented Jul 29, 2017

martin-frbg commented Jul 29, 2017

brada4 commented Jul 29, 2017

xianyi commented Aug 1, 2017

martin-frbg commented Aug 1, 2017

martin-frbg commented Aug 1, 2017

martin-frbg commented Aug 1, 2017

jirutka commented Aug 1, 2017

jirutka commented Aug 1, 2017

martin-frbg commented Aug 1, 2017

martin-frbg commented Aug 2, 2017 • edited Loading

jirutka commented Aug 6, 2017

brada4 commented Aug 6, 2017

jirutka commented Jul 28, 2017 •

edited

Loading

jirutka commented Jul 28, 2017 •

edited

Loading

martin-frbg commented Jul 28, 2017 •

edited

Loading

7heo commented Jul 28, 2017 •

edited

Loading

jirutka commented Jul 28, 2017 •

edited

Loading

martin-frbg commented Aug 2, 2017 •

edited

Loading