-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestGRPCConnection unit test fails #969
Comments
This still fails identically with v0.23.0 (and master branch, as of writing). What's more, isolating the tests to just the GRPC tests fails in a completely different way:
vs.
In fact, the latter is inconsistent. Sometimes the gRPC server error message is reported:
Other times, the test panics as it does when running the full suite of tests:
Something smells racy here. |
@averzicco As the author of the GRPC tests, are you able to reproduce the failures that I see here? I'm surprised this made it past CI when being merged. |
hi @dswarbrick, I've quickly checkout the code, all test are successful:
|
@averzicco I get inconsistent results, depending on whether I run tests with
That's on Ubuntu 22.10 (amd64) with Go 1.19.2 (and also 1.19.4, installed from Ubuntu lunar repos), Intel Core i7 (16 cores). Running the tests in a Debian sid schroot on the same hardware, or on a Debian testing (bookworm) Intel Core i3, the tests pass. |
Based on the output you posted the result is not inconsistent, the test failed in both cases. I've re-run the test with the -race flag (which is used when running) and are still successful:
I've also tried running the test on a 32 cores debian 10 machine, test are successful. Not sure what can be the issue here, since you can reproduce it locally, could you try to run the test in debug mode to check what is failing? |
@averzicco By "inconsistent" I mean that the test are failing in different ways - but yes, they consistently fail. A small correction to my earlier comment - the tests pass in the Debian sid schroot, yet fail when run natively on the Ubuntu 22.10 which hosts the schroot envs. That would seem to suggest something weird with Ubuntu's Go toolchain, which I find a bit hard to swallow. Also, despite the tests passing in a Debian sid schroot on the Core i7, and natively on a Debian testing Core i3, these tests have been quite flaky on the Debian CI infrastructure which builds the prometheus-blackbox-exporter package (for a variety of archs), e.g. https://tests.reproducible-builds.org/debian/rbuild/unstable/amd64/prometheus-blackbox-exporter_0.23.0-1.rbuild.log.gz |
for what i see in the log you linked, it seems something is wrong with the grpc build-deps (the
|
@averzicco Disregard the skipped tests; they were previously patched out due to older grpc build-deps in Debian, unable to support I think the tests are racy when executed on slower / busy systems (which Debian CI infrastructure often is). Look closely at the failure:
I think it is due to the inherent raciness caused by tests which spin up http servers, tcp servers etc in goroutines within the same test. s := grpc.NewServer()
healthServer := health.NewServer()
healthServer.SetServingStatus("service", grpc_health_v1.HealthCheckResponse_SERVING)
grpc_health_v1.RegisterHealthServer(s, healthServer)
go func() {
if err := s.Serve(ln); err != nil {
t.Errorf("failed to serve: %v", err)
return
}
}()
defer s.GracefulStop()
testCTX, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
registry := prometheus.NewRegistry()
result := ProbeGRPC(testCTX, "localhost:"+port,
config.Module{Timeout: time.Second, GRPC: config.GRPCProbe{
IPProtocolFallback: false,
},
}, registry, log.NewNopLogger())
if !result {
t.Fatalf("GRPC probe failed")
} I suspect that the grpc server in the goroutine is not ready in time, so ProbeGRPC() fails, and the deferred s.GracefulStop() is called as the test exits. However, the goroutine is still in the s.Serve() loop, so... panic. It wouldn't be the first time that I've needed to add a small delay after the goroutine to give tcp / http server time to initialize before a request is fired at it. |
Putting aside the goroutine panic for a moment, I found out why the other type of test failure occurs, e.g.
It seems that the lookup of "localhost" is failing in ProbeGRPC due to the probe module DNS resolution options.
Setting @averzicco Out of curiosity, are you setting I don't yet fully understand why a little test program produces these results on Ubuntu:
... despite this:
Running the same little test program on Debian produces the expected results:
|
I tracked the strange failure of Go's (and Python's) inability to resolve "localhost" to "::1" on Ubuntu to the lack of an entry for it in /etc/hosts. It seems that an out-of-the-box /etc/hosts on Ubuntu contains the following entries for IPv6:
... whereas a default Debian /etc/hosts contains (as per netcfg.h):
I'm not sure yet what generates the default /etc/hosts on Ubuntu, but I don't think it is netcfg anymore. So, on Ubuntu, "localhost" was only resolving to "127.0.0.1" in |
Sorry, more than one year has passed since I wrote that code and I don't remember why I've used |
I was able to reproduce it on but it works fine when I run it the way we run it in CI, we run it like this in CI: export DOCKER_TEST_IMAGE_NAME=quay.io/prometheus/golang-builder:1.20-base
# setup ipv6, add following to `/etc/docker/daemon.json`
{
"ipv6": true,
"fixed-cidr-v6": "2001:db8:1::/64"
}
# restart docker deamon
sudo service docker restart
# run it
docker run --rm -t -v "$(pwd):/app" "${DOCKER_TEST_IMAGE_NAME}" -i github.com/prometheus/blackbox_exporter -T this makes me think that there is something different we are doing in CI, and we need to do that in local. I don't have time to dig down further right now so just sharing my findings. |
these tests fail when IPv6 is not enabled. if you are running these inside docker, you need to enable IPv6 (as shown in the comment above), or skip tests that needs IPv6. you can set I fixed these in #1342, and I am hoping that fixes this as well. please reopen if this is not fixed, and you are able to reproduce this on master. |
Running
go test ./...
on a checkout of v0.22.0, the TestGRPCConnection test consistently fails as follows:Tested with Go 1.19.1 on linux/amd64.
The text was updated successfully, but these errors were encountered: