Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The init command does not return SSL errors #3663

Closed
palourde opened this issue Apr 2, 2020 · 5 comments · Fixed by #4479
Closed

The init command does not return SSL errors #3663

palourde opened this issue Apr 2, 2020 · 5 comments · Fixed by #4479
Assignees
Milestone

Comments

@palourde
Copy link
Contributor

palourde commented Apr 2, 2020

Expected Behavior

If the SSL certificate for etcd's client traffic is invalid, an explicit error should be returned.

Current Behavior

The init command only returns the generic error error connecting to cluster: context deadline exceeded

Some context: https://sensu.slack.com/archives/C60EEQFH8/p1585835720437200

TL;DR version: The client traffic was configured as following:

etcd-advertise-client-urls: https://192.168.156.200:2379
etcd-listen-client-urls: https://0.0.0.0:2379
etcd-cert-file: /etc/pki/tls/certs/fullchain.pem
etcd-key-file: /etc/pki/tls/private/privkey.pem

However, this certificate's subject was something like *.domain.tld. Because sensu-backend init uses etcd-advertise-client-urls to connect to etcd, the connection was probably rejected because the CN (*.domain.tld) didn't matched the URL (192.168.156.200), which is expected. However, it would be useful to return an explicit error so it's easier to debug for operators.

Possible Solution

Figure out, if possible, where the actual error about TLS is returned and use that instead.

Steps to Reproduce (for bugs)

Setup a new backend with TLS for etcd client traffic, but make the SSL subject mismatch.

Context

Your Environment

  • Sensu version used (sensuctl, sensu-backend, and/or sensu-agent):
  • Installation method (packages, binaries, docker etc.):
  • Operating System and version (e.g. Ubuntu 14.04):
@palourde palourde added the bug label Apr 2, 2020
@calebhailey calebhailey added reviewed component:backend Sensu Backend improvements labels Apr 5, 2020
@palourde palourde self-assigned this Apr 13, 2020
@palourde
Copy link
Contributor Author

After a full day of investigation, I'm starting to better understand the problem.

This is essentially coming from grpc/grpc-go#2031.

It appears a workaround was merged 5 days ago in grpc-go to allow blocking callers to surface the actual error, and not just context deadline exceeded. Once this patch is released, we would then need to wait for etcd to update to this new version, which might take a while, and may not necessarily be merged into the 3.3 branch of etcd.

A short-term fix could be to rely on the log entry produced by this GRPC unary client interceptor: https://github.com/etcd-io/etcd/blob/1c16c242db884999b495e07e86b5b6ca548a010c/clientv3/retry_interceptor.go#L62-L67, but it requires us to not use a blocking client, therefore removing this line:

grpc.WithBlock(),

Which would produce the following log entries:

$ sensu-backend init --cluster-admin-username admin --cluster-admin-password 'P@ssw0rd!' --etcd-advertise-client-urls https://127.0.0.1:2379
{"level":"warn","ts":"2020-04-14T15:58:46.811-0400","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-ecd80cf5-7314-4193-9008-5c46f03d8c4c/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\""}
{"component":"backend","error":"context deadline exceeded","level":"fatal","msg":"error executing sensu-backend","time":"2020-04-14T15:58:46-04:00"}
exit status 1

Unfortunately, I don't think we could rely on the same workaround for sensu-backend start (I've noticed it may also fail to start with the error {"component":"backend","error":"context deadline exceeded","level":"fatal","msg":"error executing sensu-backend","time":"2020-04-14T15:59:38-04:00"} if a TLS error occurs while connecting to the etcd client URL).

@palourde
Copy link
Contributor Author

A workaround for sensu-backend init has been merged, however I think we should keep this issue open in order to track the progress made in grpc-go & etcd with the PR mentioned above, so we can also display TLS errors in sensu-backend start.

@portertech
Copy link
Contributor

We're waiting on etcd 3.5 for this.

@portertech
Copy link
Contributor

We need to verify if this is still the case (we've long upgraded to etcd 3.5).

@fguimond fguimond assigned fguimond and unassigned palourde Nov 1, 2021
@fguimond
Copy link
Contributor

fguimond commented Nov 2, 2021

The problem is still present with the latest version from main which has etc 3.5.

> ./bin/darwin/amd64/sensu-backend init --cluster-admin-username fguimond --cluster-admin-password fguimond -c ~/sensu-ent/backend-secure-etcd/backend.yml
{"component":"cmd","level":"info","msg":"attempting to connect to etcd server: http://localhost:2379","time":"2021-11-02T11:38:30-04:00"}
{"component":"cmd","level":"error","msg":"error connecting to etcd endpoint: context deadline exceeded","time":"2021-11-02T11:38:35-04:00"}
{"component":"sensu-enterprise","error":"no etcd endpoints are available or cluster is unhealthy","level":"fatal","msg":"error executing sensu-backend","time":"2021-11-02T11:38:35-04:00"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants