Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Kibana throws errors 500/401 one hour after login when using SAML #828

Closed
GuiTeK opened this issue Sep 15, 2021 · 50 comments · Fixed by #1773
Closed

[BUG] Kibana throws errors 500/401 one hour after login when using SAML #828

GuiTeK opened this issue Sep 15, 2021 · 50 comments · Fixed by #1773
Assignees
Labels
bug Something isn't working triaged v2.12.0 Items targeting 2.12.0

Comments

@GuiTeK
Copy link

GuiTeK commented Sep 15, 2021

Describe the bug
1 hour after login, Kibana will show one of these 2 errors instead of the requested page:

  • On the root domain of Kibana: {"statusCode":500,"error":"Internal Server Error","message":"An internal server error occurred."}
  • On all other pages: {"statusCode":401,"error":"Unauthorized","message":"Response Error"}

The only way to work around this issue is to delete cookies.

To Reproduce
Steps to reproduce the behavior:

  1. Login to Kibana via a SAML Identity Provider (e.g. Okta)
  2. Wait for 1 hour
  3. Try to refresh/browse to a new Kibana page
  4. See that Kibana shows {"statusCode":401,"error":"Unauthorized","message":"Response Error"} instead of showing the requested page
  5. See that Kibana shows {"statusCode":500,"error":"Internal Server Error","message":"An internal server error occurred."} when manually going to the root Kibana domain

Note: to create the Okta app, I followed the instructions here: AWS - Add Single Sign-On (SSO) to Open Distro for Elasticsearch Kibana using SAML and Okta.

Expected behavior
The internal JWT created by OpenDistro (I'm not sure exactly what component creates it) should be automatically renewed and Kibana shouldn't throw an error (either 401 or 500) when visiting a page 1 hour or more after initial login.

Logs
ES Node Logs

[2021-09-15T09:08:24,861][TRACE][c.a.o.s.a.BackendRegistry] [es3.logs.example.com] Rest authentication request from 10.0.3.4:32528 [original: /10.0.3.4:32528]
[2021-09-15T09:08:24,861][DEBUG][c.a.o.s.a.BackendRegistry] [es3.logs.example.com] Check authdomain for rest noop/0 or 2 in total
[2021-09-15T09:08:24,861][TRACE][c.a.o.s.a.BackendRegistry] [es3.logs.example.com] Try to extract auth creds from clientcert http authenticator
[2021-09-15T09:08:24,861][TRACE][c.a.o.s.h.HTTPClientCertAuthenticator] [es3.logs.example.com] No CLIENT CERT, send 401
[2021-09-15T09:08:24,861][TRACE][c.a.o.s.a.BackendRegistry] [es3.logs.example.com] No 'Authorization' header, send 403
[2021-09-15T09:08:24,861][DEBUG][c.a.o.s.a.BackendRegistry] [es3.logs.example.com] Check authdomain for rest noop/1 or 2 in total
[2021-09-15T09:08:24,861][TRACE][c.a.o.s.a.BackendRegistry] [es3.logs.example.com] Try to extract auth creds from saml http authenticator
[2021-09-15T09:08:24,862][INFO ][c.a.d.a.h.j.AbstractHTTPJwtAuthenticator] [es3.logs.example.com] Extracting JWT token from [REDACTED JWT TOKEN HERE] failed
com.amazon.dlic.auth.http.jwt.keybyoidc.BadCredentialsException: The token has expired
	at com.amazon.dlic.auth.http.jwt.keybyoidc.JwtVerifier.getVerifiedJwtToken(JwtVerifier.java:85) ~[opendistro_security-1.13.1.0.jar:1.13.1.0]
	at com.amazon.dlic.auth.http.jwt.AbstractHTTPJwtAuthenticator.extractCredentials0(AbstractHTTPJwtAuthenticator.java:108) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at com.amazon.dlic.auth.http.jwt.AbstractHTTPJwtAuthenticator.access$000(AbstractHTTPJwtAuthenticator.java:47) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at com.amazon.dlic.auth.http.jwt.AbstractHTTPJwtAuthenticator$1.run(AbstractHTTPJwtAuthenticator.java:90) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at com.amazon.dlic.auth.http.jwt.AbstractHTTPJwtAuthenticator$1.run(AbstractHTTPJwtAuthenticator.java:87) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at java.security.AccessController.doPrivileged(AccessController.java:312) [?:?]
	at com.amazon.dlic.auth.http.jwt.AbstractHTTPJwtAuthenticator.extractCredentials(AbstractHTTPJwtAuthenticator.java:87) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at com.amazon.dlic.auth.http.saml.HTTPSamlAuthenticator.extractCredentials(HTTPSamlAuthenticator.java:148) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at com.amazon.opendistroforelasticsearch.security.auth.BackendRegistry.authenticate(BackendRegistry.java:421) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityRestFilter.checkAndAuthenticateRequest(OpenDistroSecurityRestFilter.java:177) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityRestFilter.access$000(OpenDistroSecurityRestFilter.java:66) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityRestFilter$1.handleRequest(OpenDistroSecurityRestFilter.java:113) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:258) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:340) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:191) [elasticsearch-7.10.2.jar:7.10.2]
	at com.amazon.opendistroforelasticsearch.security.ssl.http.netty.ValidatingDispatcher.dispatchRequest(ValidatingDispatcher.java:63) [opendistro_security-1.13.1.0.jar:1.13.1.0]
	at org.elasticsearch.http.AbstractHttpServerTransport.dispatchRequest(AbstractHttpServerTransport.java:319) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.http.AbstractHttpServerTransport.handleIncomingRequest(AbstractHttpServerTransport.java:384) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.http.AbstractHttpServerTransport.incomingRequest(AbstractHttpServerTransport.java:309) [elasticsearch-7.10.2.jar:7.10.2]
	at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:42) [transport-netty4-client-7.10.2.jar:7.10.2]
	at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:28) [transport-netty4-client-7.10.2.jar:7.10.2]
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.channelRead(Netty4HttpPipeliningHandler.java:58) [transport-netty4-client-7.10.2.jar:7.10.2]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
	at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.apache.cxf.rs.security.jose.jwt.JwtException: The token has expired
	at org.apache.cxf.rs.security.jose.jwt.JwtUtils.validateJwtExpiry(JwtUtils.java:58) ~[cxf-rt-rs-security-jose-3.4.0.jar:3.4.0]
	at com.amazon.dlic.auth.http.jwt.keybyoidc.JwtVerifier.validateClaims(JwtVerifier.java:119) ~[opendistro_security-1.13.1.0.jar:1.13.1.0]
	at com.amazon.dlic.auth.http.jwt.keybyoidc.JwtVerifier.getVerifiedJwtToken(JwtVerifier.java:81) ~[opendistro_security-1.13.1.0.jar:1.13.1.0]
	... 74 more
[2021-09-15T09:08:24,865][DEBUG][c.o.s.a.AuthnRequest     ] [es3.logs.example.com] AuthNRequest --> <samlp:AuthnRequest xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion" ID="ONELOGIN_{SOME_UUID}" Version="2.0" IssueInstant="2021-09-15T09:08:24Z" ForceAuthn="true" Destination="https://subdomain.okta.com/app/xxx/yyy/sso/saml" ProtocolBinding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" AssertionConsumerServiceURL="https://kb.logs.example.com/_opendistro/_security/saml/acs"><saml:Issuer>logs-kibana-saml</saml:Issuer><samlp:NameIDPolicy Format="urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified" AllowCreate="true" /></samlp:AuthnRequest>
[2021-09-15T09:08:24,865][TRACE][o.e.i.b.in_flight_requests] [es3.logs.example.com] [in_flight_requests] Adjusted breaker by [0] bytes, now [0]
[2021-09-15T09:08:24,866][TRACE][o.e.h.HttpTracer         ] [es3.logs.example.com] [422][8027f54c-0e8c-4318-b8c8-a5fab657dd63][UNAUTHORIZED][text/plain; charset=UTF-8][0] sent response to [Netty4HttpChannel{localAddress=/10.0.5.8:9200, remoteAddress=/10.0.3.4:32528}] success [true]
[2021-09-15T09:08:24,866][TRACE][c.a.o.s.a.BackendRegistry] [es3.logs.example.com] No 'Authorization' header, send 401 and 'WWW-Authenticate Basic'

Kibana Server Logs

Sep 14 15:33:17 kb1.logs.example.com kibana[493]: {"type":"log","@timestamp":"2021-09-14T15:33:17Z","tags":["error","elasticsearch","data"],"pid":493,"message":"[ResponseError]: Response Error"}
Sep 14 15:33:17 kb1.logs.example.com kibana[493]: {"type":"log","@timestamp":"2021-09-14T15:33:17Z","tags":["error","http"],"pid":493,"message":"{ ResponseError: Response Error\n    at IncomingMessage.response.on (/us
r/share/kibana/node_modules/@elastic/elasticsearch/lib/Transport.js:272:25)\n    at IncomingMessage.emit (events.js:203:15)\n    at endReadableNT (_stream_readable.js:1145:12)\n    at process._tickCallback (internal/proc
ess/next_tick.js:63:19)\n  name: 'ResponseError',\n  meta:\n   { body: '',\n     statusCode: 401,\n     headers:\n      { 'x-opaque-id': '{SOME_UUID}',\n        'www-authenticate':\n         'X-S
ecurity-IdP realm=\"Open Distro Security\" location=\"https://subdomain.okta.com/app/xxx/yyy/sso/saml?SAMLRequest=some-base64-data\" requestId=\"ONELOGIN_{SOME_UUID}\"',\n        'content-type': 'text/plain; charset=UTF-8',\n        'content-length': '0' },\n
    meta:\n      { context: null,\n        request: [Object],\n        name: 'elasticsearch-js',\n        connection: [Object],\n        attempts: 0,\n        aborted: false } },\n  isBoom: true,\n  isServer: false,\n  d
ata: null,\n  output:\n   { statusCode: 401,\n     payload:\n      { statusCode: 401,\n        error: 'Unauthorized',\n        message: 'Response Error' },\n     headers: {} },\n  reformat: [Function],\n  [Symbol(SavedOb
jectsClientErrorCode)]: 'SavedObjectsClient/notAuthorized' }"}
Sep 14 15:33:17 kb1.logs.example.com kibana[493]: {"type":"error","@timestamp":"2021-09-14T15:33:17Z","tags":[],"pid":493,"level":"error","error":{"message":"Internal Server Error","name":"Error","stack":"Error: Inter
nal Server Error\n    at HapiResponseAdapter.toInternalError (/usr/share/kibana/src/core/server/http/router/response_adapter.js:69:19)\n    at Router.handle (/usr/share/kibana/src/core/server/http/router/router.js:177:34
)\n    at process._tickCallback (internal/process/next_tick.js:68:7)"},"url":{"protocol":null,"slashes":null,"auth":null,"host":null,"port":null,"hostname":null,"hash":null,"search":null,"query":{},"pathname":"/","path":
"/","href":"/"},"message":"Internal Server Error"}
Sep 14 15:33:17 kb1.logs.example.com kibana[493]: {"type":"response","@timestamp":"2021-09-14T15:33:17Z","tags":[],"pid":493,"method":"get","statusCode":500,"req":{"url":"/","method":"get","headers":{"x-forwarded-for"
:"38.103.45.2","x-forwarded-proto":"https","x-forwarded-port":"443","host":"kb.logs.example.com","x-amzn-trace-id":"Root=some-id","sec-ch-ua":"\"Google Chrome\";v=\"93\", \" Not;A Brand\";v
=\"99\", \"Chromium\";v=\"93\"","sec-ch-ua-mobile":"?0","sec-ch-ua-platform":"\"macOS\"","upgrade-insecure-requests":"1","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko
) Chrome/93.0.4577.63 Safari/537.36","accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","sec-fetch-site":"none","sec-fetch-m
ode":"navigate","sec-fetch-user":"?1","sec-fetch-dest":"document","accept-encoding":"gzip, deflate, br","accept-language":"en-GB,en-US;q=0.9,en;q=0.8,fr;q=0.7"},"remoteAddress":"10.0.1.36","userAgent":"Mozilla/5.0 (Macin
tosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36"},"res":{"statusCode":500,"responseTime":56,"contentLength":9},"message":"GET / 500 56ms - 9.0B"}

Host/Environment (please complete the following information):

  • OS: Ubuntu 20.04
  • ElasticSearch: OSS 7.10.2
  • Kibana (opendistroforelasticsearch-kibana): 1.13.2
  • OpenDistro versions:
opendistro-alerting               1.13.1.0-1 
opendistro-anomaly-detection      1.13.0.0-1
opendistro-asynchronous-search    1.13.0.1-1
opendistro-index-management       1.13.2.0-1
opendistro-job-scheduler          1.13.0.0-1
opendistro-knn                    1.13.0.0-1
opendistro-knnlib                 1.13.0.0
opendistro-performance-analyzer   1.13.0.0-1
opendistro-reports-scheduler      1.13.0.0-1
opendistro-security               1.13.1.0-1
opendistro-sql                    1.13.2.0-1
opendistroforelasticsearch        1.13.2-1
@GuiTeK GuiTeK added Beta bug Something isn't working untriaged labels Sep 15, 2021
@GuiTeK
Copy link
Author

GuiTeK commented Nov 5, 2021

Any news on this issue?

I just tested OpenSearch 1.0.0 and 1.1.0 and the issue is still present.

The only way to work around it is to delete cookies, which is a nightmare from an UX point of view/usability of the tool.

The JWT expiry setting could maybe be a workaround for this issue, but unfortunately this feature also has a bug: opensearch-project/security#1448

@GuiTeK
Copy link
Author

GuiTeK commented Nov 5, 2021

cc @dblock just so you're aware of it as I believe it is a rather high impact bug.

@davidlago
Copy link

Thanks for the report, @GuiTeK. Although not a security vulnerability, I agree that this is painful from the user experience standpoint. The team has not had time to look into it yet, but I've removed the untriaged label to make sure we have it on the list of issues ready to take on.

@GuiTeK
Copy link
Author

GuiTeK commented Nov 8, 2021

Thank you for your reply @davidlago!

Related issues (although the root cause is not the parameters described in these two issues):

@rmelilloii
Copy link

Hello, @GuiTeK all good!? Did you find a solution to this issue? I am currently validating the downgrade of the ES plugin "opendistro_security" from: 1.13.1.0 to: 1.13.0.0. Keeping the Kibana plugin as it is.
It works, at least did not break anything. I will proceed with the same for my clusters and see how it behaves.
Thanks for the info regarding OpenSearch versions.

@rmelilloii
Copy link

Hello, @GuiTeK all good!? Did you find a solution to this issue? I am currently validating the downgrade of the ES plugin "opendistro_security" from: 1.13.1.0 to: 1.13.0.0. Keeping the Kibana plugin as it is. It works, at least did not break anything. I will proceed with the same for my clusters and see how it behaves. Thanks for the info regarding OpenSearch versions.

Replying to self:
ended up with:
Kibana 1.12 (7.10.0) + opendistroSecurityKibana 1.13.0.0
My case with multiple Tenants suffers a lot fo the lack of real support to them + the session/cookie issue.

With this combination, I can finally see the 1hour timeout (from Azure). Before was a matter of 15 min or less.

But I feel that it will not be enough. So maybe next will try to change my infra (load balancer/request flow) or test with latest Opensearch version.

I guess that everyone is off for the year, is that right? Cheers!! :)

@mvanderlee
Copy link

We observed this behaviour when the session keepalive is set to true. Setting it to false fixed it.

In dashboards.yml

opensearch_security.session.keepalive: false

However, we found another issue regarding to SAML timeouts. The IDP provides a expiry time, but OpenSearch only honors a specific option in the SAMLResponse. Auth0 for example sends it via a different option. Not sure who is in the wrong here. But in our case the work-around was to set the jwt.expiry setting manually

#159 (comment)

@davidlago davidlago added the help wanted Extra attention is needed, need help from community label Feb 18, 2022
@sandervandegeijn
Copy link

Confirmed with both saml and oidc

@rmelilloii
Copy link

rmelilloii commented Apr 8, 2022

Hello hello! Forgot to post back. I ended up with normal settings (exactly like the official docs) no fancy stuff, normal versions all around. But, I use load balancers, previous person had several ÉS node type under the LB and kibana was pointed to this LB for auth. After pointing to a single client node it never happened again.
I didn’t test with multiple endpoints, but it is not really required for me and I am moving to Opensearch. Will test there ;) Cheers and thanks for all the ideas.

@mhoydis13
Copy link

This is still an issue for anyone using openid_auth_domain

@sandervandegeijn
Copy link

Yip. Tried different ttl values and combination of settings. Problem still persists

@mhoydis13
Copy link

This issue is still present in version 2.1.0

@sandervandegeijn
Copy link

Confirmed and very annoying.

@SakuraAxy
Copy link

I have the same problem

@sandervandegeijn
Copy link

Any news on this one? This is one of the reasons I can't migrate an Elastic cluster to Opensearch because the team doesn't want tot deal with this session error.

@FryggFR
Copy link

FryggFR commented Jan 23, 2023

Hello,

I have the same problem with Azure AD

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Jan 30, 2023

[Triage 1/30/2023] This issue seems to be related to the cookie storage and potentially the access & refresh tokens expiring. We are passing a token but it does not have a good method of dealing with expiration between front-end and backend systems. @davidlago could you link this to the to-be-created ticket for session management so that this can be a considered use case. Thank you.

Also linking a pair of associated issues

@jperhamcatchteam
Copy link

Hello,

I have the same problem with Azure AD

Also running into this issue with Azure AD. I have not found a way to resolve this without manually clearing cookies for the issue browser.

@SergioIbIGZ
Copy link

Hello @jochen-kressin . Thank you for your answers.
I tried adding "offline_access" to the scope but our IDP does not work, is invalid for it :(

Apart of this, yesterday I made a interesting POC. I installed the last version of OpenSearch (2.11) with the latest version of OpenSearch Dashboards 2. Using Helm and applying the exact SAME config as we use in OpenSearch 1.x, including OpenID.

For my surprise, in OpenSearch 2.x logs appears the cookie expire invalid error, but in OpenSearch Dasboards 2.x does NOT appears the BadCredentialsException that exist in the 1.x version. For your reminder:

[2023-12-14T11:27:07,592][INFO ][c.a.d.a.h.j.AbstractHTTPJwtAuthenticator] [tip-master-1] Extracting JWT token from eyJraWQiOiJyc2ExIiwiYWxnIjoiUlMyNTYifQ.eyJzdWIiOiJYRTk1NDQ0IiwiY291bnRyeSI6IkVTUCIsInJvbGVzIjpbIkJCVkFfSE8tVklFV0VSX0NUSSJdLCJraWQiOiJyc2ExIiwiaXNzIjoiaHR0cHM6XC9cL2lkcC5saXZlLmdsb2JhbC5wbGF0Zm9ybS5iYnZhLmNvbVwvb2lkY1wvIiwicHJlZmVycmVkX3VzZXJuYW1lIjoiWEU5NTQ0NCIsImdpdmVuX25hbWUiOiJTRVJHSU8iLCJhdWQiOiIxMGMzYzhhNi0zZjc4LTQ5M2UtYTc4Ny02MTJkODcwMjFhZGQiLCJuYW1lIjoiW0V4dGVybmFsXSBTRVJHSU8gSUJBw5FFWiBET01JTkdPIiwiZXhwIjoxNzAyNTUyMTU1LCJpYXQiOjE3MDI1NTE1NTUsImp0aSI6IjQxZGM0ZGViLThmMjMtNDUwYi04OTIzLTc1ZjM3ZjM5NWI4NiIsImVtYWlsIjoiU0VSR0lPLklCQU5FWi5DT05UUkFDVE9SQEJCVkEuQ09NIn0.mwMA0tnhvOPK14kY39MjPawYiklH3TlnHMwBM63K8AABfAIFtz-Ra_8uwy3AfHODKdTzDhOigYU5oFNJlVoudnzRAZek7uDk2YUsYRb3of4zIKJt3tBPkaHOrYOGHUGIQsgeX3kNQewKMPIvmiLgGw-r0Ep5kKbm228TRXhlbWOt_Y_TDj1KqF5SCv2rkr60wiJVt19nPSbzK2WlLhkE_227ywC1gwo9N1lvSH6qoO82o4If75O4L0ddO6crvaE97amgeCxi9jaI6_QM0U9lSX9kGoAvK1kb0ik90NJGnaVBqetw-WWVykZMsxUFDL8dlDP8eMNCRCOwayrDM2_EUQ failed
com.amazon.dlic.auth.http.jwt.keybyoidc.BadCredentialsException: The token has expired

So when a user log-in in OpenSearch Dashboards 2.x, the 401/Unauthorized error 10 minutes after the login, does NOT occurs. The session is kept alive or maybe the token is able to be refreshed.

The conclusion for me is that "com.amazon.dlic.auth.http.jwt.keybyoidc" is different, o maybe another version, in 2.x and it works properly with our IDP and its cookies.

Can you tell me about it? Is there possible to study a possible fix for 1.x?

Thanks for your time and dedication :)

@sandervandegeijn
Copy link

Any updates on this one? Would really love for this to be solved in 2.12. This one is very annoying :)

@stephen-crawford
Copy link
Contributor

Hi @sandervandegeijn, I don't know of any active efforts to fix this. I will remove the Triaged label so the matainers can review this issue during the Triaging meeting later today and add an update below.

@stephen-crawford
Copy link
Contributor

[Triage] Seems like this is still an issue and something is going wrong with the behavior when using external IdPs and dashboards. Based on the discussion any data is lost from the active session if a redirect to refresh the token is executed during making a changes. I.e. making filters and then a token refresh causes you to lose all of the filters. We should try to prioritize this based on the long life of this issue.

@stephen-crawford stephen-crawford added triaged v2.12.0 Items targeting 2.12.0 and removed help wanted Extra attention is needed, need help from community labels Feb 5, 2024
@derek-ho
Copy link
Collaborator

derek-ho commented Feb 7, 2024

@GuiTeK @sandervandegeijn I am beginning to pick this up, and new tho this space. Can you share your settings for opensearch_security.session.ttl and opensearch_security.session.keepalive as well as any settings from the saml provider that you have, such as the expiry/ assertion lifespan? I am seeing some behavior on my local and not sure if it is what you are talking about - is the bug that after some certain amount of time OSD is re-routed back to the saml provider to re-authenticate even though the assertion should still be valid? I do see this on my local, but I do not see the 401's/error pages that you folks were mentioning?

@sandervandegeijn
Copy link

We are at the defaults. I'm not running SAML anymore, but with openid it's basically the same thing. Leave the dashboards app for 30 minutes or so, click on the next page and it will kick off the authentication flow (which is better than it was in the past, then it would just throw the 401 and be done with it). This is fast and another problem is dat it loses all the state that you had like filters. :)

@andrew-landsverk-win
Copy link

I am doing the same as @sandervandegeijn - using OpenID via Azure AD and the same thing happens to us.

@derek-ho
Copy link
Collaborator

derek-ho commented Feb 7, 2024

Understood. I do think there is a bug/confusion here regarding the whole management of sessions. Would you folks be able to provide some feedback on this issue? #1711? Additionally a few questions/comments for you (I am still trying to wrap my head around it so there will be more to come!)

  • Is there anything preventing you from updating keepalive to false and opensearch_security.session.ttl to match the expiry of the assertion/exp field? This might not solve for cases in which the exp is actually short lived (that may be a separate issue of doing the best to preserve URls, filters, etc.), but this should solve for the redirection. I believe the defaults here: https://github.com/opensearch-project/security-dashboards-plugin/blob/main/server/index.ts#L78 are the root case of this
  • Can you folks help me understand in which case would people want the session within OSD to actually be shorter than the validity of the result received from the IDP? We may want to set some sensible defaults in the case that it is not set in a way we expect by the IDP, but any other reasons to have this mismatch? I need to dive deep into the code, but if there is no use case In having a shorter session in dashboards only I would be in favor of removing that to handle everything in a single place and have a single source of truth (OpenSearch backend) although a intermediary fix may be to set the cookie expiry to the max of the existing cookie expiry, expiry from idp, or current time + ttl. What do you folks think?

@sandervandegeijn
Copy link

Actually, I checked, I set the timeouts to one day. Removed the settings, so now I'm on the defaults. Will test again.

I do not understand why you would override the timeout from the IDP. If you need it shorter, you should fix it at the IDP's side I would suppose?

This also seems related: #159 (comment)

@SergioIbIGZ
Copy link

Hello again,
Probably this information is not useful here but I would like to share it with you.

As I said in #828 (comment) in our case, the IDP and its cookie is configuring to try to set 12 hours as Dashboards session ttl.
But at the end that session is closed with a 401 error after only 10 minutes (I don't know why in our case is only that time).

Our parcial workaround is configuring this settings:

opensearch_security.openid.refresh_tokens: false
opensearch_security.session.keepalive: true
opensearch_security.session.ttl: 180000
opensearch_security.cookie.ttl: 180000

As you can see, the session ttl is only 3 minutes. But what we achieve with that is a browser "auto-reload" to dashboards login each that time. It is annoying but doing this, the 401 error does NOT occurs. But the users have to save its work continuosly to don't lose it, or course.

Thanks!

@sandervandegeijn
Copy link

Actually, I checked, I set the timeouts to one day. Removed the settings, so now I'm on the defaults. Will test again.

I do not understand why you would override the timeout from the IDP. If you need it shorter, you should fix it at the IDP's side I would suppose?

This also seems related: #159 (comment)

Problem persists.

@atbohmer
Copy link

atbohmer commented Feb 8, 2024

From a user perspective: mighty irritating! Had a discover window open, did some work, went to fetch a coffee, back and a session reset in front of my eyes. All selections and filtering gone. So back to the ELK setup for daily work.

@derek-ho
Copy link
Collaborator

Hello again, Probably this information is not useful here but I would like to share it with you.

As I said in #828 (comment) in our case, the IDP and its cookie is configuring to try to set 12 hours as Dashboards session ttl. But at the end that session is closed with a 401 error after only 10 minutes (I don't know why in our case is only that time).

Our parcial workaround is configuring this settings:

opensearch_security.openid.refresh_tokens: false
opensearch_security.session.keepalive: true
opensearch_security.session.ttl: 180000
opensearch_security.cookie.ttl: 180000

As you can see, the session ttl is only 3 minutes. But what we achieve with that is a browser "auto-reload" to dashboards login each that time. It is annoying but doing this, the 401 error does NOT occurs. But the users have to save its work continuosly to don't lose it, or course.

Thanks!

@SergioIbIGZ If I am reading your situation correctly, it seems like you have issues on 1.x line, but the issue is not on 2.x line? Unfortunately we do not develop for 1.x anymore anymore, and would recommend you upgrade to 2.x. https://opensearch.org/releases.html. That being said, I am going to shortly post a summary on this issue and close it out with the merging of a recent PR. Feel free to open another issue if something that is affecting 2.x comes up, or if I am not understanding your problem correctly. Thanks!

@derek-ho
Copy link
Collaborator

derek-ho commented Feb 13, 2024

This issue is getting a little long in the tooth and it's getting hard for me to diagnose/help individual folks with their problems. That being said, it seems to me like there's several issues mentioned, some related, and some not, and some based on opendistro, which may or may not be out of date. From what I see the issues are:

I will be closing this issue with the merging of #1773. Anybody please feel free to open a follow-up issue with detailed reproduction steps (IDP, opensearch_dashboards.yml settings, opensearch security backend config, etc.) so I can better address individual concerns. Thanks!

Additionally, we have a RFC #1711 to discuss confusion around some of the settings. If anyone has any thoughts, please leave them there, thanks!

@GuiTeK @rmelilloii @sandervandegeijn @mhoydis13 @SakuraAxy @FryggFR @jperhamcatchteam @Beeez @K3ndu @tr0k
@mkhpalm @SergioIbIGZ @andrew-landsverk-win @atbohmer @jochen-kressin

@sandervandegeijn
Copy link

Thanks for the effort Derek, we haven't made it easy for you ;)

@SergioIbIGZ
Copy link

Hello again, Probably this information is not useful here but I would like to share it with you.
As I said in #828 (comment) in our case, the IDP and its cookie is configuring to try to set 12 hours as Dashboards session ttl. But at the end that session is closed with a 401 error after only 10 minutes (I don't know why in our case is only that time).
Our parcial workaround is configuring this settings:

opensearch_security.openid.refresh_tokens: false
opensearch_security.session.keepalive: true
opensearch_security.session.ttl: 180000
opensearch_security.cookie.ttl: 180000

As you can see, the session ttl is only 3 minutes. But what we achieve with that is a browser "auto-reload" to dashboards login each that time. It is annoying but doing this, the 401 error does NOT occurs. But the users have to save its work continuosly to don't lose it, or course.
Thanks!

@SergioIbIGZ If I am reading your situation correctly, it seems like you have issues on 1.x line, but the issue is not on 2.x line? Unfortunately we do not develop for 1.x anymore anymore, and would recommend you upgrade to 2.x. https://opensearch.org/releases.html. That being said, I am going to shortly post a summary on this issue and close it out with the merging of a recent PR. Feel free to open another issue if something that is affecting 2.x comes up, or if I am not understanding your problem correctly. Thanks!

That's right @derek-ho. I already tested it in 2.x and my issue is not present. So I will suggest to update to that version.
Thanks a lot :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged v2.12.0 Items targeting 2.12.0
Projects
None yet