-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull Query: Disable Pull Queries when auth validation is needed on /query
endpoint
#3863
Comments
If I read this right, because we have a known performance hit from checking the user has the rights to access the data the pull query is using you're saying we should either: a) Fail the pull query, or This seems a very strange stance to take. Why would we fail the pull query? What if the user is happy to take the perf hit and just wants the pull query to work? If we skip validation we're introducing a security hole, (it would reintroduce #3772). It's a security hole with the current pull query impl, as you're allowing the user access to data KSQL, that they don't have permission to access from Kafka, (I think it helps to think of the materialized state in KSQL as a cache of the data in Kafka. KSQL could build the state from the state in Kafka if it wanted: using the pre-materialized state is, in essence, just an optimisation). Plus, it's certainly a security hole as we move forward with pull queries that will actually access Kafka topics, i.e. pull queries against none-materialized state. Surely the fix here is to introduce some level of caching for the auth responses so that we don't need to check auth on every request. |
If we create an admin client every request and making a network call, you are also creating a socket and with even reasonable volume of requests/sec, it will connection storm kafka and cause stability pains. (https://kafka.apache.org/documentation/ and search for "connection storms") . Ultimately, the user will end up turning off pull queries anyway, so why give them headaches. This is why we must do this before releasing.
Great. Looks like we hit the last option from your earlier list of things to try. #3663 (comment) . |
As @vinothchandar said, Kafka is known to be very sensitive to connection storms. Internal experience running kafka in the cloud shows this to be empirically true. Creating a new connection on each pull query when we can reasonably expect a high volume is going to cause instability on Kafka. Further the admin calls to describe the ACLs only go to the controller, which isn’t sharded. This creates another hot spot which won’t scale for a high volume of queries. Further the availability of the controller is vital to the availability of the cluster. So instability there is even worse. I think we should fix this properly (cached authorization results shared across requests), but in the meantime make sure that we don’t de stabilize existing Kafka installations: the latter would be worse for adoption IMO. |
Personally, I still think it's a shame to disable something that may be of use to people for their low volume / testing without issue. We could call this out in the release notes, or have it disabled in the default config with a warning in the comment etc. This would be better than disabling out right IMHO. Why not introduce a second config to control if pull queries are enabled if custom auth handler is installed? |
fixes confluentinc#3863 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation - By default, Pull queries error out, if auth validation is needed - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory - Fixed some tests, added test cases
fixes confluentinc#3863 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation - By default, Pull queries error out, if auth validation is needed - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory - Fixed some tests, added test cases
fixes confluentinc#3863 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation - By default, Pull queries error out, if auth validation is needed - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory - Fixed some tests, added test cases
…3879) fixes #3863 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation - By default, Pull queries error out, if auth validation is needed - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory - Fixed some tests, added test cases - Applied on both `query` and websocket endpoints
…c#3879) fixes confluentinc#3863 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation - By default, Pull queries error out, if auth validation is needed - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory - Fixed some tests, added test cases - Applied on both `query` and websocket endpoints
…sters (#3980) * refactor: lazy initialization of clients (admin,sr,ksql,connect) (#3696) - Made client creation lazy by memoizing them. * feat: add config to disable pull queries when validating (#3879) fixes #3863 - Added `ksql.query.pull.skip.access.validator` to control if pull queries work without validation - By default, Pull queries error out, if auth validation is needed - Replaced DUMMY_VALIDATOR with Optional<> interface for KsqlAuthorizationValidatorFactory - Fixed some tests, added test cases - Applied on both `query` and websocket endpoints
Now that the pull queries are moved to
/query
endpoint, we took a look at what validation happens on this code path.We are invoking
KsqlAuthorizationValidator#checkAuthorization()
for every request (push or pull) when eitherksql.access.validator.enable=on
orksql.access.validator.enable=auto
and the kafka cluster has a non empty value forauthorizer.class.name
.Even as #3696 avoids eager instantiation of the adminclient (+ other clients) for every request, it simply memoizes it per
ServiceContext
created for every request. When real auth validation is involved, the call path ultimately leads toKsqlAuthorizationValidatorImpl#checkAccess
, which describes a topic by talking to kafkaWe need to fail pull queries or skip validation, to avoid creating clients per request again.
The text was updated successfully, but these errors were encountered: