Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: basic support for SASL/PLAIN #923

Merged
merged 18 commits into from
Aug 8, 2018
Merged

Conversation

thomaslee
Copy link
Contributor

Still plenty more to do here (honestly, it's a bit of a mess in places), but thought I might open this up for comment.

We have a pressing need for dumb authentication in our Kafka cluster(s) to work around some infrastructure issues and kafka-node's support for TLS client certs isn't an option in the short term. Our immediate needs would be met with basic support for SASL/PLAIN, but obviously kafka-node doesn't support SASL -- just wanted to prove to myself that it was fairly trivial to hook up something simple. So this is my whirlwind prototype.

Very, very briefly verified against a local Kafka instance configured with a SASL_PLAINTEXT listener, seems to work -- just barely. 😅

Support for SASL/GSSAPI & SASL/SCRAM-SHA-{256,512} are lower on my list of priorities, but would look pretty similar.

Previous discussion by others back in 2016 over in #511 and #482. Hopefully SASL support is still something you're open to @hyperlink.

@hyperlink
Copy link
Collaborator

Thanks for your contribution @thomaslee ! I am absolutely open to whatever the community puts out there. All these new contributions are very encouraging to see. I will go through these PRs as soon as I can.

Can you share details on how you configured kafka to work with SASL? Ideally create another Kafka container with SASL enabled and add test to verify this works. 👍

@thomaslee
Copy link
Contributor Author

Sure -- there are a few steps, hopefully I haven't missed anything:

In config/server.properties, add a SASL_PLAINTEXT listener and set sasl.enabled.mechanisms:

listeners=PLAINTEXT://:9092,SASL_PLAINTEXT://:9192
sasl.enabled.mechanisms=PLAIN

then create config/kafka_server_jaas.conf (you can add more user_XXX lines for additional users):

KafkaServer {
    org.apache.kafka.common.security.plain.PlainLoginModule required
    user_myuser="mypassword";
};

lastly, when launching Kafka, set kava.security.auth.login.config via KAFKA_OPTS:

KAFKA_OPTS="-Djava.security.auth.login.config=config/kafka_server_jaas.conf" bin/kafka-server-start.sh config/server.properties

Might be an easier way, but this seems to work for me.

@thomaslee thomaslee force-pushed the tom_sasl_plain branch 2 times, most recently from 5370c11 to 1996f4a Compare April 23, 2018 06:51
@thomaslee
Copy link
Contributor Author

@hyperlink I've added a positive & a negative test for SASL/PLAIN auth (including broker config for an extra SASL_PLAINTEXT listener -- the final config ended up being a bit more involved than what my earlier config might have implied). Both are passing for me locally against Kafka 1.0.1.

That said, I've had a hard time getting the other tests to pass locally even without my changes -- leaning on Travis for broader verification.

@thomaslee
Copy link
Contributor Author

thomaslee commented Apr 23, 2018

Ugh, digging around in Kafka source code the test failures around the SASL stuff are legitimate: as I mentioned, I've been testing against Kafka 1.0.1, which supports SaslHandshake v1. Apparently Kafka versions 0.10 & 0.11 use v0 of the SaslHandshake API, which triggers a raw SASL exchange without the use of SaslAuthenticate{Request,Response}. As a result, I'll need to revisit this & port to SaslHandshake v0.

EDIT: modified to clarify 0.10 & 0.11 rather than 0.8-0.11: I'm not sure when the first support for SASL was introduced, but as far as I can tell SASL/PLAIN was only introduced in 0.10. So these tests will need to be skipped for anything <0.10

@thomaslee thomaslee force-pushed the tom_sasl_plain branch 3 times, most recently from 4eb6b09 to b42cfbe Compare April 24, 2018 01:43
@thomaslee
Copy link
Contributor Author

@hyperlink phew -- tests finally went green, then a build failed after a no-op change modifying commit history. 😢 I'm assuming that's an intermittent error.

Okay, so the nasty details:

As I mentioned: SaslHandshake v1 (used in 1.0.0+) is pretty straightforward: no surprises, easy to implement.

SaslHandshake v0 (used in 0.10/0.11) on the other hand gets a bit messy because the step where we send credentials doesn't actually use normal Kafka request/response headers. As a result, we can't rely on correlation IDs for SASL authentication responses -- so we really don't want non-SASL responses coming back while we're in the middle of a SASL handshake. See the override of KafkaClient.handleReceivedData for the nastiest part of this change.

Anyway, because of this we now wait for the API versions response to come back before proceeding with the SASL exchange + metadata load and (ultimately) allowing the client to become "ready". I think this is probably better behavior anyway: pretty sure before this change we'd fire off the API versions request in parallel with the initial metadata load, which works but probably isn't what was originally intended. (This explains some strange log messages I saw a while back too.)

Lastly, the broker seems to just close the connection if authentication fails when using SaslHandshake v0 -- so this change assumes that authentication has failed if you've sent a SASL authentication packet & the connection suddenly closes before you get a reply.

If the backward compatibility stuff is too nasty, I wouldn't shed a tear if we only supported SASL for 1.0.0+. Certainly an easier, smaller patch -- but I'm sure there would be folks out there who might want to use SASL with 0.10 or 0.11. Let me know either way.

I think that's everything. Seem reasonable? And should I set up Travis builders for 1.0.0 and 1.1.0? That will exercise the SaslHandshake v1 code path, which is not yet exercised by the PR builds.

@hyperlink hyperlink self-requested a review April 24, 2018 14:33
@hyperlink
Copy link
Collaborator

😃 good work @thomaslee !

I will look at the PR in more detail and give some feedback soon.

Regarding the intermittent failures just ignore them for now I've been trying to improve the stability of the tests but they are hard to trackdown because they pass locally. I don't know why the ConsumerGroupStream tests fail. I have a branch where those tests pass everytime the suite runs. So it appears the failure is caused by some side effect of other tests.

Please setup travis building for kafka version 1 if you can and thanks for your contribution!

@thomaslee
Copy link
Contributor Author

Cheers! Just added 1.0 and 1.1 to the Travis matrix. Something's up with the start-kafka.sh script when using KAFKA_OPTS in the 1.x images -- something to do with sed. Had to hack around that for now, holler out if you see a better solution.

@thomaslee
Copy link
Contributor Author

Testing today revealed issues in ConsumerGroup with this change in place (using the SaslHandshake v1 code path): https://gist.githubusercontent.com/thomaslee/dfd4479a439c583e4dde490736c0d335/raw/5787ade7c1abcbfc8f06423ed935ddb4a429afb3/gistfile1.txt

The error on the broker side seems to be:

org.apache.kafka.common.errors.IllegalSaslStateException: Unexpected Kafka request of type JOIN_GROUP during SASL handshake.

Trying to figure out what's going on, but appreciate any insight.

@thomaslee
Copy link
Contributor Author

Okay, seems like Client.sendGroupRequest doesn't have any smarts wrt waiting for for the usual broker connection initialization path, so it's possible (even likely) for consumer group requests to sneak in while we're still trying to authenticate a connection if out bootstrap server list differs from the advertised server list. This can occur irrespective of whether we're using SaslHandshake v0 or v1.

Seems like a proper fix might involve modifying brokerForLeader (& related methods like setupBroker etc.) to take a callback which gets invoked with a fully authenticated BrokerWrapper. Then we could ensure that any callers to brokerForLeader get a fully initialized, authenticated connection before they try to send anything.

That's a little more than I'd like to bite off here, so trying to figure out a work around by modifying waitUntilReady to take the authenticated state of a connection into account & working that into sendGroupRequest. Easy change, seems to work fine at a glance -- just ironing out a few quirks.

@thomaslee
Copy link
Contributor Author

Alright, 11a66df introduced the new waitUntilReady logic that works around the issue where group requests were sneaking in before authentication.

@thomaslee
Copy link
Contributor Author

Ugh, there's still more issues with this when we get disconnected from a broker. I think I need to go back to the drawing board on some of this & lean more heavily on waitUntilReady in general instead of trying to get the chain of callbacks just right.

@hyperlink
Copy link
Collaborator

hyperlink commented Apr 25, 2018

@thomaslee probably outside the scope of this PR but I am thinking we should start to abstract the nitty gritty of broker initialization/authentication into the BrokerWrapper. The client makes requests normally as soon as it's instantiated and the BrokerWrapper will queue them up and process them once it's really ready. What do you think?

@thomaslee
Copy link
Contributor Author

@hyperlink yeah, I think I could see that working: so basically a call to write would just queue up until isReady() === true? I'm doing something similar in an as-yet unpushed set of changes: all writes to a broker go via a sendWhenReady method in KafkaClient/Client. This model is looking significantly simpler / more reliable than trying to get those callbacks just right.

I think I like your proposal better, but given all the thrash on this PR I'm tempted to leave it to a subsequent PR if you can forgive the warts for now. 😃 Will push the new set of changes shortly.

@thomaslee thomaslee force-pushed the tom_sasl_plain branch 2 times, most recently from e53b393 to 318306c Compare April 25, 2018 22:18
@thomaslee
Copy link
Contributor Author

@hyperlink alrighty, looks like tests are green again using the new approach that makes more liberal use of waitUntilReady. As expected, it also seems far more stable wrt existing consumer & producer logic.

Once you're happy with this & we land it, I'll follow up with a PR to rework the sendWhenReady logic for BrokerWrapper, as you hinted earlier -- unless you beat me to it, of course. 😄

@thomaslee
Copy link
Contributor Author

@hyperlink I'll resolve that conflict this evening, but have you had a chance to take a look at this? Wondering how far away we realistically are from landing this (obviously non-trivial and possibly risky) change. We've got it monkey-patched into 2.6.0 internally, but as you can imagine it ain't pretty. 😄

@hyperlink
Copy link
Collaborator

@thomaslee reviewing this PR is next on my list.

@thomaslee thomaslee force-pushed the tom_sasl_plain branch 2 times, most recently from e6e1bbc to 82ac886 Compare May 5, 2018 00:36
@thomaslee
Copy link
Contributor Author

Rebased & fixed up that conflict 😄

Copy link
Collaborator

@hyperlink hyperlink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding documentation options.sasl in the readme?

lib/client.js Outdated
var handlers = this.unqueueCallback(socket, correlationId);

if (handlers) {
var decoder = handlers[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're dropping node 4 so feel free to destructure to your heart's content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

logger.error('error initialize broker after connect', error);
logger.error('error initializing broker after connect', error);
if (error instanceof errors.SaslAuthenticationError) {
self.emit('auth_error', error);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use just emit on error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, I think I was concerned about raising an error here when we don't do so for other types of error (and presumably raising other types of error here would have knock-on implications for tests, client code, etc.)

To put it another way, I wanted to bubble up this error for test purposes but wasn't convinced this PR warranted changing error semantics & potentially breaking stuff. If that makes any sense at all. It makes a potentially large/risky change even larger and riskier. 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards emitting an error. I would expect consumers/producers would want to be notified of this failure since a failure in authentication means messages will not be received or delivered.

@@ -41,7 +43,7 @@ BrokerWrapper.prototype.isConnected = function () {
};

BrokerWrapper.prototype.isReady = function () {
return this.apiSupport != null;
return this.apiSupport != null && (!this.needAuthentication || this.authenticated);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add authenticated to the toString override?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

async.waterfall(
[
callback => {
logger.debug(`Sending SASL/${mechanism} handshake request to ${broker.socket.addr}`);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use ${broker} since the overridden toString should print the socket.addr along with other useful information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course 👍

);
let error = this.error;
if (!error) {
if (socket.saslAuthCorrelationId !== undefined) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also check this.options.sasl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't hurt!

@thomaslee
Copy link
Contributor Author

@hyperlink finally got around to applying your review feedback + rebased. Let me know if anything else needs work. FWIW we've been running the previous version of this code in production environments for a month or two without incident.

@prasadkashyap
Copy link

Thanks for this PR @thomaslee.

@hyperlink when do you plan on merging this please?

@lordgraysith
Copy link

@hyperlink we played with this at work. It seems functional and we are anxious to use it. We'd love to see it merged as well.
@thomaslee nice work!

@sureshappana
Copy link

@hyperlink We'd love to use this at work!!

Could you please give some timeline after merge to release?

@thomaslee
Copy link
Contributor Author

Appreciate the votes of confidence folks & nobody's keener to see this land than I am -- but some patience is warranted here. It's not a totally trivial change, the project lead has a day job and I was very slow in following up on his PR feedback (because I too have a day job 😉).

That said, appreciate any feedback in terms of testing/code review/etc. in the interim (looking at you @lordgraysith -- thanks for giving it a shot!). If we can get more confidence that this stuff is solid & ready to go I'm sure it'll make Xiaoxin's job easier when he does get a chance to take a look. Cheers!

@hyperlink hyperlink merged commit ee5e0c6 into SOHU-Co:master Aug 8, 2018
@lordgraysith
Copy link

Thank you so much @hyperlink and @thomaslee! We're excited to see this published out on npm.

@hyperlink
Copy link
Collaborator

Published as 3.0.0

@gshakya30
Copy link

@thomaslee , Is there any plan to provide support for SASL/GSSAPI??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants