-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18061][SQL][Security] Spark Thriftserver needs to create SPNego principal #15594
Conversation
@steveloughran @yhuai Will you be able to help review this PR? |
I'm not spark committer so can't review it well enough to get in; I was just watching it out of concern for the word "kerberos". How about you ask on the spark developer list for any volunteers to review. Looking at the code, I'd drop using reflection for logging. You can just create an SLF4J log with the name "org.apache.hive.hiveserver2" (or whatever the full path is) and end up with the same log name as the parent class. As for the setting of the hive option, I do think it's an ugly pain which should really be addressed by making hive extensible —until then, what choice to do you have. |
Thanks @steveloughran. I will check on the developer list. |
} else { | ||
try { | ||
httpUGI = HiveAuthFactory.loginFromSpnegoKeytabAndReturnUGI(hiveConf) | ||
setSuperField(this, "httpUGI", httpUGI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have similar code in CLIService.java which is not setting the httpUGI field, do we need to make the behavior the same in both files ?
@vanzin I believe this might be your realm :) Could you please help review this. |
Thanks Luciano. I am looking at the changes and will add them soon. |
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really familiar with the thrift server (Spark's or Hive's) so can't really comment much. Maybe someone more familiar with this code can take a look (use git blame
to find out?).
@@ -57,7 +59,24 @@ private[hive] class SparkSQLCLIService(hiveServer: HiveServer2, sqlContext: SQLC | |||
case e @ (_: IOException | _: LoginException) => | |||
throw new ServiceException("Unable to login to kerberos with given principal/keytab", e) | |||
} | |||
} | |||
|
|||
// Also try creating a UGI object for the SPNego principal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation here is wrong. Also, following blocks have code indented with 4 spaces instead of 2, and wrong indentation.
val principal = hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_PRINCIPAL) | ||
val keyTabFile = hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_KEYTAB) | ||
if (principal.isEmpty() || keyTabFile.isEmpty()) { | ||
getAncestorField[Log](this, 3, "LOG").info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log message seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vanzin. The log message was added to replicate same behavior as the Hive Thriftserver code block. @steveloughran added the kerberos code and he added his comments above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about the specific log policy here; but I do think using reflection to get a private stuff is dangerous. I know it happens a lot in this code but at some point it'd be nice to move it. you don't need to use reflection to get the same log as a parent, just go
LoggerFactory.getLog("org.apache.hive.thrift...whatever").info(s"something $key")
This is cleaner and not going to break if hive ever change logging frameworks.
Test build #69034 has finished for PR 15594 at commit
|
Hi @cmirash, is it still active? |
What changes were proposed in this pull request?
Spark Thriftserver when running in HTTP mode with Kerberos enabled gives a 401 authentication error when receiving beeline HTTP request (with end user as kerberos principal). The similar command works with Hive Thriftserver.
What we find is Hive thriftserver CLI service creates both hive service and SPNego principal when kerberos is enabled whereas Spark Thriftserver only creates hive service principal.
CLIService.java
SparkSQLCLIService.scala
The patch will add missing SPNego principal to Spark Thriftserver.
How was this patch tested?
Ran manual testing with beeline command through spark against kerberized cluster.
Ran Spark unit tests for hive, sql and catalyst.