Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18061][SQL][Security] Spark Thriftserver needs to create SPNego principal #15594

Closed
wants to merge 1 commit into from

Conversation

cmirash
Copy link

@cmirash cmirash commented Oct 22, 2016

What changes were proposed in this pull request?

Spark Thriftserver when running in HTTP mode with Kerberos enabled gives a 401 authentication error when receiving beeline HTTP request (with end user as kerberos principal). The similar command works with Hive Thriftserver.

What we find is Hive thriftserver CLI service creates both hive service and SPNego principal when kerberos is enabled whereas Spark Thriftserver only creates hive service principal.

CLIService.java

if (UserGroupInformation.isSecurityEnabled()) {
      try {
        HiveAuthFactory.loginFromKeytab(hiveConf);
        this.serviceUGI = Utils.getUGI();
      } catch (IOException e) {
        throw new ServiceException("Unable to login to kerberos with given principal/keytab", e);
      } catch (LoginException e) {
        throw new ServiceException("Unable to login to kerberos with given principal/keytab", e);
      }
  // Also try creating a UGI object for the SPNego principal
  String principal = hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_PRINCIPAL);
  String keyTabFile = hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_KEYTAB);
  if (principal.isEmpty() || keyTabFile.isEmpty()) {
    LOG.info("SPNego httpUGI not created, spNegoPrincipal: " + principal +
        ", ketabFile: " + keyTabFile);
  } else {
    try {
      this.httpUGI = HiveAuthFactory.loginFromSpnegoKeytabAndReturnUGI(hiveConf);
      LOG.info("SPNego httpUGI successfully created.");
    } catch (IOException e) {
      LOG.warn("SPNego httpUGI creation failed: ", e);
    }
  }
}

SparkSQLCLIService.scala

 if (UserGroupInformation.isSecurityEnabled) {
      try {
        HiveAuthFactory.loginFromKeytab(hiveConf)
        sparkServiceUGI = Utils.getUGI()
        setSuperField(this, "serviceUGI", sparkServiceUGI)
      } catch {
        case e @ (_: IOException | _: LoginException) =>
          throw new ServiceException("Unable to login to kerberos with given principal/keytab", e)
      }

The patch will add missing SPNego principal to Spark Thriftserver.

How was this patch tested?

Ran manual testing with beeline command through spark against kerberized cluster.
Ran Spark unit tests for hive, sql and catalyst.

@cmirash
Copy link
Author

cmirash commented Nov 1, 2016

@steveloughran @yhuai Will you be able to help review this PR?

@steveloughran
Copy link
Contributor

I'm not spark committer so can't review it well enough to get in; I was just watching it out of concern for the word "kerberos". How about you ask on the spark developer list for any volunteers to review.

Looking at the code, I'd drop using reflection for logging. You can just create an SLF4J log with the name "org.apache.hive.hiveserver2" (or whatever the full path is) and end up with the same log name as the parent class. As for the setting of the hive option, I do think it's an ugly pain which should really be addressed by making hive extensible —until then, what choice to do you have.

@cmirash
Copy link
Author

cmirash commented Nov 2, 2016

Thanks @steveloughran. I will check on the developer list.

} else {
try {
httpUGI = HiveAuthFactory.loginFromSpnegoKeytabAndReturnUGI(hiveConf)
setSuperField(this, "httpUGI", httpUGI)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have similar code in CLIService.java which is not setting the httpUGI field, do we need to make the behavior the same in both files ?

@lresende
Copy link
Member

@vanzin I believe this might be your realm :) Could you please help review this.

@cmirash
Copy link
Author

cmirash commented Nov 18, 2016

Thanks Luciano. I am looking at the changes and will add them soon.

@vanzin
Copy link
Contributor

vanzin commented Nov 22, 2016

ok to test

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with the thrift server (Spark's or Hive's) so can't really comment much. Maybe someone more familiar with this code can take a look (use git blame to find out?).

@@ -57,7 +59,24 @@ private[hive] class SparkSQLCLIService(hiveServer: HiveServer2, sqlContext: SQLC
case e @ (_: IOException | _: LoginException) =>
throw new ServiceException("Unable to login to kerberos with given principal/keytab", e)
}
}

// Also try creating a UGI object for the SPNego principal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation here is wrong. Also, following blocks have code indented with 4 spaces instead of 2, and wrong indentation.

val principal = hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_PRINCIPAL)
val keyTabFile = hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_KEYTAB)
if (principal.isEmpty() || keyTabFile.isEmpty()) {
getAncestorField[Log](this, 3, "LOG").info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log message seems unnecessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vanzin. The log message was added to replicate same behavior as the Hive Thriftserver code block. @steveloughran added the kerberos code and he added his comments above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about the specific log policy here; but I do think using reflection to get a private stuff is dangerous. I know it happens a lot in this code but at some point it'd be nice to move it. you don't need to use reflection to get the same log as a parent, just go

LoggerFactory.getLog("org.apache.hive.thrift...whatever").info(s"something $key")

This is cleaner and not going to break if hive ever change logging frameworks.

@SparkQA
Copy link

SparkQA commented Nov 22, 2016

Test build #69034 has finished for PR 15594 at commit 448f91f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Hi @cmirash, is it still active?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants