Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20172][Core] Add file permission check when listing files in FsHistoryProvider #17495

Closed
wants to merge 7 commits into from

Conversation

jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

In the current Spark's HistoryServer we expected to get AccessControlException during listing all the files, but unfortunately it was not worked because we actually doesn't check the access permission and no other calls will throw such exception. What was worse is that this check will be deferred until reading files, which is not necessary and quite verbose, since it will be printed out the exception in every 10 seconds when checking the files.

So here with this fix, we actually check the read permission during listing the files, which could avoid unnecessary file read later on and suppress the verbose log.

How was this patch tested?

Add unit test to verify.

Change-Id: I7c7014ed83dbba786d90ca530380e94c086b497b
@@ -320,14 +321,15 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
.filter { entry =>
try {
val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
fs.access(entry.getPath, FsAction.READ)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is tagged with @LimitedPrivate({"HDFS", "Hive"}), but I think it should be fine to call it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this API actually calls fs.getFileStatus() which is a remote call and completely unnecessary in this context, since you already have the FileStatus.

You could instead directly do the check against the FsPermission object (same way the fs.access() does after it performs the remote call).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, let me change it. Thanks.

Change-Id: Ie2bd075561e481266dc8047e2af604f4d6c83810
@jerryshao jerryshao changed the title [SPARK-20172] Add file permission check when listing files in FsHistoryProvider [SPARK-20172][Core] Add file permission check when listing files in FsHistoryProvider Mar 31, 2017
@SparkQA
Copy link

SparkQA commented Mar 31, 2017

Test build #75431 has finished for PR 17495 at commit 63142cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

@tgravescs @vanzin , would you please help reviewing this PR. Thanks a lot.

@jerryshao
Copy link
Contributor Author

Ping @vanzin @tgravescs again. Sorry to bother you and really appreciate your time.

@SparkQA
Copy link

SparkQA commented Apr 7, 2017

Test build #75594 has finished for PR 17495 at commit 63142cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

Sorry @jerryshao I know you have a few up but I'm swamped and probably won't get to them to next week.

@@ -571,6 +572,34 @@ class FsHistoryProviderSuite extends SparkFunSuite with BeforeAndAfter with Matc
}
}

test("log without read permission should be filtered out before actual reading") {
class TestFsHistoryProvider extends FsHistoryProvider(createTestConf()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll need this from another test in this same file:

    // setReadable(...) does not work on Windows. Please refer JDK-6728842.
    assume(!Utils.isWindows)

In fact shouldn't this test be merged with the test for SPARK-3697?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference of this unit test is that this UT checks whether the file is filtered out during check, and for SPARK-3697 UT, it only checks the final result, so file could be filtered out during read. Let me merge this two UTs.

@@ -320,14 +321,15 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
.filter { entry =>
try {
val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)
fs.access(entry.getPath, FsAction.READ)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this API actually calls fs.getFileStatus() which is a remote call and completely unnecessary in this context, since you already have the FileStatus.

You could instead directly do the check against the FsPermission object (same way the fs.access() does after it performs the remote call).

Change-Id: I95fe3f765c8b66cc14bc888899f99dc8a0466e91
@SparkQA
Copy link

SparkQA commented Apr 11, 2017

Test build #75684 has finished for PR 17495 at commit 1d1440b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -320,14 +321,35 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
.filter { entry =>
try {
val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L)

def canAccess = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return type? Also I'd make this a top-level method (with the entry and action as parameters), maybe even in SparkHadoopUtil... just to avoid the deeply-nested method declaration. That allows you to easily write a unit test for it (yay!).

val perm = entry.getPermission
val ugi = UserGroupInformation.getCurrentUser
val user = ugi.getShortUserName
val groups = ugi.getGroupNames
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is only used in one place, just inline the call.

} else if (perm.getOtherAction.implies(FsAction.READ)) {
true
} else {
throw new AccessControlException(s"Permission denied: user=$user, " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why throw an exception instead of returning false? That makes the function's interface weird.

You could just log the access issue here if you want, and then you can even remove the try..catch.

val ugi = UserGroupInformation.getCurrentUser
val user = ugi.getShortUserName
val groups = ugi.getGroupNames
if (user == entry.getOwner && perm.getUserAction.implies(FsAction.READ)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not correct. The code in Hadoop's FileSystem.checkAccessPermissions is slightly different and actually correct.

For example, if you have a file with permissions 066 and your user matches the owner, you do not have permission to read the file, even if you belong to a group that has permissions to read it. Your code allows that user to read that file.

e.g.

$ sudo ls -la /tmp/test
total 132
d---rwxrwx  2 vanzin vanzin   4096 Abr 12 10:30 .
drwxrwxrwt 78 root   root   126976 Abr 12 10:30 ..
$ ls /tmp/test
ls: cannot open directory '/tmp/test': Permission denied

There's also an issue with superusers (they should always have permissions), but then the Hadoop library also has that problem, so maybe we can ignore that one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about it. Let me fix it.

Change-Id: I38a5491e555496004204bf99634f7147dac6c642
@SparkQA
Copy link

SparkQA commented Apr 13, 2017

Test build #75755 has finished for PR 17495 at commit 9447011.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 13, 2017

Test build #75758 has started for PR 17495 at commit 9447011.

@jerryshao
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Apr 13, 2017

Test build #75774 has finished for PR 17495 at commit 9447011.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val path = new Path(logFile2.toURI)
val fs = path.getFileSystem(SparkHadoopUtil.get.conf)
val status = fs.getFileStatus(path)
SparkHadoopUtil.get.checkAccessPermission(status, FsAction.READ) should be (true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these tests here? There's no SparkHadoopUtilSuite currently but it makes a lot more sense to create one, since that's where the code lives.

You also don't need to reference real files here; you can create custom FileStatus objects to test the behavior you want (the constructors are public and provide all the parameters you need).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vanzin , I will update the code.

Change-Id: Ib9ac4be0a896531a529c260197431df1c3adf77a
@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75884 has started for PR 17495 at commit 4d3838a.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75890 has finished for PR 17495 at commit 4d3838a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SparkHadoopUtilSuite extends SparkFunSuite with Matchers

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75897 has finished for PR 17495 at commit 4d3838a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SparkHadoopUtilSuite extends SparkFunSuite with Matchers

class SparkHadoopUtilSuite extends SparkFunSuite with Matchers {
test("check file permission") {
import FsAction._
val user = UserGroupInformation.getCurrentUser.getShortUserName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using getCurrentUser, I think it would be safer to use createUserForTesting + ugi.doAs. That lets you define the user name and the groups and thus avoids possible clashes with the OS configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be better, thanks for the suggestion.

Change-Id: Iafc17472e44a511402a3faa5e1889fa445b3c386
@SparkQA
Copy link

SparkQA commented Apr 19, 2017

Test build #75924 has finished for PR 17495 at commit ab5117b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val groups = UserGroupInformation.getCurrentUser.getGroupNames
require(!groups.isEmpty)

val testUser = user + "-" + Random.nextInt(100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to get the current user just for this... just create a random user name.

Change-Id: Ic9851b569da3458895e7c7ea8a5474941c466585
@SparkQA
Copy link

SparkQA commented Apr 20, 2017

Test build #75965 has finished for PR 17495 at commit b36fa75.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Apr 20, 2017

LGTM, merging to master / 2.2.

asfgit pushed a commit that referenced this pull request Apr 20, 2017
…sHistoryProvider

## What changes were proposed in this pull request?

In the current Spark's HistoryServer we expected to get `AccessControlException` during listing all the files, but unfortunately it was not worked because we actually doesn't check the access permission and no other calls will throw such exception. What was worse is that this check will be deferred until reading files, which is not necessary and quite verbose, since it will be printed out the exception in every 10 seconds when checking the files.

So here with this fix, we actually check the read permission during listing the files, which could avoid unnecessary file read later on and suppress the verbose log.

## How was this patch tested?

Add unit test to verify.

Author: jerryshao <sshao@hortonworks.com>

Closes #17495 from jerryshao/SPARK-20172.

(cherry picked from commit 592f5c8)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@asfgit asfgit closed this in 592f5c8 Apr 20, 2017
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…sHistoryProvider

## What changes were proposed in this pull request?

In the current Spark's HistoryServer we expected to get `AccessControlException` during listing all the files, but unfortunately it was not worked because we actually doesn't check the access permission and no other calls will throw such exception. What was worse is that this check will be deferred until reading files, which is not necessary and quite verbose, since it will be printed out the exception in every 10 seconds when checking the files.

So here with this fix, we actually check the read permission during listing the files, which could avoid unnecessary file read later on and suppress the verbose log.

## How was this patch tested?

Add unit test to verify.

Author: jerryshao <sshao@hortonworks.com>

Closes apache#17495 from jerryshao/SPARK-20172.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants