-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20172][Core] Add file permission check when listing files in FsHistoryProvider #17495
Conversation
Change-Id: I7c7014ed83dbba786d90ca530380e94c086b497b
@@ -320,14 +321,15 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) | |||
.filter { entry => | |||
try { | |||
val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L) | |||
fs.access(entry.getPath, FsAction.READ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is tagged with @LimitedPrivate({"HDFS", "Hive"})
, but I think it should be fine to call it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this API actually calls fs.getFileStatus()
which is a remote call and completely unnecessary in this context, since you already have the FileStatus
.
You could instead directly do the check against the FsPermission
object (same way the fs.access()
does after it performs the remote call).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, let me change it. Thanks.
Change-Id: Ie2bd075561e481266dc8047e2af604f4d6c83810
Test build #75431 has finished for PR 17495 at commit
|
@tgravescs @vanzin , would you please help reviewing this PR. Thanks a lot. |
Ping @vanzin @tgravescs again. Sorry to bother you and really appreciate your time. |
Test build #75594 has finished for PR 17495 at commit
|
Sorry @jerryshao I know you have a few up but I'm swamped and probably won't get to them to next week. |
@@ -571,6 +572,34 @@ class FsHistoryProviderSuite extends SparkFunSuite with BeforeAndAfter with Matc | |||
} | |||
} | |||
|
|||
test("log without read permission should be filtered out before actual reading") { | |||
class TestFsHistoryProvider extends FsHistoryProvider(createTestConf()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll need this from another test in this same file:
// setReadable(...) does not work on Windows. Please refer JDK-6728842.
assume(!Utils.isWindows)
In fact shouldn't this test be merged with the test for SPARK-3697?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference of this unit test is that this UT checks whether the file is filtered out during check, and for SPARK-3697 UT, it only checks the final result, so file could be filtered out during read. Let me merge this two UTs.
@@ -320,14 +321,15 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) | |||
.filter { entry => | |||
try { | |||
val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L) | |||
fs.access(entry.getPath, FsAction.READ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this API actually calls fs.getFileStatus()
which is a remote call and completely unnecessary in this context, since you already have the FileStatus
.
You could instead directly do the check against the FsPermission
object (same way the fs.access()
does after it performs the remote call).
Change-Id: I95fe3f765c8b66cc14bc888899f99dc8a0466e91
Test build #75684 has finished for PR 17495 at commit
|
@@ -320,14 +321,35 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) | |||
.filter { entry => | |||
try { | |||
val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L) | |||
|
|||
def canAccess = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return type? Also I'd make this a top-level method (with the entry and action as parameters), maybe even in SparkHadoopUtil
... just to avoid the deeply-nested method declaration. That allows you to easily write a unit test for it (yay!).
val perm = entry.getPermission | ||
val ugi = UserGroupInformation.getCurrentUser | ||
val user = ugi.getShortUserName | ||
val groups = ugi.getGroupNames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is only used in one place, just inline the call.
} else if (perm.getOtherAction.implies(FsAction.READ)) { | ||
true | ||
} else { | ||
throw new AccessControlException(s"Permission denied: user=$user, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why throw an exception instead of returning false? That makes the function's interface weird.
You could just log the access issue here if you want, and then you can even remove the try..catch
.
val ugi = UserGroupInformation.getCurrentUser | ||
val user = ugi.getShortUserName | ||
val groups = ugi.getGroupNames | ||
if (user == entry.getOwner && perm.getUserAction.implies(FsAction.READ)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is not correct. The code in Hadoop's FileSystem.checkAccessPermissions
is slightly different and actually correct.
For example, if you have a file with permissions 066
and your user matches the owner, you do not have permission to read the file, even if you belong to a group that has permissions to read it. Your code allows that user to read that file.
e.g.
$ sudo ls -la /tmp/test
total 132
d---rwxrwx 2 vanzin vanzin 4096 Abr 12 10:30 .
drwxrwxrwt 78 root root 126976 Abr 12 10:30 ..
$ ls /tmp/test
ls: cannot open directory '/tmp/test': Permission denied
There's also an issue with superusers (they should always have permissions), but then the Hadoop library also has that problem, so maybe we can ignore that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about it. Let me fix it.
Change-Id: I38a5491e555496004204bf99634f7147dac6c642
Test build #75755 has finished for PR 17495 at commit
|
Jenkins, retest this please. |
Test build #75758 has started for PR 17495 at commit |
Jenkins, test this please. |
Test build #75774 has finished for PR 17495 at commit
|
val path = new Path(logFile2.toURI) | ||
val fs = path.getFileSystem(SparkHadoopUtil.get.conf) | ||
val status = fs.getFileStatus(path) | ||
SparkHadoopUtil.get.checkAccessPermission(status, FsAction.READ) should be (true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these tests here? There's no SparkHadoopUtilSuite
currently but it makes a lot more sense to create one, since that's where the code lives.
You also don't need to reference real files here; you can create custom FileStatus
objects to test the behavior you want (the constructors are public and provide all the parameters you need).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vanzin , I will update the code.
Change-Id: Ib9ac4be0a896531a529c260197431df1c3adf77a
Test build #75884 has started for PR 17495 at commit |
Jenkins, retest this please. |
Test build #75890 has finished for PR 17495 at commit
|
Jenkins, retest this please. |
Test build #75897 has finished for PR 17495 at commit
|
class SparkHadoopUtilSuite extends SparkFunSuite with Matchers { | ||
test("check file permission") { | ||
import FsAction._ | ||
val user = UserGroupInformation.getCurrentUser.getShortUserName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using getCurrentUser
, I think it would be safer to use createUserForTesting
+ ugi.doAs
. That lets you define the user name and the groups and thus avoids possible clashes with the OS configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be better, thanks for the suggestion.
Change-Id: Iafc17472e44a511402a3faa5e1889fa445b3c386
Test build #75924 has finished for PR 17495 at commit
|
val groups = UserGroupInformation.getCurrentUser.getGroupNames | ||
require(!groups.isEmpty) | ||
|
||
val testUser = user + "-" + Random.nextInt(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to get the current user just for this... just create a random user name.
Change-Id: Ic9851b569da3458895e7c7ea8a5474941c466585
Test build #75965 has finished for PR 17495 at commit
|
LGTM, merging to master / 2.2. |
…sHistoryProvider ## What changes were proposed in this pull request? In the current Spark's HistoryServer we expected to get `AccessControlException` during listing all the files, but unfortunately it was not worked because we actually doesn't check the access permission and no other calls will throw such exception. What was worse is that this check will be deferred until reading files, which is not necessary and quite verbose, since it will be printed out the exception in every 10 seconds when checking the files. So here with this fix, we actually check the read permission during listing the files, which could avoid unnecessary file read later on and suppress the verbose log. ## How was this patch tested? Add unit test to verify. Author: jerryshao <sshao@hortonworks.com> Closes #17495 from jerryshao/SPARK-20172. (cherry picked from commit 592f5c8) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
…sHistoryProvider ## What changes were proposed in this pull request? In the current Spark's HistoryServer we expected to get `AccessControlException` during listing all the files, but unfortunately it was not worked because we actually doesn't check the access permission and no other calls will throw such exception. What was worse is that this check will be deferred until reading files, which is not necessary and quite verbose, since it will be printed out the exception in every 10 seconds when checking the files. So here with this fix, we actually check the read permission during listing the files, which could avoid unnecessary file read later on and suppress the verbose log. ## How was this patch tested? Add unit test to verify. Author: jerryshao <sshao@hortonworks.com> Closes apache#17495 from jerryshao/SPARK-20172.
What changes were proposed in this pull request?
In the current Spark's HistoryServer we expected to get
AccessControlException
during listing all the files, but unfortunately it was not worked because we actually doesn't check the access permission and no other calls will throw such exception. What was worse is that this check will be deferred until reading files, which is not necessary and quite verbose, since it will be printed out the exception in every 10 seconds when checking the files.So here with this fix, we actually check the read permission during listing the files, which could avoid unnecessary file read later on and suppress the verbose log.
How was this patch tested?
Add unit test to verify.