-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5347][CORE] Change FileSplit to InputSplit in update inputMetrics #4150
Conversation
Test build #25937 has started for PR 4150 at commit
|
Test build #25937 has finished for PR 4150 at commit
|
Test PASSed. |
My only question was whether |
I think this is a duplicate of #4050, which only adds support for |
Given this reasoning, it does seem like this is a duplicate of SPARK-5199 |
If we use a inputFormat that don‘t instanc of org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}, then we can't get information of input metrics. |
@shenh062326 Sandy is saying that in those other cases, the values you are getting are not even in the same units, and so would be invalid. I believe we should close this PR in favor of #4050 which accomplishes the part of this change that is possible. |
Hi @shenh062326 since this is a duplicate would you mind closing this PR? The associated JIRA is already closed. Thanks. |
When inputFormatClass is set to CombineFileInputFormat, input metrics show that input is empty. It don't appear is spark-1.1.0. It's because in HadoopRDD, inputMetrics only been set when split is instanceOf FileSplit, but CombineFileInputFormat use InputSplit. It's not nessesary to instanceOf FileSplit, only have to instanceOf InputSplit.