Skip to content

Commit

Permalink
[SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

Until 2.2.1, Spark raises `NullPointerException` on zero-size ORC files. Usually, these zero-size ORC files are generated by 3rd-party apps like Flume.

```scala
scala> sql("create table empty_orc(a int) stored as orc location '/tmp/empty_orc'")

$ touch /tmp/empty_orc/zero.orc

scala> sql("select * from empty_orc").show
java.lang.RuntimeException: serious problem at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
...
Caused by: java.lang.NullPointerException at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560)
```

After [SPARK-22279](apache#19499), Apache Spark with the default configuration doesn't have this bug. Although Hive 1.2.1 library code path still has the problem, we had better have a test coverage on what we have now in order to prevent future regression on it.

## How was this patch tested?

Pass a newly added test case.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#19948 from dongjoon-hyun/SPARK-19809-EMPTY-FILE.
  • Loading branch information
dongjoon-hyun authored and HyukjinKwon committed Dec 12, 2017
1 parent 704af4b commit 17cdabb
Showing 1 changed file with 17 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2172,4 +2172,21 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
}
}
}

test("SPARK-19809 NullPointerException on zero-size ORC file") {
Seq("native", "hive").foreach { orcImpl =>
withSQLConf(SQLConf.ORC_IMPLEMENTATION.key -> orcImpl) {
withTempPath { dir =>
withTable("spark_19809") {
sql(s"CREATE TABLE spark_19809(a int) STORED AS ORC LOCATION '$dir'")
Files.touch(new File(s"${dir.getCanonicalPath}", "zero.orc"))

withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") { // default since 2.3.0
checkAnswer(sql("SELECT * FROM spark_19809"), Seq.empty)
}
}
}
}
}
}
}

0 comments on commit 17cdabb

Please sign in to comment.