Skip to content

Commit

Permalink
[SPARK-20367] Properly unescape column names of partitioning columns …
Browse files Browse the repository at this point in the history
…parsed from paths.

## What changes were proposed in this pull request?

When infering partitioning schema from paths, the column in parsePartitionColumn should be unescaped with unescapePathName, just like it is being done in e.g. parsePathFragmentAsSeq.

## How was this patch tested?

Added a test to FileIndexSuite.

Author: Juliusz Sompolski <julek@databricks.com>

Closes #17703 from juliuszsompolski/SPARK-20367.
  • Loading branch information
juliuszsompolski authored and cloud-fan committed Apr 21, 2017
1 parent 592f5c8 commit 0368eb9
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ object PartitioningUtils {
if (equalSignIndex == -1) {
None
} else {
val columnName = columnSpec.take(equalSignIndex)
val columnName = unescapePathName(columnSpec.take(equalSignIndex))
assert(columnName.nonEmpty, s"Empty partition column name in '$columnSpec'")

val rawColumnValue = columnSpec.drop(equalSignIndex + 1)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import org.apache.hadoop.fs.{FileStatus, Path, RawLocalFileSystem}

import org.apache.spark.metrics.source.HiveCatalogMetrics
import org.apache.spark.sql.catalyst.util._
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.test.SharedSQLContext
import org.apache.spark.util.{KnownSizeEstimation, SizeEstimator}
Expand Down Expand Up @@ -236,6 +237,17 @@ class FileIndexSuite extends SharedSQLContext {
val fileStatusCache = FileStatusCache.getOrCreate(spark)
fileStatusCache.putLeafFiles(new Path("/tmp", "abc"), files.toArray)
}

test("SPARK-20367 - properly unescape column names in inferPartitioning") {
withTempPath { path =>
val colToUnescape = "Column/#%'?"
spark
.range(1)
.select(col("id").as(colToUnescape), col("id"))
.write.partitionBy(colToUnescape).parquet(path.getAbsolutePath)
assert(spark.read.parquet(path.getAbsolutePath).schema.exists(_.name == colToUnescape))
}
}
}

class FakeParentPathFileSystem extends RawLocalFileSystem {
Expand Down

0 comments on commit 0368eb9

Please sign in to comment.