Skip to content

Commit

Permalink
[SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list …
Browse files Browse the repository at this point in the history
…of paths

## What changes were proposed in this pull request?
If given a list of paths, `pyspark.sql.readwriter.text` will attempt to use an undefined variable `paths`.  This change checks if the param `paths` is a basestring and then converts it to a list, so that the same variable `paths` can be used for both cases

## How was this patch tested?
Added unit test for reading list of files

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #15379 from BryanCutler/sql-readtext-paths-SPARK-17805.
  • Loading branch information
BryanCutler authored and rxin committed Oct 7, 2016
1 parent 3713bb1 commit bcaa799
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 2 deletions.
4 changes: 2 additions & 2 deletions python/pyspark/sql/readwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,8 +289,8 @@ def text(self, paths):
[Row(value=u'hello'), Row(value=u'this')]
"""
if isinstance(paths, basestring):
path = [paths]
return self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(path)))
paths = [paths]
return self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(paths)))

@since(2.0)
def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=None,
Expand Down
6 changes: 6 additions & 0 deletions python/pyspark/sql/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -1702,6 +1702,12 @@ def test_cache(self):
"does_not_exist",
lambda: spark.catalog.uncacheTable("does_not_exist"))

def test_read_text_file_list(self):
df = self.spark.read.text(['python/test_support/sql/text-test.txt',
'python/test_support/sql/text-test.txt'])
count = df.count()
self.assertEquals(count, 4)


class HiveSparkSubmitTests(SparkSubmitTests):

Expand Down

0 comments on commit bcaa799

Please sign in to comment.