Skip to content

Releases: databricks/spark-xml

v0.18.0

10 Apr 16:06
ddd1ef5
Compare
Choose a tag to compare

What's Changed

  • Use defined timezone on write for formats that need TZ info by @srowen in #665
  • Add notes about file extensions and _corrupt_record to documentation by @dolfinus in #674
  • Fix for xml expression to not parse arbitrary strings by @xanderbailey in #679
  • Update for 0.18.0, move CICD configs to supported Spark versions by @srowen in #680

New Contributors

Full Changelog: v0.17.0...v0.18.0

Version 0.17.0

07 Sep 05:11
Compare
Choose a tag to compare
  • Improve handling of XSD complex type, decimal (#631, #638)
  • Restore behavior of ignoreSurroundingSpaces (#637)
  • Improve schema inference performance (#660)
  • Fix corner case of double/float type inference (#644)

See https://github.com/databricks/spark-xml/milestone/14?closed=1

Note that this is intended to be the final stand-alone release of spark-xml, as it is being incorporated into Apache Spark 4.0.

Version 0.16.0

05 Jan 14:39
Compare
Choose a tag to compare
  • Minor bug fixes
  • Custom timestamp formats now use session timezone when not specified in the format/input (#621)
  • Some "ref" elements work in XSD schemas now ((#619)
  • 'arrayElementName' can be used to control the schema name used for array elements when writing (#603)

See https://github.com/databricks/spark-xml/milestone/13?closed=1

Version 0.15.0

03 Jun 17:17
f4d592b
Compare
Choose a tag to compare

This is a minor bug fix release, primarily for:

  • #582 Fix a Hadoop conf bug that interferes with running multiple separate spark-xml reads/write jobs concurrently

See also:

https://github.com/databricks/spark-xml/milestone/12

Version 0.14.0

21 Oct 19:23
a17f473
Compare
Choose a tag to compare

This release is primarily to support Spark 3.2.0 and Scala 2.13. Support for Scala 2.11, previously deprecated, is removed. Spark 2 is not officially supported now, but should continue to work with Scala 2.12 builds.

It includes one new feature, otherwise:

  • Control XML declaration in XML output (#560)

See https://github.com/databricks/spark-xml/issues?q=is%3Aclosed+milestone%3A0.14.0

Version 0.13.0

21 Sep 00:40
Compare
Choose a tag to compare

This is a minor bug fix release; see https://github.com/databricks/spark-xml/pulls?q=is%3Apr+is%3Aclosed+milestone%3A0.13.0

  • Improvement: better handling of certain XSD complexTypes in XSD -> schema parsing (#559)
  • Fix: Return null for primitive types when value matches nullValue string (#542)
  • Deprecated Dataset[String] implicit and improve XmlReader options (#528)
  • Deprecated Scala 2.11 support

Version 0.12.0

23 Feb 20:28
ceed1b8
Compare
Choose a tag to compare
  • Fixed schema inference for date types (#521)
  • Fixed some type inferences of primitive types (int vs long) from XSDs) (#522)
  • Fixed parsing of partial result when a row fails to parse (#518)
  • Fixed bug in parsing missing optional child tags in certain situations (#513)
  • Fixed parsing of non-UTF-8 XML data (#511)
  • Added support for additional timestampFormat, dateFormat format for reading, writing timestamp / date in XML

https://github.com/databricks/spark-xml/milestone/9?closed=1

https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.12.0/
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.12.0/

Version 0.11.0

07 Dec 17:51
74b9802
Compare
Choose a tag to compare
  • Reading:
    • Support for 'wildcard' columns (wildcardColName) matching anything, corresponding to XSD xs:any types
    • Can optionally ignore namespace prefixes with ignoreNamespace
    • MapType columns now read attributes correctly
  • Writing:
    • Root tag can have attributes
    • Timestamp output format now follows XML standards
  • Minor fixes and improvements to XSD schema support

Changes: https://github.com/databricks/spark-xml/milestone/8?closed=1

https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.11.0/
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.12/0.11.0/

Version 0.10.0

25 Aug 22:04
Compare
Choose a tag to compare

Highlights:

  • Bug fix: in rare cases, parsing an uncompressed XML file could miss a record. (#468)
  • Bug fix: parsing XML subtree as string field would lose attributes (#469)
  • Feature: experimental support for inferring a Spark schema from an XSD (#457)
  • Other minor bug fixes

Changes: https://github.com/databricks/spark-xml/milestone/7?closed=1

https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.10.0/
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.12/0.10.0/

Version 0.9.0

02 Mar 02:18
bb4b9ff
Compare
Choose a tag to compare

Highlights:

  • Support XSD validation in from_xml (#433)
  • Don't ignore unclosed tag content (#437)
  • Helper functions to support manually using from_xml, etc from Python (#438)

Changes: https://github.com/databricks/spark-xml/milestone/6?closed=1