docs/sql-data-sources-xml.md
Spark SQL provides spark.read().xml("file_1_path","file_2_path") to read a file or directory of files in XML format into a Spark DataFrame, and dataframe.write().xml("path") to write to a xml file. The rowTag option must be specified to indicate the XML element that maps to a DataFrame row. The option() function can be used to customize the behavior of reading or writing, such as controlling behavior of the XML attributes, XSD validation, compression, and so on.
Data source options of XML can be set via:
.option/.options methods of
DataFrameReaderDataFrameWriterDataStreamReaderDataStreamWriterfrom_xmlto_xmlschema_of_xmlOPTIONS clause at CREATE TABLE USING DATA_SOURCE<ul>
<li><code>PERMISSIVE</code>: when it meets a corrupted record, puts the malformed string into a field configured by columnNameOfCorruptRecord, and sets malformed fields to null. To keep corrupt records, an user can set a string type field named columnNameOfCorruptRecord in an user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds a columnNameOfCorruptRecord field in an output schema.</li>
<li><code>DROPMALFORMED</code>: ignores the whole corrupted records. This mode is unsupported in the XML built-in functions.</li>
<li><code>FAILFAST</code>: throws an exception when it meets corrupted records.</li>
</ul>
</td>
<td>read</td>
<ul>
<li>Region-based zone ID: It should have the form 'area/city', such as 'America/Los_Angeles'.</li>
<li>Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00', also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
</ul>
Other short names like 'CST' are not recommended to use because they can be ambiguous.
</td>
<td>read/write</td>