docs/sql-data-sources-json.md
Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON.
For a regular multi-line JSON file, set the multiLine parameter to True.
{% include_example json_dataset python/sql/datasource.py %}
</div> <div data-lang="scala" markdown="1"> Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset[Row]`. This conversion can be done using `SparkSession.read.json()` on either a `Dataset[String]`, or a JSON file.Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON.
For a regular multi-line JSON file, set the multiLine option to true.
{% include_example json_dataset scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
</div> <div data-lang="java" markdown="1"> Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset<Row>`. This conversion can be done using `SparkSession.read().json()` on either a `Dataset<String>`, or a JSON file.Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON.
For a regular multi-line JSON file, set the multiLine option to true.
{% include_example json_dataset java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
</div> <div data-lang="r" markdown="1"> Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. using the `read.json()` function, which loads data from a directory of JSON files where each line of the files is a JSON object.Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON.
For a regular multi-line JSON file, set a named parameter multiLine to TRUE.
{% include_example json_dataset r/RSparkSQLExample.R %}
</div> <div data-lang="SQL" markdown="1">{% highlight sql %}
CREATE TEMPORARY VIEW jsonTable USING org.apache.spark.sql.json OPTIONS ( path "examples/src/main/resources/people.json" )
SELECT * FROM jsonTable
{% endhighlight %}
</div> </div>Data source options of JSON can be set via:
.option/.options methods of
DataFrameReaderDataFrameWriterDataStreamReaderDataStreamWriterfrom_jsonto_jsonschema_of_jsonOPTIONS clause at CREATE TABLE USING DATA_SOURCE<ul>
<li>Region-based zone ID: It should have the form 'area/city', such as 'America/Los_Angeles'.</li>
<li>Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
</ul>
Other short names like 'CST' are not recommended to use because they can be ambiguous.
</td>
<td>read/write</td>
<ul>
<li><code>PERMISSIVE</code>: when it meets a corrupted record, puts the malformed string into a field configured by <code>columnNameOfCorruptRecord</code>, and sets malformed fields to <code>null</code>. To keep corrupt records, an user can set a string type field named <code>columnNameOfCorruptRecord</code> in an user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds a <code>columnNameOfCorruptRecord</code> field in an output schema.</li>
<li><code>DROPMALFORMED</code>: ignores the whole corrupted records. This mode is unsupported in the JSON built-in functions.</li>
<li><code>FAILFAST</code>: throws an exception when it meets corrupted records.</li>
</ul>
</td>
<td>read</td>
<ul>
<li><code>+INF</code>: for positive infinity, as well as alias of <code>+Infinity</code> and <code>Infinity</code>.</li>
<li><code>-INF</code>: for negative infinity, alias <code>-Infinity</code>.</li>
<li><code>NaN</code>: for other not-a-numbers, like result of division by zero.</li>
</ul>
</td>
<td>read</td>