Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc changes for new nested JSON reader [skip ci] #7791

Merged
merged 2 commits into from
Feb 24, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 9 additions & 26 deletions docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,38 +296,21 @@ The JSON format read is a very experimental feature which is expected to have so
it by default. If you would like to test it, you need to enable `spark.rapids.sql.format.json.enabled` and
`spark.rapids.sql.format.json.read.enabled`.

Currently, the GPU accelerated JSON reader doesn't support column pruning, which will likely make
this difficult to use or even test. The user must specify the full schema or just let Spark infer
the schema from the JSON file. eg,

We have a `people.json` file with below content

Reading input containing invalid JSON format (in any row) will throw runtime exception.
An example of valid input is as following:
``` console
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
```

Both below ways will work

- Inferring the schema

``` scala
val df = spark.read.json("people.json")
```

- Specifying the full schema

``` scala
val schema = StructType(Seq(StructField("name", StringType), StructField("age", IntegerType)))
val df = spark.read.schema(schema).json("people.json")
```

While the below code will not work in the current version,
The following input is invalid and will cause error:
```console
{"name":"Andy", "age":30} ,,,,
{"name":"Justin", "age":19}
```

``` scala
val schema = StructType(Seq(StructField("name", StringType)))
val df = spark.read.schema(schema).json("people.json")
```console
{"name": Justin", "age":19}
```

### JSON supporting types
Expand Down