Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc changes for new nested JSON reader [skip ci] #7791

Merged
merged 2 commits into from
Feb 24, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 8 additions & 26 deletions docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,38 +296,20 @@ The JSON format read is a very experimental feature which is expected to have so
it by default. If you would like to test it, you need to enable `spark.rapids.sql.format.json.enabled` and
`spark.rapids.sql.format.json.read.enabled`.

Currently, the GPU accelerated JSON reader doesn't support column pruning, which will likely make
this difficult to use or even test. The user must specify the full schema or just let Spark infer
the schema from the JSON file. eg,

We have a `people.json` file with below content

It will cause error on invalid data. For example, the following is valid:
ttnghia marked this conversation as resolved.
Show resolved Hide resolved
``` console
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
```

Both below ways will work

- Inferring the schema

``` scala
val df = spark.read.json("people.json")
```

- Specifying the full schema

``` scala
val schema = StructType(Seq(StructField("name", StringType), StructField("age", IntegerType)))
val df = spark.read.schema(schema).json("people.json")
```

While the below code will not work in the current version,
The followings will cause error:
ttnghia marked this conversation as resolved.
Show resolved Hide resolved
```console
{"name":"Andy", "age":30} ,,,,
{"name":"Justin", "age":19}
```

``` scala
val schema = StructType(Seq(StructField("name", StringType)))
val df = spark.read.schema(schema).json("people.json")
```console
{"name": Justin", "age":19}
```

### JSON supporting types
Expand Down