From cbe36338ac101c14c2d76545a74bcfa5d5002059 Mon Sep 17 00:00:00 2001 From: Wei Guo Date: Fri, 2 Aug 2024 10:13:07 +0900 Subject: [PATCH] [SPARK-49072][DOCS] Fix abnormal display of text content which contains two $ in one line but non-formula in docs ### What changes were proposed in this pull request? There are some display exceptions in some documents currently, for examples: - https://spark.apache.org/docs/3.5.1/running-on-kubernetes.html#secret-management ![image](https://github.com/user-attachments/assets/5a4fa4e0-b773-4007-96d0-c036bc7e0c13) - https://spark.apache.org/docs/latest/sql-migration-guide.html ![image](https://github.com/user-attachments/assets/e5f7ea17-9573-4917-b9cd-e36fd83d35fb) The reason is that the `MathJax` javascript package will display the content between two $ as a formula. This PR aims to fix abnormal display of text content which contains two $ in one line but not non-formula in docs. ### Why are the changes needed? Fix doc display exceptions. ### Does this PR introduce _any_ user-facing change? Yes, Improve user experience about docs. ### How was this patch tested? Local manual tests with command `SKIP_API=1 bundle exec jekyll build --watch`. The new results after this PR: ![image](https://github.com/user-attachments/assets/3d0b62fc-44da-45cd-a295-5d098ce3b8ec) ![image](https://github.com/user-attachments/assets/f8884a26-7029-4926-a290-a24b7ae75fa4) ![image](https://github.com/user-attachments/assets/6d7a1289-b459-4ad3-8636-e4713ead3921) ![image](https://github.com/user-attachments/assets/216dbd8d-4bd5-43b6-abc5-082aea543888) ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47548 from wayneguow/latex_error. Authored-by: Wei Guo Signed-off-by: Hyukjin Kwon --- docs/running-on-kubernetes.md | 2 +- docs/running-on-yarn.md | 2 +- docs/sql-migration-guide.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 655b30756a298..39b3bfc10da38 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -988,7 +988,7 @@ See the [configuration page](configuration.html) for information on Spark config Prefix to use in front of the executor pod names. It must conform the rules defined by the Kubernetes DNS Label Names. - The prefix will be used to generate executor pod names in the form of $podNamePrefix-exec-$id, where the `id` is + The prefix will be used to generate executor pod names in the form of \$podNamePrefix-exec-\$id, where the `id` is a positive int value, so the length of the `podNamePrefix` needs to be less than or equal to 47(= 63 - 10 - 6). 2.3.0 diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 700ddefabea47..b8e22b12e3c92 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -489,7 +489,7 @@ To use a custom metrics.properties for the application master and executors, upd and send to RM, which uses them when renewing delegation tokens. A typical use case of this feature is to support delegation tokens in an environment where a YARN cluster needs to talk to multiple downstream HDFS clusters, where the YARN RM may not have configs (e.g., dfs.nameservices, dfs.ha.namenodes.*, dfs.namenode.rpc-address.*) to connect to these clusters. - In this scenario, Spark users can specify the config value to be ^dfs.nameservices$|^dfs.namenode.rpc-address.*$|^dfs.ha.namenodes.*$ to parse + In this scenario, Spark users can specify the config value to be ^dfs.nameservices\$|^dfs.namenode.rpc-address.*\$|^dfs.ha.namenodes.*\$ to parse these HDFS configs from the job's local configuration files. This config is very similar to mapreduce.job.send-token-conf. Please check YARN-5910 for more details. 3.3.0 diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index dad07f197f587..3846f7bb24d12 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -40,7 +40,7 @@ license: | - `spark.sql.avro.datetimeRebaseModeInWrite` instead of `spark.sql.legacy.avro.datetimeRebaseModeInWrite` - `spark.sql.avro.datetimeRebaseModeInRead` instead of `spark.sql.legacy.avro.datetimeRebaseModeInRead` - Since Spark 4.0, the default value of `spark.sql.orc.compression.codec` is changed from `snappy` to `zstd`. To restore the previous behavior, set `spark.sql.orc.compression.codec` to `snappy`. -- Since Spark 4.0, the SQL config `spark.sql.legacy.allowZeroIndexInFormatString` is deprecated. Consider to change `strfmt` of the `format_string` function to use 1-based indexes. The first argument must be referenced by "1$", the second by "2$", etc. +- Since Spark 4.0, the SQL config `spark.sql.legacy.allowZeroIndexInFormatString` is deprecated. Consider to change `strfmt` of the `format_string` function to use 1-based indexes. The first argument must be referenced by `1$`, the second by `2$`, etc. - Since Spark 4.0, Postgres JDBC datasource will read JDBC read TIMESTAMP WITH TIME ZONE as TimestampType regardless of the JDBC read option `preferTimestampNTZ`, while in 3.5 and previous, TimestampNTZType when `preferTimestampNTZ=true`. To restore the previous behavior, set `spark.sql.legacy.postgres.datetimeMapping.enabled` to `true`. - Since Spark 4.0, Postgres JDBC datasource will write TimestampType as TIMESTAMP WITH TIME ZONE, while in 3.5 and previous, it wrote as TIMESTAMP a.k.a. TIMESTAMP WITHOUT TIME ZONE. To restore the previous behavior, set `spark.sql.legacy.postgres.datetimeMapping.enabled` to `true`. - Since Spark 4.0, MySQL JDBC datasource will read TIMESTAMP as TimestampType regardless of the JDBC read option `preferTimestampNTZ`, while in 3.5 and previous, TimestampNTZType when `preferTimestampNTZ=true`. To restore the previous behavior, set `spark.sql.legacy.mysql.timestampNTZMapping.enabled` to `true`, MySQL DATETIME is not affected. @@ -129,7 +129,7 @@ license: | * `[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]` * `T[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]` - - Since Spark 3.3, the `strfmt` in `format_string(strfmt, obj, ...)` and `printf(strfmt, obj, ...)` will no longer support to use "0$" to specify the first argument, the first argument should always reference by "1$" when use argument index to indicating the position of the argument in the argument list. + - Since Spark 3.3, the `strfmt` in `format_string(strfmt, obj, ...)` and `printf(strfmt, obj, ...)` will no longer support to use `0$` to specify the first argument, the first argument should always reference by `1$` when use argument index to indicating the position of the argument in the argument list. - Since Spark 3.3, nulls are written as empty strings in CSV data source by default. In Spark 3.2 or earlier, nulls were written as empty strings as quoted empty strings, `""`. To restore the previous behavior, set `nullValue` to `""`, or set the configuration `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv` to `true`.