Skip to content

Commit

Permalink
Fix docs: we're not installing stable version of spark anymore (#2165)
Browse files Browse the repository at this point in the history
  • Loading branch information
mathbunnyru authored Oct 29, 2024
1 parent f74a764 commit 03e5fe5
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 5 deletions.
3 changes: 2 additions & 1 deletion docs/using/specifics.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ You can build a `pyspark-notebook` image with a different `Spark` version by ove
- This version needs to match the version supported by the Spark distribution used above.
- See [Spark Overview](https://spark.apache.org/docs/latest/#downloading) and [Ubuntu packages](https://packages.ubuntu.com/search?keywords=openjdk).
- `spark_version` (optional): The Spark version to install, for example `3.5.0`.
If not specified (this is the default), latest stable Spark will be installed.
If not specified (this is the default), latest Spark will be installed.
Note: to support Python 3.12, we currently install Spark v4 preview versions: <https://github.com/jupyter/docker-stacks/pull/2072#issuecomment-2414123851>.
- `hadoop_version`: The Hadoop version (`3` by default).
Note, that _Spark < 3.3_ require to specify `major.minor` Hadoop version (i.e. `3.2`).
- `scala_version` (optional): The Scala version, for example `2.13` (not specified by default).
Expand Down
2 changes: 1 addition & 1 deletion images/pyspark-notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ RUN apt-get update --yes && \
ca-certificates-java && \
apt-get clean && rm -rf /var/lib/apt/lists/*

# If spark_version is not set, latest stable Spark will be installed
# If spark_version is not set, latest Spark will be installed
ARG spark_version
ARG hadoop_version="3"
# If scala_version is not set, Spark without Scala will be installed
Expand Down
6 changes: 3 additions & 3 deletions images/pyspark-notebook/setup_spark.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ def get_all_refs(url: str) -> list[str]:

def get_latest_spark_version() -> str:
"""
Returns the last stable version of Spark using spark archive
Returns the last version of Spark using spark archive
"""
LOGGER.info("Downloading Spark versions information")
all_refs = get_all_refs("https://archive.apache.org/dist/spark/")
stable_versions = [
versions = [
ref.removeprefix("spark-").removesuffix("/")
for ref in all_refs
if ref.startswith("spark-") and "incubating" not in ref
Expand All @@ -49,7 +49,7 @@ def version_array(ver: str) -> tuple[int, int, int, str]:
patch, _, preview = arr[2].partition("-")
return (major, minor, int(patch), preview)

latest_version = max(stable_versions, key=lambda ver: version_array(ver))
latest_version = max(versions, key=lambda ver: version_array(ver))
LOGGER.info(f"Latest version: {latest_version}")
return latest_version

Expand Down

0 comments on commit 03e5fe5

Please sign in to comment.