Skip to content

Commit

Permalink
[doc] Add missing document for pyspark ranker. [skip ci] (#8692)
Browse files Browse the repository at this point in the history
  • Loading branch information
trivialfis authored Jan 17, 2023
1 parent 78396f8 commit 175986b
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 5 deletions.
10 changes: 10 additions & 0 deletions doc/python/python_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,3 +173,13 @@ PySpark API
:members:
:inherited-members:
:show-inheritance:

.. autoclass:: xgboost.spark.SparkXGBRanker
:members:
:inherited-members:
:show-inheritance:

.. autoclass:: xgboost.spark.SparkXGBRankerModel
:members:
:inherited-members:
:show-inheritance:
6 changes: 3 additions & 3 deletions doc/tutorials/spark_estimator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ such as ``weight_col``, ``validation_indicator_col``, ``use_gpu``, for details p

The following code snippet shows how to train a spark xgboost regressor model,
first we need to prepare a training dataset as a spark dataframe contains
"label" column and "features" column(s), the "features" column(s) must be ``pyspark.ml.linalg.Vector`
"label" column and "features" column(s), the "features" column(s) must be ``pyspark.ml.linalg.Vector``
type or spark array type or a list of feature column names.


Expand All @@ -56,7 +56,7 @@ type or spark array type or a list of feature column names.
The following code snippet shows how to predict test data using a spark xgboost regressor model,
first we need to prepare a test dataset as a spark dataframe contains
"features" and "label" column, the "features" column must be ``pyspark.ml.linalg.Vector`
"features" and "label" column, the "features" column must be ``pyspark.ml.linalg.Vector``
type or spark array type.

.. code-block:: python
Expand Down Expand Up @@ -97,7 +97,7 @@ Aside from the PySpark and XGBoost modules, we also need the `cuDF
<https://docs.rapids.ai/api/cudf/stable/>`_ package for handling Spark dataframe. We
recommend using either Conda or Virtualenv to manage python dependencies for PySpark
jobs. Please refer to `How to Manage Python Dependencies in PySpark
<https://www.databricks.com/blog/2020/12/22/how-to-manage-python-dependencies-in-pyspark.html>`_
<https://www.databricks.com/blog/2020/12/22/how-to-manage-python-dependencies-in-pyspark.html>`_
for more details on PySpark dependency management.

In short, to create a Python environment that can be sent to a remote cluster using
Expand Down
5 changes: 3 additions & 2 deletions python-package/xgboost/spark/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
"""PySpark XGBoost integration interface
"""
"""PySpark XGBoost integration interface"""

try:
import pyspark
Expand All @@ -10,6 +9,7 @@
SparkXGBClassifier,
SparkXGBClassifierModel,
SparkXGBRanker,
SparkXGBRankerModel,
SparkXGBRegressor,
SparkXGBRegressorModel,
)
Expand All @@ -20,4 +20,5 @@
"SparkXGBRegressor",
"SparkXGBRegressorModel",
"SparkXGBRanker",
"SparkXGBRankerModel",
]

0 comments on commit 175986b

Please sign in to comment.