Skip to content

Commit

Permalink
FIX-modin-project#3413: align documentation with updated project stru…
Browse files Browse the repository at this point in the history
…cture

Signed-off-by: Dmitry Chigarev <[email protected]>
  • Loading branch information
dchigarev committed Oct 28, 2021
1 parent 7a81588 commit c553248
Show file tree
Hide file tree
Showing 106 changed files with 1,220 additions and 1,152 deletions.
4 changes: 2 additions & 2 deletions docs/UsingOmnisci/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ To enable this engine you could set the following environment variables:
.. code-block:: bash
export MODIN_ENGINE=native
export MODIN_BACKEND=omnisci
export MODIN_STORAGE_FORMAT=omnisci
export MODIN_EXPERIMENTAL=true
or turn it on in source code:
Expand All @@ -19,7 +19,7 @@ or turn it on in source code:
import modin.config as cfg
cfg.Engine.put('native')
cfg.Backend.put('omnisci')
cfg.StorageFormat.put('omnisci')
cfg.IsExperimental.put(True)
Expand Down
4 changes: 2 additions & 2 deletions docs/UsingPandasonRay/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ If you want to be explicit, you could set the following environment variables:
.. code-block:: bash
export MODIN_ENGINE=ray
export MODIN_BACKEND=pandas
export MODIN_STORAGE_FORMAT=pandas
or turn it on in source code:

.. code-block:: python
import modin.config as cfg
cfg.Engine.put('ray')
cfg.Backend.put('pandas')
cfg.StorageFormat.put('pandas')
105 changes: 54 additions & 51 deletions docs/developer/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,23 +56,23 @@ For the simplicity the other backend systems - Dask and MPI are omitted and only
* Query Executor is responsible for getting the Dataframe Algebra DAG, performing further optimizations based
on a selected backend execution subsystem and mapping or compiling the Dataframe Algebra DAG to and actual
execution sequence.
* Backends module is responsible for mapping the abstract operation to an actual executor call, e.g. Pandas,
PyArrow, custom backend.
* Storage formats module is responsible for mapping the abstract operation to an actual executor call, e.g. Pandas,
PyArrow, custom formats.
* Orchestration subsystem is responsible for spawning and controlling the actual execution environment for the
selected backend. It spawns the actual nodes, fires up the execution environment, e.g. Ray, monitors the state
of executors and provides telemetry

Component View
--------------
.. toctree::
../flow/modin/engines/base/frame/index
../flow/modin/engines/ray/generic
../flow/modin/engines/ray/pandas_on_ray/frame/index
../flow/modin/engines/ray/cudf_on_ray/frame/index
../flow/modin/engines/dask/pandas_on_dask/frame/index
../flow/modin/core/dataframe/pandas/index
../flow/modin/core/execution/ray/generic
../flow/modin/core/execution/ray/implementations/pandas_on_ray/index
../flow/modin/core/execution/ray/implementations/cudf_on_ray/index
../flow/modin/core/execution/dask/implementations/pandas_on_dask/index
../flow/modin/experimental/index
../flow/modin/backends/index
../flow/modin/engines/python/pandas_on_python/frame/index
../flow/modin/core/storage_formats/index
../flow/modin/core/execution/python/implementations/pandas_on_python/index


DataFrame Partitioning
Expand Down Expand Up @@ -285,65 +285,68 @@ by documentation for now, the rest is coming soon...).
├───docs
├───examples
├───modin
│ ├─── :doc:`backends </flow/modin/backends/index>`
│ │ ├───base
│ │ │ └─── :doc:`query_compiler </flow/modin/backends/base/query_compiler>`
│ │ ├─── :doc:`pandas </flow/modin/backends/pandas/index>`
│ │ | ├─── :doc:`parsers </flow/modin/backends/pandas/parsers>`
│ │ │ └─── :doc:`query_compiler </flow/modin/backends/pandas/query_compiler>`
│ │ └─── :doc:`pyarrow </flow/modin/backends/pyarrow/index>`
│ │ | ├─── :doc:`parsers </flow/modin/backends/pyarrow/parsers>`
│ │ │ └─── :doc:`query_compiler </flow/modin/backends/pyarrow/query_compiler>`
│ ├─── :doc:`config </flow/modin/config>`
│ ├───data_management
│ │ ├─── :doc:`factories </flow/modin/data_management/factories>`
│ │ └─── :doc:`functions </flow/modin/data_management/functions>`
│ ├───core
│ │ ├─── :doc:`dataframe </flow/modin/core/dataframe/index>`
│ │ │ ├─── :doc:`algebra </flow/modin/core/dataframe/algebra>`
│ │ │ ├─── :doc:`base </flow/modin/core/dataframe/base/index>`
│ │ │ │ ├───dataframe
│ │ │ │ └───partitioning
│ │ │ └─── :doc:`pandas </flow/modin/core/dataframe/pandas/index>`
│ │ ├───execution
│ │ │ ├───dask
│ │ │ │ ├───common
│ │ │ │ └───implementations
│ │ │ │ └─── :doc:`pandas_on_dask </flow/modin/core/execution/dask/implementations/pandas_on_dask/index>`
│ │ │ ├─── :doc:`dispatching </flow/modin/core/execution/dispatching>`
│ │ │ ├───python
│ │ │ │ └───implementations
│ │ │ │ └─── :doc:`pandas_on_python </flow/modin/core/execution/python/implementations/pandas_on_python/index>`
│ │ │ └───ray
│ │ │ ├───common
│ │ │ ├─── :doc:`generic </flow/modin/core/execution/ray/generic>`
│ │ │ └───implementations
│ │ │ ├─── :doc:`cudf_on_ray </flow/modin/core/execution/ray/implementations/cudf_on_ray/index>`
│ │ │ └─── :doc:`pandas_on_ray </flow/modin/core/execution/ray/implementations/pandas_on_ray/index>`
│ │ ├─── :doc:`io </flow/modin/core/io/index>`
│ │ └─── :doc:`storage_formats </flow/modin/core/storage_formats/index>`
│ │ ├─── :doc:`base </flow/modin/core/storage_formats/base/query_compiler>`
│ │ ├───cudf
│ │ ├─── :doc:`pandas </flow/modin/core/storage_formats/pandas/index>`
│ │ └─── :doc:`pyarrow </flow/modin/core/storage_formats/pyarrow/index>`
│ ├───distributed
│ │ └───dataframe
│ │ └─── :doc:`pandas </flow/modin/distributed/dataframe/pandas>`
│ ├───engines
│ │ ├───base
│ │ │ ├─── :doc:`frame </flow/modin/engines/base/frame/index>`
│ │ │ └─── :doc:`io </flow/modin/engines/base/io>`
│ │ ├───dask
│ │ │ └───pandas_on_dask
│ │ | └─── :doc:`frame </flow/modin/engines/dask/pandas_on_dask/frame/index>`
│ │ ├───python
│ │ │ └───pandas_on_python
│ │ │ └─── :doc:`frame </flow/modin/engines/python/pandas_on_python/frame/index>`
│ │ └───ray
│ │ ├─── :doc:`generic </flow/modin/engines/ray/generic>`
│ │ ├───cudf_on_ray
│ │ │ ├─── :doc:`frame </flow/modin/engines/ray/cudf_on_ray/frame/index>`
│ │ │ └─── :doc:`io </flow/modin/engines/ray/cudf_on_ray/io>`
│ │ └───pandas_on_ray
│ │ └─── :doc:`frame </flow/modin/engines/ray/pandas_on_ray/frame/index>`
│ ├── :doc:`experimental </flow/modin/experimental/experimental>`
│ │ ├─── :doc:`backends </flow/modin/experimental/backends/index>`
│ │ │ └─── :doc:`omnisci </flow/modin/experimental/backends/omnisci/index>`
│ │ │ └─── :doc:`query_compiler </flow/modin/experimental/backends/omnisci/query_compiler>`
│ │ ├───dataframe
│ │ │ └─── :doc:`pandas </flow/modin/distributed/dataframe/pandas>`
│ ├─── :doc:`experimental </flow/modin/experimental/experimental>`
│ │ ├───cloud
│ │ ├───engines
│ │ │ ├─── :doc:`omnisci_on_native </flow/modin/experimental/engines/omnisci_on_native/frame/index>`
│ │ │ ├─── :doc:`pandas_on_ray </flow/modin/experimental/engines/pandas_on_ray>`
│ │ │ └─── :doc:`pyarrow_on_ray </flow/modin/experimental/engines/pyarrow_on_ray>`
│ │ ├───core
│ │ │ ├───execution
│ │ │ │ ├───native
│ │ │ │ │ └───implementations
│ │ │ │ │ └─── :doc:`omnisci_on_native </flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/index>`
│ │ │ │ └───ray
│ │ │ │ └───implementations
│ │ │ │ ├─── :doc:`pandas_on_ray </flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray>`
│ │ │ │ └─── :doc:`pyarrow_on_ray </flow/modin/experimental/core/execution/ray/implementations/pyarrow_on_ray>`
│ │ │ └─── :doc:`storage_formats </flow/modin/experimental/core/storage_formats/index>`
│ │ │ └─── :doc:`omnisci </flow/modin/experimental/core/storage_formats/omnisci/index>`
│ │ ├─── :doc:`pandas </flow/modin/experimental/pandas>`
│ │ ├─── :doc:`sklearn </flow/modin/experimental/sklearn>`
│ │ ├───spreadsheet
│ │ ├───sql
│ │ └─── :doc:`xgboost </flow/modin/experimental/xgboost>`
│ ├───pandas
│ │ ├─── :doc:`dataframe </flow/modin/pandas/dataframe>`
│ │ └─── :doc:`series </flow/modin/pandas/series>`
│ ├───spreadsheet
│ └───sql
│ └───spreadsheet
├───requirements
├───scripts
└───stress_tests
.. _pandas Dataframe: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
.. _Arrow tables: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html
.. _Ray: https://github.com/ray-project/ray
.. _code: https://github.com/modin-project/modin/blob/master/modin/engines/base/frame/data.py
.. _code: https://github.com/modin-project/modin/blob/master/modin/core/dataframe
.. _Dask Futures: https://docs.dask.org/en/latest/futures.html
.. _issue: https://github.com/modin-project/modin/issues
.. _Discourse: https://discuss.modin.org
Expand Down
4 changes: 2 additions & 2 deletions docs/developer/partition_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,15 @@ You can find the specific implementation of Modin's Partition Interface in :doc:

Ray engine
----------
However, it is worth noting that for Modin on ``Ray`` engine with ``pandas`` backend IPs of the remote partitions may not match
However, it is worth noting that for Modin on ``Ray`` engine with ``pandas`` in-memory format IPs of the remote partitions may not match
actual locations if the partitions are lower than 100 kB. Ray saves such objects (<= 100 kB, by default) in in-process store
of the calling process (please, refer to `Ray documentation`_ for more information). We can't get IPs for such objects while maintaining good performance.
So, you should keep in mind this for unwrapping of the remote partitions with their IPs. Several options are provided to handle the case in
``How to handle Ray objects that are lower 100 kB`` section.

Dask engine
-----------
There is no mentioned above issue for Modin on ``Dask`` engine with ``pandas`` backend because ``Dask`` saves any objects
There is no mentioned above issue for Modin on ``Dask`` engine with ``pandas`` in-memory format because ``Dask`` saves any objects
in the worker process that processes a function (please, refer to `Dask documentation`_ for more information).

How to handle Ray objects that are lower than 100 kB
Expand Down
4 changes: 2 additions & 2 deletions docs/experimental_features/modin_xgboost.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ Install XGBoost on Modin
------------------------

Modin comes with all the dependencies except ``xgboost`` package by default.
Currently, distributed XGBoost on Modin is only supported on the Ray backend, therefore, see
the :doc:`installation page </installation>` for more information on installing Modin with the Ray backend.
Currently, distributed XGBoost on Modin is only supported on the Ray execution engine, therefore, see
the :doc:`installation page </installation>` for more information on installing Modin with the Ray engine.
To install ``xgboost`` package you can use ``pip``:

.. code-block:: bash
Expand Down
23 changes: 0 additions & 23 deletions docs/flow/modin/backends/pandas/query_compiler.rst

This file was deleted.

19 changes: 0 additions & 19 deletions docs/flow/modin/backends/pyarrow/query_compiler.rst

This file was deleted.

10 changes: 5 additions & 5 deletions docs/flow/modin/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,17 @@ API.
import os
# Setting `MODIN_BACKEND` environment variable.
# Setting `MODIN_STORAGE_FORMAT` environment variable.
# Also can be set outside the script.
os.environ["MODIN_BACKEND"] = "OmniSci"
os.environ["MODIN_STORAGE_FORMAT"] = "OmniSci"
import modin.config
import modin.pandas as pd
# Checking initially set `Backend` config,
# which corresponds to `MODIN_BACKEND` environment
# Checking initially set `StorageFormat` config,
# which corresponds to `MODIN_STORAGE_FORMAT` environment
# variable
print(modin.config.Backend.get()) # prints 'Omnisci'
print(modin.config.StorageFormat.get()) # prints 'Omnisci'
# Checking default value of `NPartitions`
print(modin.config.NPartitions.get()) # prints '8'
Expand Down
Loading

0 comments on commit c553248

Please sign in to comment.