diff --git a/docs/UsingOmnisci/index.rst b/docs/UsingOmnisci/index.rst index 44aee25fbc3..29ea1f5f198 100644 --- a/docs/UsingOmnisci/index.rst +++ b/docs/UsingOmnisci/index.rst @@ -10,7 +10,7 @@ To enable this engine you could set the following environment variables: .. code-block:: bash export MODIN_ENGINE=native - export MODIN_BACKEND=omnisci + export MODIN_STORAGE_FORMAT=omnisci export MODIN_EXPERIMENTAL=true or turn it on in source code: @@ -19,7 +19,7 @@ or turn it on in source code: import modin.config as cfg cfg.Engine.put('native') - cfg.Backend.put('omnisci') + cfg.StorageFormat.put('omnisci') cfg.IsExperimental.put(True) diff --git a/docs/UsingPandasonRay/index.rst b/docs/UsingPandasonRay/index.rst index c8cd8acbb72..b9ef5bc41d4 100644 --- a/docs/UsingPandasonRay/index.rst +++ b/docs/UsingPandasonRay/index.rst @@ -15,7 +15,7 @@ If you want to be explicit, you could set the following environment variables: .. code-block:: bash export MODIN_ENGINE=ray - export MODIN_BACKEND=pandas + export MODIN_STORAGE_FORMAT=pandas or turn it on in source code: @@ -23,4 +23,4 @@ or turn it on in source code: import modin.config as cfg cfg.Engine.put('ray') - cfg.Backend.put('pandas') + cfg.StorageFormat.put('pandas') diff --git a/docs/developer/architecture.rst b/docs/developer/architecture.rst index dcc323d6ce8..6aee8c5cce9 100644 --- a/docs/developer/architecture.rst +++ b/docs/developer/architecture.rst @@ -56,8 +56,8 @@ For the simplicity the other backend systems - Dask and MPI are omitted and only * Query Executor is responsible for getting the Dataframe Algebra DAG, performing further optimizations based on a selected backend execution subsystem and mapping or compiling the Dataframe Algebra DAG to and actual execution sequence. -* Backends module is responsible for mapping the abstract operation to an actual executor call, e.g. Pandas, - PyArrow, custom backend. +* Storage formats module is responsible for mapping the abstract operation to an actual executor call, e.g. Pandas, + PyArrow, custom formats. * Orchestration subsystem is responsible for spawning and controlling the actual execution environment for the selected backend. It spawns the actual nodes, fires up the execution environment, e.g. Ray, monitors the state of executors and provides telemetry @@ -65,14 +65,14 @@ For the simplicity the other backend systems - Dask and MPI are omitted and only Component View -------------- .. toctree:: - ../flow/modin/engines/base/frame/index - ../flow/modin/engines/ray/generic - ../flow/modin/engines/ray/pandas_on_ray/frame/index - ../flow/modin/engines/ray/cudf_on_ray/frame/index - ../flow/modin/engines/dask/pandas_on_dask/frame/index + ../flow/modin/core/dataframe/pandas/index + ../flow/modin/core/execution/ray/generic + ../flow/modin/core/execution/ray/implementations/pandas_on_ray/index + ../flow/modin/core/execution/ray/implementations/cudf_on_ray/index + ../flow/modin/core/execution/dask/implementations/pandas_on_dask/index ../flow/modin/experimental/index - ../flow/modin/backends/index - ../flow/modin/engines/python/pandas_on_python/frame/index + ../flow/modin/core/storage_formats/index + ../flow/modin/core/execution/python/implementations/pandas_on_python/index DataFrame Partitioning @@ -285,57 +285,60 @@ by documentation for now, the rest is coming soon...). ├───docs ├───examples ├───modin - │ ├─── :doc:`backends ` - │ │ ├───base - │ │ │ └─── :doc:`query_compiler ` - │ │ ├─── :doc:`pandas ` - │ │ | ├─── :doc:`parsers ` - │ │ │ └─── :doc:`query_compiler ` - │ │ └─── :doc:`pyarrow ` - │ │ | ├─── :doc:`parsers ` - │ │ │ └─── :doc:`query_compiler ` │ ├─── :doc:`config ` - │ ├───data_management - │ │ ├─── :doc:`factories ` - │ │ └─── :doc:`functions ` + │ ├───core + │ │ ├─── :doc:`dataframe ` + │ │ │ ├─── :doc:`algebra ` + │ │ │ ├─── :doc:`base ` + │ │ │ │ ├───dataframe + │ │ │ │ └───partitioning + │ │ │ └─── :doc:`pandas ` + │ │ ├───execution + │ │ │ ├───dask + │ │ │ │ ├───common + │ │ │ │ └───implementations + │ │ │ │ └─── :doc:`pandas_on_dask ` + │ │ │ ├─── :doc:`dispatching ` + │ │ │ ├───python + │ │ │ │ └───implementations + │ │ │ │ └─── :doc:`pandas_on_python ` + │ │ │ └───ray + │ │ │ ├───common + │ │ │ ├─── :doc:`generic ` + │ │ │ └───implementations + │ │ │ ├─── :doc:`cudf_on_ray ` + │ │ │ └─── :doc:`pandas_on_ray ` + │ │ ├─── :doc:`io ` + │ │ └─── :doc:`storage_formats ` + │ │ ├─── :doc:`base ` + │ │ ├───cudf + │ │ ├─── :doc:`pandas ` + │ │ └─── :doc:`pyarrow ` │ ├───distributed - │ │ └───dataframe - │ │ └─── :doc:`pandas ` - │ ├───engines - │ │ ├───base - │ │ │ ├─── :doc:`frame ` - │ │ │ └─── :doc:`io ` - │ │ ├───dask - │ │ │ └───pandas_on_dask - │ │ | └─── :doc:`frame ` - │ │ ├───python - │ │ │ └───pandas_on_python - │ │ │ └─── :doc:`frame ` - │ │ └───ray - │ │ ├─── :doc:`generic ` - │ │ ├───cudf_on_ray - │ │ │ ├─── :doc:`frame ` - │ │ │ └─── :doc:`io ` - │ │ └───pandas_on_ray - │ │ └─── :doc:`frame ` - │ ├── :doc:`experimental ` - │ │ ├─── :doc:`backends ` - │ │ │ └─── :doc:`omnisci ` - │ │ │ └─── :doc:`query_compiler ` + │ │ ├───dataframe + │ │ │ └─── :doc:`pandas ` + │ ├─── :doc:`experimental ` │ │ ├───cloud - │ │ ├───engines - │ │ │ ├─── :doc:`omnisci_on_native ` - │ │ │ ├─── :doc:`pandas_on_ray ` - │ │ │ └─── :doc:`pyarrow_on_ray ` + │ │ ├───core + │ │ │ ├───execution + │ │ │ │ ├───native + │ │ │ │ │ └───implementations + │ │ │ │ │ └─── :doc:`omnisci_on_native ` + │ │ │ │ └───ray + │ │ │ │ └───implementations + │ │ │ │ ├─── :doc:`pandas_on_ray ` + │ │ │ │ └─── :doc:`pyarrow_on_ray ` + │ │ │ └─── :doc:`storage_formats ` + │ │ │ └─── :doc:`omnisci ` │ │ ├─── :doc:`pandas ` │ │ ├─── :doc:`sklearn ` + │ │ ├───spreadsheet │ │ ├───sql │ │ └─── :doc:`xgboost ` │ ├───pandas │ │ ├─── :doc:`dataframe ` │ │ └─── :doc:`series ` - │ ├───spreadsheet - │ └───sql + │ └───spreadsheet ├───requirements ├───scripts └───stress_tests @@ -343,7 +346,7 @@ by documentation for now, the rest is coming soon...). .. _pandas Dataframe: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html .. _Arrow tables: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html .. _Ray: https://github.com/ray-project/ray -.. _code: https://github.com/modin-project/modin/blob/master/modin/engines/base/frame/data.py +.. _code: https://github.com/modin-project/modin/blob/master/modin/core/dataframe .. _Dask Futures: https://docs.dask.org/en/latest/futures.html .. _issue: https://github.com/modin-project/modin/issues .. _Discourse: https://discuss.modin.org diff --git a/docs/developer/partition_api.rst b/docs/developer/partition_api.rst index 99c296c0ab4..666c54372fe 100644 --- a/docs/developer/partition_api.rst +++ b/docs/developer/partition_api.rst @@ -23,7 +23,7 @@ You can find the specific implementation of Modin's Partition Interface in :doc: Ray engine ---------- -However, it is worth noting that for Modin on ``Ray`` engine with ``pandas`` backend IPs of the remote partitions may not match +However, it is worth noting that for Modin on ``Ray`` engine with ``pandas`` in-memory format IPs of the remote partitions may not match actual locations if the partitions are lower than 100 kB. Ray saves such objects (<= 100 kB, by default) in in-process store of the calling process (please, refer to `Ray documentation`_ for more information). We can't get IPs for such objects while maintaining good performance. So, you should keep in mind this for unwrapping of the remote partitions with their IPs. Several options are provided to handle the case in @@ -31,7 +31,7 @@ So, you should keep in mind this for unwrapping of the remote partitions with th Dask engine ----------- -There is no mentioned above issue for Modin on ``Dask`` engine with ``pandas`` backend because ``Dask`` saves any objects +There is no mentioned above issue for Modin on ``Dask`` engine with ``pandas`` in-memory format because ``Dask`` saves any objects in the worker process that processes a function (please, refer to `Dask documentation`_ for more information). How to handle Ray objects that are lower than 100 kB diff --git a/docs/experimental_features/modin_xgboost.rst b/docs/experimental_features/modin_xgboost.rst index 0c5dcfd76d6..a1f10fea409 100644 --- a/docs/experimental_features/modin_xgboost.rst +++ b/docs/experimental_features/modin_xgboost.rst @@ -9,8 +9,8 @@ Install XGBoost on Modin ------------------------ Modin comes with all the dependencies except ``xgboost`` package by default. -Currently, distributed XGBoost on Modin is only supported on the Ray backend, therefore, see -the :doc:`installation page ` for more information on installing Modin with the Ray backend. +Currently, distributed XGBoost on Modin is only supported on the Ray execution engine, therefore, see +the :doc:`installation page ` for more information on installing Modin with the Ray engine. To install ``xgboost`` package you can use ``pip``: .. code-block:: bash diff --git a/docs/flow/modin/backends/pandas/query_compiler.rst b/docs/flow/modin/backends/pandas/query_compiler.rst deleted file mode 100644 index faec1678bd0..00000000000 --- a/docs/flow/modin/backends/pandas/query_compiler.rst +++ /dev/null @@ -1,23 +0,0 @@ -Pandas Query Compiler -""""""""""""""""""""" -:py:class:`~modin.backends.pandas.query_compiler.PandasQueryCompiler` is responsible for compiling -a set of known predefined functions and pairing those with dataframe algebra operators in the -:doc:`PandasFrame `, specifically for dataframes backed by -``pandas.DataFrame`` objects. - -Each :py:class:`~modin.backends.pandas.query_compiler.PandasQueryCompiler` contains an instance of -:py:class:`~modin.engines.base.frame.data.PandasFrame` which it queries to get the result. - -:py:class:`~modin.backends.pandas.query_compiler.PandasQueryCompiler` supports methods built by the :doc:`function module `. -If you want to add an implementation for a query compiler method, visit the function module documentation -to see whether the new operation fits one of the existing function templates and can be easily implemented -with them. - -Public API -'''''''''' -:py:class:`~modin.backends.pandas.query_compiler.PandasQueryCompiler` implements common query compilers API -defined by the :py:class:`~modin.backends.base.query_compiler.BaseQueryCompiler`. Some functionalities -are inherited from the base class, in the following section only overridden methods are presented. - -.. autoclass:: modin.backends.pandas.query_compiler.PandasQueryCompiler - :members: diff --git a/docs/flow/modin/backends/pyarrow/query_compiler.rst b/docs/flow/modin/backends/pyarrow/query_compiler.rst deleted file mode 100644 index 5a75fa47820..00000000000 --- a/docs/flow/modin/backends/pyarrow/query_compiler.rst +++ /dev/null @@ -1,19 +0,0 @@ -PyArrow Query Compiler -"""""""""""""""""""""" -:py:class:`~modin.backends.pyarrow.query_compiler.PyarrowQueryCompiler` is responsible for compiling efficient -DataFrame algebra queries for the :doc:`PyarrowOnRayFrame `, -the frames which are backed by ``pyarrow.Table`` objects. - -Each :py:class:`~modin.backends.pyarrow.query_compiler.PyarrowQueryCompiler` contains an instance of -:py:class:`~modin.experimental.engines.pyarrow_on_ray.frame.data.PyarrowOnRayFrame` which it queries to get the result. - -Public API -'''''''''' -:py:class:`~modin.backends.pyarrow.query_compiler.PyarrowQueryCompiler` implements common query compilers API -defined by the :py:class:`~modin.backends.base.query_compiler.BaseQueryCompiler`. Most functionalities -are inherited from :py:class:`~modin.backends.pandas.query_compiler.PandasQueryCompiler`, in the following -section only overridden methods are presented. - -.. autoclass:: modin.backends.pyarrow.query_compiler.PyarrowQueryCompiler - :members: - :show-inheritance: diff --git a/docs/flow/modin/config.rst b/docs/flow/modin/config.rst index 07dd241b362..e38a0562f9c 100644 --- a/docs/flow/modin/config.rst +++ b/docs/flow/modin/config.rst @@ -36,17 +36,17 @@ API. import os - # Setting `MODIN_BACKEND` environment variable. + # Setting `MODIN_STORAGE_FORMAT` environment variable. # Also can be set outside the script. - os.environ["MODIN_BACKEND"] = "OmniSci" + os.environ["MODIN_STORAGE_FORMAT"] = "OmniSci" import modin.config import modin.pandas as pd - # Checking initially set `Backend` config, - # which corresponds to `MODIN_BACKEND` environment + # Checking initially set `StorageFormat` config, + # which corresponds to `MODIN_STORAGE_FORMAT` environment # variable - print(modin.config.Backend.get()) # prints 'Omnisci' + print(modin.config.StorageFormat.get()) # prints 'Omnisci' # Checking default value of `NPartitions` print(modin.config.NPartitions.get()) # prints '8' diff --git a/docs/flow/modin/data_management/functions.rst b/docs/flow/modin/core/dataframe/algebra.rst similarity index 83% rename from docs/flow/modin/data_management/functions.rst rename to docs/flow/modin/core/dataframe/algebra.rst index b81e71512a7..c2f92cabea7 100644 --- a/docs/flow/modin/data_management/functions.rst +++ b/docs/flow/modin/core/dataframe/algebra.rst @@ -1,12 +1,12 @@ :orphan: -Function Module Description +Operators Module Description """"""""""""""""""""""""""" Brief description ''''''''''''''''' Most of the functions that are evaluated by `QueryCompiler` can be categorized into -one of the patterns: Map, MapReduce, Binary functions, Fold functions, etc. The ``modin.data_management.functions.Function`` +one of the patterns: Map, TreeReduce, Binary, Reduce, etc, called operators. The ``modin.core.dataframe.algebra`` module provides templates to easily build such types of functions. These templates are supposed to be used at the `QueryCompiler` level since each built function accepts and returns `QueryCompiler`. @@ -21,11 +21,11 @@ would take one of the pandas object: ``pandas.DataFrame``, ``pandas.Series`` or .. note:: Currently, functions that are built in that way are supported only in a pandas - backend (i.e. can be used only in `PandasQueryCompiler`). + storage format (i.e. can be used only in `PandasQueryCompiler`). -Function module provides templates for this type of function: +Algebra module provides templates for this type of function: -Map functions +Map operator ------------- Uniformly apply a function argument to each partition in parallel. **Note**: map function should not change the shape of the partitions. @@ -33,7 +33,7 @@ Uniformly apply a function argument to each partition in parallel. .. figure:: /img/map_evaluation.svg :align: center -Reduction functions +Reduction operator ------------------- Applies an argument function that reduces each column or row on the specified axis into a scalar, but requires knowledge about the whole axis. Be aware that providing this knowledge may be expensive because the execution engine has to @@ -43,14 +43,14 @@ that the reduction function returns a one dimensional frame. .. figure:: /img/reduce_evaluation.svg :align: center -MapReduce Functions +TreeReduce operator ------------------- Applies an argument function that reduces specified axis into a scalar. First applies map function to each partition in parallel, then concatenates resulted partitions along the specified axis and applies reduction function. In contrast with `Map function` template, here you're allowed to change partition shape in the map phase. Note that the execution engine expects that the reduction function returns a one dimensional frame. -Binary functions +Binary operator ---------------- Applies an argument function, that takes exactly two operands (first is always `QueryCompiler`). If both operands are query compilers then the execution engine broadcasts partitions of @@ -65,46 +65,46 @@ the right operand to the left. it automatically but note that this requires repartitioning, which is a much more expensive operation than the binary function itself. -Fold functions +Fold operator -------------- Applies an argument function that requires knowledge of the whole axis. Be aware that providing this knowledge may be expensive because the execution engine has to concatenate partitions along the specified axis. -GroupBy functions +GroupBy operator ----------------- Evaluates GroupBy aggregation for that type of functions that can be executed via MapReduce approach. To be able to form groups engine broadcasts ``by`` partitions to each partition of the source frame. -Default-to-pandas functions +Default-to-pandas operator --------------------------- Do :ref:`fallback to pandas ` for passed function. How to register your own function ''''''''''''''''''''''''''''''''' -Let's examine an example of how to use the function module to create your own +Let's examine an example of how to use the algebra module to create your own new functions. Imagine you have a complex aggregation that can be implemented into a single query but doesn't have any implementation in pandas API. If you know how to implement this aggregation efficiently in a distributed frame, you may want to use one of the above described -patterns (e.g. ``MapReduceFunction``). +patterns (e.g. ``TreeReduce``). Let's implement a function that counts non-NA values for each column or row (``pandas.DataFrame.count``). First, we need to determine the function type. -MapReduce approach would be great: in a map phase, we'll count non-NA cells in each +TreeReduce approach would be great: in a map phase, we'll count non-NA cells in each partition in parallel and then just sum its results in the reduce phase. -To define the MapReduce function that does `count` + `sum` we just need to register the +To define the TreeReduce function that does `count` + `sum` we just need to register the appropriate functions and then assign the result to the picked `QueryCompiler` (`PandasQueryCompiler` in our case): .. code-block:: python - from modin.backends import PandasQueryCompiler - from modin.data_management.functions import MapReduceFunction + from modin.core.storage_formats import PandasQueryCompiler + from modin.core.dataframe.algebra import TreeReduce - PandasQueryCompiler.custom_count = MapReduceFunction.register(pandas.DataFrame.count, pandas.DataFrame.sum) + PandasQueryCompiler.custom_count = TreeReduce.register(pandas.DataFrame.count, pandas.DataFrame.sum) Then, we want to handle it from the DataFrame, so we need to create a way to do that: diff --git a/docs/flow/modin/core/dataframe/base/index.rst b/docs/flow/modin/core/dataframe/base/index.rst new file mode 100644 index 00000000000..b1ee93680e9 --- /dev/null +++ b/docs/flow/modin/core/dataframe/base/index.rst @@ -0,0 +1,5 @@ +Base Dataframe Interface +======================== + +Common interfaces for Dataframe objects are not defined yet. Currently, all of the implementations +inherit :doc:`Dataframe implementation for pandas storage format`. diff --git a/docs/flow/modin/core/dataframe/index.rst b/docs/flow/modin/core/dataframe/index.rst new file mode 100644 index 00000000000..5d5bb76301c --- /dev/null +++ b/docs/flow/modin/core/dataframe/index.rst @@ -0,0 +1,26 @@ +Base Dataframe Objects +====================== + +Modin paritions data to scale efficiently. +To keep track of everything a few key classes are introduced: ``Dataframe``, ``Partition``, ``AxisPartiton`` and ``PartitionManager``. + +* `Dataframe` is the class conforming to DataFrame Algebra. +* `Partition` is an element of a NxM grid which, when combined, represents the ``Dataframe`` +* `AxisPartition` is a joined group of ``Parition``-s along some axis (either rows or labels) +* `PartitionManager` is the manager that implements the primitives used for DataFrame Algebra operations over ``Partition``-s + +Each :doc:`storage format ` may have its own implementations of these Dataframe's entities. +Current stable implementations are the following: + +* :doc:`Base Dataframe ` defines a common interface and algebra operators for `Dataframe` implementations. +* :doc:`Pandas Dataframe ` is an implementation for any frame class of :doc:`pandas storage format `. + +.. note:: + At the current stage of Modin development, the base interfaces of the Dataframe objects are not defined yet. + So for now the origin of all changes in the Dataframe interfaces is the :doc:`Dataframe for pandas storage format`. + +.. toctree:: + :hidden: + + base/index + pandas/index diff --git a/docs/flow/modin/core/dataframe/pandas/dataframe.rst b/docs/flow/modin/core/dataframe/pandas/dataframe.rst new file mode 100644 index 00000000000..5d788aaa0be --- /dev/null +++ b/docs/flow/modin/core/dataframe/pandas/dataframe.rst @@ -0,0 +1,33 @@ +PandasDataframe +""""""""""""""" + +The class is base for any frame class of ``pandas`` storage format and serves as the intermediate level +between ``pandas`` query compiler and conforming partition manager. All queries formed +at the query compiler layer are ingested by this class and then conveyed jointly with the stored partitions +into the partition manager for processing. Direct partitions manipulation by this class is prohibited except +cases if an operation is striclty private or protected and called inside of the class only. The class provides +significantly reduced set of operations that fit plenty of pandas operations. + +Main tasks of ``Modin PandasDataframe`` are storage of partitions, manipulation with labels of axes and +providing set of methods to perform operations on the internal data. + +As mentioned above, ``PandasDataframe`` shouldn't work with stored partitions directly and +the responsibility for modifying partitions array has to lay on :doc:`partitioning/partition_manager`. For example, method +:meth:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe.broadcast_apply_full_axis` redirects applying +function to ``PandasDataframePartitionManager.broadcast_axis_partitions`` method. + +``Modin PandasDataframe`` can be created from ``pandas.DataFrame``, ``pyarrow.Table`` +(methods :meth:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe.from_pandas`, +:meth:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe.from_arrow` are used respectively). Also, +``PandasDataframe`` can be converted to ``np.array``, ``pandas.DataFrame`` +(methods :meth:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe.to_numpy`, +:meth:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe.to_pandas` are used respectively). + +Manipulation with labels of axes happens using internal methods for changing labels on the new, +adding prefixes/suffixes etc. + +Public API +---------- + +.. autoclass:: modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe + :members: diff --git a/docs/flow/modin/core/dataframe/pandas/index.rst b/docs/flow/modin/core/dataframe/pandas/index.rst new file mode 100644 index 00000000000..323d7ba8831 --- /dev/null +++ b/docs/flow/modin/core/dataframe/pandas/index.rst @@ -0,0 +1,15 @@ +Pandas Dataframe Objects +======================== + +* :doc:`PandasDataframe ` is the class conforming to DataFrame Algebra. +* :doc:`PandasDataframePartition ` implements ``Partition`` interface holding ``pandas.DataFrame``. +* :doc:`PandasDataframeAxisPartition ` is a joined group of ``PandasDataframePartition``-s along some axis (either rows or labels) +* :doc:`PandasDataframePartitionManager ` is the manager that implements the primitives used for DataFrame Algebra operations over ``PandasDataframePartition``-s + +.. toctree:: + :hidden: + + dataframe + partitioning/partition + partitioning/axis_partition + partitioning/partition_manager \ No newline at end of file diff --git a/docs/flow/modin/core/dataframe/pandas/partitioning/axis_partition.rst b/docs/flow/modin/core/dataframe/pandas/partitioning/axis_partition.rst new file mode 100644 index 00000000000..a82d8698335 --- /dev/null +++ b/docs/flow/modin/core/dataframe/pandas/partitioning/axis_partition.rst @@ -0,0 +1,44 @@ +BaseDataframeAxisPartition +"""""""""""""""""""""""""" + +The class is base for any axis partition class and serves as the last level on which +operations that were conveyed from the partition manager are being performed on an entire column or row. + +The class provides an API that has to be overridden by the child classes in order to manipulate +on a list of block partitions (making up column or row partition) they store. + +The procedures that use this class and its methods assume that they have some global knowledge +about the entire axis. This may require the implementation to use concatenation or append on the +list of block partitions. + +The ``PandasDataframeAxisPartition`` object that controls these objects (through the API exposed here) has an invariant +that requires that this object is never returned from a function. It assumes that there will always be +``PandasDataframeAxisPartition`` object stored and structures itself accordingly. + +.. warning:: + The location of the ``BaseDataframeAxisPartition`` class in the `pandas` implementation of Modin Dataframe objects is a legacy. + It's more likely to be made a `base` implementation of the `AxisPartition` and moved to the ``dataframe/base/partitioning`` + directory soon. + +Public API +---------- + +.. autoclass:: modin.core.dataframe.pandas.partitioning.axis_partition.BaseDataframeAxisPartition + :members: + +PandasDataframeAxisPartition +"""""""""""""""""""""""""""" + +The class is base for any axis partition class of ``pandas`` storage format. + +Subclasses must implement ``list_of_blocks`` which represents data wrapped by the ``PandasDataframePartition`` +objects and creates something interpretable as a ``pandas.DataFrame``. + +See ``modin.core.execution.ray.implementations.pandas_on_ray.partitioning.axis_partition.PandasOnRayDataframeAxisPartition`` +for an example on how to override/use this class when the implementation needs to be augmented. + +Public API +---------- + +.. autoclass:: modin.core.dataframe.pandas.partitioning.axis_partition.PandasDataframeAxisPartition + :members: diff --git a/docs/flow/modin/engines/base/frame/partition.rst b/docs/flow/modin/core/dataframe/pandas/partitioning/partition.rst similarity index 58% rename from docs/flow/modin/engines/base/frame/partition.rst rename to docs/flow/modin/core/dataframe/pandas/partitioning/partition.rst index bcc03d70643..e567709037f 100644 --- a/docs/flow/modin/engines/base/frame/partition.rst +++ b/docs/flow/modin/core/dataframe/pandas/partitioning/partition.rst @@ -1,20 +1,20 @@ -PandasFramePartition -"""""""""""""""""""" +PandasDataframePartition +"""""""""""""""""""""""" -The class is base for any partition class of ``pandas`` backend and serves as the last level +The class is base for any partition class of ``pandas`` storage format and serves as the last level on which operations that were conveyed from the partition manager are being performed on an individual block partition. The class provides an API that has to be overridden by child classes in order to manipulate on data and metadata they store. -The public API exposed by the children of this class is used in ``PandasFramePartitionManager``. +The public API exposed by the children of this class is used in ``PandasDataframePartitionManager``. -The objects wrapped by the child classes are treated as immutable by ``PandasFramePartitionManager`` subclasses +The objects wrapped by the child classes are treated as immutable by ``PandasDataframePartitionManager`` subclasses and no logic for updating inplace. Public API ---------- -.. autoclass:: modin.engines.base.frame.partition.PandasFramePartition +.. autoclass:: modin.core.dataframe.pandas.partitioning.PandasFramePartition :members: diff --git a/docs/flow/modin/engines/base/frame/partition_manager.rst b/docs/flow/modin/core/dataframe/pandas/partitioning/partition_manager.rst similarity index 81% rename from docs/flow/modin/engines/base/frame/partition_manager.rst rename to docs/flow/modin/core/dataframe/pandas/partitioning/partition_manager.rst index 61754357eb0..177d717b2cf 100644 --- a/docs/flow/modin/engines/base/frame/partition_manager.rst +++ b/docs/flow/modin/core/dataframe/pandas/partitioning/partition_manager.rst @@ -1,14 +1,14 @@ -PandasFramePartitionManager -""""""""""""""""""""""""""" +PandasDataframePartitionManager +""""""""""""""""""""""""""""""" -The class is base for any partition manager class of ``pandas`` backend and serves as -intermediate level between ``pandas`` base frame and conforming :doc:`partition ` class. +The class is base for any partition manager class of ``pandas`` storage format and serves as +intermediate level between :doc:`Modin PandasDataframe <../dataframe>` and conforming :doc:`partition ` class. The class is responsible for partitions manipulation and applying a function to individual partitions: block partitions, row partitions or column partitions, i.e. the class can form axis partitions from block partitions to apply a function if an operation requires access to an entire column or row. The class translates frame API into partition API and also can have some preprocessing operations depending on the partition type for improving performance (for example, -:meth:`~modin.engines.base.frame.partition_manager.PandasFramePartitionManager.preprocess_func`). +:meth:`~modin.core.dataframe.pandas.partitioning.partition_manager.PandasDataframePartitionManager.preprocess_func`). Main task of partition manager is to keep knowledge of how partitions are stored and managed internal to itself, so surrounding code could use it via lean enough API without worrying about @@ -41,5 +41,5 @@ as well as manages conversion to numpy and pandas representations. Public API ---------- -.. autoclass:: modin.engines.base.frame.partition_manager.PandasFramePartitionManager +.. autoclass:: modin.core.dataframe.pandas.partitioning.partition_manager.PandasDataframePartitionManager :members: diff --git a/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/dataframe.rst b/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/dataframe.rst new file mode 100644 index 00000000000..196a6d993a6 --- /dev/null +++ b/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/dataframe.rst @@ -0,0 +1,12 @@ +PandasOnDaskDataframe +""""""""""""""""""""" + +The class is the specific implementation of the dataframe algebra for the `Dask` execution engine. +It serves as an intermediate level between ``pandas`` query compiler and +:py:class:`~modin.core.execution.dask.implementations.pandas_on_dask.partitioning.PandasOnDaskDataframePartitionManager`. + +Public API +---------- + +.. autoclass:: modin.core.execution.dask.implementations.pandas_on_dask.dataframe.dataframe.PandasOnDaskDataframe + :members: diff --git a/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/index.rst b/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/index.rst new file mode 100644 index 00000000000..b3ab0634b04 --- /dev/null +++ b/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/index.rst @@ -0,0 +1,18 @@ +PandasOnDask Dataframe implementation +===================================== + +This page describes the implementation of :doc:`base Dataframe Objects ` +specific for `PandasOnDask` backend. + +* :doc:`Dataframe ` +* :doc:`Partition ` +* :doc:`AxisPartition ` +* :doc:`PartitionManager ` + +.. toctree:: + :hidden: + + dataframe + partitioning/partition + partitioning/axis_partition + partitioning/partition_manager diff --git a/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/partitioning/axis_partition.rst b/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/partitioning/axis_partition.rst new file mode 100644 index 00000000000..62c5489909f --- /dev/null +++ b/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/partitioning/axis_partition.rst @@ -0,0 +1,30 @@ +PandasOnDaskDataframeAxisPartition +"""""""""""""""""""""""""""""""""" + +The class is the specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.axis_partition.PandasDataframeAxisPartition`, +providing the API to perform operations on an axis (column or row) partition using Dask as the execution engine. +The axis partition is a wrapper over a list of block partitions that are stored in this class. + +Public API +---------- + +.. autoclass:: modin.core.execution.dask.implementations.pandas_on_dask.partitioning.axis_partition.PandasOnDaskDataframeAxisPartition + :members: + +PandasOnDaskDataframeColumnPartition +"""""""""""""""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.core.execution.dask.implementations.pandas_on_dask.partitioning.axis_partition.PandasOnDaskDataframeColumnPartition + :members: + +PandasOnDaskDataframeRowPartition +""""""""""""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.core.execution.dask.implementations.pandas_on_dask.partitioning.axis_partition.PandasOnDaskDataframeRowPartition + :members: diff --git a/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/partitioning/partition.rst b/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/partitioning/partition.rst new file mode 100644 index 00000000000..d2d954ed597 --- /dev/null +++ b/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/partitioning/partition.rst @@ -0,0 +1,25 @@ +PandasOnDaskDataframePartition +"""""""""""""""""""""""""""""" + +The class is the specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.partition.PandasDataframePartition`, +providing the API to perform operations on a block partition, namely, ``pandas.DataFrame``, using Dask as the execution engine. + +In addition to wrapping a ``pandas.DataFrame``, the class also holds the following metadata: + +* ``length`` - length of ``pandas.DataFrame`` wrapped +* ``width`` - width of ``pandas.DataFrame`` wrapped +* ``ip`` - node IP address that holds ``pandas.DataFrame`` wrapped + +An operation on a block partition can be performed in two modes: + +* asynchronously_ - via :meth:`~modin.core.execution.dask.implementations.pandas_on_dask.partitioning.PandasOnDaskDataframePartition.apply` +* lazily_ - via :meth:`~modin.core.execution.dask.implementations.pandas_on_dask.partitioning.PandasOnDaskDataframePartition.add_to_apply_calls` + +Public API +---------- + +.. autoclass:: modin.core.execution.dask.implementations.pandas_on_dask.partitioning.PandasOnDaskDataframePartition + :members: + + .. _asynchronously: https://en.wikipedia.org/wiki/Asynchrony_(computer_programming) + .. _lazily: https://en.wikipedia.org/wiki/Lazy_evaluation diff --git a/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/partitioning/partition_manager.rst b/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/partitioning/partition_manager.rst new file mode 100644 index 00000000000..6077b257c14 --- /dev/null +++ b/docs/flow/modin/core/execution/dask/implementations/pandas_on_dask/partitioning/partition_manager.rst @@ -0,0 +1,12 @@ +PandasOnDaskDataframePartitionManager +""""""""""""""""""""""""""""""""""""" + +This class is the specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.partition_manager.PandasDataframePartitionManager` +using Dask as the execution engine. This class is responsible for partition manipulation and applying a funcion to +block/row/column partitions. + +Public API +---------- + +.. autoclass:: modin.core.execution.dask.implementations.pandas_on_dask.partitioning.partition_manager.PandasOnDaskDataframePartitionManager + :members: diff --git a/docs/flow/modin/core/execution/dispatching.rst b/docs/flow/modin/core/execution/dispatching.rst new file mode 100644 index 00000000000..2ad440e7205 --- /dev/null +++ b/docs/flow/modin/core/execution/dispatching.rst @@ -0,0 +1,54 @@ +:orphan: + +.. + TODO: add links to documentation for mentioned modules. + +Factories Module Description +"""""""""""""""""""""""""""" + +Brief description +''''''''''''''''' +Modin has several execution engines and storage formats, combining them together forms certain backends.  +Calling any DataFrame API function will end up in some backend-specific method. The responsibility of dispatching high-level API calls to +backend-specific function belongs to the :ref:`QueryCompiler `, which is determined at the time of the dataframe's creation by the factory of +the corresponding backend. The mission of this module is to route IO function calls from +the API level to its actual backend-specific implementations, which builds the +`QueryCompiler` of the appropriate backend. + +Backend representation via Factories +'''''''''''''''''''''''''''''''''''' +Backend is a combination of the :doc:`storage format ` and execution engine. +For example, ``PandasOnRay`` backend means the combination of the `pandas storage format` and `Ray` execution engine. + +Each storage format has its own :ref:`Query Compiler ` which compiles the most efficient queries +for the corresponding :doc:`Modin Dataframe ` implementation. Speaking about ``PandasOnRay`` +backend, its Query Compiler is :doc:`PandasQueryCompiler ` and the +Dataframe implementation is :doc:`PandasDataframe `, +which is general implementation for every backend of the pandas storage format. The actual implementation of ``PandasOnRay`` frame +is defined by the :doc:`PandasOnRayDataframe ` class that +extends ``PandasDataframe``. + +In the scope of this module, each backend is represented with a factory class located in +``modin/core/execution/dispatching/factories/factories.py``. Each factory contains a field that identifies the IO module of the corresponding backend. This IO module is +responsible for dispatching calls of IO functions to their actual implementations in the +underlying IO module. For more information about IO module visit :doc:`related doc `. + +Factory Dispatcher +'''''''''''''''''' +The ``modin.core.execution.dispatching.factories.dispatcher.FactoryDispatcher`` class provides +public methods whose interface corresponds to pandas IO functions, the only difference is that they return `QueryCompiler` of the +selected backend instead of DataFrame. ``FactoryDispatcher`` is responsible for routing +these IO calls to the factory which represents the selected backend. + +So when you call ``read_csv()`` function and your backend is ``PandasOnRay`` then the +trace would be the following: + +.. figure:: /img/factory_dispatching.svg + :align: center + +``modin.pandas.read_csv`` calls ``FactoryDispatcher.read_csv``, which calls ``.read_csv`` +function of the factory of the selected backend, in our case it's ``PandasOnRayFactory._read_csv``, +which in turn forwards this call to the actual implementation of ``read_csv`` — to the +``PandasOnRayIO.read_csv``. The result of ``modin.pandas.read_csv`` will return a Modin +DataFrame with the appropriate `QueryCompiler` bound to it, which is responsible for +dispatching all of the further function calls. diff --git a/docs/flow/modin/core/execution/python/implementations/pandas_on_python/dataframe.rst b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/dataframe.rst new file mode 100644 index 00000000000..3d10b3966dc --- /dev/null +++ b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/dataframe.rst @@ -0,0 +1,13 @@ +PandasOnPythonDataframe +""""""""""""""""""""""" + +The class is specific implementation of :py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` +for `Python` execution engine. It serves as an intermediate level between +:py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler` and +:py:class:`~modin.core.execution.python.implementations.pandas_on_python.partitioning.partition_manager.PandasOnPythonDataframePartitionManager`. + +Public API +---------- + +.. autoclass:: modin.core.execution.python.implementations.pandas_on_python.dataframe.dataframe.PandasOnPythonDataframe + :members: \ No newline at end of file diff --git a/docs/flow/modin/core/execution/python/implementations/pandas_on_python/index.rst b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/index.rst new file mode 100644 index 00000000000..aa27af03a5d --- /dev/null +++ b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/index.rst @@ -0,0 +1,20 @@ +PandasOnPython Dataframe implementation +======================================= + +This page describes implementation of :doc:`base Dataframe Objects ` +specific for `PandasOnPython` backend. Since Python engine doesn't allow computation parallelization, +operations on partitions are performed sequentially. The absence of parallelization doesn't give any +perfomance speed-up, so ``PandasOnPython`` is used for testing purposes only. + +* :doc:`Dataframe ` +* :doc:`Partition ` +* :doc:`AxisPartition ` +* :doc:`PartitionManager ` + +.. toctree:: + :hidden: + + dataframe + partitioning/partition + partitioning/axis_partition + partitioning/partition_manager \ No newline at end of file diff --git a/docs/flow/modin/core/execution/python/implementations/pandas_on_python/partitioning/axis_partition.rst b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/partitioning/axis_partition.rst new file mode 100644 index 00000000000..d0d5d7a1c96 --- /dev/null +++ b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/partitioning/axis_partition.rst @@ -0,0 +1,30 @@ +PandasOnPythonDataframeAxisPartition +"""""""""""""""""""""""""""""""""""" + +The class is specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.axis_partition.PandasDataframeAxisPartition`, +providing the API to perform operations on an axis partition, using Python +as the execution engine. The axis partition is made up of list of block +partitions that are stored in this class. + +Public API +---------- + +.. autoclass:: modin.core.execution.python.implementations.pandas_on_python.partitioning.axis_partition.PandasOnPythonDataframeAxisPartition + +PandasOnPythonFrameColumnPartition +"""""""""""""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.core.execution.python.implementations.pandas_on_python.partitioning.axis_partition.PandasOnPythonDataframeColumnPartition + :members: + +PandasOnPythonFrameRowPartition +""""""""""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.core.execution.python.implementations.pandas_on_python.partitioning.axis_partition.PandasOnPythonDataframeRowPartition + :members: \ No newline at end of file diff --git a/docs/flow/modin/engines/python/pandas_on_python/frame/partition.rst b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/partitioning/partition.rst similarity index 50% rename from docs/flow/modin/engines/python/pandas_on_python/frame/partition.rst rename to docs/flow/modin/core/execution/python/implementations/pandas_on_python/partitioning/partition.rst index 73d457682f0..6899ae9f1d7 100644 --- a/docs/flow/modin/engines/python/pandas_on_python/frame/partition.rst +++ b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/partitioning/partition.rst @@ -1,7 +1,7 @@ -PandasOnPythonFramePartition -"""""""""""""""""""""""""""" +PandasOnPythonDataframePartition +"""""""""""""""""""""""""""""""" -The class is specific implementation of :py:class:`~modin.engines.base.frame.partition.PandasFramePartition`, +The class is specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.partition.PandasDataframePartition`, providing the API to perform operations on a block partition using Python as the execution engine. In addition to wrapping a ``pandas.DataFrame``, the class also holds the following metadata: @@ -11,17 +11,17 @@ In addition to wrapping a ``pandas.DataFrame``, the class also holds the followi An operation on a block partition can be performed in two modes: -* immediately via :meth:`~modin.engines.python.pandas_on_python.frame.partition.PandasOnPythonFramePartition.apply` - +* immediately via :meth:`~modin.core.execution.python.implementations.pandas_on_python.partitioning.partition.PandasOnPythonDataframePartition.apply` - in this case accumulated call queue and new function will be executed immediately. -* lazily_ via :meth:`~modin.engines.python.pandas_on_python.frame.partition.PandasOnPythonFramePartition.add_to_apply_calls` - +* lazily_ via :meth:`~modin.core.execution.python.implementations.pandas_on_python.partitioning.partition.PandasOnPythonDataframePartition.add_to_apply_calls` - in this case function will be added to the call queue and no computations will be done at the moment. Public API ---------- -.. autoclass:: modin.engines.python.pandas_on_python.frame.partition.PandasOnPythonFramePartition +.. autoclass:: modin.core.execution.python.implementations.pandas_on_python.partitioning.partition.PandasOnPythonDataframePartition :members: .. _lazily: https://en.wikipedia.org/wiki/Lazy_evaluation \ No newline at end of file diff --git a/docs/flow/modin/core/execution/python/implementations/pandas_on_python/partitioning/partition_manager.rst b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/partitioning/partition_manager.rst new file mode 100644 index 00000000000..06058a70beb --- /dev/null +++ b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/partitioning/partition_manager.rst @@ -0,0 +1,12 @@ +PandasOnPythonDataframePartition +"""""""""""""""""""""""""""""""" + +The class is specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.partition_manager.PandasDataframePartitionManager` +using Python as the execution engine. This class is responsible for partitions manipulation and applying +a funcion to block/row/column partitions. + +Public API +---------- + +.. autoclass:: modin.core.execution.python.implementations.pandas_on_python.partitioning.partition_manager.PandasOnPythonDataframePartitionManager + :members: \ No newline at end of file diff --git a/docs/flow/modin/core/execution/ray/generic.rst b/docs/flow/modin/core/execution/ray/generic.rst new file mode 100644 index 00000000000..042edbe015b --- /dev/null +++ b/docs/flow/modin/core/execution/ray/generic.rst @@ -0,0 +1,19 @@ +:orphan: + +Generic Ray-based members +========================= + +Objects which are storage format agnostic but require specific Ray implementation +are placed in ``modin.core.execution.ray.generic``. + +Their purpose is to implement certain parallel I/O operations and to serve +as a foundation for building storage format specific objects: + +* :py:class:`~modin.core.execution.ray.generic.io.io.RayIO` -- implements parallel :meth:`~modin.core.execution.ray.generic.io.io.RayIO.to_csv` and :meth:`~modin.core.execution.ray.generic.io.io.RayIO.to_sql`. +* :py:class:`~modin.core.execution.ray.generic.partitioning.partition_manager.GenericRayDataframePartitionManager` -- implements parallel :meth:`~modin.core.execution.ray.generic.partitioning.partition_manager.GenericRayDataframePartitionManager.to_numpy`. + +.. autoclass:: modin.core.execution.ray.generic.io.io.RayIO + :members: + +.. autoclass:: modin.core.execution.ray.generic.partitioning.partition_manager.GenericRayDataframePartitionManager + :members: diff --git a/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/dataframe.rst b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/dataframe.rst new file mode 100644 index 00000000000..99f3423f655 --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/dataframe.rst @@ -0,0 +1,13 @@ +cuDFOnRayDataframe +"""""""""""""""""" + +The class is the specific implementation of :py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` +class using Ray distributed engine. It serves as an intermediate level between +:py:class:`~modin.core.storage_formats.cudf.query_compiler.cuDFQueryCompiler` and +:py:class:`~modin.core.execution.ray.implementations.cudf_on_ray.partitioning.partition_manager.cuDFOnRayDataframePartitionManager`. + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.cudf_on_ray.dataframe.dataframe.cuDFOnRayDataframe + :members: diff --git a/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/index.rst b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/index.rst new file mode 100644 index 00000000000..1d4174b346d --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/index.rst @@ -0,0 +1,20 @@ +cuDFOnRay Dataframe Implementation +================================== + +Modin implements ``Dataframe``, ``PartitionManager``, ``AxisPartition``, ``Partition`` and +``GPUManager`` classes specific for ``cuDFOnRay`` backend: + +* :doc:`Dataframe ` +* :doc:`Partition ` +* :doc:`AxisPartition ` +* :doc:`PartitionManager ` +* :doc:`GPUManager ` + +.. toctree:: + :hidden: + + dataframe + partitioning/partition + partitioning/axis_partition + partitioning/partition_manager + partitioning/gpu_manager \ No newline at end of file diff --git a/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/io.rst b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/io.rst new file mode 100644 index 00000000000..81483b1849b --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/io.rst @@ -0,0 +1,29 @@ +:orphan: + +IO details in cuDFOnRay backend +""""""""""""""""""""""""""""""" + +IO on cuDFOnRay backend is implemented using base classes ``BaseIO`` and ``CSVDispatcher``. + +cuDFOnRayIO +""""""""""" + +The class ``cuDFOnRayIO`` implements ``BaseIO`` base class using cuDFOnRay-backend +entities (``cuDFOnRayDataframe``, ``cuDFOnRayDataframePartition`` etc.). + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.cudf_on_ray.io.io.cuDFOnRayIO + :noindex: + :members: + + +cuDFCSVDispatcher +""""""""""""""""" + +The ``cuDFCSVDispatcher`` class implements ``CSVDispatcher`` using cuDFOnRay backend. + +.. autoclass:: modin.core.execution.ray.implementations.cudf_on_ray.io.text.csv_dispatcher.cuDFCSVDispatcher + :noindex: + :members: \ No newline at end of file diff --git a/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/axis_partition.rst b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/axis_partition.rst new file mode 100644 index 00000000000..6f9188f2cb0 --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/axis_partition.rst @@ -0,0 +1,30 @@ +cuDFOnRayDataframeAxisPartition +""""""""""""""""""""""""""""""" + +The class is a base class for any axis partition class based on Ray engine and cuDF storage format. This class provides +the API to perform operations on an axis partition, using Ray as the execution engine. The axis partition is +made up of list of block partitions that are stored in this class. + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.cudf_on_ray.partitioning.axis_partition.cuDFOnRayDataframeAxisPartition + :members: + +cuOnRayDataframeColumnPartition +""""""""""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.cudf_on_ray.partitioning.axis_partition.cuDFOnRayDataframeColumnPartition + :members: + +cuDFOnRayDataframeRowPartition +"""""""""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.cudf_on_ray.partitioning.axis_partition.cuDFOnRayDataframeRowPartition + :members: \ No newline at end of file diff --git a/docs/flow/modin/engines/ray/cudf_on_ray/frame/gpu_manager.rst b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/gpu_manager.rst similarity index 56% rename from docs/flow/modin/engines/ray/cudf_on_ray/frame/gpu_manager.rst rename to docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/gpu_manager.rst index f15a07d7aee..c5f591922a5 100644 --- a/docs/flow/modin/engines/ray/cudf_on_ray/frame/gpu_manager.rst +++ b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/gpu_manager.rst @@ -6,5 +6,5 @@ The Ray actor-class stores ``cuDF.DataFrame``-s and executes operations on it. Public API ---------- -.. autoclass:: modin.engines.ray.cudf_on_ray.frame.gpu_manager.GPUManager +.. autoclass:: modin.core.execution.ray.implementations.cudf_on_ray.partitioning.gpu_manager.GPUManager :members: \ No newline at end of file diff --git a/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/partition.rst b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/partition.rst new file mode 100644 index 00000000000..4f6c7608f1f --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/partition.rst @@ -0,0 +1,21 @@ +cuDFOnRayDataframePartition +""""""""""""""""""""""""""" + +The class is the specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.partition.PandasDataframePartition`, +providing the API to perform operations on a block partition, namely, ``cudf.DataFrame``, +using Ray as an execution engine. + +An operation on a block partition can be performed asynchronously_ in two ways: + +* :meth:`~modin.core.execution.ray.implementations.cudf_on_ray.partitioning.partition.cuDFOnRayDataframePartition.apply` returns ``ray.ObjectRef`` + with integer key of operation result from internal storage. +* :meth:`~modin.core.execution.ray.implementations.cudf_on_ray.partitioning.partition.cuDFOnRayDataframePartitionn.add_to_apply_calls` returns + a new :py:class:`~modin.core.execution.ray.implementations.cudf_on_ray.partitioning.partition.cuDFOnRayDataframePartition` object that is based on result of operation. + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.cudf_on_ray.partitioning.partition.cuDFOnRayDataframePartition + :members: + +.. _asynchronously: https://en.wikipedia.org/wiki/Asynchrony_(computer_programming) diff --git a/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/partition_manager.rst b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/partition_manager.rst new file mode 100644 index 00000000000..bfc7590eda5 --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/cudf_on_ray/partitioning/partition_manager.rst @@ -0,0 +1,14 @@ +cuDFOnRayDataframePartitionManager +"""""""""""""""""""""""""""""""""" + +This class is the specific implementation of :py:class:`~modin.core.execution.ray.generic.partitioning.partition_manager.GenericRayDataframePartitionManager`. +It serves as an intermediate level between :py:class:`~modin.core.execution.ray.implementations.cudf_on_ray.dataframe.dataframe.cuDFOnRayDataframe` +and :py:class:`~modin.core.execution.ray.implementations.cudf_on_ray.partitioning.partition.cuDFOnRayDataframePartition` class. +This class is responsible for partition manipulation and applying a function to +block/row/column partitions. + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.cudf_on_ray.partitioning.partition_manager.cuDFOnRayDataframePartitionManager + :members: diff --git a/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/dataframe.rst b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/dataframe.rst new file mode 100644 index 00000000000..26b85c8a8f4 --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/dataframe.rst @@ -0,0 +1,13 @@ +PandasOnRayDataframe +"""""""""""""""""""" + +The class is specific implementation of :py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` +class using Ray distributed engine. It serves as an intermediate level between +:py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler` and +:py:class:`~modin.core.execution.ray.implementations.pandas_on_ray.partitioning.partition_manager.PandasOnRayDataframePartitionManager`. + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.pandas_on_ray.dataframe.dataframe.PandasOnRayDataframe + :members: \ No newline at end of file diff --git a/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/index.rst b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/index.rst new file mode 100644 index 00000000000..53327036582 --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/index.rst @@ -0,0 +1,18 @@ +PandasOnRay Dataframe implementation +==================================== + +Modin implements ``Dataframe``, ``PartitionManager``, ``AxisPartition`` and ``Partition`` classes +specific for ``PandasOnRay`` backend: + +* :doc:`Dataframe ` +* :doc:`Partition ` +* :doc:`AxisPartition ` +* :doc:`PartitionManager ` + +.. toctree:: + :hidden: + + dataframe + partitioning/partition + partitioning/axis_partition + partitioning/partition_manager \ No newline at end of file diff --git a/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/axis_partition.rst b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/axis_partition.rst new file mode 100644 index 00000000000..ee6fcfb0ccd --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/axis_partition.rst @@ -0,0 +1,30 @@ +PandasOnRayDataframeAxisPartition +""""""""""""""""""""""""""""""""" + +This class is the specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.axis_partition.PandasDataframeAxisPartition`, +providing the API to perform operations on an axis partition, using Ray as an execution engine. The axis partition is +a wrapper over a list of block partitions that are stored in this class. + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.pandas_on_ray.partitioning.axis_partition.PandasOnRayDataframeAxisPartition + :members: + +PandasOnRayDataframeColumnPartition +""""""""""""""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.pandas_on_ray.partitioning.axis_partition.PandasOnRayDataframeColumnPartition + :members: + +PandasOnRayDataframeRowPartition +"""""""""""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.pandas_on_ray.partitioning.axis_partition.PandasOnRayDataframeRowPartition + :members: diff --git a/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.rst b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.rst new file mode 100644 index 00000000000..878dcc90d2d --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition.rst @@ -0,0 +1,25 @@ +PandasOnRayDataframePartition +""""""""""""""""""""""""""""" + +The class is the specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.partition.PandasDataframePartition`, +providing the API to perform operations on a block partition, namely, ``pandas.DataFrame``, using Ray as an execution engine. + +In addition to wrapping a ``pandas.DataFrame``, the class also holds the following metadata: + +* ``length`` - length of ``pandas.DataFrame`` wrapped +* ``width`` - width of ``pandas.DataFrame`` wrapped +* ``ip`` - node IP address that holds ``pandas.DataFrame`` wrapped + +An operation on a block partition can be performed in two modes: + +* asynchronously_ - via :meth:`~modin.core.execution.ray.implementations.pandas_on_ray.partitioning..PandasOnRayDataframePartition.apply` +* lazily_ - via :meth:`~modin.core.execution.ray.implementations.pandas_on_ray.partitioning..PandasOnRayDataframePartition.add_to_apply_calls` + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.pandas_on_ray.partitioning.PandasOnRayDataframePartition + :members: + +.. _asynchronously: https://en.wikipedia.org/wiki/Asynchrony_(computer_programming) +.. _lazily: https://en.wikipedia.org/wiki/Lazy_evaluation diff --git a/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition_manager.rst b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition_manager.rst new file mode 100644 index 00000000000..b43742358ed --- /dev/null +++ b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/partition_manager.rst @@ -0,0 +1,12 @@ +PandasOnRayDataframePartitionManager +"""""""""""""""""""""""""""""""""""" + +This class is the specific implementation of :py:class:`~modin.core.execution.ray.generic.partitioning.partition_manager.GenericRayDataframePartitionManager` +using Ray distributed engine. This class is responsible for partition manipulation and applying a funcion to +block/row/column partitions. + +Public API +---------- + +.. autoclass:: modin.core.execution.ray.implementations.pandas_on_ray.partitioning.partition_manager.PandasOnRayDataframePartitionManager + :members: diff --git a/docs/flow/modin/engines/base/io.rst b/docs/flow/modin/core/io/index.rst similarity index 93% rename from docs/flow/modin/engines/base/io.rst rename to docs/flow/modin/core/io/index.rst index a61856764f8..11f17e21b63 100644 --- a/docs/flow/modin/engines/base/io.rst +++ b/docs/flow/modin/core/io/index.rst @@ -7,14 +7,14 @@ High-Level Data Import Operation Workflow ''''''''''''''''''''''''''''''''''''''''' .. note:: - ``read_csv`` on pandas backend and Ray engine was taken as an example + ``read_csv`` on PandasOnRay backend was taken as an example in this chapter for reader convenience. For other import functions workflow and classes/functions naming convension will be the same. Data import operation workflow diagram is shown below. After user calls high-level ``modin.pandas.read_csv`` function, call is forwarded to the ``FactoryDispatcher``, -which defines which factory from ``modin.data_management.factories.factories`` and -backend/engine specific IO class should be used (for Ray engine and pandas backend +which defines which factory from ``modin.core.execution.dispatching.factories.factories`` and +backend specific IO class should be used (for Ray engine and pandas in-memory format IO class will be named ``PandasOnRayIO``). This class defines Modin frame and query compiler classes and ``read_*`` functions, which could be based on the following classes: ``RayTask`` - class for managing remote tasks by concrete distribution @@ -58,13 +58,13 @@ Modin file splitting mechanism differs depending on the data format type: specifies initial row offset and number of rows in the chunk. After file splitting is complete, chunks data is passed to the parser functions -(``PandasCSVParser.parse`` for ``read_csv`` function with pandas backend) for +(``PandasCSVParser.parse`` for ``read_csv`` function with pandas storage format) for further processing on each worker. Submodules Description '''''''''''''''''''''' -``modin.engines.base.io`` module is used mostly for storing utils and dispatcher +``modin.core.io`` module is used mostly for storing utils and dispatcher classes for reading files of different formats. * ``io.py`` - class containing basic utils and default implementation of IO functions. diff --git a/docs/flow/modin/backends/base/query_compiler.rst b/docs/flow/modin/core/storage_formats/base/query_compiler.rst similarity index 60% rename from docs/flow/modin/backends/base/query_compiler.rst rename to docs/flow/modin/core/storage_formats/base/query_compiler.rst index 0c388079938..65a9920b4ce 100644 --- a/docs/flow/modin/backends/base/query_compiler.rst +++ b/docs/flow/modin/core/storage_formats/base/query_compiler.rst @@ -3,43 +3,43 @@ Base Query Compiler Brief description ''''''''''''''''' -:py:class:`~modin.backends.base.query_compiler.BaseQueryCompiler` is an abstract class of query compiler, and sets a common interface +:py:class:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler` is an abstract class of query compiler, and sets a common interface that every other query compiler implementation in Modin must follow. The Base class contains a basic implementations for most of the interface methods, all of which :ref:`default to pandas `. -Subclassing :py:class:`~modin.backends.base.query_compiler.BaseQueryCompiler` +Subclassing :py:class:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler` ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' If you want to add new type of query compiler to Modin the new class needs to inherit -from :py:class:`~modin.backends.base.query_compiler.BaseQueryCompiler` and implement the abstract methods: +from :py:class:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler` and implement the abstract methods: -- :py:meth:`~modin.backends.base.query_compiler.BaseQueryCompiler.from_pandas` build query compiler from pandas DataFrame. -- :py:meth:`~modin.backends.base.query_compiler.BaseQueryCompiler.from_arrow` build query compiler from Arrow Table. -- :py:meth:`~modin.backends.base.query_compiler.BaseQueryCompiler.to_pandas` get query compiler representation as pandas DataFrame. -- :py:meth:`~modin.backends.base.query_compiler.BaseQueryCompiler.default_to_pandas` do :ref:`fallback to pandas ` for the passed function. -- :py:meth:`~modin.backends.base.query_compiler.BaseQueryCompiler.finalize` finalize object constructing. -- :py:meth:`~modin.backends.base.query_compiler.BaseQueryCompiler.free` trigger memory cleaning. +- :py:meth:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler.from_pandas` build query compiler from pandas DataFrame. +- :py:meth:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler.from_arrow` build query compiler from Arrow Table. +- :py:meth:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler.to_pandas` get query compiler representation as pandas DataFrame. +- :py:meth:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler.default_to_pandas` do :ref:`fallback to pandas ` for the passed function. +- :py:meth:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler.finalize` finalize object constructing. +- :py:meth:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler.free` trigger memory cleaning. (Please refer to the code documentation to see the full documentation for these functions). This is a minimum set of operations to ensure a new query compiler will function in the Modin architecture, -and the rest of the API can safely default to the pandas implementation via the base class implementation. To add a backend-specific implementation for -some of the query compiler operations, just override the corresponding method in your -query compiler class. +and the rest of the API can safely default to the pandas implementation via the base class implementation. +To add a storage format specific implementation for some of the query compiler operations, just override +the corresponding method in your query compiler class. Example ''''''' As an exercise let's define a new query compiler in `Modin`, just to see how easy it is. -Usually, the query compiler routes formed queries to the underlying :doc:`frame ` class, +Usually, the query compiler routes formed queries to the underlying :doc:`frame ` class, which submits operators to an execution engine. For the sake of simplicity and independence of this example, our execution engine will be the pandas itself. -We need to inherit a new class from :py:class:`~modin.backends.base.query_compiler.BaseQueryCompiler` and implement all of the abstract methods. +We need to inherit a new class from :py:class:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler` and implement all of the abstract methods. In this case, with `pandas` as an execution engine, it's trivial: .. code-block:: python - from modin.backends import BaseQueryCompiler + from modin.core.storage_formats import BaseQueryCompiler class DefaultToPandasQueryCompiler(BaseQueryCompiler): def __init__(self, pandas_df): @@ -88,10 +88,10 @@ and already can be used in Modin DataFrame: To be able to select this query compiler as default via ``modin.config`` you also need to define the combination of your query compiler and pandas execution engine as a backend by adding the corresponding factory. To find more information about factories, -visit :doc:`corresponding section ` of the flow documentation. +visit :doc:`corresponding section ` of the flow documentation. Query Compiler API '''''''''''''''''' -.. autoclass:: modin.backends.base.query_compiler.BaseQueryCompiler +.. autoclass:: modin.core.storage_formats.base.query_compiler.BaseQueryCompiler :members: diff --git a/docs/flow/modin/backends/index.rst b/docs/flow/modin/core/storage_formats/index.rst similarity index 51% rename from docs/flow/modin/backends/index.rst rename to docs/flow/modin/core/storage_formats/index.rst index ede36ddd13c..87bc56ed050 100644 --- a/docs/flow/modin/backends/index.rst +++ b/docs/flow/modin/core/storage_formats/index.rst @@ -1,3 +1,21 @@ +Storage Formats +=============== +Storage format is one of the components that form Modin's backend, it describes the type(s) +of objects that are stored in the partitions of the selected Modin Dataframe implementation. + +The base storage format in Modin is pandas. In that format, Modin Dataframe operates with +partitions that hold ``pandas.DataFrame`` objects. Pandas is the most natural storage format +since high-level DataFrame objects mirror its API, however, Modin's storage formats are not +limited to the objects that conform to pandas API. There are formats that are able to store +``pyarrow.Table`` (:doc:`pyarrow storage format `) or even instances of +SQL-like databases (:doc:`OmniSci storage format `) +inside Modin Dataframe's partitions. + +An honor of converting high-level pandas API calls to the ones that are understandable +by the corresponding backend's implementation belongs to the Query Compiler (QC) object. + +.. _query_compiler_def: + Query Compiler ============== @@ -8,22 +26,26 @@ Query Compiler pandas/index pyarrow/index -Modin supports several execution backends. Calling any DataFrame API function will end up in -some backend-specific method. The query compiler is a bridge between Modin Dataframe and -the actual execution engine. +Modin supports several execution backends (storage format + execution engine). Calling any +DataFrame API function will end up in some backend-specific method. The query compiler is +a bridge between pandas DataFrame API and the actual Modin Dataframe implementation for the +corresponding backend. .. image:: /img/simplified_query_flow.svg :align: right :width: 300px -Query compilers of all backends implement a common API, which is used by the Modin Dataframe +Each storage format has its own Query Compiler class that implements the most optimal +query routing for the selected format. + +Query compilers of all storage formats implement a common API, which is used by the Modin Dataframe to support dataframe queries. The role of the query compiler is to translate its API into a pairing of known user-defined functions and dataframe algebra operators. Each query compiler instance contains a -:doc:`frame ` of the selected execution engine and queries +:doc:`frame ` of the selected execution implementation and queries it with the compiled queries to get the result. The query compiler object is immutable, so the result of every method is a new query compiler. -The query compilers API is defined by the :py:class:`~modin.backends.base.query_compiler.BaseQueryCompiler` class +The query compilers API is defined by the :py:class:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler` class and may resemble the pandas API, however, they're not equal. The query compilers API is significantly reduced in comparison with pandas, since many corner cases or even the whole methods can be handled at the API layer with the existing API. @@ -41,14 +63,14 @@ interprets a one-column query compiler as Series or DataFrame depending on the o High-level module overview '''''''''''''''''''''''''' -This module houses submodules of all of the stable query compilers: +This module houses submodules of all of the stable storage formats: .. TODO: Insert a link to when it is added (issue #3323) - :doc:`Base module ` contains an abstract query compiler class which defines common API. -- :doc:`Pandas module ` contains query compiler and text parsers for pandas backend. -- cuDF module contains query compiler and text parsers for cuDF backend. -- :doc:`Pyarrow module ` contains query compiler and text parsers for Pyarrow backend. +- :doc:`Pandas module ` contains query compiler and text parsers for pandas storage format. +- cuDF module contains query compiler and text parsers for cuDF storage format. +- :doc:`Pyarrow module ` contains query compiler and text parsers for Pyarrow storage format. -You can find more in the :doc:`experimental section `. +You can find more in the :doc:`experimental section `. diff --git a/docs/flow/modin/backends/pandas/index.rst b/docs/flow/modin/core/storage_formats/pandas/index.rst similarity index 61% rename from docs/flow/modin/backends/pandas/index.rst rename to docs/flow/modin/core/storage_formats/pandas/index.rst index 7cf3310bbb2..ea85d613149 100644 --- a/docs/flow/modin/backends/pandas/index.rst +++ b/docs/flow/modin/core/storage_formats/pandas/index.rst @@ -1,7 +1,7 @@ :orphan: -Pandas backend -"""""""""""""" +Pandas storage format +""""""""""""""""""""" .. toctree:: :hidden: @@ -12,7 +12,7 @@ Pandas backend High-Level Module Overview '''''''''''''''''''''''''' This module houses submodules which are responsible for communication between -the query compiler level and execution engine level for pandas backend: +the query compiler level and execution implementation level for pandas storage format: -- :doc:`Query compiler ` is responsible for compiling efficient queries for :doc:`PandasFrame `. +- :doc:`Query compiler ` is responsible for compiling efficient queries for :doc:`PandasDataframe `. - :doc:`Parsers ` are responsible for parsing data on workers during IO operations. diff --git a/docs/flow/modin/backends/pandas/parsers.rst b/docs/flow/modin/core/storage_formats/pandas/parsers.rst similarity index 82% rename from docs/flow/modin/backends/pandas/parsers.rst rename to docs/flow/modin/core/storage_formats/pandas/parsers.rst index 5b4492a2271..ea023adb22e 100644 --- a/docs/flow/modin/backends/pandas/parsers.rst +++ b/docs/flow/modin/core/storage_formats/pandas/parsers.rst @@ -5,9 +5,9 @@ High-Level Module Overview This module houses parser classes (classes that are used for data parsing on the workers) and util functions for handling parsing results. ``PandasParser`` is base class for parser -classes with pandas backend, that contains methods common for all child classes. Other +classes with pandas format, that contains methods common for all child classes. Other module classes implement ``parse`` function that performs parsing of specific format data -basing on the chunk information computed in the ``modin.engines.base.io`` module. After +basing on the chunk information computed in the ``modin.core.io`` module. After chunk data parsing is completed, resulting ``DataFrame``-s will be splitted into smaller ``DataFrame``-s according to ``num_splits`` parameter, data type and number or rows/columns in the parsed chunk, and then these frames and some additional metadata will diff --git a/docs/flow/modin/core/storage_formats/pandas/query_compiler.rst b/docs/flow/modin/core/storage_formats/pandas/query_compiler.rst new file mode 100644 index 00000000000..151ac74f00b --- /dev/null +++ b/docs/flow/modin/core/storage_formats/pandas/query_compiler.rst @@ -0,0 +1,24 @@ +Pandas Query Compiler +""""""""""""""""""""" +:py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler` is responsible for compiling +a set of known predefined functions and pairing those with dataframe algebra operators in the +:doc:`PandasDataframe `, specifically for dataframes backed by +``pandas.DataFrame`` objects. + +Each :py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler` contains an instance of +:py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` which it queries to get the result. + +:py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler` supports methods built by +the :doc:`algebra module `. +If you want to add an implementation for a query compiler method, visit the algebra module documentation +to see whether the new operation fits one of the existing function templates and can be easily implemented +with them. + +Public API +'''''''''' +:py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler` implements common query compilers API +defined by the :py:class:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler`. Some functionalities +are inherited from the base class, in the following section only overridden methods are presented. + +.. autoclass:: modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler + :members: diff --git a/docs/flow/modin/backends/pyarrow/index.rst b/docs/flow/modin/core/storage_formats/pyarrow/index.rst similarity index 50% rename from docs/flow/modin/backends/pyarrow/index.rst rename to docs/flow/modin/core/storage_formats/pyarrow/index.rst index b8e945438af..9d499bf620c 100644 --- a/docs/flow/modin/backends/pyarrow/index.rst +++ b/docs/flow/modin/core/storage_formats/pyarrow/index.rst @@ -1,5 +1,5 @@ -PyArrow backend -""""""""""""""" +PyArrow storage format +"""""""""""""""""""""" .. toctree:: :hidden: @@ -7,20 +7,20 @@ PyArrow backend query_compiler parsers -In general, PyArrow backend follows the flow of the pandas backend: query compiler contains an instance of Modin Frame, +In general, PyArrow storage formats follow the flow of the pandas ones: query compiler contains an instance of Modin Frame, which is internally split into partitions. The main difference is that partitions contain PyArrow tables, instead of DataFrames like in pandas backend. To learn more about this approach please -visit :doc:`PyArrow execution engine ` section. +visit :doc:`PyArrow execution engine ` section. High-Level Module Overview '''''''''''''''''''''''''' This module houses submodules which are responsible for communication between -the query compiler level and execution engine level for PyArrow backend: +the query compiler level and execution implementation level for PyArrow storage format: -- :doc:`Query compiler ` is responsible for compiling efficient queries for :doc:`PyarrowOnRayFrame `. +- :doc:`Query compiler ` is responsible for compiling efficient queries for :doc:`PyarrowOnRayDataframe `. - :doc:`Parsers ` are responsible for parsing data on workers during IO operations. .. note:: - Currently the only one available PyArrow backend factory is ``PyarrowOnRay`` which works + Currently the only one available PyArrow storage format factory is ``PyarrowOnRay`` which works in :doc:`experimental mode ` only. diff --git a/docs/flow/modin/backends/pyarrow/parsers.rst b/docs/flow/modin/core/storage_formats/pyarrow/parsers.rst similarity index 53% rename from docs/flow/modin/backends/pyarrow/parsers.rst rename to docs/flow/modin/core/storage_formats/pyarrow/parsers.rst index b737b0e9aed..7d2a615d28d 100644 --- a/docs/flow/modin/backends/pyarrow/parsers.rst +++ b/docs/flow/modin/core/storage_formats/pyarrow/parsers.rst @@ -1,14 +1,14 @@ PyArrow Parsers Module Description """""""""""""""""""""""""""""""""" -This module houses parser classes that are responsible for data parsing on the workers for the PyArrow backend. -Parsers for PyArrow backends follow an interface of :doc:`pandas backend parsers `: +This module houses parser classes that are responsible for data parsing on the workers for the PyArrow storage format. +Parsers for PyArrow storage formats follow an interface of :doc:`pandas format parsers `: parser class of every file format implements ``parse`` method, which parses the specified part of the file and builds PyArrow tables from the parsed data, based on the specified chunk size and number of splits. -The resulted PyArrow tables will be used as a partitions payload in the :py:class:`~modin.experimental.engines.pyarrow_on_ray.frame.data.PyarrowOnRayFrame`. +The resulted PyArrow tables will be used as a partitions payload in the :py:class:`~modin.experimental.core.execution.ray.implementations.pyarrow_on_ray.dataframe.dataframe.PyarrowOnRayDataframe`. Public API '''''''''' -.. automodule:: modin.backends.pyarrow.parsers +.. automodule:: modin.core.storage_formats.pyarrow.parsers :members: diff --git a/docs/flow/modin/core/storage_formats/pyarrow/query_compiler.rst b/docs/flow/modin/core/storage_formats/pyarrow/query_compiler.rst new file mode 100644 index 00000000000..d621d197352 --- /dev/null +++ b/docs/flow/modin/core/storage_formats/pyarrow/query_compiler.rst @@ -0,0 +1,19 @@ +PyArrow Query Compiler +"""""""""""""""""""""" +:py:class:`~modin.core.storage_formats.pyarrow.query_compiler.PyarrowQueryCompiler` is responsible for compiling efficient +DataFrame algebra queries for the :doc:`PyarrowOnRayDataframe `, +the frames which are backed by ``pyarrow.Table`` objects. + +Each :py:class:`~modin.core.storage_formats.pyarrow.query_compiler.PyarrowQueryCompiler` contains an instance of +:py:class:`~modin.experimental.core.execution.ray.implementations.pyarrow_on_ray.dataframe.dataframe.PyarrowOnRayDataframe` which it queries to get the result. + +Public API +'''''''''' +:py:class:`~modin.core.storage_formats.pyarrow.query_compiler.PyarrowQueryCompiler` implements common query compilers API +defined by the :py:class:`~modin.core.storage_formats.base.query_compiler.BaseQueryCompiler`. Most functionalities +are inherited from :py:class:`~modin.core.storage_formats.pandas.query_compiler.PandasQueryCompiler`, in the following +section only overridden methods are presented. + +.. autoclass:: modin.core.storage_formats.pyarrow.query_compiler.PyarrowQueryCompiler + :members: + :show-inheritance: diff --git a/docs/flow/modin/data_management/factories.rst b/docs/flow/modin/data_management/factories.rst deleted file mode 100644 index 8de834ca46c..00000000000 --- a/docs/flow/modin/data_management/factories.rst +++ /dev/null @@ -1,47 +0,0 @@ -:orphan: - -.. - TODO: add links to documentation for mentioned modules. - -Factories Module Description -"""""""""""""""""""""""""""" - -Brief description -''''''''''''''''' -Modin has several execution backends. Calling any DataFrame API function will end up in -some backend-specific method. The responsibility of dispatching high-level API calls to -backend-specific function belongs to the `QueryCompiler`, which is determined at the time of the dataframe's creation by the factory of -the corresponding backend. The mission of this module is to route IO function calls from -the API level to its actual backend-specific implementations, which builds the -`QueryCompiler` of the appropriate backend. - -Backend representation via Factories -'''''''''''''''''''''''''''''''''''' -Backend is a combination of the `QueryCompiler` and `Execution Engine`. For example, -``PandasOnRay`` backend means the combination of the ``PandasQueryCompiler`` and ``Ray`` -engine. - -In the scope of this module, each backend is represented with a factory class located in -``modin/data_management/factories/factories.py``. Each factory contains a field that identifies the IO module of the corresponding backend. This IO module is -responsible for dispatching calls of IO functions to their actual implementations in the -underlying IO module. For more information about IO module visit :doc:`related doc `. - -Factory Dispatcher -'''''''''''''''''' -The ``modin.data_management.factories.dispatcher.FactoryDispatcher`` class provides public methods whose interface corresponds to -pandas IO functions, the only difference is that they return `QueryCompiler` of the -selected backend instead of DataFrame. ``FactoryDispatcher`` is responsible for routing -these IO calls to the factory which represents the selected backend. - -So when you call ``read_csv()`` function and your backend is ``PandasOnRay`` then the -trace would be the following: - -.. figure:: /img/factory_dispatching.svg - :align: center - -``modin.pandas.read_csv`` calls ``FactoryDispatcher.read_csv``, which calls ``.read_csv`` -function of the factory of the selected backend, in our case it's ``PandasOnRayFactory._read_csv``, -which in turn forwards this call to the actual implementation of ``read_csv`` — to the -``PandasOnRayIO.read_csv``. The result of ``modin.pandas.read_csv`` will return a Modin -DataFrame with the appropriate `QueryCompiler` bound to it, which is responsible for -dispatching all of the further function calls. diff --git a/docs/flow/modin/engines/base/frame/axis_partition.rst b/docs/flow/modin/engines/base/frame/axis_partition.rst deleted file mode 100644 index 757a969a615..00000000000 --- a/docs/flow/modin/engines/base/frame/axis_partition.rst +++ /dev/null @@ -1,39 +0,0 @@ -BaseFrameAxisPartition -"""""""""""""""""""""" - -The class is base for any axis partition class and serves as the last level on which -operations that were conveyed from the partition manager are being performed on an entire column or row. - -The class provides an API that has to be overridden by the child classes in order to manipulate -on a list of block partitions (making up column or row partition) they store. - -The procedures that use this class and its methods assume that they have some global knowledge -about the entire axis. This may require the implementation to use concatenation or append on the -list of block partitions. - -The ``PandasFramePartitionManager`` object that controls these objects (through the API exposed here) has an invariant -that requires that this object is never returned from a function. It assumes that there will always be -``PandasFramePartition`` object stored and structures itself accordingly. - -Public API ----------- - -.. autoclass:: modin.engines.base.frame.axis_partition.BaseFrameAxisPartition - :members: - -PandasFrameAxisPartition -"""""""""""""""""""""""" - -The class is base for any axis partition class of ``pandas`` backend. - -Subclasses must implement ``list_of_blocks`` which represents data wrapped by the ``PandasFramePartition`` -objects and creates something interpretable as a pandas DataFrame. - -See ``modin.engines.ray.pandas_on_ray.axis_partition.PandasOnRayFrameAxisPartition`` -for an example on how to override/use this class when the implementation needs to be augmented. - -Public API ----------- - -.. autoclass:: modin.engines.base.frame.axis_partition.PandasFrameAxisPartition - :members: diff --git a/docs/flow/modin/engines/base/frame/data.rst b/docs/flow/modin/engines/base/frame/data.rst deleted file mode 100644 index 9f744641499..00000000000 --- a/docs/flow/modin/engines/base/frame/data.rst +++ /dev/null @@ -1,33 +0,0 @@ -PandasFrame -""""""""""" - -The class is base for any frame class of ``pandas`` backend and serves as the intermediate level -between ``pandas`` query compiler and conforming partition manager. All queries formed -at the query compiler layer are ingested by this class and then conveyed jointly with the stored partitions -into the partition manager for processing. Direct partitions manipulation by this class is prohibited except -cases if an operation is striclty private or protected and called inside of the class only. The class provides -significantly reduced set of operations that fit plenty of pandas operations. - -Main tasks of ``PandasFrame`` are storage of partitions, manipulation with labels of axes and -providing set of methods to perform operations on the internal data. - -As mentioned above, ``PandasFrame`` shouldn't work with stored partitions directly and -the responsibility for modifying partitions array has to lay on :doc:`partition_manager`. For example, method -:meth:`~modin.engines.base.frame.data.PandasFrame.broadcast_apply_full_axis` redirects applying -function to ``PandasFramePartitionManager.broadcast_axis_partitions`` method. - -``PandasFrame`` can be created from ``pandas.DataFrame``, ``pyarrow.Table`` -(methods :meth:`~modin.engines.base.frame.data.PandasFrame.from_pandas`, -:meth:`~modin.engines.base.frame.data.PandasFrame.from_arrow` are used respectively). Also, -``PandasFrame`` can be converted to ``np.array``, ``pandas.DataFrame`` -(methods :meth:`~modin.engines.base.frame.data.PandasFrame.to_numpy`, -:meth:`~modin.engines.base.frame.data.PandasFrame.to_pandas` are used respectively). - -Manipulation with labels of axes happens using internal methods for changing labels on the new, -adding prefixes/suffixes etc. - -Public API ----------- - -.. autoclass:: modin.engines.base.frame.data.PandasFrame - :members: diff --git a/docs/flow/modin/engines/base/frame/index.rst b/docs/flow/modin/engines/base/frame/index.rst deleted file mode 100644 index 3e431474ea8..00000000000 --- a/docs/flow/modin/engines/base/frame/index.rst +++ /dev/null @@ -1,18 +0,0 @@ -Base Frame Objects -================== - -Modin paritions data to scale efficiently. -To keep track of everything a few key classes are introduced: ``Frame``, ``Partition``, ``AxisPartiton`` and ``PartitionManager``. - -* :doc:`Frame ` is the class conforming to DataFrame Algebra. -* :doc:`Partition ` is an element of a NxM grid which, when combined, represents the ``Frame`` -* :doc:`AxisPartition ` is a joined group of ``Parition``-s along some axis (either rows or labels) -* :doc:`PartitionManager ` is the manager that implements the primitives used for DataFrame Algebra operations over ``Partition``-s - -.. toctree:: - :hidden: - - data - partition - axis_partition - partition_manager diff --git a/docs/flow/modin/engines/dask/pandas_on_dask/frame/axis_partition.rst b/docs/flow/modin/engines/dask/pandas_on_dask/frame/axis_partition.rst deleted file mode 100644 index 320aa7755ba..00000000000 --- a/docs/flow/modin/engines/dask/pandas_on_dask/frame/axis_partition.rst +++ /dev/null @@ -1,30 +0,0 @@ -PandasOnDaskFrameAxisPartition -"""""""""""""""""""""""""""""" - -The class is the specific implementation of :py:class:`~modin.engines.base.frame.axis_partition.PandasFrameAxisPartition`, -providing the API to perform operations on an axis (column or row) partition using Dask as the execution engine. -The axis partition is a wrapper over a list of block partitions that are stored in this class. - -Public API ----------- - -.. autoclass:: modin.engines.dask.pandas_on_dask.frame.axis_partition.PandasOnDaskFrameAxisPartition - :members: - -PandasOnDaskFrameColumnPartition -"""""""""""""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.engines.dask.pandas_on_dask.frame.axis_partition.PandasOnDaskFrameColumnPartition - :members: - -PandasOnDaskFrameRowPartition -""""""""""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.engines.dask.pandas_on_dask.frame.axis_partition.PandasOnDaskFrameRowPartition - :members: diff --git a/docs/flow/modin/engines/dask/pandas_on_dask/frame/data.rst b/docs/flow/modin/engines/dask/pandas_on_dask/frame/data.rst deleted file mode 100644 index 01f0d2f44b9..00000000000 --- a/docs/flow/modin/engines/dask/pandas_on_dask/frame/data.rst +++ /dev/null @@ -1,12 +0,0 @@ -PandasOnDaskFrame -""""""""""""""""" - -The class is the specific implementation of the dataframe algebra for the ``PandasOnDask`` backend. -It serves as an intermediate level between ``pandas`` query compiler and -:py:class:`~modin.engines.dask.pandas_on_dask.frame.partition_manager.PandasOnDaskFramePartitionManager`. - -Public API ----------- - -.. autoclass:: modin.engines.dask.pandas_on_dask.frame.data.PandasOnDaskFrame - :members: diff --git a/docs/flow/modin/engines/dask/pandas_on_dask/frame/index.rst b/docs/flow/modin/engines/dask/pandas_on_dask/frame/index.rst deleted file mode 100644 index fd7eb90d002..00000000000 --- a/docs/flow/modin/engines/dask/pandas_on_dask/frame/index.rst +++ /dev/null @@ -1,18 +0,0 @@ -PandasOnDask Frame Objects -========================== - -This page describes the implementation of :doc:`Base Frame Objects ` -specific for ``PandasOnDask`` backend. - -* :doc:`Frame ` -* :doc:`Partition ` -* :doc:`AxisPartition ` -* :doc:`PartitionManager ` - -.. toctree:: - :hidden: - - data - partition - axis_partition - partition_manager diff --git a/docs/flow/modin/engines/dask/pandas_on_dask/frame/partition.rst b/docs/flow/modin/engines/dask/pandas_on_dask/frame/partition.rst deleted file mode 100644 index b3cd59bf8b1..00000000000 --- a/docs/flow/modin/engines/dask/pandas_on_dask/frame/partition.rst +++ /dev/null @@ -1,25 +0,0 @@ -PandasOnDaskFramePartition -"""""""""""""""""""""""""" - -The class is the specific implementation of :py:class:`~modin.engines.base.frame.partition.PandasFramePartition`, -providing the API to perform operations on a block partition, namely, ``pandas.DataFrame``, using Dask as the execution engine. - -In addition to wrapping a pandas DataFrame, the class also holds the following metadata: - -* ``length`` - length of pandas DataFrame wrapped -* ``width`` - width of pandas DataFrame wrapped -* ``ip`` - node IP address that holds pandas DataFrame wrapped - -An operation on a block partition can be performed in two modes: - -* asynchronously_ - via :meth:`~modin.engines.dask.pandas_on_dask.frame.partition.PandasOnDaskFramePartition.apply` -* lazily_ - via :meth:`~modin.engines.dask.pandas_on_dask.frame.partition.PandasOnDaskFramePartition.add_to_apply_calls` - -Public API ----------- - -.. autoclass:: modin.engines.dask.pandas_on_dask.frame.partition.PandasOnDaskFramePartition - :members: - - .. _asynchronously: https://en.wikipedia.org/wiki/Asynchrony_(computer_programming) - .. _lazily: https://en.wikipedia.org/wiki/Lazy_evaluation diff --git a/docs/flow/modin/engines/dask/pandas_on_dask/frame/partition_manager.rst b/docs/flow/modin/engines/dask/pandas_on_dask/frame/partition_manager.rst deleted file mode 100644 index 7af74acdb3a..00000000000 --- a/docs/flow/modin/engines/dask/pandas_on_dask/frame/partition_manager.rst +++ /dev/null @@ -1,12 +0,0 @@ -PandasOnDaskFramePartitionManager -""""""""""""""""""""""""""""""""" - -This class is the specific implementation of :py:class:`~modin.engines.base.frame.partition_manager.PandasFramePartitionManager` -using Dask as the execution engine. This class is responsible for partition manipulation and applying a funcion to -block/row/column partitions. - -Public API ----------- - -.. autoclass:: modin.engines.dask.pandas_on_dask.frame.partition_manager.PandasOnDaskFramePartitionManager - :members: diff --git a/docs/flow/modin/engines/python/pandas_on_python/frame/axis_partition.rst b/docs/flow/modin/engines/python/pandas_on_python/frame/axis_partition.rst deleted file mode 100644 index c72c2132e3a..00000000000 --- a/docs/flow/modin/engines/python/pandas_on_python/frame/axis_partition.rst +++ /dev/null @@ -1,30 +0,0 @@ -PandasOnPythonFrameAxisPartition -"""""""""""""""""""""""""""""""" - -The class is specific implementation of :py:class:`~modin.engines.base.frame.axis_partition.PandasFrameAxisPartition`, -providing the API to perform operations on an axis partition, using Python -as the execution engine. The axis partition is made up of list of block -partitions that are stored in this class. - -Public API ----------- - -.. autoclass:: modin.engines.python.pandas_on_python.frame.axis_partition.PandasOnPythonFrameAxisPartition - -PandasOnPythonFrameColumnPartition -"""""""""""""""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.engines.python.pandas_on_python.frame.axis_partition.PandasOnPythonFrameColumnPartition - :members: - -PandasOnPythonFrameRowPartition -""""""""""""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.engines.python.pandas_on_python.frame.axis_partition.PandasOnPythonFrameRowPartition - :members: \ No newline at end of file diff --git a/docs/flow/modin/engines/python/pandas_on_python/frame/data.rst b/docs/flow/modin/engines/python/pandas_on_python/frame/data.rst deleted file mode 100644 index 25e96b454f7..00000000000 --- a/docs/flow/modin/engines/python/pandas_on_python/frame/data.rst +++ /dev/null @@ -1,13 +0,0 @@ -PandasOnPythonFrame -""""""""""""""""""" - -The class is specific implementation of :py:class:`~modin.engines.base.frame.data.PandasFrame` -for ``PandasOnPython`` backend. It serves as an intermediate level between -:py:class:`~modin.backends.pandas.query_compiler.PandasQueryCompiler` and -:py:class:`~modin.engines.python.pandas_on_python.frame.partition_manager.PandasOnPythonFramePartitionManager`. - -Public API ----------- - -.. autoclass:: modin.engines.python.pandas_on_python.frame.data.PandasOnPythonFrame - :members: \ No newline at end of file diff --git a/docs/flow/modin/engines/python/pandas_on_python/frame/index.rst b/docs/flow/modin/engines/python/pandas_on_python/frame/index.rst deleted file mode 100644 index 81d02d0a259..00000000000 --- a/docs/flow/modin/engines/python/pandas_on_python/frame/index.rst +++ /dev/null @@ -1,20 +0,0 @@ -PandasOnPython Frame Objects -============================ - -This page describes implementation of :doc:`Base Frame Objects ` -specific for ``PandasOnPython`` backend. Since Python engine doesn't allow computation parallelization, -operations on partitions are performed sequentially. The absence of parallelization doesn't give any -perfomance speed-up, so ``PandasOnPython`` is used for testing purposes only. - -* :doc:`Frame ` -* :doc:`Partition ` -* :doc:`AxisPartition ` -* :doc:`PartitionManager ` - -.. toctree:: - :hidden: - - data - partition - axis_partition - partition_manager \ No newline at end of file diff --git a/docs/flow/modin/engines/python/pandas_on_python/frame/partition_manager.rst b/docs/flow/modin/engines/python/pandas_on_python/frame/partition_manager.rst deleted file mode 100644 index 339413d7616..00000000000 --- a/docs/flow/modin/engines/python/pandas_on_python/frame/partition_manager.rst +++ /dev/null @@ -1,12 +0,0 @@ -PythonFrameManager -"""""""""""""""""" - -The class is specific implementation of :py:class:`~modin.engines.base.frame.partition_manager.PandasFramePartitionManager` -using Python as the execution engine. This class is responsible for partitions manipulation and applying -a funcion to block/row/column partitions. - -Public API ----------- - -.. autoclass:: modin.engines.python.pandas_on_python.frame.partition_manager.PandasOnPythonFramePartitionManager - :members: \ No newline at end of file diff --git a/docs/flow/modin/engines/ray/cudf_on_ray/frame/axis_partition.rst b/docs/flow/modin/engines/ray/cudf_on_ray/frame/axis_partition.rst deleted file mode 100644 index 44da1af3883..00000000000 --- a/docs/flow/modin/engines/ray/cudf_on_ray/frame/axis_partition.rst +++ /dev/null @@ -1,30 +0,0 @@ -cuDFOnRayFrameAxisPartition -""""""""""""""""""""""""""" - -The class is a base class for any axis partition class based on Ray engine and cuDF backend. This class provides -the API to perform operations on an axis partition, using Ray as the execution engine. The axis partition is -made up of list of block partitions that are stored in this class. - -Public API ----------- - -.. autoclass:: modin.engines.ray.cudf_on_ray.frame.axis_partition.cuDFOnRayFrameAxisPartition - :members: - -cuOnRayFrameColumnPartition -""""""""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.engines.ray.cudf_on_ray.frame.axis_partition.cuDFOnRayFrameColumnPartition - :members: - -cuDFOnRayFrameRowPartition -"""""""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.engines.ray.cudf_on_ray.frame.axis_partition.cuDFOnRayFrameRowPartition - :members: \ No newline at end of file diff --git a/docs/flow/modin/engines/ray/cudf_on_ray/frame/data.rst b/docs/flow/modin/engines/ray/cudf_on_ray/frame/data.rst deleted file mode 100644 index 03f04e96a5b..00000000000 --- a/docs/flow/modin/engines/ray/cudf_on_ray/frame/data.rst +++ /dev/null @@ -1,13 +0,0 @@ -cuDFOnRayFrame -"""""""""""""" - -The class is the specific implementation of :py:class:`~modin.engines.base.frame.data.PandasFrame` -class using Ray distributed engine. It serves as an intermediate level between -:py:class:`~modin.backends.cudf.query_compiler.cuDFQueryCompiler` and -:py:class:`~modin.engines.ray.cudf_on_ray.frame.partition_manager.cuDFOnRayFramePartitionManager`. - -Public API ----------- - -.. autoclass:: modin.engines.ray.cudf_on_ray.frame.data.cuDFOnRayFrame - :members: diff --git a/docs/flow/modin/engines/ray/cudf_on_ray/frame/index.rst b/docs/flow/modin/engines/ray/cudf_on_ray/frame/index.rst deleted file mode 100644 index bb47abce77e..00000000000 --- a/docs/flow/modin/engines/ray/cudf_on_ray/frame/index.rst +++ /dev/null @@ -1,20 +0,0 @@ -cuDFOnRay Frame Implementation -============================== - -Modin implements ``Frame``, ``PartitionManager``, ``AxisPartition``, ``Partition`` and -``GPUManager`` classes specific for ``cuDFOnRay`` backend: - -* :doc:`Frame ` -* :doc:`Partition ` -* :doc:`AxisPartition ` -* :doc:`PartitionManager ` -* :doc:`GPUManager ` - -.. toctree:: - :hidden: - - data - partition - axis_partition - partition_manager - gpu_manager \ No newline at end of file diff --git a/docs/flow/modin/engines/ray/cudf_on_ray/frame/partition.rst b/docs/flow/modin/engines/ray/cudf_on_ray/frame/partition.rst deleted file mode 100644 index a77dec3e5c3..00000000000 --- a/docs/flow/modin/engines/ray/cudf_on_ray/frame/partition.rst +++ /dev/null @@ -1,21 +0,0 @@ -cuDFOnRayFramePartition -""""""""""""""""""""""" - -The class is the specific implementation of :py:class:`~modin.engines.base.frame.partition.PandasFramePartition`, -providing the API to perform operations on a block partition, namely, ``cudf.DataFrame``, -using Ray as an execution engine. - -An operation on a block partition can be performed asynchronously_ in two ways: - -* :meth:`~modin.engines.ray.cudf_on_ray.frame.partition.cuDFOnRayFramePartition.apply` returns ``ray.ObjectRef`` - with integer key of operation result from internal storage. -* :meth:`~modin.engines.ray.cudf_on_ray.frame.partition.cuDFOnRayFramePartition.add_to_apply_calls` returns - a new :py:class:`~modin.engines.ray.cudf_on_ray.frame.partition.cuDFOnRayFramePartition` object that is based on result of operation. - -Public API ----------- - -.. autoclass:: modin.engines.ray.cudf_on_ray.frame.partition.cuDFOnRayFramePartition - :members: - -.. _asynchronously: https://en.wikipedia.org/wiki/Asynchrony_(computer_programming) diff --git a/docs/flow/modin/engines/ray/cudf_on_ray/frame/partition_manager.rst b/docs/flow/modin/engines/ray/cudf_on_ray/frame/partition_manager.rst deleted file mode 100644 index f7965716353..00000000000 --- a/docs/flow/modin/engines/ray/cudf_on_ray/frame/partition_manager.rst +++ /dev/null @@ -1,14 +0,0 @@ -cuDFOnRayFramePartitionManager -"""""""""""""""""""""""""""""" - -This class is the specific implementation of :py:class:`~modin.engines.ray.generic.frame.partition_manager.GenericRayFramePartitionManager`. -It serves as an intermediate level between :py:class:`~modin.engines.ray.cudf_on_ray.frame.data.cuDFOnRayFrame` -and :py:class:`~modin.engines.ray.cudf_on_ray.frame.partition.cuDFOnRayFramePartition` class. -This class is responsible for partition manipulation and applying a function to -block/row/column partitions. - -Public API ----------- - -.. autoclass:: modin.engines.ray.cudf_on_ray.frame.partition_manager.cuDFOnRayFramePartitionManager - :members: diff --git a/docs/flow/modin/engines/ray/cudf_on_ray/io.rst b/docs/flow/modin/engines/ray/cudf_on_ray/io.rst deleted file mode 100644 index 6b3e3a9233a..00000000000 --- a/docs/flow/modin/engines/ray/cudf_on_ray/io.rst +++ /dev/null @@ -1,29 +0,0 @@ -:orphan: - -IO details in cuDF backend -"""""""""""""""""""""""""" - -IO on cuDF backend is implemented using base classes ``BaseIO`` and ``CSVDispatcher``. - -cuDFOnRayIO -""""""""""" - -The class ``cuDFOnRayIO`` implements ``BaseIO`` base class using cuDF-backend -entities (``cuDFOnRayFrame``, ``cuDFOnRayFramePartition`` etc.). - -Public API ----------- - -.. autoclass:: modin.engines.ray.cudf_on_ray.io.io.cuDFOnRayIO - :noindex: - :members: - - -cuDFCSVDispatcher -""""""""""""""""" - -The ``cuDFCSVDispatcher`` class implements ``CSVDispatcher`` using cuDF backend. - -.. autoclass:: modin.engines.ray.cudf_on_ray.io.text.csv_dispatcher.cuDFCSVDispatcher - :noindex: - :members: \ No newline at end of file diff --git a/docs/flow/modin/engines/ray/generic.rst b/docs/flow/modin/engines/ray/generic.rst deleted file mode 100644 index 4e8cfe47d15..00000000000 --- a/docs/flow/modin/engines/ray/generic.rst +++ /dev/null @@ -1,19 +0,0 @@ -:orphan: - -Generic Ray-based members -========================= - -Objects which are backend-agnostic but require specific Ray implementation -are placed in ``modin.engines.ray.generic``. - -Their purpose is to implement certain parallel I/O operations and to serve -as a foundation for building backend-specific objects: - -* :py:class:`~modin.engines.ray.generic.io.RayIO` -- implements parallel :meth:`~modin.engines.ray.generic.io.RayIO.to_csv` and :meth:`~modin.engines.ray.generic.io.RayIO.to_sql`. -* :py:class:`~modin.engines.ray.generic.frame.partition_manager.GenericRayFramePartitionManager` -- implements parallel :meth:`~modin.engines.ray.generic.frame.partition_manager.GenericRayFramePartitionManager.to_numpy`. - -.. autoclass:: modin.engines.ray.generic.io.RayIO - :members: - -.. autoclass:: modin.engines.ray.generic.frame.partition_manager.GenericRayFramePartitionManager - :members: diff --git a/docs/flow/modin/engines/ray/pandas_on_ray/frame/axis_partition.rst b/docs/flow/modin/engines/ray/pandas_on_ray/frame/axis_partition.rst deleted file mode 100644 index b5254d15184..00000000000 --- a/docs/flow/modin/engines/ray/pandas_on_ray/frame/axis_partition.rst +++ /dev/null @@ -1,30 +0,0 @@ -PandasOnRayFrameAxisPartition -""""""""""""""""""""""""""""" - -This class is the specific implementation of :py:class:`~modin.engines.base.frame.axis_partition.PandasFrameAxisPartition`, -providing the API to perform operations on an axis partition, using Ray as an execution engine. The axis partition is -a wrapper over a list of block partitions that are stored in this class. - -Public API ----------- - -.. autoclass:: modin.engines.ray.pandas_on_ray.frame.axis_partition.PandasOnRayFrameAxisPartition - :members: - -PandasOnRayFrameColumnPartition -""""""""""""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.engines.ray.pandas_on_ray.frame.axis_partition.PandasOnRayFrameColumnPartition - :members: - -PandasOnRayFrameRowPartition -"""""""""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.engines.ray.pandas_on_ray.frame.axis_partition.PandasOnRayFrameRowPartition - :members: diff --git a/docs/flow/modin/engines/ray/pandas_on_ray/frame/data.rst b/docs/flow/modin/engines/ray/pandas_on_ray/frame/data.rst deleted file mode 100644 index bea558929e5..00000000000 --- a/docs/flow/modin/engines/ray/pandas_on_ray/frame/data.rst +++ /dev/null @@ -1,13 +0,0 @@ -PandasOnRayFrame -"""""""""""""""" - -The class is specific implementation of :py:class:`~modin.engines.base.frame.data.PandasFrame` -class using Ray distributed engine. It serves as an intermediate level between -:py:class:`~modin.backends.pandas.query_compiler.PandasQueryCompiler` and -:py:class:`~modin.engines.ray.pandas_on_ray.frame.partition_manager.PandasOnRayFramePartitionManager`. - -Public API ----------- - -.. autoclass:: modin.engines.ray.pandas_on_ray.frame.data.PandasOnRayFrame - :members: \ No newline at end of file diff --git a/docs/flow/modin/engines/ray/pandas_on_ray/frame/index.rst b/docs/flow/modin/engines/ray/pandas_on_ray/frame/index.rst deleted file mode 100644 index 174ab30f2b5..00000000000 --- a/docs/flow/modin/engines/ray/pandas_on_ray/frame/index.rst +++ /dev/null @@ -1,18 +0,0 @@ -PandasOnRay Frame Implementation -================================ - -Modin implements ``Frame``, ``PartitionManager``, ``AxisPartition`` and ``Partition`` classes -specific for ``PandasOnRay`` backend: - -* :doc:`Frame ` -* :doc:`Partition ` -* :doc:`AxisPartition ` -* :doc:`PartitionManager ` - -.. toctree:: - :hidden: - - data - partition - axis_partition - partition_manager \ No newline at end of file diff --git a/docs/flow/modin/engines/ray/pandas_on_ray/frame/partition.rst b/docs/flow/modin/engines/ray/pandas_on_ray/frame/partition.rst deleted file mode 100644 index 1560e1c38e1..00000000000 --- a/docs/flow/modin/engines/ray/pandas_on_ray/frame/partition.rst +++ /dev/null @@ -1,25 +0,0 @@ -PandasOnRayFramePartition -""""""""""""""""""""""""" - -The class is the specific implementation of :py:class:`~modin.engines.base.frame.partition.PandasFramePartition`, -providing the API to perform operations on a block partition, namely, ``pandas.DataFrame``, using Ray as an execution engine. - -In addition to wrapping a pandas DataFrame, the class also holds the following metadata: - -* ``length`` - length of pandas DataFrame wrapped -* ``width`` - width of pandas DataFrame wrapped -* ``ip`` - node IP address that holds pandas DataFrame wrapped - -An operation on a block partition can be performed in two modes: - -* asynchronously_ - via :meth:`~modin.engines.ray.pandas_on_ray.frame.partition.PandasOnRayFramePartition.apply` -* lazily_ - via :meth:`~modin.engines.ray.pandas_on_ray.frame.partition.PandasOnRayFramePartition.add_to_apply_calls` - -Public API ----------- - -.. autoclass:: modin.engines.ray.pandas_on_ray.frame.partition.PandasOnRayFramePartition - :members: - -.. _asynchronously: https://en.wikipedia.org/wiki/Asynchrony_(computer_programming) -.. _lazily: https://en.wikipedia.org/wiki/Lazy_evaluation diff --git a/docs/flow/modin/engines/ray/pandas_on_ray/frame/partition_manager.rst b/docs/flow/modin/engines/ray/pandas_on_ray/frame/partition_manager.rst deleted file mode 100644 index 31a7bb961bf..00000000000 --- a/docs/flow/modin/engines/ray/pandas_on_ray/frame/partition_manager.rst +++ /dev/null @@ -1,12 +0,0 @@ -PandasOnRayFramePartitionManager -"""""""""""""""""""""""""""""""" - -This class is the specific implementation of :py:class:`~modin.engines.base.frame.partition_manager.PandasFramePartitionManager` -using Ray distributed engine. This class is responsible for partition manipulation and applying a funcion to -block/row/column partitions. - -Public API ----------- - -.. autoclass:: modin.engines.ray.pandas_on_ray.frame.partition_manager.PandasOnRayFramePartitionManager - :members: diff --git a/docs/flow/modin/experimental/backends/index.rst b/docs/flow/modin/experimental/backends/index.rst deleted file mode 100644 index 7450425cc1e..00000000000 --- a/docs/flow/modin/experimental/backends/index.rst +++ /dev/null @@ -1,15 +0,0 @@ -:orphan: - -Experimental backends -""""""""""""""""""""" - -``modin.experimental.backends`` holds experimental backends that are under development right now -and provides a limited set of functionality: - -* :doc:`omnisci ` - - -.. toctree:: - :hidden: - - omnisci/index diff --git a/docs/flow/modin/experimental/backends/omnisci/index.rst b/docs/flow/modin/experimental/backends/omnisci/index.rst deleted file mode 100644 index 567a13054fa..00000000000 --- a/docs/flow/modin/experimental/backends/omnisci/index.rst +++ /dev/null @@ -1,13 +0,0 @@ -OmniSci backend -""""""""""""""" - -.. toctree:: - :hidden: - - query_compiler - -High-Level Module Overview -'''''''''''''''''''''''''' - -This module contains :py:class:`~modin.experimental.backends.omnisci.query_compiler.DFAlgQueryCompiler` -class used for lazy DataFrame based engine. diff --git a/docs/flow/modin/experimental/backends/omnisci/query_compiler.rst b/docs/flow/modin/experimental/backends/omnisci/query_compiler.rst deleted file mode 100644 index 2dbf89870fc..00000000000 --- a/docs/flow/modin/experimental/backends/omnisci/query_compiler.rst +++ /dev/null @@ -1,13 +0,0 @@ -OmniSci Query Compiler -"""""""""""""""""""""" - -:py:class:`~modin.experimental.backends.omnisci.query_compiler.DFAlgQueryCompiler` implements -a query compiler for lazy frame. Each compiler instance holds an instance of -:py:class:`~modin.experimental.engines.omnisci_on_native.frame.data.OmnisciOnNativeFrame` -which is used to build a lazy execution tree. - -Public API -'''''''''' - -.. autoclass:: modin.experimental.backends.omnisci.query_compiler.DFAlgQueryCompiler - :members: diff --git a/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/calcite_algebra.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/calcite_algebra.rst new file mode 100644 index 00000000000..e9f4a7628da --- /dev/null +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/calcite_algebra.rst @@ -0,0 +1,98 @@ +CalciteBaseNode +""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteBaseNode + :members: + +CalciteScanNode +""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteScanNode + :members: + +CalciteProjectionNode +""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteProjectionNode + :members: + +CalciteFilterNode +""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteFilterNode + :members: + +CalciteAggregateNode +"""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteAggregateNode + :members: + +CalciteCollation +"""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteCollation + :members: + +CalciteSortNode +""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteSortNode + :members: + +CalciteJoinNode +""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteJoinNode + :members: + +CalciteUnionNode +"""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteUnionNode + :members: + +CalciteInputRefExpr +""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteInputRefExpr + :members: + +CalciteInputIdxExpr +""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteInputIdxExpr + :members: diff --git a/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/calcite_builder.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/calcite_builder.rst new file mode 100644 index 00000000000..518f9c46823 --- /dev/null +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/calcite_builder.rst @@ -0,0 +1,8 @@ +CalciteBuilder +"""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_builder.CalciteBuilder + :members: diff --git a/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/calcite_serializer.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/calcite_serializer.rst new file mode 100644 index 00000000000..bff4fa6cc63 --- /dev/null +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/calcite_serializer.rst @@ -0,0 +1,8 @@ +CalciteSerializer +""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_serializer.CalciteSerializer + :members: diff --git a/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/dataframe.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/dataframe.rst new file mode 100644 index 00000000000..da54c919521 --- /dev/null +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/dataframe.rst @@ -0,0 +1,8 @@ +OmnisciOnNativeDataframe +"""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.dataframe.dataframe.OmnisciOnNativeDataframe + :members: diff --git a/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/df_algebra.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/df_algebra.rst new file mode 100644 index 00000000000..c1eb34d26c4 --- /dev/null +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/df_algebra.rst @@ -0,0 +1,116 @@ +TransformMapper +""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.TransformMapper + :members: + +FrameMapper +""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.FrameMapper + :members: + +InputMapper +""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.InputMapper + :members: + +DFAlgNode +""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.DFAlgNode + :members: + +FrameNode +""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.FrameNode + :members: + +MaskNode +"""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.MaskNode + :members: + +GroupbyAggNode +"""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.GroupbyAggNode + :members: + +TransformNode +""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.TransformNode + :members: + +JoinNode +"""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.JoinNode + :members: + +UnionNode +""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.UnionNode + :members: + +SortNode +"""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.SortNode + :members: + +FilterNode +"""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.FilterNode + :members: + +Utilities +""""""""" + +Public API +---------- + +.. autofunction:: modin.experimental.core.execution.native.implementations.omnisci_on_native.translate_exprs_to_base +.. autofunction:: modin.experimental.core.execution.native.implementations.omnisci_on_native.replace_frame_in_exprs diff --git a/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/expr.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/expr.rst new file mode 100644 index 00000000000..8d2237ed8a3 --- /dev/null +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/expr.rst @@ -0,0 +1,55 @@ +BaseExpr +"""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.BaseExpr + :members: + +InputRefExpr +"""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.InputRefExpr + :members: + +LiteralExpr +""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.LiteralExpr + :members: + +OpExpr +"""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.OpExpr + :members: + +AggregateExpr +""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.AggregateExpr + :members: + +Utilities +""""""""" + +Public API +---------- + +.. autofunction:: modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.is_cmp_op +.. autofunction:: modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.build_row_idx_filter_expr +.. autofunction:: modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.build_if_then_else +.. autofunction:: modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.build_dt_expr diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/index.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/index.rst similarity index 68% rename from docs/flow/modin/experimental/engines/omnisci_on_native/frame/index.rst rename to docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/index.rst index 779d1d0c2eb..1575858ff84 100644 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/index.rst +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/index.rst @@ -1,18 +1,18 @@ :orphan: -OmnisciOnNative Frame Implementation -================================= +OmnisciOnNative Dataframe Implementation +======================================== -Modin implements ``Frame``, ``PartitionManager`` and ``Partition`` classes +Modin implements ``Dataframe``, ``PartitionManager`` and ``Partition`` classes specific for ``OmnisciOnNative`` backend: -* :doc:`Frame ` -* :doc:`Partition ` -* :doc:`PartitionManager ` +* :doc:`Dataframe ` +* :doc:`Partition ` +* :doc:`PartitionManager ` Overview of OmniSci embedded engine usage can be accessed in the related section: -* :doc:`OmniSci Engine ` +* :doc:`OmniSci Engine ` To support lazy execution Modin uses two types of trees. Operations on frames are described by ``DFAlgNode`` based trees. Scalar computations are described by ``BaseExpr`` based tree. @@ -33,11 +33,11 @@ class. .. toctree:: :hidden: - data - partition - axis_partition - partition_manager - ../omnisci_engine + dataframe + partitioning/partition + partitioning/axis_partition + partitioning/partition_manager + omnisci_engine df_algebra expr calcite_algebra diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/omnisci_engine.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/omnisci_engine.rst similarity index 74% rename from docs/flow/modin/experimental/engines/omnisci_on_native/omnisci_engine.rst rename to docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/omnisci_engine.rst index 2c1af605959..66c4af9d8a7 100644 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/omnisci_engine.rst +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/omnisci_engine.rst @@ -39,21 +39,21 @@ filter and aggregation can be executed in a single data scan. To utilize this feature and reduce data transformation and transfer overheads, we need to implement lazy operations on a dataframe. The dataframe with lazy computation is implemented in -:py:class:`~modin.experimental.engines.omnisci_on_native.frame.data.OmnisciOnNativeFrame` +:py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.dataframe.dataframe.OmnisciOnNativeDataframe` class. Lazy operations on a frame build a tree which is later translated into a query executed by OmniSci. We use two types of trees. The first one describes operations on frames that map to relational operations like projection, union, etc. Nodes in this tree are derived from -:py:class:`~modin.experimental.engines.omnisci_on_native.frame.df_algebra.DFAlgNode` +:py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.df_algebra.DFAlgNode` class. Some of the nodes (e.g. -:py:class:`~modin.experimental.engines.omnisci_on_native.frame.df_algebra.TransformNode` mapped to a projection) +:py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.df_algebra.TransformNode` mapped to a projection) need a description of how individual columns are computed. The second type of tree is used to describe operations on columns, including arithmetic operations, type casts, datetime operations, etc. Nodes of this tree are derived from -:py:class:`~modin.experimental.engines.omnisci_on_native.frame.expr.BaseExpr` class. +:py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.BaseExpr` class. Partitions ---------- @@ -61,7 +61,7 @@ Partitions Partitioning is used to achieve high parallelism. In the case of OmniSciDB based execution parallelism is provided by OmniSciDB execution engine and we don't need to manage multiple partitions. -:py:class:`~modin.experimental.engines.omnisci_on_native.frame.data.OmnisciOnNativeFrame` +:py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.dataframe.dataframe.OmnisciOnNativeDataframe` always has a single partition. A partition holds data in either ``pandas.DataFrame`` or ``pyarrow.Table`` @@ -102,17 +102,17 @@ optimizations. Operations used by Calcite in its intermediate representation are implemented in classes derived from -:py:class:`~modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteBaseNode`. -:py:class:`~modin.experimental.engines.omnisci_on_native.frame.calcite_builder.CalciteBuilder` is used to -translate :py:class:`~modin.experimental.engines.omnisci_on_native.frame.df_algebra.DFAlgNode`-based -trees into :py:class:`~modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteBaseNode`-based sequences. -It also translates :py:class:`~modin.experimental.engines.omnisci_on_native.frame.expr.BaseExpr`-based -trees by replacing :py:class:`~modin.experimental.engines.omnisci_on_native.frame.expr.InputRefExpr` -nodes with either :py:class:`~modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteInputRefExpr` -or :py:class:`~modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteInputIdxExpr` +:py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteBaseNode`. +:py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_builder.CalciteBuilder` is used to +translate :py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.df_algebra.DFAlgNode`-based +trees into :py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteBaseNode`-based sequences. +It also translates :py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.BaseExpr`-based +trees by replacing :py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.expr.InputRefExpr` +nodes with either :py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteInputRefExpr` +or :py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_algebra.CalciteInputIdxExpr` depending on context. -:py:class:`~modin.experimental.engines.omnisci_on_native.frame.calcite_serializer.CalciteSerializer` +:py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.calcite_serializer.CalciteSerializer` is used to serialize the resulting sequence into JSON format. This JSON becomes a query by simply adding 'execute relalg' or 'execute calcite' prefix (the latter is used if we want to use Calcite @@ -121,7 +121,7 @@ for additional query optimization). An execution result is a new Arrow table which is used to form a new partition. This partition is assigned to the executed frame. The frame's operation tree is replaced with -:py:class:`~modin.experimental.engines.omnisci_on_native.frame.df_algebra.FrameNode` operation. +:py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.df_algebra.FrameNode` operation. Column name mangling '''''''''''''''''''' diff --git a/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/omnisci_worker.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/omnisci_worker.rst new file mode 100644 index 00000000000..4ec19f268b3 --- /dev/null +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/omnisci_worker.rst @@ -0,0 +1,8 @@ +OmnisciServer +""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.omnisci_worker.OmnisciServer + :members: diff --git a/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/partitioning/partition.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/partitioning/partition.rst new file mode 100644 index 00000000000..4858be45da5 --- /dev/null +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/partitioning/partition.rst @@ -0,0 +1,8 @@ +OmnisciOnNativeDataframePartition +""""""""""""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.partitioning.partition.OmnisciOnNativeDataframePartition + :members: diff --git a/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/partitioning/partition_manager.rst b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/partitioning/partition_manager.rst new file mode 100644 index 00000000000..6bb07dfa58b --- /dev/null +++ b/docs/flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/partitioning/partition_manager.rst @@ -0,0 +1,8 @@ +OmnisciOnNativeDataframePartitionManager +"""""""""""""""""""""""""""""""""""""""" + +Public API +---------- + +.. autoclass:: modin.experimental.core.execution.native.implementations.omnisci_on_native.partitioning.partition_manager.OmnisciOnNativeDataframePartitionManager + :members: diff --git a/docs/flow/modin/experimental/engines/pandas_on_ray.rst b/docs/flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray.rst similarity index 60% rename from docs/flow/modin/experimental/engines/pandas_on_ray.rst rename to docs/flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray.rst index 13f09c0a001..e4aad3c636d 100644 --- a/docs/flow/modin/experimental/engines/pandas_on_ray.rst +++ b/docs/flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray.rst @@ -6,8 +6,8 @@ Pandas-on-Ray Module Description High-Level Module Overview '''''''''''''''''''''''''' -This module houses experimental functionality with pandas backend and Ray -engine. This functionality is concentrated in the :py:class:`~modin.experimental.engines.pandas_on_ray.io_exp.ExperimentalPandasOnRayIO` +This module houses experimental functionality with pandas storage format and Ray +engine. This functionality is concentrated in the :py:class:`~modin.experimental.core.execution.ray.implementations.pandas_on_ray.io.io.ExperimentalPandasOnRayIO` class, that contains methods, which extend typical pandas API to give user more flexibility with IO operations. @@ -25,9 +25,9 @@ statement as follows: Implemented Operations '''''''''''''''''''''' -For now :py:class:`~modin.experimental.engines.pandas_on_ray.io_exp.ExperimentalPandasOnRayIO` -implements two methods - :meth:`~modin.experimental.engines.pandas_on_ray.io_exp.ExperimentalPandasOnRayIO.read_sql` and -:meth:`~modin.experimental.engines.pandas_on_ray.io_exp.ExperimentalPandasOnRayIO.read_csv_glob`. +For now :py:class:`~modin.experimental.core.execution.ray.implementations.pandas_on_ray.io.io.ExperimentalPandasOnRayIO` +implements two methods - :meth:`~modin.experimental.core.execution.ray.implementations.pandas_on_ray.io.io.ExperimentalPandasOnRayIO.read_sql` and +:meth:`~modin.experimental.core.execution.ray.implementations.pandas_on_ray.io.io.ExperimentalPandasOnRayIO.read_csv_glob`. The first method allows the user to use typical ``pandas.read_sql`` function extended with `Spark-like parameters `_ such as ``partition_column``, ``lower_bound`` and ``upper_bound``. With these @@ -39,10 +39,10 @@ provided as a parameter. Submodules Description '''''''''''''''''''''' -``modin.experimental.engines.pandas_on_ray`` module is used mostly for storing utils and +``modin.experimental.core.execution.ray.implementations.pandas_on_ray`` module is used mostly for storing utils and functions for experimanetal IO class: -* ``io_exp.py`` - submodule containing IO class and parse functions, which are responsible +* ``io.py`` - submodule containing IO class and parse functions, which are responsible for data processing on the workers. * ``sql.py`` - submodule with util functions for experimental ``read_sql`` function. @@ -50,5 +50,5 @@ functions for experimanetal IO class: Public API '''''''''' -.. autoclass:: modin.experimental.engines.pandas_on_ray.io_exp.ExperimentalPandasOnRayIO +.. autoclass:: modin.experimental.core.execution.ray.implementations.pandas_on_ray.io.io.ExperimentalPandasOnRayIO :members: diff --git a/docs/flow/modin/experimental/engines/pyarrow_on_ray.rst b/docs/flow/modin/experimental/core/execution/ray/implementations/pyarrow_on_ray.rst similarity index 94% rename from docs/flow/modin/experimental/engines/pyarrow_on_ray.rst rename to docs/flow/modin/experimental/core/execution/ray/implementations/pyarrow_on_ray.rst index 3af570ce0cc..7afcce80357 100644 --- a/docs/flow/modin/experimental/engines/pyarrow_on_ray.rst +++ b/docs/flow/modin/experimental/core/execution/ray/implementations/pyarrow_on_ray.rst @@ -6,7 +6,7 @@ PyArrow-on-Ray Module Description High-Level Module Overview '''''''''''''''''''''''''' -This module houses experimental functionality with PyArrow backend and Ray +This module houses experimental functionality with PyArrow storage format and Ray engine. The biggest difference from core engines is that internally each partition is represented as ``pyarrow.Table`` put in the ``Ray`` Plasma store. diff --git a/docs/flow/modin/experimental/core/storage_formats/index.rst b/docs/flow/modin/experimental/core/storage_formats/index.rst new file mode 100644 index 00000000000..f9c06e94b05 --- /dev/null +++ b/docs/flow/modin/experimental/core/storage_formats/index.rst @@ -0,0 +1,15 @@ +:orphan: + +Experimental storage formats +"""""""""""""""""""""""""""" + +``modin.experimental.storage_formats`` holds experimental storage formats that are under development right now +and provides a limited set of functionality: + +* :doc:`omnisci ` + + +.. toctree:: + :hidden: + + omnisci/index diff --git a/docs/flow/modin/experimental/core/storage_formats/omnisci/index.rst b/docs/flow/modin/experimental/core/storage_formats/omnisci/index.rst new file mode 100644 index 00000000000..48f189e8bee --- /dev/null +++ b/docs/flow/modin/experimental/core/storage_formats/omnisci/index.rst @@ -0,0 +1,15 @@ +OmniSci storage format +"""""""""""""""""""""" + +.. toctree:: + :hidden: + + query_compiler + +High-Level Module Overview +'''''''''''''''''''''''''' + +This module contains :py:class:`~modin.experimental.core.storage_formats.omnisci.query_compiler.DFAlgQueryCompiler` +class used for lazy DataFrame based execution implementations. + +For more information about the specific of this format please visit the :doc:`implementation page `. diff --git a/docs/flow/modin/experimental/core/storage_formats/omnisci/query_compiler.rst b/docs/flow/modin/experimental/core/storage_formats/omnisci/query_compiler.rst new file mode 100644 index 00000000000..31bf4c18a9b --- /dev/null +++ b/docs/flow/modin/experimental/core/storage_formats/omnisci/query_compiler.rst @@ -0,0 +1,13 @@ +OmniSci Query Compiler +"""""""""""""""""""""" + +:py:class:`~modin.experimental.core.storage_formats.omnisci.query_compiler.DFAlgQueryCompiler` implements +a query compiler for lazy frame. Each compiler instance holds an instance of +:py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.dataframe.dataframe.OmnisciOnNativeDataframe` +which is used to build a lazy execution tree. + +Public API +'''''''''' + +.. autoclass:: modin.experimental.core.storage_formats.omnisci.query_compiler.DFAlgQueryCompiler + :members: diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/calcite_algebra.rst b/docs/flow/modin/experimental/engines/omnisci_on_native/frame/calcite_algebra.rst deleted file mode 100644 index a05d6431d00..00000000000 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/calcite_algebra.rst +++ /dev/null @@ -1,98 +0,0 @@ -CalciteBaseNode -""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteBaseNode - :members: - -CalciteScanNode -""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteScanNode - :members: - -CalciteProjectionNode -""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteProjectionNode - :members: - -CalciteFilterNode -""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteFilterNode - :members: - -CalciteAggregateNode -"""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteAggregateNode - :members: - -CalciteCollation -"""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteCollation - :members: - -CalciteSortNode -""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteSortNode - :members: - -CalciteJoinNode -""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteJoinNode - :members: - -CalciteUnionNode -"""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteUnionNode - :members: - -CalciteInputRefExpr -""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteInputRefExpr - :members: - -CalciteInputIdxExpr -""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_algebra.CalciteInputIdxExpr - :members: diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/calcite_builder.rst b/docs/flow/modin/experimental/engines/omnisci_on_native/frame/calcite_builder.rst deleted file mode 100644 index 09acaa6b4be..00000000000 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/calcite_builder.rst +++ /dev/null @@ -1,8 +0,0 @@ -CalciteBuilder -"""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_builder.CalciteBuilder - :members: diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/calcite_serializer.rst b/docs/flow/modin/experimental/engines/omnisci_on_native/frame/calcite_serializer.rst deleted file mode 100644 index 9cc380944c2..00000000000 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/calcite_serializer.rst +++ /dev/null @@ -1,8 +0,0 @@ -CalciteSerializer -""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.calcite_serializer.CalciteSerializer - :members: diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/data.rst b/docs/flow/modin/experimental/engines/omnisci_on_native/frame/data.rst deleted file mode 100644 index a0745d43fc1..00000000000 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/data.rst +++ /dev/null @@ -1,8 +0,0 @@ -OmnisciOnNativeFrame -""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.data.OmnisciOnNativeFrame - :members: diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/df_algebra.rst b/docs/flow/modin/experimental/engines/omnisci_on_native/frame/df_algebra.rst deleted file mode 100644 index 2203b0f8ed6..00000000000 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/df_algebra.rst +++ /dev/null @@ -1,116 +0,0 @@ -TransformMapper -""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.TransformMapper - :members: - -FrameMapper -""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.FrameMapper - :members: - -InputMapper -""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.InputMapper - :members: - -DFAlgNode -""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.DFAlgNode - :members: - -FrameNode -""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.FrameNode - :members: - -MaskNode -"""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.MaskNode - :members: - -GroupbyAggNode -"""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.GroupbyAggNode - :members: - -TransformNode -""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.TransformNode - :members: - -JoinNode -"""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.JoinNode - :members: - -UnionNode -""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.UnionNode - :members: - -SortNode -"""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.SortNode - :members: - -FilterNode -"""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.FilterNode - :members: - -Utilities -""""""""" - -Public API ----------- - -.. autofunction:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.translate_exprs_to_base -.. autofunction:: modin.experimental.engines.omnisci_on_native.frame.df_algebra.replace_frame_in_exprs diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/expr.rst b/docs/flow/modin/experimental/engines/omnisci_on_native/frame/expr.rst deleted file mode 100644 index 4c5c33311cb..00000000000 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/expr.rst +++ /dev/null @@ -1,55 +0,0 @@ -BaseExpr -"""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.expr.BaseExpr - :members: - -InputRefExpr -"""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.expr.InputRefExpr - :members: - -LiteralExpr -""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.expr.LiteralExpr - :members: - -OpExpr -"""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.expr.OpExpr - :members: - -AggregateExpr -""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.expr.AggregateExpr - :members: - -Utilities -""""""""" - -Public API ----------- - -.. autofunction:: modin.experimental.engines.omnisci_on_native.frame.expr.is_cmp_op -.. autofunction:: modin.experimental.engines.omnisci_on_native.frame.expr.build_row_idx_filter_expr -.. autofunction:: modin.experimental.engines.omnisci_on_native.frame.expr.build_if_then_else -.. autofunction:: modin.experimental.engines.omnisci_on_native.frame.expr.build_dt_expr diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/omnisci_worker.rst b/docs/flow/modin/experimental/engines/omnisci_on_native/frame/omnisci_worker.rst deleted file mode 100644 index dc616c4acb6..00000000000 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/omnisci_worker.rst +++ /dev/null @@ -1,8 +0,0 @@ -OmnisciServer -""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.omnisci_worker.OmnisciServer - :members: diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/partition.rst b/docs/flow/modin/experimental/engines/omnisci_on_native/frame/partition.rst deleted file mode 100644 index 74e22adef90..00000000000 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/partition.rst +++ /dev/null @@ -1,8 +0,0 @@ -OmnisciOnNativeFramePartition -"""""""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.partition.OmnisciOnNativeFramePartition - :members: diff --git a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/partition_manager.rst b/docs/flow/modin/experimental/engines/omnisci_on_native/frame/partition_manager.rst deleted file mode 100644 index 104d9c61bf9..00000000000 --- a/docs/flow/modin/experimental/engines/omnisci_on_native/frame/partition_manager.rst +++ /dev/null @@ -1,8 +0,0 @@ -OmnisciOnNativeFramePartitionManager -""""""""""""""""""""""""""""""""" - -Public API ----------- - -.. autoclass:: modin.experimental.engines.omnisci_on_native.frame.partition_manager.OmnisciOnNativeFramePartitionManager - :members: diff --git a/docs/flow/modin/pandas/dataframe.rst b/docs/flow/modin/pandas/dataframe.rst index 97413564e85..a153e80d06b 100644 --- a/docs/flow/modin/pandas/dataframe.rst +++ b/docs/flow/modin/pandas/dataframe.rst @@ -26,7 +26,7 @@ Usage Guide The most efficient way to create Modin ``DataFrame`` is to import data from external storage using the highly efficient Modin IO methods (for example using ``pd.read_csv``, -see details for Modin IO methods in the :doc:`separate section `), +see details for Modin IO methods in the :doc:`separate section `), but even if the data does not originate from a file, any pandas supported data type or ``pandas.DataFrame`` can be used. Internally, the ``DataFrame`` data is divided into partitions, which number along an axis usually corresponds to the number of the user's hardware CPUs. If needed, @@ -85,10 +85,10 @@ Let's consider simple example of creation and interacting with Modin ``DataFrame # List of DataFrame partitions - [[] - [] - [] - []] + [[] + [] + [] + []] # The first DataFrame partition diff --git a/docs/flow/modin/pandas/series.rst b/docs/flow/modin/pandas/series.rst index 38a325705d5..a5312ef217c 100644 --- a/docs/flow/modin/pandas/series.rst +++ b/docs/flow/modin/pandas/series.rst @@ -26,7 +26,7 @@ Usage Guide The most efficient way to create Modin ``Series`` is to import data from external storage using the highly efficient Modin IO methods (for example using ``pd.read_csv``, -see details for Modin IO methods in the :doc:`separate section `), +see details for Modin IO methods in the :doc:`separate section `), but even if the data does not originate from a file, any pandas supported data type or ``pandas.Series`` can be used. Internally, the ``Series`` data is divided into partitions, which number along an axis usually corresponds to the number of the user's hardware CPUs. If needed, @@ -82,10 +82,10 @@ Let's consider simple example of creation and interacting with Modin ``Series``: # List of `Series` partitions - [[] - [] - [] - []] + [[] + [] + [] + []] # The first `Series` partition diff --git a/docs/index.rst b/docs/index.rst index 9d4a2657f0b..2dfab498384 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -167,7 +167,7 @@ and lots of other useful information.** developer/partition_api .. toctree:: - :caption: Engines, Backends, and APIs + :caption: Engines, Storage formats, and APIs UsingPandasonRay/index UsingPandasonDask/index diff --git a/docs/installation.rst b/docs/installation.rst index 681f871765d..f0a0699e3f9 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -53,7 +53,7 @@ or for different functionalities of Modin. Here is a list of dependency sets for .. code-block:: bash - pip install "modin[dask]" # If you want to use the Dask backend + pip install "modin[dask]" # If you want to use the Dask execution engine Installing with conda ---------------------