diff --git a/RELEASE.md b/RELEASE.md index 6c0c894750..ea0fce323a 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -29,6 +29,12 @@ ## Breaking changes to the API ## Upcoming deprecations for Kedro 0.19.0 +* Renamed abstract dataset classes, in accordance with the [Kedro lexicon](https://github.com/kedro-org/kedro/wiki/Kedro-documentation-style-guide#kedro-lexicon). Dataset classes ending with "DataSet" are deprecated and will be removed in 0.19.0. Note that all of the below classes are also importable from `kedro.io`; only the module where they are defined is listed as the location. + +| Type | Deprecated Alias | Location | +| -------------------------- | -------------------------- | --------------- | +| `AbstractDataset` | `AbstractDataSet` | `kedro.io.core` | +| `AbstractVersionedDataset` | `AbstractVersionedDataSet` | `kedro.io.core` | # Release 0.18.12 diff --git a/docs/source/data/data_catalog.md b/docs/source/data/data_catalog.md index d7c73e4fdf..fb1f7ac3dc 100644 --- a/docs/source/data/data_catalog.md +++ b/docs/source/data/data_catalog.md @@ -783,7 +783,7 @@ gear = cars["gear"].values The following steps happened behind the scenes when `load` was called: - The value `cars` was located in the Data Catalog -- The corresponding `AbstractDataSet` object was retrieved +- The corresponding `AbstractDataset` object was retrieved - The `load` method of this dataset was called - This `load` method delegated the loading to the underlying pandas `read_csv` function diff --git a/docs/source/data/kedro_io.md b/docs/source/data/kedro_io.md index 6fdfefdd66..a38ea97fcb 100644 --- a/docs/source/data/kedro_io.md +++ b/docs/source/data/kedro_io.md @@ -1,7 +1,7 @@ # Kedro IO -In this tutorial, we cover advanced uses of [the Kedro IO module](/kedro.io) to understand the underlying implementation. The relevant API documentation is [kedro.io.AbstractDataSet](/kedro.io.AbstractDataSet) and [kedro.io.DataSetError](/kedro.io.DataSetError). +In this tutorial, we cover advanced uses of [the Kedro IO module](/kedro.io) to understand the underlying implementation. The relevant API documentation is [kedro.io.AbstractDataset](/kedro.io.AbstractDataset) and [kedro.io.DataSetError](/kedro.io.DataSetError). ## Error handling @@ -21,9 +21,9 @@ except DataSetError: ``` -## AbstractDataSet +## AbstractDataset -To understand what is going on behind the scenes, you should study the [AbstractDataSet interface](/kedro.io.AbstractDataSet). `AbstractDataSet` is the underlying interface that all datasets extend. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataSet` implementation. +To understand what is going on behind the scenes, you should study the [AbstractDataset interface](/kedro.io.AbstractDataset). `AbstractDataset` is the underlying interface that all datasets extend. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataset` implementation. If you have a dataset called `parts`, you can make direct calls to it like so: @@ -33,13 +33,13 @@ parts_df = parts.load() We recommend using a `DataCatalog` instead (for more details, see [the `DataCatalog` documentation](../data/data_catalog.md)) as it has been designed to make all datasets available to project members. -For contributors, if you would like to submit a new dataset, you must extend the `AbstractDataSet`. For a complete guide, please read [the section on custom datasets](../extend_kedro/custom_datasets.md). +For contributors, if you would like to submit a new dataset, you must extend the `AbstractDataset`. For a complete guide, please read [the section on custom datasets](../extend_kedro/custom_datasets.md). ## Versioning In order to enable versioning, you need to update the `catalog.yml` config file and set the `versioned` attribute to `true` for the given dataset. If this is a custom dataset, the implementation must also: - 1. extend `kedro.io.core.AbstractVersionedDataSet` AND + 1. extend `kedro.io.core.AbstractVersionedDataset` AND 2. add `version` namedtuple as an argument to its `__init__` method AND 3. call `super().__init__()` with positional arguments `filepath`, `version`, and, optionally, with `glob` and `exists` functions if it uses a non-local filesystem (see [kedro_datasets.pandas.CSVDataSet](/kedro_datasets.pandas.CSVDataSet) as an example) AND 4. modify its `_describe`, `_load` and `_save` methods respectively to support versioning (see [`kedro_datasets.pandas.CSVDataSet`](/kedro_datasets.pandas.CSVDataSet) for an example implementation) @@ -55,10 +55,10 @@ from pathlib import Path, PurePosixPath import pandas as pd -from kedro.io import AbstractVersionedDataSet +from kedro.io import AbstractVersionedDataset -class MyOwnDataSet(AbstractVersionedDataSet): +class MyOwnDataSet(AbstractVersionedDataset): def __init__(self, filepath, version, param1, param2=True): super().__init__(PurePosixPath(filepath), version) self._param1 = param1 @@ -314,7 +314,7 @@ Here is an exhaustive list of the arguments supported by `PartitionedDataSet`: | Argument | Required | Supported types | Description | | ----------------- | ------------------------------ | ------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `path` | Yes | `str` | Path to the folder containing partitioned data. If path starts with the protocol (e.g., `s3://`) then the corresponding `fsspec` concrete filesystem implementation will be used. If protocol is not specified, local filesystem will be used | -| `dataset` | Yes | `str`, `Type[AbstractDataSet]`, `Dict[str, Any]` | Underlying dataset definition, for more details see the section below | +| `dataset` | Yes | `str`, `Type[AbstractDataset]`, `Dict[str, Any]` | Underlying dataset definition, for more details see the section below | | `credentials` | No | `Dict[str, Any]` | Protocol-specific options that will be passed to `fsspec.filesystemcall`, for more details see the section below | | `load_args` | No | `Dict[str, Any]` | Keyword arguments to be passed into `find()` method of the corresponding filesystem implementation | | `filepath_arg` | No | `str` (defaults to `filepath`) | Argument name of the underlying dataset initializer that will contain a path to an individual partition | @@ -326,7 +326,7 @@ Dataset definition should be passed into the `dataset` argument of the `Partitio ##### Shorthand notation -Requires you only to specify a class of the underlying dataset either as a string (e.g. `pandas.CSVDataSet` or a fully qualified class path like `kedro_datasets.pandas.CSVDataSet`) or as a class object that is a subclass of the [AbstractDataSet](/kedro.io.AbstractDataSet). +Requires you only to specify a class of the underlying dataset either as a string (e.g. `pandas.CSVDataSet` or a fully qualified class path like `kedro_datasets.pandas.CSVDataSet`) or as a class object that is a subclass of the [AbstractDataset](/kedro.io.AbstractDataset). ##### Full notation diff --git a/docs/source/deployment/dask.md b/docs/source/deployment/dask.md index 9c5734d744..a03b0fd24b 100644 --- a/docs/source/deployment/dask.md +++ b/docs/source/deployment/dask.md @@ -44,14 +44,14 @@ from kedro.framework.hooks.manager import ( _register_hooks_setuptools, ) from kedro.framework.project import settings -from kedro.io import AbstractDataSet, DataCatalog +from kedro.io import AbstractDataset, DataCatalog from kedro.pipeline import Pipeline from kedro.pipeline.node import Node from kedro.runner import AbstractRunner, run_node from pluggy import PluginManager -class _DaskDataSet(AbstractDataSet): +class _DaskDataSet(AbstractDataset): """``_DaskDataSet`` publishes/gets named datasets to/from the Dask scheduler.""" diff --git a/docs/source/extend_kedro/custom_datasets.md b/docs/source/extend_kedro/custom_datasets.md index 9e4b0713eb..c0aad914da 100644 --- a/docs/source/extend_kedro/custom_datasets.md +++ b/docs/source/extend_kedro/custom_datasets.md @@ -24,13 +24,13 @@ Consult the [Pillow documentation](https://pillow.readthedocs.io/en/stable/insta ## The anatomy of a dataset -At the minimum, a valid Kedro dataset needs to subclass the base [AbstractDataSet](/kedro.io.AbstractDataSet) and provide an implementation for the following abstract methods: +At the minimum, a valid Kedro dataset needs to subclass the base [AbstractDataset](/kedro.io.AbstractDataset) and provide an implementation for the following abstract methods: * `_load` * `_save` * `_describe` -`AbstractDataSet` is generically typed with an input data type for saving data, and an output data type for loading data. +`AbstractDataset` is generically typed with an input data type for saving data, and an output data type for loading data. This typing is optional however, and defaults to `Any` type. Here is an example skeleton for `ImageDataSet`: @@ -43,10 +43,10 @@ from typing import Any, Dict import numpy as np -from kedro.io import AbstractDataSet +from kedro.io import AbstractDataset -class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]): +class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]): """``ImageDataSet`` loads / save image data from a given filepath as `numpy` array using Pillow. Example: @@ -108,11 +108,11 @@ import fsspec import numpy as np from PIL import Image -from kedro.io import AbstractDataSet +from kedro.io import AbstractDataset from kedro.io.core import get_filepath_str, get_protocol_and_path -class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]): +class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]): def __init__(self, filepath: str): """Creates a new instance of ImageDataSet to load / save image data for given filepath. @@ -169,7 +169,7 @@ Similarly, we can implement the `_save` method as follows: ```python -class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]): +class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]): def _save(self, data: np.ndarray) -> None: """Saves image data to the specified filepath.""" # using get_filepath_str ensures that the protocol and path are appended correctly for different filesystems @@ -193,7 +193,7 @@ You can open the file to verify that the data was written back correctly. The `_describe` method is used for printing purposes. The convention in Kedro is for the method to return a dictionary describing the attributes of the dataset. ```python -class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]): +class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]): def _describe(self) -> Dict[str, Any]: """Returns a dict that describes the attributes of the dataset.""" return dict(filepath=self._filepath, protocol=self._protocol) @@ -214,11 +214,11 @@ import fsspec import numpy as np from PIL import Image -from kedro.io import AbstractDataSet +from kedro.io import AbstractDataset from kedro.io.core import get_filepath_str, get_protocol_and_path -class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]): +class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]): """``ImageDataSet`` loads / save image data from a given filepath as `numpy` array using Pillow. Example: @@ -301,7 +301,7 @@ $ ls -la data/01_raw/pokemon-images-and-types/images/images/*.png | wc -l Versioning doesn't work with `PartitionedDataSet`. You can't use both of them at the same time. ``` To add [Versioning](../data/kedro_io.md#versioning) support to the new dataset we need to extend the - [AbstractVersionedDataSet](/kedro.io.AbstractVersionedDataSet) to: + [AbstractVersionedDataset](/kedro.io.AbstractVersionedDataset) to: * Accept a `version` keyword argument as part of the constructor * Adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively @@ -320,11 +320,11 @@ import fsspec import numpy as np from PIL import Image -from kedro.io import AbstractVersionedDataSet +from kedro.io import AbstractVersionedDataset from kedro.io.core import get_filepath_str, get_protocol_and_path, Version -class ImageDataSet(AbstractVersionedDataSet[np.ndarray, np.ndarray]): +class ImageDataSet(AbstractVersionedDataset[np.ndarray, np.ndarray]): """``ImageDataSet`` loads / save image data from a given filepath as `numpy` array using Pillow. Example: @@ -391,14 +391,14 @@ The difference between the original `ImageDataSet` and the versioned `ImageDataS import numpy as np from PIL import Image --from kedro.io import AbstractDataSet +-from kedro.io import AbstractDataset -from kedro.io.core import get_filepath_str, get_protocol_and_path -+from kedro.io import AbstractVersionedDataSet ++from kedro.io import AbstractVersionedDataset +from kedro.io.core import get_filepath_str, get_protocol_and_path, Version --class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]): -+class ImageDataSet(AbstractVersionedDataSet[np.ndarray, np.ndarray]): +-class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]): ++class ImageDataSet(AbstractVersionedDataset[np.ndarray, np.ndarray]): """``ImageDataSet`` loads / save image data from a given filepath as `numpy` array using Pillow. Example: @@ -537,7 +537,7 @@ These parameters are then passed to the dataset constructor so you can use them import fsspec -class ImageDataSet(AbstractVersionedDataSet): +class ImageDataSet(AbstractVersionedDataset): def __init__( self, filepath: str, diff --git a/docs/source/extend_kedro/plugins.md b/docs/source/extend_kedro/plugins.md index c7a0b10979..61b82fcfbc 100644 --- a/docs/source/extend_kedro/plugins.md +++ b/docs/source/extend_kedro/plugins.md @@ -196,7 +196,7 @@ When you are ready to submit your code: ## Supported Kedro plugins - [Kedro-Datasets](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets), a collection of all of Kedro's data connectors. These data -connectors are implementations of the `AbstractDataSet` +connectors are implementations of the `AbstractDataset` - [Kedro-Docker](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-docker), a tool for packaging and shipping Kedro projects within containers - [Kedro-Airflow](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-airflow), a tool for converting your Kedro project into an Airflow project - [Kedro-Viz](https://github.com/kedro-org/kedro-viz), a tool for visualising your Kedro pipelines diff --git a/docs/source/kedro.io.rst b/docs/source/kedro.io.rst index f86bb0558d..56c6a7d6d5 100644 --- a/docs/source/kedro.io.rst +++ b/docs/source/kedro.io.rst @@ -11,8 +11,8 @@ kedro.io :toctree: :template: autosummary/class.rst - kedro.io.AbstractDataSet - kedro.io.AbstractVersionedDataSet + kedro.io.AbstractDataset + kedro.io.AbstractVersionedDataset kedro.io.CachedDataSet kedro.io.CachedDataset kedro.io.DataCatalog diff --git a/docs/source/nodes_and_pipelines/nodes.md b/docs/source/nodes_and_pipelines/nodes.md index 7a22b8765e..a41f147244 100644 --- a/docs/source/nodes_and_pipelines/nodes.md +++ b/docs/source/nodes_and_pipelines/nodes.md @@ -287,7 +287,7 @@ def report_accuracy(y_pred: pd.Series, y_test: pd.Series): -The `ChunkWiseDataset` is a variant of the `pandas.CSVDataset` where the main change is to the `_save` method that appends data instead of overwriting it. You need to create a file `src//chunkwise.py` and put this class inside it. Below is an example of the `ChunkWiseCSVDataset` implementation: +The `ChunkWiseCSVDataset` is a variant of the `pandas.CSVDataSet` where the main change is to the `_save` method that appends data instead of overwriting it. You need to create a file `src//chunkwise.py` and put this class inside it. Below is an example of the `ChunkWiseCSVDataset` implementation: ```python import pandas as pd @@ -295,10 +295,10 @@ import pandas as pd from kedro.io.core import ( get_filepath_str, ) -from kedro.extras.datasets.pandas import CSVDataset +from kedro.extras.datasets.pandas import CSVDataSet -class ChunkWiseCSVDataset(CSVDataset): +class ChunkWiseCSVDataset(CSVDataSet): """``ChunkWiseCSVDataset`` loads/saves data from/to a CSV file using an underlying filesystem. It uses pandas to handle the CSV file. """ @@ -319,20 +319,20 @@ After that, you need to update the `catalog.yml` to use this new dataset. ```diff + y_pred: -+ type: .chunkwise.ChunkWiseCSVDataSet ++ type: .chunkwise.ChunkWiseCSVDataset + filepath: data/07_model_output/y_pred.csv ``` -With these changes, when you run `kedro run` in your terminal, you should see `y_pred`` being saved multiple times in the logs as the generator lazily processes and saves the data in smaller chunks. +With these changes, when you run `kedro run` in your terminal, you should see `y_pred` being saved multiple times in the logs as the generator lazily processes and saves the data in smaller chunks. ``` ... INFO Loading data from 'y_train' (MemoryDataset)... data_catalog.py:475 INFO Running node: make_predictions: make_predictions([X_train,X_test,y_train]) -> [y_pred] node.py:331 - INFO Saving data to 'y_pred' (ChunkWiseCSVDataSet)... data_catalog.py:514 - INFO Saving data to 'y_pred' (ChunkWiseCSVDataSet)... data_catalog.py:514 - INFO Saving data to 'y_pred' (ChunkWiseCSVDataSet)... data_catalog.py:514 + INFO Saving data to 'y_pred' (ChunkWiseCSVDataset)... data_catalog.py:514 + INFO Saving data to 'y_pred' (ChunkWiseCSVDataset)... data_catalog.py:514 + INFO Saving data to 'y_pred' (ChunkWiseCSVDataset)... data_catalog.py:514 INFO Completed 2 out of 3 tasks sequential_runner.py:85 - INFO Loading data from 'y_pred' (ChunkWiseCSVDataSet)... data_catalog.py:475 + INFO Loading data from 'y_pred' (ChunkWiseCSVDataset)... data_catalog.py:475 ... runner.py:105 ``` diff --git a/docs/source/nodes_and_pipelines/run_a_pipeline.md b/docs/source/nodes_and_pipelines/run_a_pipeline.md index 417510fe8e..2d7af412ad 100644 --- a/docs/source/nodes_and_pipelines/run_a_pipeline.md +++ b/docs/source/nodes_and_pipelines/run_a_pipeline.md @@ -57,7 +57,7 @@ If the built-in Kedro runners do not meet your requirements, you can also define ```python # in src//runner.py -from kedro.io import AbstractDataSet, DataCatalog, MemoryDataSet +from kedro.io import AbstractDataset, DataCatalog, MemoryDataSet from kedro.pipeline import Pipeline from kedro.runner.runner import AbstractRunner from pluggy import PluginManager @@ -69,13 +69,13 @@ class DryRunner(AbstractRunner): neccessary data exists. """ - def create_default_data_set(self, ds_name: str) -> AbstractDataSet: + def create_default_data_set(self, ds_name: str) -> AbstractDataset: """Factory method for creating the default data set for the runner. Args: ds_name: Name of the missing data set Returns: - An instance of an implementation of AbstractDataSet to be used + An instance of an implementation of AbstractDataset to be used for all unregistered data sets. """ diff --git a/kedro/extras/datasets/README.md b/kedro/extras/datasets/README.md index 3058ac4ab2..bd93acd6be 100644 --- a/kedro/extras/datasets/README.md +++ b/kedro/extras/datasets/README.md @@ -4,9 +4,9 @@ > `kedro.extras.datasets` is deprecated and will be removed in Kedro 0.19, > install `kedro-datasets` instead by running `pip install kedro-datasets`. -Welcome to `kedro.extras.datasets`, the home of Kedro's data connectors. Here you will find `AbstractDataSet` implementations created by QuantumBlack and external contributors. +Welcome to `kedro.extras.datasets`, the home of Kedro's data connectors. Here you will find `AbstractDataset` implementations created by QuantumBlack and external contributors. -## What `AbstractDataSet` implementations are supported? +## What `AbstractDataset` implementations are supported? We support a range of data descriptions, including CSV, Excel, Parquet, Feather, HDF5, JSON, Pickle, SQL Tables, SQL Queries, Spark DataFrames and more. We even allow support for working with images. @@ -16,7 +16,7 @@ These data descriptions are supported with the APIs of `pandas`, `spark`, `netwo Here is a full list of [supported data descriptions and APIs](https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.html). -## How can I create my own `AbstractDataSet` implementation? +## How can I create my own `AbstractDataset` implementation? -Take a look at our [instructions on how to create your own `AbstractDataSet` implementation](https://kedro.readthedocs.io/en/stable/extend_kedro/custom_datasets.html). +Take a look at our [instructions on how to create your own `AbstractDataset` implementation](https://kedro.readthedocs.io/en/stable/extend_kedro/custom_datasets.html). diff --git a/kedro/extras/datasets/__init__.py b/kedro/extras/datasets/__init__.py index 5397e3da98..3eec3e3fe1 100644 --- a/kedro/extras/datasets/__init__.py +++ b/kedro/extras/datasets/__init__.py @@ -1,5 +1,5 @@ """``kedro.extras.datasets`` is where you can find all of Kedro's data connectors. -These data connectors are implementations of the ``AbstractDataSet``. +These data connectors are implementations of the ``AbstractDataset``. .. warning:: diff --git a/kedro/extras/datasets/api/api_dataset.py b/kedro/extras/datasets/api/api_dataset.py index cdfaa93c83..f288c96814 100644 --- a/kedro/extras/datasets/api/api_dataset.py +++ b/kedro/extras/datasets/api/api_dataset.py @@ -6,14 +6,14 @@ import requests from requests.auth import AuthBase -from kedro.io.core import AbstractDataSet, DatasetError +from kedro.io.core import AbstractDataset, DatasetError # NOTE: kedro.extras.datasets will be removed in Kedro 0.19.0. # Any contribution to datasets should be made in kedro-datasets # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class APIDataSet(AbstractDataSet[None, requests.Response]): +class APIDataSet(AbstractDataset[None, requests.Response]): """``APIDataSet`` loads the data from HTTP(S) APIs. It uses the python requests library: https://requests.readthedocs.io/en/latest/ diff --git a/kedro/extras/datasets/biosequence/__init__.py b/kedro/extras/datasets/biosequence/__init__.py index 9f2f1a2a2e..d806e3ca33 100644 --- a/kedro/extras/datasets/biosequence/__init__.py +++ b/kedro/extras/datasets/biosequence/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to read/write from/to a sequence file.""" +"""``AbstractDataset`` implementation to read/write from/to a sequence file.""" __all__ = ["BioSequenceDataSet"] diff --git a/kedro/extras/datasets/biosequence/biosequence_dataset.py b/kedro/extras/datasets/biosequence/biosequence_dataset.py index 4888158774..ac0770aa68 100644 --- a/kedro/extras/datasets/biosequence/biosequence_dataset.py +++ b/kedro/extras/datasets/biosequence/biosequence_dataset.py @@ -8,14 +8,14 @@ import fsspec from Bio import SeqIO -from kedro.io.core import AbstractDataSet, get_filepath_str, get_protocol_and_path +from kedro.io.core import AbstractDataset, get_filepath_str, get_protocol_and_path # NOTE: kedro.extras.datasets will be removed in Kedro 0.19.0. # Any contribution to datasets should be made in kedro-datasets # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class BioSequenceDataSet(AbstractDataSet[List, List]): +class BioSequenceDataSet(AbstractDataset[List, List]): r"""``BioSequenceDataSet`` loads and saves data to a sequence file. Example: diff --git a/kedro/extras/datasets/dask/parquet_dataset.py b/kedro/extras/datasets/dask/parquet_dataset.py index 08c93b1d49..23dc7a701b 100644 --- a/kedro/extras/datasets/dask/parquet_dataset.py +++ b/kedro/extras/datasets/dask/parquet_dataset.py @@ -8,14 +8,14 @@ import fsspec import triad -from kedro.io.core import AbstractDataSet, get_protocol_and_path +from kedro.io.core import AbstractDataset, get_protocol_and_path # NOTE: kedro.extras.datasets will be removed in Kedro 0.19.0. # Any contribution to datasets should be made in kedro-datasets # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class ParquetDataSet(AbstractDataSet[dd.DataFrame, dd.DataFrame]): +class ParquetDataSet(AbstractDataset[dd.DataFrame, dd.DataFrame]): """``ParquetDataSet`` loads and saves data to parquet file(s). It uses Dask remote data services to handle the corresponding load and save operations: https://docs.dask.org/en/latest/how-to/connect-to-remote-data.html diff --git a/kedro/extras/datasets/email/__init__.py b/kedro/extras/datasets/email/__init__.py index 97aa7a3455..ba7873cbf2 100644 --- a/kedro/extras/datasets/email/__init__.py +++ b/kedro/extras/datasets/email/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementations for managing email messages.""" +"""``AbstractDataset`` implementations for managing email messages.""" __all__ = ["EmailMessageDataSet"] diff --git a/kedro/extras/datasets/email/message_dataset.py b/kedro/extras/datasets/email/message_dataset.py index 8a725540c2..695d93cbbe 100644 --- a/kedro/extras/datasets/email/message_dataset.py +++ b/kedro/extras/datasets/email/message_dataset.py @@ -13,7 +13,7 @@ import fsspec from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -26,7 +26,7 @@ class EmailMessageDataSet( - AbstractVersionedDataSet[Message, Message] + AbstractVersionedDataset[Message, Message] ): # pylint: disable=too-many-instance-attributes """``EmailMessageDataSet`` loads/saves an email message from/to a file using an underlying filesystem (e.g.: local, S3, GCS). It uses the diff --git a/kedro/extras/datasets/geopandas/__init__.py b/kedro/extras/datasets/geopandas/__init__.py index 966577fc37..bee7462a83 100644 --- a/kedro/extras/datasets/geopandas/__init__.py +++ b/kedro/extras/datasets/geopandas/__init__.py @@ -1,4 +1,4 @@ -"""``GeoJSONDataSet`` is an ``AbstractVersionedDataSet`` to save and load GeoJSON files. +"""``GeoJSONDataSet`` is an ``AbstractVersionedDataset`` to save and load GeoJSON files. """ __all__ = ["GeoJSONDataSet"] diff --git a/kedro/extras/datasets/geopandas/geojson_dataset.py b/kedro/extras/datasets/geopandas/geojson_dataset.py index 88cce18dee..5beba29d57 100644 --- a/kedro/extras/datasets/geopandas/geojson_dataset.py +++ b/kedro/extras/datasets/geopandas/geojson_dataset.py @@ -10,7 +10,7 @@ import geopandas as gpd from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -23,7 +23,7 @@ class GeoJSONDataSet( - AbstractVersionedDataSet[ + AbstractVersionedDataset[ gpd.GeoDataFrame, Union[gpd.GeoDataFrame, Dict[str, gpd.GeoDataFrame]] ] ): diff --git a/kedro/extras/datasets/holoviews/__init__.py b/kedro/extras/datasets/holoviews/__init__.py index c97bd72a6d..f50db9b823 100644 --- a/kedro/extras/datasets/holoviews/__init__.py +++ b/kedro/extras/datasets/holoviews/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to save Holoviews objects as image files.""" +"""``AbstractDataset`` implementation to save Holoviews objects as image files.""" __all__ = ["HoloviewsWriter"] diff --git a/kedro/extras/datasets/holoviews/holoviews_writer.py b/kedro/extras/datasets/holoviews/holoviews_writer.py index 2ed30f7156..34daeb1769 100644 --- a/kedro/extras/datasets/holoviews/holoviews_writer.py +++ b/kedro/extras/datasets/holoviews/holoviews_writer.py @@ -10,7 +10,7 @@ import holoviews as hv from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -25,7 +25,7 @@ HoloViews = TypeVar("HoloViews") -class HoloviewsWriter(AbstractVersionedDataSet[HoloViews, NoReturn]): +class HoloviewsWriter(AbstractVersionedDataset[HoloViews, NoReturn]): """``HoloviewsWriter`` saves Holoviews objects to image file(s) in an underlying filesystem (e.g. local, S3, GCS). diff --git a/kedro/extras/datasets/json/__init__.py b/kedro/extras/datasets/json/__init__.py index 5f023b35f4..887f7cd72f 100644 --- a/kedro/extras/datasets/json/__init__.py +++ b/kedro/extras/datasets/json/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to load/save data from/to a JSON file.""" +"""``AbstractDataset`` implementation to load/save data from/to a JSON file.""" __all__ = ["JSONDataSet"] diff --git a/kedro/extras/datasets/json/json_dataset.py b/kedro/extras/datasets/json/json_dataset.py index 17cc2cf69e..5e05dd46ed 100644 --- a/kedro/extras/datasets/json/json_dataset.py +++ b/kedro/extras/datasets/json/json_dataset.py @@ -9,7 +9,7 @@ import fsspec from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -21,7 +21,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class JSONDataSet(AbstractVersionedDataSet[Any, Any]): +class JSONDataSet(AbstractVersionedDataset[Any, Any]): """``JSONDataSet`` loads/saves data from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It uses native json to handle the JSON file. diff --git a/kedro/extras/datasets/matplotlib/__init__.py b/kedro/extras/datasets/matplotlib/__init__.py index ee2bc06466..eabd8fc517 100644 --- a/kedro/extras/datasets/matplotlib/__init__.py +++ b/kedro/extras/datasets/matplotlib/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to save matplotlib objects as image files.""" +"""``AbstractDataset`` implementation to save matplotlib objects as image files.""" __all__ = ["MatplotlibWriter"] diff --git a/kedro/extras/datasets/matplotlib/matplotlib_writer.py b/kedro/extras/datasets/matplotlib/matplotlib_writer.py index 00a365f2ec..204e4673c5 100644 --- a/kedro/extras/datasets/matplotlib/matplotlib_writer.py +++ b/kedro/extras/datasets/matplotlib/matplotlib_writer.py @@ -11,7 +11,7 @@ import matplotlib.pyplot as plt from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -24,7 +24,7 @@ class MatplotlibWriter( - AbstractVersionedDataSet[ + AbstractVersionedDataset[ Union[plt.figure, List[plt.figure], Dict[str, plt.figure]], NoReturn ] ): diff --git a/kedro/extras/datasets/networkx/__init__.py b/kedro/extras/datasets/networkx/__init__.py index 73674c81fe..ece1b98f9c 100644 --- a/kedro/extras/datasets/networkx/__init__.py +++ b/kedro/extras/datasets/networkx/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to save and load NetworkX graphs in JSON +"""``AbstractDataset`` implementation to save and load NetworkX graphs in JSON , GraphML and GML formats using ``NetworkX``.""" __all__ = ["GMLDataSet", "GraphMLDataSet", "JSONDataSet"] diff --git a/kedro/extras/datasets/networkx/gml_dataset.py b/kedro/extras/datasets/networkx/gml_dataset.py index d48f7d37e2..a56ddbe7ba 100644 --- a/kedro/extras/datasets/networkx/gml_dataset.py +++ b/kedro/extras/datasets/networkx/gml_dataset.py @@ -11,7 +11,7 @@ import networkx from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, Version, get_filepath_str, get_protocol_and_path, @@ -22,7 +22,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class GMLDataSet(AbstractVersionedDataSet[networkx.Graph, networkx.Graph]): +class GMLDataSet(AbstractVersionedDataset[networkx.Graph, networkx.Graph]): """``GMLDataSet`` loads and saves graphs to a GML file using an underlying filesystem (e.g.: local, S3, GCS). ``NetworkX`` is used to create GML data. diff --git a/kedro/extras/datasets/networkx/graphml_dataset.py b/kedro/extras/datasets/networkx/graphml_dataset.py index 54f5d496f7..368459958f 100644 --- a/kedro/extras/datasets/networkx/graphml_dataset.py +++ b/kedro/extras/datasets/networkx/graphml_dataset.py @@ -10,7 +10,7 @@ import networkx from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, Version, get_filepath_str, get_protocol_and_path, @@ -21,7 +21,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class GraphMLDataSet(AbstractVersionedDataSet[networkx.Graph, networkx.Graph]): +class GraphMLDataSet(AbstractVersionedDataset[networkx.Graph, networkx.Graph]): """``GraphMLDataSet`` loads and saves graphs to a GraphML file using an underlying filesystem (e.g.: local, S3, GCS). ``NetworkX`` is used to create GraphML data. diff --git a/kedro/extras/datasets/networkx/json_dataset.py b/kedro/extras/datasets/networkx/json_dataset.py index 4ae9940601..60db837a91 100644 --- a/kedro/extras/datasets/networkx/json_dataset.py +++ b/kedro/extras/datasets/networkx/json_dataset.py @@ -11,7 +11,7 @@ import networkx from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, Version, get_filepath_str, get_protocol_and_path, @@ -22,7 +22,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class JSONDataSet(AbstractVersionedDataSet[networkx.Graph, networkx.Graph]): +class JSONDataSet(AbstractVersionedDataset[networkx.Graph, networkx.Graph]): """NetworkX ``JSONDataSet`` loads and saves graphs to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). ``NetworkX`` is used to create JSON data. diff --git a/kedro/extras/datasets/pandas/__init__.py b/kedro/extras/datasets/pandas/__init__.py index b84015d1d9..2a8ba76371 100644 --- a/kedro/extras/datasets/pandas/__init__.py +++ b/kedro/extras/datasets/pandas/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementations that produce pandas DataFrames.""" +"""``AbstractDataset`` implementations that produce pandas DataFrames.""" __all__ = [ "CSVDataSet", diff --git a/kedro/extras/datasets/pandas/csv_dataset.py b/kedro/extras/datasets/pandas/csv_dataset.py index 597d03ecf9..01b044969c 100644 --- a/kedro/extras/datasets/pandas/csv_dataset.py +++ b/kedro/extras/datasets/pandas/csv_dataset.py @@ -12,7 +12,7 @@ from kedro.io.core import ( PROTOCOL_DELIMITER, - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -26,7 +26,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class CSVDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]): +class CSVDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame]): """``CSVDataSet`` loads/saves data from/to a CSV file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the CSV file. diff --git a/kedro/extras/datasets/pandas/excel_dataset.py b/kedro/extras/datasets/pandas/excel_dataset.py index 05c1144721..21139c7ca9 100644 --- a/kedro/extras/datasets/pandas/excel_dataset.py +++ b/kedro/extras/datasets/pandas/excel_dataset.py @@ -12,7 +12,7 @@ from kedro.io.core import ( PROTOCOL_DELIMITER, - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -27,7 +27,7 @@ class ExcelDataSet( - AbstractVersionedDataSet[ + AbstractVersionedDataset[ Union[pd.DataFrame, Dict[str, pd.DataFrame]], Union[pd.DataFrame, Dict[str, pd.DataFrame]], ] diff --git a/kedro/extras/datasets/pandas/feather_dataset.py b/kedro/extras/datasets/pandas/feather_dataset.py index 534d84d9bf..b43ecc1814 100644 --- a/kedro/extras/datasets/pandas/feather_dataset.py +++ b/kedro/extras/datasets/pandas/feather_dataset.py @@ -13,7 +13,7 @@ from kedro.io.core import ( PROTOCOL_DELIMITER, - AbstractVersionedDataSet, + AbstractVersionedDataset, Version, get_filepath_str, get_protocol_and_path, @@ -26,7 +26,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class FeatherDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]): +class FeatherDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame]): """``FeatherDataSet`` loads and saves data to a feather file using an underlying filesystem (e.g.: local, S3, GCS). The underlying functionality is supported by pandas, so it supports all allowed pandas options diff --git a/kedro/extras/datasets/pandas/gbq_dataset.py b/kedro/extras/datasets/pandas/gbq_dataset.py index dda5cf9d35..16cea01213 100644 --- a/kedro/extras/datasets/pandas/gbq_dataset.py +++ b/kedro/extras/datasets/pandas/gbq_dataset.py @@ -13,7 +13,7 @@ from google.oauth2.credentials import Credentials from kedro.io.core import ( - AbstractDataSet, + AbstractDataset, DatasetError, get_filepath_str, get_protocol_and_path, @@ -25,7 +25,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class GBQTableDataSet(AbstractDataSet[None, pd.DataFrame]): +class GBQTableDataSet(AbstractDataset[None, pd.DataFrame]): """``GBQTableDataSet`` loads and saves data from/to Google BigQuery. It uses pandas-gbq to read and write from/to BigQuery table. @@ -175,7 +175,7 @@ def _validate_location(self): ) -class GBQQueryDataSet(AbstractDataSet[None, pd.DataFrame]): +class GBQQueryDataSet(AbstractDataset[None, pd.DataFrame]): """``GBQQueryDataSet`` loads data from a provided SQL query from Google BigQuery. It uses ``pandas.read_gbq`` which itself uses ``pandas-gbq`` internally to read from BigQuery table. Therefore it supports all allowed diff --git a/kedro/extras/datasets/pandas/generic_dataset.py b/kedro/extras/datasets/pandas/generic_dataset.py index bf44694a26..7212310e8f 100644 --- a/kedro/extras/datasets/pandas/generic_dataset.py +++ b/kedro/extras/datasets/pandas/generic_dataset.py @@ -10,7 +10,7 @@ import pandas as pd from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -34,7 +34,7 @@ ] -class GenericDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]): +class GenericDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame]): """`pandas.GenericDataSet` loads/saves data from/to a data file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to dynamically select the appropriate type of read/write target on a best effort basis. diff --git a/kedro/extras/datasets/pandas/hdf_dataset.py b/kedro/extras/datasets/pandas/hdf_dataset.py index d60161d095..0d337af42d 100644 --- a/kedro/extras/datasets/pandas/hdf_dataset.py +++ b/kedro/extras/datasets/pandas/hdf_dataset.py @@ -10,7 +10,7 @@ import pandas as pd from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -24,7 +24,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class HDFDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]): +class HDFDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame]): """``HDFDataSet`` loads/saves data from/to a hdf file using an underlying filesystem (e.g. local, S3, GCS). It uses pandas.HDFStore to handle the hdf file. diff --git a/kedro/extras/datasets/pandas/json_dataset.py b/kedro/extras/datasets/pandas/json_dataset.py index 1d5e3cb2d1..8148d325c5 100644 --- a/kedro/extras/datasets/pandas/json_dataset.py +++ b/kedro/extras/datasets/pandas/json_dataset.py @@ -12,7 +12,7 @@ from kedro.io.core import ( PROTOCOL_DELIMITER, - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -26,7 +26,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class JSONDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]): +class JSONDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame]): """``JSONDataSet`` loads/saves data from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the json file. diff --git a/kedro/extras/datasets/pandas/parquet_dataset.py b/kedro/extras/datasets/pandas/parquet_dataset.py index bf03f97ccd..4bdba28772 100644 --- a/kedro/extras/datasets/pandas/parquet_dataset.py +++ b/kedro/extras/datasets/pandas/parquet_dataset.py @@ -13,7 +13,7 @@ from kedro.io.core import ( PROTOCOL_DELIMITER, - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -27,7 +27,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class ParquetDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]): +class ParquetDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame]): """``ParquetDataSet`` loads/saves data from/to a Parquet file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the Parquet file. diff --git a/kedro/extras/datasets/pandas/sql_dataset.py b/kedro/extras/datasets/pandas/sql_dataset.py index 7c084cb82e..373663ce84 100644 --- a/kedro/extras/datasets/pandas/sql_dataset.py +++ b/kedro/extras/datasets/pandas/sql_dataset.py @@ -11,7 +11,7 @@ from sqlalchemy.exc import NoSuchModuleError from kedro.io.core import ( - AbstractDataSet, + AbstractDataset, DatasetError, get_filepath_str, get_protocol_and_path, @@ -92,7 +92,7 @@ def _get_sql_alchemy_missing_error() -> DatasetError: ) -class SQLTableDataSet(AbstractDataSet[pd.DataFrame, pd.DataFrame]): +class SQLTableDataSet(AbstractDataset[pd.DataFrame, pd.DataFrame]): """``SQLTableDataSet`` loads data from a SQL table and saves a pandas dataframe to a table. It uses ``pandas.DataFrame`` internally, so it supports all allowed pandas options on ``read_sql_table`` and @@ -264,7 +264,7 @@ def _exists(self) -> bool: return exists -class SQLQueryDataSet(AbstractDataSet[None, pd.DataFrame]): +class SQLQueryDataSet(AbstractDataset[None, pd.DataFrame]): """``SQLQueryDataSet`` loads data from a provided SQL query. It uses ``pandas.DataFrame`` internally, so it supports all allowed pandas options on ``read_sql_query``. Since Pandas uses SQLAlchemy behind diff --git a/kedro/extras/datasets/pandas/xml_dataset.py b/kedro/extras/datasets/pandas/xml_dataset.py index 9433ae238d..ad91b4ad4b 100644 --- a/kedro/extras/datasets/pandas/xml_dataset.py +++ b/kedro/extras/datasets/pandas/xml_dataset.py @@ -12,7 +12,7 @@ from kedro.io.core import ( PROTOCOL_DELIMITER, - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -26,7 +26,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class XMLDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]): +class XMLDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame]): """``XMLDataSet`` loads/saves data from/to a XML file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the XML file. diff --git a/kedro/extras/datasets/pickle/__init__.py b/kedro/extras/datasets/pickle/__init__.py index 8e6707d450..40b898eb07 100644 --- a/kedro/extras/datasets/pickle/__init__.py +++ b/kedro/extras/datasets/pickle/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to load/save data from/to a Pickle file.""" +"""``AbstractDataset`` implementation to load/save data from/to a Pickle file.""" __all__ = ["PickleDataSet"] diff --git a/kedro/extras/datasets/pickle/pickle_dataset.py b/kedro/extras/datasets/pickle/pickle_dataset.py index eb9fb55594..19bda78f96 100644 --- a/kedro/extras/datasets/pickle/pickle_dataset.py +++ b/kedro/extras/datasets/pickle/pickle_dataset.py @@ -11,7 +11,7 @@ import fsspec from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -23,7 +23,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class PickleDataSet(AbstractVersionedDataSet[Any, Any]): +class PickleDataSet(AbstractVersionedDataset[Any, Any]): """``PickleDataSet`` loads/saves data from/to a Pickle file using an underlying filesystem (e.g.: local, S3, GCS). The underlying functionality is supported by the specified backend library passed in (defaults to the ``pickle`` library), so it diff --git a/kedro/extras/datasets/pillow/__init__.py b/kedro/extras/datasets/pillow/__init__.py index bd68c032c3..03df85f3ee 100644 --- a/kedro/extras/datasets/pillow/__init__.py +++ b/kedro/extras/datasets/pillow/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to load/save image data.""" +"""``AbstractDataset`` implementation to load/save image data.""" __all__ = ["ImageDataSet"] diff --git a/kedro/extras/datasets/pillow/image_dataset.py b/kedro/extras/datasets/pillow/image_dataset.py index 35c84995f4..1244035df1 100644 --- a/kedro/extras/datasets/pillow/image_dataset.py +++ b/kedro/extras/datasets/pillow/image_dataset.py @@ -9,7 +9,7 @@ from PIL import Image from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -21,7 +21,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class ImageDataSet(AbstractVersionedDataSet[Image.Image, Image.Image]): +class ImageDataSet(AbstractVersionedDataset[Image.Image, Image.Image]): """``ImageDataSet`` loads/saves image data as `numpy` from an underlying filesystem (e.g.: local, S3, GCS). It uses Pillow to handle image file. diff --git a/kedro/extras/datasets/plotly/__init__.py b/kedro/extras/datasets/plotly/__init__.py index f864ea6dbe..c2851bb000 100644 --- a/kedro/extras/datasets/plotly/__init__.py +++ b/kedro/extras/datasets/plotly/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementations to load/save a plotly figure from/to a JSON +"""``AbstractDataset`` implementations to load/save a plotly figure from/to a JSON file.""" __all__ = ["PlotlyDataSet", "JSONDataSet"] diff --git a/kedro/extras/datasets/plotly/json_dataset.py b/kedro/extras/datasets/plotly/json_dataset.py index a03ee5b812..5fa555d665 100644 --- a/kedro/extras/datasets/plotly/json_dataset.py +++ b/kedro/extras/datasets/plotly/json_dataset.py @@ -10,7 +10,7 @@ from plotly import graph_objects as go from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, Version, get_filepath_str, get_protocol_and_path, @@ -22,7 +22,7 @@ class JSONDataSet( - AbstractVersionedDataSet[go.Figure, Union[go.Figure, go.FigureWidget]] + AbstractVersionedDataset[go.Figure, Union[go.Figure, go.FigureWidget]] ): """``JSONDataSet`` loads/saves a plotly figure from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). diff --git a/kedro/extras/datasets/redis/__init__.py b/kedro/extras/datasets/redis/__init__.py index ba56e1fb85..f3c553ec3b 100644 --- a/kedro/extras/datasets/redis/__init__.py +++ b/kedro/extras/datasets/redis/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to load/save data from/to a redis db.""" +"""``AbstractDataset`` implementation to load/save data from/to a redis db.""" __all__ = ["PickleDataSet"] diff --git a/kedro/extras/datasets/redis/redis_dataset.py b/kedro/extras/datasets/redis/redis_dataset.py index c2bb2ca660..bac3a15b65 100644 --- a/kedro/extras/datasets/redis/redis_dataset.py +++ b/kedro/extras/datasets/redis/redis_dataset.py @@ -9,14 +9,14 @@ import redis -from kedro.io.core import AbstractDataSet, DatasetError +from kedro.io.core import AbstractDataset, DatasetError # NOTE: kedro.extras.datasets will be removed in Kedro 0.19.0. # Any contribution to datasets should be made in kedro-datasets # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class PickleDataSet(AbstractDataSet[Any, Any]): +class PickleDataSet(AbstractDataset[Any, Any]): """``PickleDataSet`` loads/saves data from/to a Redis database. The underlying functionality is supported by the redis library, so it supports all allowed options for instantiating the redis app ``from_url`` and setting diff --git a/kedro/extras/datasets/spark/deltatable_dataset.py b/kedro/extras/datasets/spark/deltatable_dataset.py index fc6c1d5d97..0f6655ac8c 100644 --- a/kedro/extras/datasets/spark/deltatable_dataset.py +++ b/kedro/extras/datasets/spark/deltatable_dataset.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to access DeltaTables using +"""``AbstractDataset`` implementation to access DeltaTables using ``delta-spark`` """ from pathlib import PurePosixPath @@ -12,14 +12,14 @@ _split_filepath, _strip_dbfs_prefix, ) -from kedro.io.core import AbstractDataSet, DatasetError +from kedro.io.core import AbstractDataset, DatasetError # NOTE: kedro.extras.datasets will be removed in Kedro 0.19.0. # Any contribution to datasets should be made in kedro-datasets # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class DeltaTableDataSet(AbstractDataSet[None, DeltaTable]): +class DeltaTableDataSet(AbstractDataset[None, DeltaTable]): """``DeltaTableDataSet`` loads data into DeltaTable objects. Example usage for the diff --git a/kedro/extras/datasets/spark/spark_dataset.py b/kedro/extras/datasets/spark/spark_dataset.py index 0d60d943ac..317e173d24 100644 --- a/kedro/extras/datasets/spark/spark_dataset.py +++ b/kedro/extras/datasets/spark/spark_dataset.py @@ -1,4 +1,4 @@ -"""``AbstractVersionedDataSet`` implementation to access Spark dataframes using +"""``AbstractVersionedDataset`` implementation to access Spark dataframes using ``pyspark`` """ import json @@ -17,7 +17,7 @@ from s3fs import S3FileSystem from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -162,7 +162,7 @@ def hdfs_glob(self, pattern: str) -> List[str]: return sorted(matched) -class SparkDataSet(AbstractVersionedDataSet[DataFrame, DataFrame]): +class SparkDataSet(AbstractVersionedDataset[DataFrame, DataFrame]): """``SparkDataSet`` loads and saves Spark dataframes. Example usage for the diff --git a/kedro/extras/datasets/spark/spark_hive_dataset.py b/kedro/extras/datasets/spark/spark_hive_dataset.py index 81f09b9daa..2abbd1f166 100644 --- a/kedro/extras/datasets/spark/spark_hive_dataset.py +++ b/kedro/extras/datasets/spark/spark_hive_dataset.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to access Spark dataframes using +"""``AbstractDataset`` implementation to access Spark dataframes using ``pyspark`` on Apache Hive. """ import pickle @@ -8,15 +8,15 @@ from pyspark.sql import DataFrame, SparkSession, Window from pyspark.sql.functions import col, lit, row_number -from kedro.io.core import AbstractDataSet, DatasetError +from kedro.io.core import AbstractDataset, DatasetError # NOTE: kedro.extras.datasets will be removed in Kedro 0.19.0. # Any contribution to datasets should be made in kedro-datasets # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -# noqa: too-many-instance-attributes -class SparkHiveDataSet(AbstractDataSet[DataFrame, DataFrame]): +# pylint:disable=too-many-instance-attributes +class SparkHiveDataSet(AbstractDataset[DataFrame, DataFrame]): """``SparkHiveDataSet`` loads and saves Spark dataframes stored on Hive. This data set also handles some incompatible file types such as using partitioned parquet on hive which will not normally allow upserts to existing data without a complete replacement diff --git a/kedro/extras/datasets/spark/spark_jdbc_dataset.py b/kedro/extras/datasets/spark/spark_jdbc_dataset.py index 15e01c4468..3abeeb312a 100644 --- a/kedro/extras/datasets/spark/spark_jdbc_dataset.py +++ b/kedro/extras/datasets/spark/spark_jdbc_dataset.py @@ -5,7 +5,7 @@ from pyspark.sql import DataFrame, SparkSession -from kedro.io.core import AbstractDataSet, DatasetError +from kedro.io.core import AbstractDataset, DatasetError __all__ = ["SparkJDBCDataSet"] @@ -14,7 +14,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class SparkJDBCDataSet(AbstractDataSet[DataFrame, DataFrame]): +class SparkJDBCDataSet(AbstractDataset[DataFrame, DataFrame]): """``SparkJDBCDataSet`` loads data from a database table accessible via JDBC URL url and connection properties and saves the content of a PySpark DataFrame to an external database table via JDBC. It uses diff --git a/kedro/extras/datasets/svmlight/__init__.py b/kedro/extras/datasets/svmlight/__init__.py index 4ea2429612..4b77f3dfde 100644 --- a/kedro/extras/datasets/svmlight/__init__.py +++ b/kedro/extras/datasets/svmlight/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to load/save data from/to a svmlight/ +"""``AbstractDataset`` implementation to load/save data from/to a svmlight/ libsvm sparse data file.""" __all__ = ["SVMLightDataSet"] diff --git a/kedro/extras/datasets/svmlight/svmlight_dataset.py b/kedro/extras/datasets/svmlight/svmlight_dataset.py index f8820b036f..697253ef2a 100644 --- a/kedro/extras/datasets/svmlight/svmlight_dataset.py +++ b/kedro/extras/datasets/svmlight/svmlight_dataset.py @@ -12,7 +12,7 @@ from sklearn.datasets import dump_svmlight_file, load_svmlight_file from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -29,7 +29,7 @@ _DO = Tuple[csr_matrix, ndarray] -class SVMLightDataSet(AbstractVersionedDataSet[_DI, _DO]): +class SVMLightDataSet(AbstractVersionedDataset[_DI, _DO]): """``SVMLightDataSet`` loads/saves data from/to a svmlight/libsvm file using an underlying filesystem (e.g.: local, S3, GCS). It uses sklearn functions ``dump_svmlight_file`` to save and ``load_svmlight_file`` to load a file. diff --git a/kedro/extras/datasets/tensorflow/tensorflow_model_dataset.py b/kedro/extras/datasets/tensorflow/tensorflow_model_dataset.py index e1b35e6620..c0e916d01f 100644 --- a/kedro/extras/datasets/tensorflow/tensorflow_model_dataset.py +++ b/kedro/extras/datasets/tensorflow/tensorflow_model_dataset.py @@ -10,7 +10,7 @@ import tensorflow as tf from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -24,7 +24,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class TensorFlowModelDataset(AbstractVersionedDataSet[tf.keras.Model, tf.keras.Model]): +class TensorFlowModelDataset(AbstractVersionedDataset[tf.keras.Model, tf.keras.Model]): """``TensorflowModelDataset`` loads and saves TensorFlow models. The underlying functionality is supported by, and passes input arguments through to, TensorFlow 2.X load_model and save_model methods. diff --git a/kedro/extras/datasets/text/__init__.py b/kedro/extras/datasets/text/__init__.py index fab08acea4..9ed2c37c0e 100644 --- a/kedro/extras/datasets/text/__init__.py +++ b/kedro/extras/datasets/text/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to load/save data from/to a text file.""" +"""``AbstractDataset`` implementation to load/save data from/to a text file.""" __all__ = ["TextDataSet"] diff --git a/kedro/extras/datasets/text/text_dataset.py b/kedro/extras/datasets/text/text_dataset.py index 2b02bfba3d..3c8a859445 100644 --- a/kedro/extras/datasets/text/text_dataset.py +++ b/kedro/extras/datasets/text/text_dataset.py @@ -8,7 +8,7 @@ import fsspec from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -20,7 +20,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class TextDataSet(AbstractVersionedDataSet[str, str]): +class TextDataSet(AbstractVersionedDataset[str, str]): """``TextDataSet`` loads/saves data from/to a text file using an underlying filesystem (e.g.: local, S3, GCS) diff --git a/kedro/extras/datasets/video/video_dataset.py b/kedro/extras/datasets/video/video_dataset.py index 4aba723afa..4f5e793f29 100644 --- a/kedro/extras/datasets/video/video_dataset.py +++ b/kedro/extras/datasets/video/video_dataset.py @@ -14,7 +14,7 @@ import numpy as np import PIL.Image -from kedro.io.core import AbstractDataSet, get_protocol_and_path +from kedro.io.core import AbstractDataset, get_protocol_and_path class SlicedVideo: @@ -192,7 +192,7 @@ def __iter__(self): return self -class VideoDataSet(AbstractDataSet[AbstractVideo, AbstractVideo]): +class VideoDataSet(AbstractDataset[AbstractVideo, AbstractVideo]): """``VideoDataSet`` loads / save video data from a given filepath as sequence of PIL.Image.Image using OpenCV. diff --git a/kedro/extras/datasets/yaml/__init__.py b/kedro/extras/datasets/yaml/__init__.py index b3780de3a6..07abbaf4a5 100644 --- a/kedro/extras/datasets/yaml/__init__.py +++ b/kedro/extras/datasets/yaml/__init__.py @@ -1,4 +1,4 @@ -"""``AbstractDataSet`` implementation to load/save data from/to a YAML file.""" +"""``AbstractDataset`` implementation to load/save data from/to a YAML file.""" __all__ = ["YAMLDataSet"] diff --git a/kedro/extras/datasets/yaml/yaml_dataset.py b/kedro/extras/datasets/yaml/yaml_dataset.py index 91c6b474cf..7ea2760cdf 100644 --- a/kedro/extras/datasets/yaml/yaml_dataset.py +++ b/kedro/extras/datasets/yaml/yaml_dataset.py @@ -9,7 +9,7 @@ import yaml from kedro.io.core import ( - AbstractVersionedDataSet, + AbstractVersionedDataset, DatasetError, Version, get_filepath_str, @@ -21,7 +21,7 @@ # in kedro-plugins (https://github.com/kedro-org/kedro-plugins) -class YAMLDataSet(AbstractVersionedDataSet[Dict, Dict]): +class YAMLDataSet(AbstractVersionedDataset[Dict, Dict]): """``YAMLDataSet`` loads/saves data from/to a YAML file using an underlying filesystem (e.g.: local, S3, GCS). It uses PyYAML to handle the YAML file. diff --git a/kedro/io/__init__.py b/kedro/io/__init__.py index 0755af906c..26d4c3619c 100644 --- a/kedro/io/__init__.py +++ b/kedro/io/__init__.py @@ -1,12 +1,12 @@ """``kedro.io`` provides functionality to read and write to a -number of data sets. At the core of the library is the ``AbstractDataSet`` class. +number of data sets. At the core of the library is the ``AbstractDataset`` class. """ from __future__ import annotations from .cached_dataset import CachedDataSet, CachedDataset from .core import ( - AbstractDataSet, - AbstractVersionedDataSet, + AbstractDataset, + AbstractVersionedDataset, DatasetAlreadyExistsError, DatasetError, DatasetNotFoundError, @@ -26,19 +26,23 @@ DataSetError: type[DatasetError] DataSetNotFoundError: type[DatasetNotFoundError] DataSetAlreadyExistsError: type[DatasetAlreadyExistsError] +AbstractDataSet: type[AbstractDataset] +AbstractVersionedDataSet: type[AbstractVersionedDataset] def __getattr__(name): import kedro.io.core # noqa: import-outside-toplevel - if name in (kedro.io.core._DEPRECATED_ERROR_CLASSES): # noqa: protected-access + if name in (kedro.io.core._DEPRECATED_CLASSES): # noqa: protected-access return getattr(kedro.io.core, name) raise AttributeError(f"module {repr(__name__)} has no attribute {repr(name)}") __all__ = [ "AbstractDataSet", + "AbstractDataset", "AbstractVersionedDataSet", + "AbstractVersionedDataset", "CachedDataSet", "CachedDataset", "DataCatalog", diff --git a/kedro/io/cached_dataset.py b/kedro/io/cached_dataset.py index d3aee1a39e..6ec2a59fb7 100644 --- a/kedro/io/cached_dataset.py +++ b/kedro/io/cached_dataset.py @@ -8,14 +8,14 @@ import warnings from typing import Any -from kedro.io.core import VERSIONED_FLAG_KEY, AbstractDataSet, Version +from kedro.io.core import VERSIONED_FLAG_KEY, AbstractDataset, Version from kedro.io.memory_dataset import MemoryDataset # https://github.com/pylint-dev/pylint/issues/4300#issuecomment-1043601901 CachedDataSet: type[CachedDataset] -class CachedDataset(AbstractDataSet): +class CachedDataset(AbstractDataset): """``CachedDataset`` is a dataset wrapper which caches in memory the data saved, so that the user avoids io operations with slow storage media. @@ -40,7 +40,7 @@ class as shown above. def __init__( self, - dataset: AbstractDataSet | dict, + dataset: AbstractDataset | dict, version: Version = None, copy_mode: str = None, metadata: dict[str, Any] = None, @@ -66,7 +66,7 @@ def __init__( """ if isinstance(dataset, dict): self._dataset = self._from_config(dataset, version) - elif isinstance(dataset, AbstractDataSet): + elif isinstance(dataset, AbstractDataset): self._dataset = dataset else: raise ValueError( @@ -89,10 +89,10 @@ def _from_config(config, version): ) if version: config[VERSIONED_FLAG_KEY] = True - return AbstractDataSet.from_config( + return AbstractDataset.from_config( "_cached", config, version.load, version.save ) - return AbstractDataSet.from_config("_cached", config) + return AbstractDataset.from_config("_cached", config) def _describe(self) -> dict[str, Any]: return { diff --git a/kedro/io/core.py b/kedro/io/core.py index f608f10840..6a097d7058 100644 --- a/kedro/io/core.py +++ b/kedro/io/core.py @@ -33,13 +33,15 @@ DataSetError: type[DatasetError] DataSetNotFoundError: type[DatasetNotFoundError] DataSetAlreadyExistsError: type[DatasetAlreadyExistsError] +AbstractDataSet: type[AbstractDataset] +AbstractVersionedDataSet: type[AbstractVersionedDataset] class DatasetError(Exception): - """``DatasetError`` raised by ``AbstractDataSet`` implementations + """``DatasetError`` raised by ``AbstractDataset`` implementations in case of failure of input/output methods. - ``AbstractDataSet`` implementations should provide instructive + ``AbstractDataset`` implementations should provide instructive information in case of failure. """ @@ -62,28 +64,8 @@ class DatasetAlreadyExistsError(DatasetError): pass -_DEPRECATED_ERROR_CLASSES = { - "DataSetError": DatasetError, - "DataSetNotFoundError": DatasetNotFoundError, - "DataSetAlreadyExistsError": DatasetAlreadyExistsError, -} - - -def __getattr__(name): - if name in _DEPRECATED_ERROR_CLASSES: - alias = _DEPRECATED_ERROR_CLASSES[name] - warnings.warn( - f"{repr(name)} has been renamed to {repr(alias.__name__)}, " - f"and the alias will be removed in Kedro 0.19.0", - DeprecationWarning, - stacklevel=2, - ) - return alias - raise AttributeError(f"module {repr(__name__)} has no attribute {repr(name)}") - - class VersionNotFoundError(DatasetError): - """``VersionNotFoundError`` raised by ``AbstractVersionedDataSet`` implementations + """``VersionNotFoundError`` raised by ``AbstractVersionedDataset`` implementations in case of no load versions available for the data set. """ @@ -94,8 +76,8 @@ class VersionNotFoundError(DatasetError): _DO = TypeVar("_DO") -class AbstractDataSet(abc.ABC, Generic[_DI, _DO]): - """``AbstractDataSet`` is the base class for all data set implementations. +class AbstractDataset(abc.ABC, Generic[_DI, _DO]): + """``AbstractDataset`` is the base class for all data set implementations. All data set implementations should extend this abstract class and implement the methods marked as abstract. If a specific dataset implementation cannot be used in conjunction with @@ -106,10 +88,10 @@ class AbstractDataSet(abc.ABC, Generic[_DI, _DO]): >>> from pathlib import Path, PurePosixPath >>> import pandas as pd - >>> from kedro.io import AbstractDataSet + >>> from kedro.io import AbstractDataset >>> >>> - >>> class MyOwnDataset(AbstractDataSet[pd.DataFrame, pd.DataFrame]): + >>> class MyOwnDataset(AbstractDataset[pd.DataFrame, pd.DataFrame]): >>> def __init__(self, filepath, param1, param2=True): >>> self._filepath = PurePosixPath(filepath) >>> self._param1 = param1 @@ -144,7 +126,7 @@ def from_config( config: dict[str, Any], load_version: str = None, save_version: str = None, - ) -> AbstractDataSet: + ) -> AbstractDataset: """Create a data set instance using the configuration provided. Args: @@ -158,7 +140,7 @@ def from_config( if versioning was not enabled. Returns: - An instance of an ``AbstractDataSet`` subclass. + An instance of an ``AbstractDataset`` subclass. Raises: DatasetError: When the function fails to create the data set @@ -274,21 +256,21 @@ def _to_str(obj, is_root=False): @abc.abstractmethod def _load(self) -> _DO: raise NotImplementedError( - f"'{self.__class__.__name__}' is a subclass of AbstractDataSet and " + f"'{self.__class__.__name__}' is a subclass of AbstractDataset and " f"it must implement the '_load' method" ) @abc.abstractmethod def _save(self, data: _DI) -> None: raise NotImplementedError( - f"'{self.__class__.__name__}' is a subclass of AbstractDataSet and " + f"'{self.__class__.__name__}' is a subclass of AbstractDataset and " f"it must implement the '_save' method" ) @abc.abstractmethod def _describe(self) -> dict[str, Any]: raise NotImplementedError( - f"'{self.__class__.__name__}' is a subclass of AbstractDataSet and " + f"'{self.__class__.__name__}' is a subclass of AbstractDataset and " f"it must implement the '_describe' method" ) @@ -336,7 +318,7 @@ def release(self) -> None: def _release(self) -> None: pass - def _copy(self, **overwrite_params) -> AbstractDataSet: + def _copy(self, **overwrite_params) -> AbstractDataset: dataset_copy = copy.deepcopy(self) for name, value in overwrite_params.items(): setattr(dataset_copy, name, value) @@ -379,7 +361,7 @@ class Version(namedtuple("Version", ["load", "save"])): def parse_dataset_definition( config: dict[str, Any], load_version: str = None, save_version: str = None -) -> tuple[type[AbstractDataSet], dict[str, Any]]: +) -> tuple[type[AbstractDataset], dict[str, Any]]: """Parse and instantiate a dataset class using the configuration provided. Args: @@ -422,10 +404,10 @@ def parse_dataset_definition( f"has not been installed." ) from exc - if not issubclass(class_obj, AbstractDataSet): + if not issubclass(class_obj, AbstractDataset): raise DatasetError( f"Dataset type '{class_obj.__module__}.{class_obj.__qualname__}' " - f"is invalid: all data set types must extend 'AbstractDataSet'." + f"is invalid: all data set types must extend 'AbstractDataset'." ) if VERSION_KEY in config: @@ -481,9 +463,9 @@ def _local_exists(filepath: str) -> bool: # SKIP_IF_NO_SPARK return filepath.exists() or any(par.is_file() for par in filepath.parents) -class AbstractVersionedDataSet(AbstractDataSet[_DI, _DO], abc.ABC): +class AbstractVersionedDataset(AbstractDataset[_DI, _DO], abc.ABC): """ - ``AbstractVersionedDataSet`` is the base class for all versioned data set + ``AbstractVersionedDataset`` is the base class for all versioned data set implementations. All data sets that implement versioning should extend this abstract class and implement the methods marked as abstract. @@ -492,10 +474,10 @@ class AbstractVersionedDataSet(AbstractDataSet[_DI, _DO], abc.ABC): >>> from pathlib import Path, PurePosixPath >>> import pandas as pd - >>> from kedro.io import AbstractVersionedDataSet + >>> from kedro.io import AbstractVersionedDataset >>> >>> - >>> class MyOwnDataset(AbstractVersionedDataSet): + >>> class MyOwnDataset(AbstractVersionedDataset): >>> def __init__(self, filepath, version, param1, param2=True): >>> super().__init__(PurePosixPath(filepath), version) >>> self._param1 = param1 @@ -534,7 +516,7 @@ def __init__( exists_function: Callable[[str], bool] = None, glob_function: Callable[[str], list[str]] = None, ): - """Creates a new instance of ``AbstractVersionedDataSet``. + """Creates a new instance of ``AbstractVersionedDataset``. Args: filepath: Filepath in POSIX format to a file. @@ -778,3 +760,25 @@ def validate_on_forbidden_chars(**kwargs): raise DatasetError( f"Neither white-space nor semicolon are allowed in '{key}'." ) + + +_DEPRECATED_CLASSES = { + "DataSetError": DatasetError, + "DataSetNotFoundError": DatasetNotFoundError, + "DataSetAlreadyExistsError": DatasetAlreadyExistsError, + "AbstractDataSet": AbstractDataset, + "AbstractVersionedDataSet": AbstractVersionedDataset, +} + + +def __getattr__(name): + if name in _DEPRECATED_CLASSES: + alias = _DEPRECATED_CLASSES[name] + warnings.warn( + f"{repr(name)} has been renamed to {repr(alias.__name__)}, " + f"and the alias will be removed in Kedro 0.19.0", + DeprecationWarning, + stacklevel=2, + ) + return alias + raise AttributeError(f"module {repr(__name__)} has no attribute {repr(name)}") diff --git a/kedro/io/data_catalog.py b/kedro/io/data_catalog.py index 98e8d8f289..156be2130f 100644 --- a/kedro/io/data_catalog.py +++ b/kedro/io/data_catalog.py @@ -1,4 +1,4 @@ -"""``DataCatalog`` stores instances of ``AbstractDataSet`` implementations to +"""``DataCatalog`` stores instances of ``AbstractDataset`` implementations to provide ``load`` and ``save`` capabilities from anywhere in the program. To use a ``DataCatalog``, you need to instantiate it with a dictionary of data sets. Then it will act as a single point of reference for your calls, @@ -16,8 +16,8 @@ from parse import parse from kedro.io.core import ( - AbstractDataSet, - AbstractVersionedDataSet, + AbstractDataset, + AbstractVersionedDataset, DatasetAlreadyExistsError, DatasetError, DatasetNotFoundError, @@ -103,7 +103,7 @@ class _FrozenDatasets: def __init__( self, - *datasets_collections: _FrozenDatasets | dict[str, AbstractDataSet], + *datasets_collections: _FrozenDatasets | dict[str, AbstractDataset], ): """Return a _FrozenDatasets instance from some datasets collections. Each collection could either be another _FrozenDatasets or a dictionary. @@ -132,7 +132,7 @@ def __setattr__(self, key, value): class DataCatalog: - """``DataCatalog`` stores instances of ``AbstractDataSet`` implementations + """``DataCatalog`` stores instances of ``AbstractDataset`` implementations to provide ``load`` and ``save`` capabilities from anywhere in the program. To use a ``DataCatalog``, you need to instantiate it with a dictionary of data sets. Then it will act as a single point of reference @@ -142,14 +142,14 @@ class DataCatalog: def __init__( # noqa: too-many-arguments self, - data_sets: dict[str, AbstractDataSet] = None, + data_sets: dict[str, AbstractDataset] = None, feed_dict: dict[str, Any] = None, layers: dict[str, set[str]] = None, dataset_patterns: Patterns = None, load_versions: dict[str, str] = None, save_version: str = None, ) -> None: - """``DataCatalog`` stores instances of ``AbstractDataSet`` + """``DataCatalog`` stores instances of ``AbstractDataset`` implementations to provide ``load`` and ``save`` capabilities from anywhere in the program. To use a ``DataCatalog``, you need to instantiate it with a dictionary of data sets. Then it will act as a @@ -214,13 +214,13 @@ def from_config( Args: catalog: A dictionary whose keys are the data set names and the values are dictionaries with the constructor arguments - for classes implementing ``AbstractDataSet``. The data set + for classes implementing ``AbstractDataset``. The data set class to be loaded is specified with the key ``type`` and their fully qualified class name. All ``kedro.io`` data set can be specified by their class name only, i.e. their module name can be omitted. credentials: A dictionary containing credentials for different - data sets. Use the ``credentials`` key in a ``AbstractDataSet`` + data sets. Use the ``credentials`` key in a ``AbstractDataset`` to refer to the appropriate credentials as shown in the example below. load_versions: A mapping between dataset names and versions @@ -296,7 +296,7 @@ class to be loaded is specified with the key ``type`` and their ds_layer = ds_config.pop("layer", None) if ds_layer is not None: layers[ds_layer].add(ds_name) - data_sets[ds_name] = AbstractDataSet.from_config( + data_sets[ds_name] = AbstractDataset.from_config( ds_name, ds_config, load_versions.get(ds_name), save_version ) dataset_layers = layers or None @@ -327,7 +327,7 @@ def _is_pattern(pattern: str): @staticmethod def _match_pattern(data_set_patterns: Patterns, data_set_name: str) -> str | None: - """Match a dataset name against patterns in a dictionary containing patterns""" + """Match a dataset name against patterns in a dictionary.""" matches = ( pattern for pattern in data_set_patterns.keys() @@ -337,7 +337,10 @@ def _match_pattern(data_set_patterns: Patterns, data_set_name: str) -> str | Non @classmethod def _sort_patterns(cls, data_set_patterns: Patterns) -> dict[str, dict[str, Any]]: - """Sort a dictionary of dataset patterns according to parsing rules - + """Sort a dictionary of dataset patterns according to parsing rules. + + In order: + 1. Decreasing specificity (number of characters outside the curly brackets) 2. Decreasing number of placeholders (number of curly bracket pairs) 3. Alphabetically @@ -354,11 +357,14 @@ def _sort_patterns(cls, data_set_patterns: Patterns) -> dict[str, dict[str, Any] @staticmethod def _specificity(pattern: str) -> int: - """Helper function to check the length of exactly matched characters not inside brackets - Example - - specificity("{namespace}.companies") = 10 - specificity("{namespace}.{dataset}") = 1 - specificity("france.companies") = 16 + """Helper function to check the length of exactly matched characters not inside brackets. + + Example: + :: + + >>> specificity("{namespace}.companies") = 10 + >>> specificity("{namespace}.{dataset}") = 1 + >>> specificity("france.companies") = 16 """ # Remove all the placeholders from the pattern and count the number of remaining chars result = re.sub(r"\{.*?\}", "", pattern) @@ -366,7 +372,7 @@ def _specificity(pattern: str) -> int: def _get_dataset( self, data_set_name: str, version: Version = None, suggest: bool = True - ) -> AbstractDataSet: + ) -> AbstractDataset: matched_pattern = self._match_pattern(self._dataset_patterns, data_set_name) if data_set_name not in self._data_sets and matched_pattern: # If the dataset is a patterned dataset, materialise it and add it to @@ -376,7 +382,7 @@ def _get_dataset( if ds_layer: self.layers = self.layers or {} self.layers.setdefault(ds_layer, set()).add(data_set_name) - data_set = AbstractDataSet.from_config( + data_set = AbstractDataset.from_config( data_set_name, data_set_config, self._load_versions.get(data_set_name), @@ -405,7 +411,7 @@ def _get_dataset( error_msg += f" - did you mean one of these instead: {suggestions}" raise DatasetNotFoundError(error_msg) data_set = self._data_sets[data_set_name] - if version and isinstance(data_set, AbstractVersionedDataSet): + if version and isinstance(data_set, AbstractVersionedDataset): # we only want to return a similar-looking dataset, # not modify the one stored in the current catalog data_set = data_set._copy(_version=version) # noqa: protected-access @@ -424,7 +430,7 @@ def _resolve_config( data_set_name: str, matched_pattern: str, ) -> dict[str, Any]: - """Get resolved AbstractDataSet from a factory config""" + """Get resolved AbstractDataset from a factory config""" result = parse(matched_pattern, data_set_name) config_copy = copy.deepcopy(self._dataset_patterns[matched_pattern]) # Resolve the factory config for the dataset @@ -547,9 +553,9 @@ def release(self, name: str): dataset.release() def add( - self, data_set_name: str, data_set: AbstractDataSet, replace: bool = False + self, data_set_name: str, data_set: AbstractDataset, replace: bool = False ) -> None: - """Adds a new ``AbstractDataSet`` object to the ``DataCatalog``. + """Adds a new ``AbstractDataset`` object to the ``DataCatalog``. Args: data_set_name: A unique data set name which has not been @@ -585,7 +591,7 @@ def add( self.datasets = _FrozenDatasets(self.datasets, {data_set_name: data_set}) def add_all( - self, data_sets: dict[str, AbstractDataSet], replace: bool = False + self, data_sets: dict[str, AbstractDataset], replace: bool = False ) -> None: """Adds a group of new data sets to the ``DataCatalog``. @@ -645,7 +651,7 @@ def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> Non >>> assert io.load("data").equals(df) """ for data_set_name in feed_dict: - if isinstance(feed_dict[data_set_name], AbstractDataSet): + if isinstance(feed_dict[data_set_name], AbstractDataset): data_set = feed_dict[data_set_name] else: data_set = MemoryDataset(data=feed_dict[data_set_name]) diff --git a/kedro/io/lambda_dataset.py b/kedro/io/lambda_dataset.py index b2cca48921..68d7161b11 100644 --- a/kedro/io/lambda_dataset.py +++ b/kedro/io/lambda_dataset.py @@ -1,19 +1,19 @@ -"""``LambdaDataset`` is an implementation of ``AbstractDataSet`` which allows for +"""``LambdaDataset`` is an implementation of ``AbstractDataset`` which allows for providing custom load, save, and exists methods without extending -``AbstractDataSet``. +``AbstractDataset``. """ from __future__ import annotations import warnings from typing import Any, Callable -from kedro.io.core import AbstractDataSet, DatasetError +from kedro.io.core import AbstractDataset, DatasetError # https://github.com/pylint-dev/pylint/issues/4300#issuecomment-1043601901 LambdaDataSet: type[LambdaDataset] -class LambdaDataset(AbstractDataSet): +class LambdaDataset(AbstractDataset): """``LambdaDataset`` loads and saves data to a data set. It relies on delegating to specific implementation such as csv, sql, etc. diff --git a/kedro/io/memory_dataset.py b/kedro/io/memory_dataset.py index 1dc5ded1b0..7cab3f4d3d 100644 --- a/kedro/io/memory_dataset.py +++ b/kedro/io/memory_dataset.py @@ -6,7 +6,7 @@ import warnings from typing import Any -from kedro.io.core import AbstractDataSet, DatasetError +from kedro.io.core import AbstractDataset, DatasetError _EMPTY = object() @@ -14,7 +14,7 @@ MemoryDataSet: type[MemoryDataset] -class MemoryDataset(AbstractDataSet): +class MemoryDataset(AbstractDataset): """``MemoryDataset`` loads and saves data from/to an in-memory Python object. diff --git a/kedro/io/partitioned_dataset.py b/kedro/io/partitioned_dataset.py index 1501fc4a04..66df5294a8 100644 --- a/kedro/io/partitioned_dataset.py +++ b/kedro/io/partitioned_dataset.py @@ -14,7 +14,7 @@ from kedro.io.core import ( VERSION_KEY, VERSIONED_FLAG_KEY, - AbstractDataSet, + AbstractDataset, DatasetError, parse_dataset_definition, ) @@ -36,7 +36,7 @@ IncrementalDataSet: type[IncrementalDataset] -class PartitionedDataset(AbstractDataSet): +class PartitionedDataset(AbstractDataset): # noqa: too-many-instance-attributes,protected-access """``PartitionedDataset`` loads and saves partitioned file-like data using the underlying dataset definition. For filesystem level operations it uses `fsspec`: @@ -138,7 +138,7 @@ class PartitionedDataset(AbstractDataSet): def __init__( # noqa: too-many-arguments self, path: str, - dataset: str | type[AbstractDataSet] | dict[str, Any], + dataset: str | type[AbstractDataset] | dict[str, Any], filepath_arg: str = "filepath", filename_suffix: str = "", credentials: dict[str, Any] = None, @@ -161,7 +161,7 @@ def __init__( # noqa: too-many-arguments dataset: Underlying dataset definition. This is used to instantiate the dataset for each file located inside the ``path``. Accepted formats are: - a) object of a class that inherits from ``AbstractDataSet`` + a) object of a class that inherits from ``AbstractDataset`` b) a string representing a fully qualified class name to such class c) a dictionary with ``type`` key pointing to a string from b), other keys are passed to the Dataset initializer. @@ -384,7 +384,7 @@ class IncrementalDataset(PartitionedDataset): def __init__( # noqa: too-many-arguments self, path: str, - dataset: str | type[AbstractDataSet] | dict[str, Any], + dataset: str | type[AbstractDataset] | dict[str, Any], checkpoint: str | dict[str, Any] | None = None, filepath_arg: str = "filepath", filename_suffix: str = "", @@ -408,7 +408,7 @@ def __init__( # noqa: too-many-arguments dataset: Underlying dataset definition. This is used to instantiate the dataset for each file located inside the ``path``. Accepted formats are: - a) object of a class that inherits from ``AbstractDataSet`` + a) object of a class that inherits from ``AbstractDataset`` b) a string representing a fully qualified class name to such class c) a dictionary with ``type`` key pointing to a string from b), other keys are passed to the Dataset initializer. @@ -521,7 +521,7 @@ def _is_valid_partition(partition) -> bool: ) @property - def _checkpoint(self) -> AbstractDataSet: + def _checkpoint(self) -> AbstractDataset: type_, kwargs = parse_dataset_definition(self._checkpoint_config) return type_(**kwargs) # type: ignore @@ -555,15 +555,15 @@ def confirm(self) -> None: self._checkpoint.save(partition_ids[-1]) # checkpoint to last partition -_DEPRECATED_ERROR_CLASSES = { +_DEPRECATED_CLASSES = { "PartitionedDataSet": PartitionedDataset, "IncrementalDataSet": IncrementalDataset, } def __getattr__(name): - if name in _DEPRECATED_ERROR_CLASSES: - alias = _DEPRECATED_ERROR_CLASSES[name] + if name in _DEPRECATED_CLASSES: + alias = _DEPRECATED_CLASSES[name] warnings.warn( f"{repr(name)} has been renamed to {repr(alias.__name__)}, " f"and the alias will be removed in Kedro 0.19.0", diff --git a/kedro/runner/parallel_runner.py b/kedro/runner/parallel_runner.py index b9a45792da..860cefed6a 100644 --- a/kedro/runner/parallel_runner.py +++ b/kedro/runner/parallel_runner.py @@ -38,7 +38,7 @@ class _SharedMemoryDataset: """``_SharedMemoryDataset`` is a wrapper class for a shared MemoryDataset in SyncManager. - It is not inherited from AbstractDataSet class. + It is not inherited from AbstractDataset class. """ def __init__(self, manager: SyncManager): diff --git a/kedro/runner/runner.py b/kedro/runner/runner.py index be379ace71..084843124e 100644 --- a/kedro/runner/runner.py +++ b/kedro/runner/runner.py @@ -21,7 +21,7 @@ from pluggy import PluginManager from kedro.framework.hooks.manager import _NullPluginManager -from kedro.io import AbstractDataSet, DataCatalog, MemoryDataset +from kedro.io import AbstractDataset, DataCatalog, MemoryDataset from kedro.pipeline import Pipeline from kedro.pipeline.node import Node @@ -164,14 +164,14 @@ def _run( pass @abstractmethod # pragma: no cover - def create_default_data_set(self, ds_name: str) -> AbstractDataSet: + def create_default_data_set(self, ds_name: str) -> AbstractDataset: """Factory method for creating the default dataset for the runner. Args: ds_name: Name of the missing dataset. Returns: - An instance of an implementation of ``AbstractDataSet`` to be + An instance of an implementation of ``AbstractDataset`` to be used for all unregistered datasets. """ pass diff --git a/kedro/runner/sequential_runner.py b/kedro/runner/sequential_runner.py index 59f53e7b7a..e944f8af09 100644 --- a/kedro/runner/sequential_runner.py +++ b/kedro/runner/sequential_runner.py @@ -8,7 +8,7 @@ from pluggy import PluginManager -from kedro.io import AbstractDataSet, DataCatalog, MemoryDataset +from kedro.io import AbstractDataset, DataCatalog, MemoryDataset from kedro.pipeline import Pipeline from kedro.runner.runner import AbstractRunner, run_node @@ -29,14 +29,14 @@ def __init__(self, is_async: bool = False): """ super().__init__(is_async=is_async) - def create_default_data_set(self, ds_name: str) -> AbstractDataSet: + def create_default_data_set(self, ds_name: str) -> AbstractDataset: """Factory method for creating the default data set for the runner. Args: ds_name: Name of the missing data set Returns: - An instance of an implementation of AbstractDataSet to be used + An instance of an implementation of AbstractDataset to be used for all unregistered data sets. """ diff --git a/tests/io/test_core.py b/tests/io/test_core.py index 05a3204639..7274a0cd32 100644 --- a/tests/io/test_core.py +++ b/tests/io/test_core.py @@ -9,8 +9,8 @@ import pytest from kedro.io.core import ( - _DEPRECATED_ERROR_CLASSES, - AbstractDataSet, + _DEPRECATED_CLASSES, + AbstractDataset, _parse_filepath, get_filepath_str, ) @@ -34,13 +34,13 @@ @pytest.mark.parametrize("module_name", ["kedro.io", "kedro.io.core"]) -@pytest.mark.parametrize("class_name", _DEPRECATED_ERROR_CLASSES) +@pytest.mark.parametrize("class_name", _DEPRECATED_CLASSES) def test_deprecation(module_name, class_name): with pytest.warns(DeprecationWarning, match=f"{repr(class_name)} has been renamed"): getattr(importlib.import_module(module_name), class_name) -class MyDataSet(AbstractDataSet): +class MyDataSet(AbstractDataset): def __init__(self, var=None): self.var = var diff --git a/tests/io/test_data_catalog.py b/tests/io/test_data_catalog.py index 9c61a9d3ec..9273fa5200 100644 --- a/tests/io/test_data_catalog.py +++ b/tests/io/test_data_catalog.py @@ -11,7 +11,7 @@ from kedro.extras.datasets.pandas import CSVDataSet, ParquetDataSet from kedro.io import ( - AbstractDataSet, + AbstractDataset, DataCatalog, DatasetAlreadyExistsError, DatasetError, @@ -175,7 +175,7 @@ def conflicting_feed_dict(): return {"ds1": ds1, "ds3": 1} -class BadDataset(AbstractDataSet): # pragma: no cover +class BadDataset(AbstractDataset): # pragma: no cover def __init__(self, filepath): self.filepath = filepath raise Exception("Naughty!") # pylint: disable=broad-exception-raised @@ -477,7 +477,7 @@ def test_config_invalid_data_set(self, sane_config): pattern = ( "An exception occurred when parsing config for dataset 'boats':\n" "Dataset type 'kedro.io.data_catalog.DataCatalog' is invalid: " - "all data set types must extend 'AbstractDataSet'" + "all data set types must extend 'AbstractDataset'" ) with pytest.raises(DatasetError, match=re.escape(pattern)): DataCatalog.from_config(**sane_config) diff --git a/tests/io/test_incremental_dataset.py b/tests/io/test_incremental_dataset.py index b1dd974f28..76218b6324 100644 --- a/tests/io/test_incremental_dataset.py +++ b/tests/io/test_incremental_dataset.py @@ -13,7 +13,7 @@ from kedro.extras.datasets.pickle import PickleDataSet from kedro.extras.datasets.text import TextDataSet -from kedro.io import AbstractDataSet, DatasetError, IncrementalDataset +from kedro.io import AbstractDataset, DatasetError, IncrementalDataset from kedro.io.data_catalog import CREDENTIALS_KEY DATASET = "kedro.extras.datasets.pandas.CSVDataSet" @@ -41,7 +41,7 @@ def local_csvs(tmp_path, partitioned_data_pandas): return local_dir -class DummyDataset(AbstractDataSet): # pragma: no cover +class DummyDataset(AbstractDataset): # pragma: no cover def __init__(self, filepath): pass diff --git a/tests/io/test_partitioned_dataset.py b/tests/io/test_partitioned_dataset.py index 97735a7380..453ff1781e 100644 --- a/tests/io/test_partitioned_dataset.py +++ b/tests/io/test_partitioned_dataset.py @@ -266,7 +266,7 @@ def test_invalid_dataset(self, dataset, local_csvs): ( FakeDataset, r"Dataset type 'tests\.io\.test_partitioned_dataset\.FakeDataset' " - r"is invalid\: all data set types must extend 'AbstractDataSet'", + r"is invalid\: all data set types must extend 'AbstractDataset'", ), ({}, "'type' is missing from dataset catalog configuration"), ], diff --git a/tests/runner/test_parallel_runner.py b/tests/runner/test_parallel_runner.py index a74cff8d53..8c301b4216 100644 --- a/tests/runner/test_parallel_runner.py +++ b/tests/runner/test_parallel_runner.py @@ -9,7 +9,7 @@ from kedro.framework.hooks import _create_hook_manager from kedro.io import ( - AbstractDataSet, + AbstractDataset, DataCatalog, DatasetError, LambdaDataset, @@ -228,7 +228,7 @@ def test_unable_to_schedule_all_nodes( runner.run(fan_out_fan_in, catalog) -class LoggingDataset(AbstractDataSet): +class LoggingDataset(AbstractDataset): def __init__(self, log, name, value=None): self.log = log self.name = name diff --git a/tests/runner/test_sequential_runner.py b/tests/runner/test_sequential_runner.py index cf91b76c49..36d2a7f6ac 100644 --- a/tests/runner/test_sequential_runner.py +++ b/tests/runner/test_sequential_runner.py @@ -7,7 +7,7 @@ import pytest from kedro.framework.hooks import _create_hook_manager -from kedro.io import AbstractDataSet, DataCatalog, DatasetError, LambdaDataset +from kedro.io import AbstractDataset, DataCatalog, DatasetError, LambdaDataset from kedro.pipeline import node from kedro.pipeline.modular_pipeline import pipeline as modular_pipeline from kedro.runner import SequentialRunner @@ -125,7 +125,7 @@ def test_unsatisfied_inputs(self, is_async, unfinished_outputs_pipeline, catalog ) -class LoggingDataset(AbstractDataSet): +class LoggingDataset(AbstractDataset): def __init__(self, log, name, value=None): self.log = log self.name = name diff --git a/tests/runner/test_thread_runner.py b/tests/runner/test_thread_runner.py index a9348548a7..a95b9294c8 100644 --- a/tests/runner/test_thread_runner.py +++ b/tests/runner/test_thread_runner.py @@ -6,7 +6,7 @@ import pytest from kedro.framework.hooks import _create_hook_manager -from kedro.io import AbstractDataSet, DataCatalog, DatasetError, MemoryDataset +from kedro.io import AbstractDataset, DataCatalog, DatasetError, MemoryDataset from kedro.pipeline import node from kedro.pipeline.modular_pipeline import pipeline as modular_pipeline from kedro.runner import ThreadRunner @@ -111,7 +111,7 @@ def test_node_returning_none(self): ThreadRunner().run(pipeline, catalog) -class LoggingDataset(AbstractDataSet): +class LoggingDataset(AbstractDataset): def __init__(self, log, name, value=None): self.log = log self.name = name