kedro-org · deepyaman · Aug 14, 2023 · Jun 29, 2023 · Jun 29, 2023 · Jun 29, 2023
@@ -20,6 +20,12 @@
 ## Breaking changes to the API
 
 ## Upcoming deprecations for Kedro 0.19.0
+* Renamed abstract dataset classes, in accordance with the [Kedro lexicon](https://github.com/kedro-org/kedro/wiki/Kedro-documentation-style-guide#kedro-lexicon). Dataset classes ending with "DataSet" are deprecated and will be removed in 0.19.0. Note that all of the below classes are also importable from `kedro.io`; only the module where they are defined is listed as the location.
+
+| Type                       | Deprecated Alias           | Location        |
+| -------------------------- | -------------------------- | --------------- |
+| `AbstractDataset`          | `AbstractDataSet`          | `kedro.io.core` |
+| `AbstractVersionedDataset` | `AbstractVersionedDataSet` | `kedro.io.core` |
 
 # Release 0.18.11
 

@@ -578,7 +578,7 @@ gear = cars["gear"].values
 The following steps happened behind the scenes when `load` was called:
 
 - The value `cars` was located in the Data Catalog
-- The corresponding `AbstractDataSet` object was retrieved
+- The corresponding `AbstractDataset` object was retrieved
 - The `load` method of this dataset was called
 - This `load` method delegated the loading to the underlying pandas `read_csv` function
 

@@ -1,7 +1,7 @@
 # Kedro IO
 
 
-In this tutorial, we cover advanced uses of [the Kedro IO module](/kedro.io) to understand the underlying implementation. The relevant API documentation is [kedro.io.AbstractDataSet](/kedro.io.AbstractDataSet) and [kedro.io.DataSetError](/kedro.io.DataSetError).
+In this tutorial, we cover advanced uses of [the Kedro IO module](/kedro.io) to understand the underlying implementation. The relevant API documentation is [kedro.io.AbstractDataset](/kedro.io.AbstractDataset) and [kedro.io.DataSetError](/kedro.io.DataSetError).
 
 ## Error handling
 
@@ -21,9 +21,9 @@ except DataSetError:
 ```
 
 
-## AbstractDataSet
+## AbstractDataset
 
-To understand what is going on behind the scenes, you should study the [AbstractDataSet interface](/kedro.io.AbstractDataSet). `AbstractDataSet` is the underlying interface that all datasets extend. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataSet` implementation.
+To understand what is going on behind the scenes, you should study the [AbstractDataset interface](/kedro.io.AbstractDataset). `AbstractDataset` is the underlying interface that all datasets extend. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataset` implementation.
 
 If you have a dataset called `parts`, you can make direct calls to it like so:
 
@@ -33,13 +33,13 @@ parts_df = parts.load()
 
 We recommend using a `DataCatalog` instead (for more details, see [the `DataCatalog` documentation](../data/data_catalog.md)) as it has been designed to make all datasets available to project members.
 
-For contributors, if you would like to submit a new dataset, you must extend the `AbstractDataSet`. For a complete guide, please read [the section on custom datasets](../extend_kedro/custom_datasets.md).
+For contributors, if you would like to submit a new dataset, you must extend the `AbstractDataset`. For a complete guide, please read [the section on custom datasets](../extend_kedro/custom_datasets.md).
 
 
 ## Versioning
 
 In order to enable versioning, you need to update the `catalog.yml` config file and set the `versioned` attribute to `true` for the given dataset. If this is a custom dataset, the implementation must also:
-  1. extend `kedro.io.core.AbstractVersionedDataSet` AND
+  1. extend `kedro.io.core.AbstractVersionedDataset` AND
   2. add `version` namedtuple as an argument to its `__init__` method AND
   3. call `super().__init__()` with positional arguments `filepath`, `version`, and, optionally, with `glob` and `exists` functions if it uses a non-local filesystem (see [kedro_datasets.pandas.CSVDataSet](/kedro_datasets.pandas.CSVDataSet) as an example) AND
   4. modify its `_describe`, `_load` and `_save` methods respectively to support versioning (see [`kedro_datasets.pandas.CSVDataSet`](/kedro_datasets.pandas.CSVDataSet) for an example implementation)
@@ -55,10 +55,10 @@ from pathlib import Path, PurePosixPath
 
 import pandas as pd
 
-from kedro.io import AbstractVersionedDataSet
+from kedro.io import AbstractVersionedDataset
 
 
-class MyOwnDataSet(AbstractVersionedDataSet):
+class MyOwnDataSet(AbstractVersionedDataset):
     def __init__(self, filepath, version, param1, param2=True):
         super().__init__(PurePosixPath(filepath), version)
         self._param1 = param1
@@ -314,7 +314,7 @@ Here is an exhaustive list of the arguments supported by `PartitionedDataSet`:
 | Argument          | Required                       | Supported types                                  | Description                                                                                                                                                                                                                                   |
 | ----------------- | ------------------------------ | ------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `path`            | Yes                            | `str`                                            | Path to the folder containing partitioned data. If path starts with the protocol (e.g., `s3://`) then the corresponding `fsspec` concrete filesystem implementation will be used. If protocol is not specified, local filesystem will be used |
-| `dataset`         | Yes                            | `str`, `Type[AbstractDataSet]`, `Dict[str, Any]` | Underlying dataset definition, for more details see the section below                                                                                                                                                                         |
+| `dataset`         | Yes                            | `str`, `Type[AbstractDataset]`, `Dict[str, Any]` | Underlying dataset definition, for more details see the section below                                                                                                                                                                         |
 | `credentials`     | No                             | `Dict[str, Any]`                                 | Protocol-specific options that will be passed to `fsspec.filesystemcall`, for more details see the section below                                                                                                                              |
 | `load_args`       | No                             | `Dict[str, Any]`                                 | Keyword arguments to be passed into `find()` method of the corresponding filesystem implementation                                                                                                                                            |
 | `filepath_arg`    | No                             | `str` (defaults to `filepath`)                   | Argument name of the underlying dataset initializer that will contain a path to an individual partition                                                                                                                                       |
@@ -326,7 +326,7 @@ Dataset definition should be passed into the `dataset` argument of the `Partitio
 
 ##### Shorthand notation
 
-Requires you only to specify a class of the underlying dataset either as a string (e.g. `pandas.CSVDataSet` or a fully qualified class path like `kedro_datasets.pandas.CSVDataSet`) or as a class object that is a subclass of the [AbstractDataSet](/kedro.io.AbstractDataSet).
+Requires you only to specify a class of the underlying dataset either as a string (e.g. `pandas.CSVDataSet` or a fully qualified class path like `kedro_datasets.pandas.CSVDataSet`) or as a class object that is a subclass of the [AbstractDataset](/kedro.io.AbstractDataset).
 
 ##### Full notation
 

@@ -44,14 +44,14 @@ from kedro.framework.hooks.manager import (
     _register_hooks_setuptools,
 )
 from kedro.framework.project import settings
-from kedro.io import AbstractDataSet, DataCatalog
+from kedro.io import AbstractDataset, DataCatalog
 from kedro.pipeline import Pipeline
 from kedro.pipeline.node import Node
 from kedro.runner import AbstractRunner, run_node
 from pluggy import PluginManager
 
 
-class _DaskDataSet(AbstractDataSet):
+class _DaskDataSet(AbstractDataset):
     """``_DaskDataSet`` publishes/gets named datasets to/from the Dask
     scheduler."""
 

@@ -24,13 +24,13 @@ Consult the [Pillow documentation](https://pillow.readthedocs.io/en/stable/insta
 
 ## The anatomy of a dataset
 
-At the minimum, a valid Kedro dataset needs to subclass the base [AbstractDataSet](/kedro.io.AbstractDataSet) and provide an implementation for the following abstract methods:
+At the minimum, a valid Kedro dataset needs to subclass the base [AbstractDataset](/kedro.io.AbstractDataset) and provide an implementation for the following abstract methods:
 
 * `_load`
 * `_save`
 * `_describe`
 
-`AbstractDataSet` is generically typed with an input data type for saving data, and an output data type for loading data.
+`AbstractDataset` is generically typed with an input data type for saving data, and an output data type for loading data.
 This typing is optional however, and defaults to `Any` type.
 
 Here is an example skeleton for `ImageDataSet`:
@@ -43,10 +43,10 @@ from typing import Any, Dict
 
 import numpy as np
 
-from kedro.io import AbstractDataSet
+from kedro.io import AbstractDataset
 
 
-class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]):
+class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]):
     """``ImageDataSet`` loads / save image data from a given filepath as `numpy` array using Pillow.
 
     Example:
@@ -108,11 +108,11 @@ import fsspec
 import numpy as np
 from PIL import Image
 
-from kedro.io import AbstractDataSet
+from kedro.io import AbstractDataset
 from kedro.io.core import get_filepath_str, get_protocol_and_path
 
 
-class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]):
+class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]):
     def __init__(self, filepath: str):
         """Creates a new instance of ImageDataSet to load / save image data for given filepath.
 
@@ -169,7 +169,7 @@ Similarly, we can implement the `_save` method as follows:
 
 
 ```python
-class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]):
+class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]):
     def _save(self, data: np.ndarray) -> None:
         """Saves image data to the specified filepath."""
         # using get_filepath_str ensures that the protocol and path are appended correctly for different filesystems
@@ -193,7 +193,7 @@ You can open the file to verify that the data was written back correctly.
 The `_describe` method is used for printing purposes. The convention in Kedro is for the method to return a dictionary describing the attributes of the dataset.
 
 ```python
-class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]):
+class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]):
     def _describe(self) -> Dict[str, Any]:
         """Returns a dict that describes the attributes of the dataset."""
         return dict(filepath=self._filepath, protocol=self._protocol)
@@ -214,11 +214,11 @@ import fsspec
 import numpy as np
 from PIL import Image
 
-from kedro.io import AbstractDataSet
+from kedro.io import AbstractDataset
 from kedro.io.core import get_filepath_str, get_protocol_and_path
 
 
-class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]):
+class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]):
     """``ImageDataSet`` loads / save image data from a given filepath as `numpy` array using Pillow.
 
     Example:
@@ -301,7 +301,7 @@ $ ls -la data/01_raw/pokemon-images-and-types/images/images/*.png | wc -l
 Versioning doesn't work with `PartitionedDataSet`. You can't use both of them at the same time.
 ```
 To add [Versioning](../data/kedro_io.md#versioning) support to the new dataset we need to extend the
- [AbstractVersionedDataSet](/kedro.io.AbstractVersionedDataSet) to:
+ [AbstractVersionedDataset](/kedro.io.AbstractVersionedDataset) to:
 
 * Accept a `version` keyword argument as part of the constructor
 * Adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively
@@ -320,11 +320,11 @@ import fsspec
 import numpy as np
 from PIL import Image
 
-from kedro.io import AbstractVersionedDataSet
+from kedro.io import AbstractVersionedDataset
 from kedro.io.core import get_filepath_str, get_protocol_and_path, Version
 
 
-class ImageDataSet(AbstractVersionedDataSet[np.ndarray, np.ndarray]):
+class ImageDataSet(AbstractVersionedDataset[np.ndarray, np.ndarray]):
     """``ImageDataSet`` loads / save image data from a given filepath as `numpy` array using Pillow.
 
     Example:
@@ -391,14 +391,14 @@ The difference between the original `ImageDataSet` and the versioned `ImageDataS
  import numpy as np
  from PIL import Image
 
--from kedro.io import AbstractDataSet
+-from kedro.io import AbstractDataset
 -from kedro.io.core import get_filepath_str, get_protocol_and_path
-+from kedro.io import AbstractVersionedDataSet
++from kedro.io import AbstractVersionedDataset
 +from kedro.io.core import get_filepath_str, get_protocol_and_path, Version
 
 
--class ImageDataSet(AbstractDataSet[np.ndarray, np.ndarray]):
-+class ImageDataSet(AbstractVersionedDataSet[np.ndarray, np.ndarray]):
+-class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]):
++class ImageDataSet(AbstractVersionedDataset[np.ndarray, np.ndarray]):
      """``ImageDataSet`` loads / save image data from a given filepath as `numpy` array using Pillow.
 
      Example:
@@ -537,7 +537,7 @@ These parameters are then passed to the dataset constructor so you can use them
 import fsspec
 
 
-class ImageDataSet(AbstractVersionedDataSet):
+class ImageDataSet(AbstractVersionedDataset):
     def __init__(
         self,
         filepath: str,

@@ -196,7 +196,7 @@ When you are ready to submit your code:
 ## Supported Kedro plugins
 
 - [Kedro-Datasets](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets), a collection of all of Kedro's data connectors. These data
-connectors are implementations of the `AbstractDataSet`
+connectors are implementations of the `AbstractDataset`
 - [Kedro-Docker](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-docker), a tool for packaging and shipping Kedro projects within containers
 - [Kedro-Airflow](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-airflow), a tool for converting your Kedro project into an Airflow project
 - [Kedro-Viz](https://github.com/kedro-org/kedro-viz), a tool for visualising your Kedro pipelines

@@ -11,8 +11,8 @@ kedro.io
    :toctree:
    :template: autosummary/class.rst
 
-   kedro.io.AbstractDataSet
-   kedro.io.AbstractVersionedDataSet
+   kedro.io.AbstractDataset
+   kedro.io.AbstractVersionedDataset
    kedro.io.CachedDataSet
    kedro.io.CachedDataset
    kedro.io.DataCatalog

@@ -203,14 +203,14 @@ import fsspec
 import pandas as pd
 
 from kedro.io.core import (
-    AbstractVersionedDataSet,
+    AbstractVersionedDataset,
     Version,
     get_filepath_str,
     get_protocol_and_path,
 )
 
 
-class ChunkWiseCSVDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]):
+class ChunkWiseCSVDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame]):
     """``ChunkWiseCSVDataSet`` loads/saves data from/to a CSV file using an underlying
     filesystem. It uses pandas to handle the CSV file.
     """

@@ -57,13 +57,13 @@ If the built-in Kedro runners do not meet your requirements, you can also define
 
 ```python
 # in src/<package_name>/runner.py
-from kedro.io import AbstractDataSet, DataCatalog, MemoryDataSet
+from kedro.io import AbstractDataset, DataCatalog, MemoryDataSet
 from kedro.pipeline import Pipeline
 from kedro.runner.runner import AbstractRunner
 from pluggy import PluginManager
 
 
-from kedro.io import AbstractDataSet, DataCatalog, MemoryDataSet
+from kedro.io import AbstractDataset, DataCatalog, MemoryDataSet
 from kedro.pipeline import Pipeline
 from kedro.runner.runner import AbstractRunner
 
@@ -74,13 +74,13 @@ class DryRunner(AbstractRunner):
     neccessary data exists.
     """
 
-    def create_default_data_set(self, ds_name: str) -> AbstractDataSet:
+    def create_default_data_set(self, ds_name: str) -> AbstractDataset:
         """Factory method for creating the default data set for the runner.
 
         Args:
             ds_name: Name of the missing data set
         Returns:
-            An instance of an implementation of AbstractDataSet to be used
+            An instance of an implementation of AbstractDataset to be used
             for all unregistered data sets.
 
         """

@@ -4,9 +4,9 @@
 > `kedro.extras.datasets` is deprecated and will be removed in Kedro 0.19,
 > install `kedro-datasets` instead by running `pip install kedro-datasets`.
 
-Welcome to `kedro.extras.datasets`, the home of Kedro's data connectors. Here you will find `AbstractDataSet` implementations created by QuantumBlack and external contributors.
+Welcome to `kedro.extras.datasets`, the home of Kedro's data connectors. Here you will find `AbstractDataset` implementations created by QuantumBlack and external contributors.
 
-## What `AbstractDataSet` implementations are supported?
+## What `AbstractDataset` implementations are supported?
 
 We support a range of data descriptions, including CSV, Excel, Parquet, Feather, HDF5, JSON, Pickle, SQL Tables, SQL Queries, Spark DataFrames and more. We even allow support for working with images.
 
@@ -16,7 +16,7 @@ These data descriptions are supported with the APIs of `pandas`, `spark`, `netwo
 
 Here is a full list of [supported data descriptions and APIs](https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.html).
 
-## How can I create my own `AbstractDataSet` implementation?
+## How can I create my own `AbstractDataset` implementation?
 
 
-Take a look at our [instructions on how to create your own `AbstractDataSet` implementation](https://kedro.readthedocs.io/en/stable/extend_kedro/custom_datasets.html).
+Take a look at our [instructions on how to create your own `AbstractDataset` implementation](https://kedro.readthedocs.io/en/stable/extend_kedro/custom_datasets.html).
@@ -1,5 +1,5 @@
 """``kedro.extras.datasets`` is where you can find all of Kedro's data connectors.
-These data connectors are implementations of the ``AbstractDataSet``.
+These data connectors are implementations of the ``AbstractDataset``.
 
 .. warning::
 

@@ -6,14 +6,14 @@
 import requests
 from requests.auth import AuthBase
 
-from kedro.io.core import AbstractDataSet, DatasetError
+from kedro.io.core import AbstractDataset, DatasetError
 
 # NOTE: kedro.extras.datasets will be removed in Kedro 0.19.0.
 # Any contribution to datasets should be made in kedro-datasets
 # in kedro-plugins (https://github.com/kedro-org/kedro-plugins)
 
 
-class APIDataSet(AbstractDataSet[None, requests.Response]):
+class APIDataSet(AbstractDataset[None, requests.Response]):
     """``APIDataSet`` loads the data from HTTP(S) APIs.
     It uses the python requests library: https://requests.readthedocs.io/en/latest/
 

@@ -1,4 +1,4 @@
-"""``AbstractDataSet`` implementation to read/write from/to a sequence file."""
+"""``AbstractDataset`` implementation to read/write from/to a sequence file."""
 
 __all__ = ["BioSequenceDataSet"]