Skip to content

Commit

Permalink
Add support for pydocstyle and test on abc.py (#7985)
Browse files Browse the repository at this point in the history
This PR is related to #202 and adds support for using pydocstyle to lint docstrings. It adds flake8 configuration, a pre-commit hook, and sets up the flake8 config to enable incremental application of docstring styling. As an added bonus, because of the way that pydocstyle chooses not to lint docstrings on internal objects (those prefixed with an underscore), as more files are linted we can evaluate what constitutes public API for `cudf`. Once a file has been linted and is added to the regex filter in the config, ongoing CI checks will prevent regressions in the documentation.

As an initial test, this PR lints the abc.py file containing the Serializable class (and updates the docstrings accordingly). We may want to mark that entire class as internal in any case, but it's important enough that it needs to be documented in any case (and perhaps we could settle on a standard docstring style for internal-facing code as well while we're at it).

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Christopher Harris (https://github.com/cwharris)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Keith Kraus (https://github.com/kkraus14)

URL: #7985
  • Loading branch information
vyasr authored Apr 29, 2021
1 parent 5a012e5 commit 7f0ad1d
Show file tree
Hide file tree
Showing 4 changed files with 112 additions and 26 deletions.
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ repos:
entry: mypy --config-file=python/cudf/setup.cfg python/cudf/cudf
language: system
types: [python]
- repo: https://github.com/pycqa/pydocstyle
rev: 6.0.0
hooks:
- id: pydocstyle
args: ["--config=python/.flake8"]


default_language_version:
python: python3
16 changes: 14 additions & 2 deletions ci/checks/style.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2018, NVIDIA CORPORATION.
# Copyright (c) 2018-2021, NVIDIA CORPORATION.
#####################
# cuDF Style Tester #
#####################
Expand Down Expand Up @@ -33,6 +33,10 @@ FLAKE_CYTHON_RETVAL=$?
MYPY_CUDF=`mypy --config=python/cudf/setup.cfg python/cudf/cudf`
MYPY_CUDF_RETVAL=$?

# Run pydocstyle and get results/return code
PYDOCSTYLE=`pydocstyle --config=python/.flake8 python`
PYDOCSTYLE_RETVAL=$?

# Run clang-format and check for a consistent code format
CLANG_FORMAT=`python cpp/scripts/run-clang-format.py 2>&1`
CLANG_FORMAT_RETVAL=$?
Expand Down Expand Up @@ -78,6 +82,14 @@ else
echo -e "\n\n>>>> PASSED: mypy style check\n\n"
fi

if [ "$PYDOCSTYLE_RETVAL" != "0" ]; then
echo -e "\n\n>>>> FAILED: pydocstyle style check; begin output\n\n"
echo -e "$PYDOCSTYLE"
echo -e "\n\n>>>> FAILED: pydocstyle style check; end output\n\n"
else
echo -e "\n\n>>>> PASSED: pydocstyle style check\n\n"
fi

if [ "$CLANG_FORMAT_RETVAL" != "0" ]; then
echo -e "\n\n>>>> FAILED: clang format check; begin output\n\n"
echo -e "$CLANG_FORMAT"
Expand All @@ -91,7 +103,7 @@ HEADER_META=`ci/checks/headers_test.sh`
HEADER_META_RETVAL=$?
echo -e "$HEADER_META"

RETVALS=($ISORT_RETVAL $BLACK_RETVAL $FLAKE_RETVAL $FLAKE_CYTHON_RETVAL $CLANG_FORMAT_RETVAL $HEADER_META_RETVAL $MYPY_CUDF_RETVAL)
RETVALS=($ISORT_RETVAL $BLACK_RETVAL $FLAKE_RETVAL $FLAKE_CYTHON_RETVAL $PYDOCSTYLE_RETVAL $CLANG_FORMAT_RETVAL $HEADER_META_RETVAL $MYPY_CUDF_RETVAL)
IFS=$'\n'
RETVAL=`echo "${RETVALS[*]}" | sort -nr | head -n1`

Expand Down
8 changes: 7 additions & 1 deletion python/.flake8
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020, NVIDIA CORPORATION.
# Copyright (c) 2020-2021, NVIDIA CORPORATION.

[flake8]
exclude = __init__.py
Expand All @@ -8,3 +8,9 @@ ignore =
# whitespace before :
E203

[pydocstyle]
match = ^.*abc\.py$
# Due to https://github.com/PyCQA/pydocstyle/issues/363, we must exclude rather than include using match-dir.
match-dir = ^(?!ci|cpp|python/dask_cudf|python/cudf_kafka|python/custreamz).*$
# In addition to numpy style, we additionally ignore magic methods (D105) and newlines before docstrings (D204).
add-ignore = D105, D204
108 changes: 85 additions & 23 deletions python/cudf/cudf/core/abc.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Copyright (c) 2020, NVIDIA CORPORATION.
# Copyright (c) 2020-2021, NVIDIA CORPORATION.
"""Common abstract base classes for cudf."""

import abc
import sys
Expand All @@ -18,23 +19,78 @@


class Serializable(abc.ABC):
"""A serializable object composed of device memory buffers.
This base class defines a standard serialization protocol for objects
encapsulating device memory buffers. Serialization proceeds by copying
device data onto the host, then returning it along with suitable metadata
for reconstruction of the object. Deserialization performs the reverse
process, copying the serialized data from the host to new device buffers.
Subclasses must define the abstract methods :meth:`~.serialize` and
:meth:`~.deserialize`. The former defines the conversion of the object
into a representative collection of metadata and data buffers, while the
latter converts back from that representation into an equivalent object.
"""

@abstractmethod
def serialize(self):
"""Generate an equivalent serializable representation of an object.
Subclasses must implement this method to define how the attributes of
the object are converted into a serializable representation. A common
solution is to construct a list containing device buffer attributes in
a well-defined order that can be reinterpreted upon deserialization,
then place all other lightweight attributes into the metadata
dictionary.
Returns
-------
Tuple[Dict, List]
The first element of the returned tuple is a dict containing any
serializable metadata required to reconstruct the object. The
second element is a list containing the device data buffers
or memoryviews of the object.
:meta private:
"""
pass

@classmethod
@abstractmethod
def deserialize(cls, header, frames):
"""Generate an object from a serialized representation.
Subclasses must implement this method to define how objects of that
class can be constructed from a serialized representation generalized
by :meth:`serialize`.
Parameters
----------
header : dict
The metadata required to reconstruct the object.
frames : list
The Buffers or memoryviews that the object should contain.
Returns
-------
Serializable
A new instance of `cls` (a subclass of `Serializable`) equivalent
to the instance that was serialized to produce the header and
frames.
:meta private:
"""
pass

def device_serialize(self):
"""Converts the object into a header and list of Buffer/memoryview
objects for file storage or network transmission.
"""Serialize data and metadata associated with device memory.
Returns
-------
header : dictionary containing any serializable metadata
frames : list of Buffer or memoryviews, commonly of length one
header : dict
The metadata required to reconstruct the object.
frames : list
The Buffers or memoryviews that the object should contain.
:meta private:
"""
Expand All @@ -51,20 +107,24 @@ def device_serialize(self):

@classmethod
def device_deserialize(cls, header, frames):
"""Convert serialized header and frames back
into respective Object Type
"""Perform device-side deserialization tasks.
The primary purpose of this method is the creation of device memory
buffers from host buffers where necessary.
Parameters
----------
cls : class of object
header : dict
dictionary containing any serializable metadata
frames : list of Buffer or memoryview objects
The metadata required to reconstruct the object.
frames : list
The Buffers or memoryviews that the object should contain.
Returns
-------
Deserialized Object of type cls extracted
from frames and header
Serializable
A new instance of `cls` (a subclass of `Serializable`) equivalent
to the instance that was serialized to produce the header and
frames.
:meta private:
"""
Expand All @@ -84,13 +144,14 @@ def device_deserialize(cls, header, frames):
return obj

def host_serialize(self):
"""Converts the object into a header and list of memoryview
objects for file storage or network transmission.
"""Serialize data and metadata associated with host memory.
Returns
-------
header : dictionary containing any serializable metadata
frames : list of memoryviews, commonly of length one
header : dict
The metadata required to reconstruct the object.
frames : list
The Buffers or memoryviews that the object should contain.
:meta private:
"""
Expand All @@ -104,20 +165,21 @@ def host_serialize(self):

@classmethod
def host_deserialize(cls, header, frames):
"""Convert serialized header and frames back
into respective Object Type
"""Perform device-side deserialization tasks.
Parameters
----------
cls : class of object
header : dict
dictionary containing any serializable metadata
frames : list of memoryview objects
The metadata required to reconstruct the object.
frames : list
The Buffers or memoryviews that the object should contain.
Returns
-------
Deserialized Object of type cls extracted
from frames and header
Serializable
A new instance of `cls` (a subclass of `Serializable`) equivalent
to the instance that was serialized to produce the header and
frames.
:meta private:
"""
Expand Down

0 comments on commit 7f0ad1d

Please sign in to comment.