Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docstring checker. #9848

Merged
merged 19 commits into from
May 28, 2018
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@ repos:
entry: bash ./tools/codestyle/cpplint_pre_commit.hook
language: system
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx)$
- repo: local
hooks:
- id: pylint-doc-string
name: pylint
description: Check python docstring style using docstring_checker.
entry: bash ./tools/codestyle/pylint_pre_commit.hook
language: system
files: \.(py)$
- repo: https://github.com/PaddlePaddle/pre-commit-golang
sha: 8337620115c25ff8333f1b1a493bd031049bd7c0
hooks:
Expand Down
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ before_install:
# protobuf version.
- sudo pip install -r $TRAVIS_BUILD_DIR/python/requirements.txt
- sudo pip install wheel sphinx==1.5.6 recommonmark sphinx-rtd-theme==0.1.9 virtualenv pre-commit LinkChecker
# For pylint dockstring checker
- sudo pip install pylint pytest astroid isort
- |
function timeout() { perl -e 'alarm shift; exec @ARGV' "$@"; }
script:
Expand Down
3 changes: 3 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ RUN pip install pre-commit 'ipython==5.3.0' && \
pip install 'ipykernel==4.6.0' 'jupyter==1.0.0' && \
pip install opencv-python

#For docstring checker
pip install pylint pytest astroid isort

COPY ./python/requirements.txt /root/
RUN pip install -r /root/requirements.txt

Expand Down
333 changes: 333 additions & 0 deletions tools/codestyle/docstring_checker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,333 @@
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""DocstringChecker is used to check python doc string's style."""

import six
import astroid

from pylint.checkers import BaseChecker, utils
from pylint.interfaces import IAstroidChecker


def register(linter):
"""Register checkers."""
linter.register_checker(DocstringChecker(linter))


class Docstring(object):
"""Docstring class holds the parsed doc string elements.
"""

def __init__(self):
from collections import defaultdict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move import to the top?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.d = defaultdict(list) #name->[]
self.clear()

def clear(self):
self.d['Args'] = []
self.d['Examples'] = []
self.d['Returns'] = []
self.d['Raises'] = []
self.args = {} #arg_name->arg_type

def get_level(self, string, indent=' '):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we agreed that the indent is 4, instead of 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come from google python style guide.
And now almost all of the comments use 4 spaces indention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

level = 0
unit_size = len(indent)
while string[:unit_size] == indent:
string = string[unit_size:]
level += 1

return level

def parse(self, doc):
"""parse gets sections from doc
Such as Args, Returns, Raises, Examples s
Args:
doc (string): is the astroid node doc string.
Returns:
True if doc is parsed successfully.
"""
self.clear()

lines = doc.splitlines()
state = ("others", -1)
for l in lines:
c = l.strip()
if len(c) <= 0:
continue

level = self.get_level(l)
if c.startswith("Args:"):
state = ("Args", level)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check with our auto-doc generator and see it can parse the new style

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在我们的文档里边其实规定了参数的风格:https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/dev/api_doc_std_cn.md。
而且现在绝大多数都是按照这种风格来的。

elif c.startswith("Returns:"):
state = ("Returns", level)
elif c.startswith("Raises:"):
state = ("Raises", level)
elif c.startswith("Examples:"):
state = ("Examples", level)
else:
if level > state[1]:
self.d[state[0]].append(c)
continue

state = ("others", -1)
self.d[state[0]].append(c)

self._arg_with_type()
return True

def get_returns(self):
return self.d['Returns']

def get_raises(self):
return self.d['Raises']

def get_examples(self):
return self.d['Examples']

def _arg_with_type(self):
import re

for t in self.d['Args']:
m = re.search('([A-Za-z0-9_-]+)\s{0,4}(\(.+\))\s{0,4}:', t)
if m:
self.args[m.group(1)] = m.group(2)

return self.args


class DocstringChecker(BaseChecker):
"""DosstringChecker is pylint checker to
check docstring style.
"""
__implements__ = (IAstroidChecker, )

POSITIONAL_MESSAGE_ID = 'str-used-on-positional-format-argument'
KEYWORD_MESSAGE_ID = 'str-used-on-keyword-format-argument'

name = 'doc-string-checker'
symbol = "doc-string"
priority = -1
msgs = {
'W9001': ('One line doc string on > 1 lines', symbol + "-one-line",
'Used when a short doc string is on multiple lines'),
'W9002':
('Doc string does not end with "." period', symbol + "-end-with",
'Used when a doc string does not end with a period'),
'W9003': ('All args with their types must be mentioned in doc string',
symbol + "-with-all-args",
'Used when not all arguments are in the doc string '),
'W9005': ('Missing docstring or docstring is too shorter',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shorter->short

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

symbol + "-missing", 'Add docstring longer >=10'),
'W9006': ('Docstring indent error, use 4 space for indent',
symbol + "-indent-error", 'Use 4 space for indent'),
'W9007': ('You should add `Returns` in comments',
symbol + "-with-returns",
'There should be a `Returns` section in comments'),
'W9008': ('You should add `Raises` section in comments',
symbol + "-with-raises",
'There should be a `Raises` section in comments'),
}
options = ()

def visit_functiondef(self, node):
"""visit_functiondef checks Function node docstring style.
Args:
node (astroid.node): The visiting node.
Returns:
True if successful other wise False.
"""

self.check_doc_string(node)

if node.tolineno - node.fromlineno <= 10:
return True

if not node.doc:
return True

doc = Docstring()
doc.parse(node.doc)

self.all_args_in_doc(node, doc)
self.with_returns(node, doc)
self.with_raises(node, doc)

def visit_module(self, node):
self.check_doc_string(node)

def visit_classdef(self, node):
self.check_doc_string(node)

def check_doc_string(self, node):
self.missing_doc_string(node)
self.one_line(node)
self.has_period(node)
self.indent_style(node)

def missing_doc_string(self, node):
if node.tolineno - node.fromlineno <= 10:
return True

if node.doc is None or len(node.doc) < 10:
self.add_message('W9005', node=node, line=node.fromlineno)
return False

# FIXME(gongwb): give the docstring line-no
def indent_style(self, node, indent=4):
"""indent_style checks docstring's indent style
Args:
node (astroid.node): The visiting node.
indent (int): The default indent of style
Returns:
True if successful other wise False.
"""
if node.doc is None:
return True

doc = node.doc
lines = doc.splitlines()

for l in lines:
cur_indent = len(l) - len(l.lstrip())
if cur_indent % indent != 0:
self.add_message('W9006', node=node, line=node.fromlineno)
return False

return True

def one_line(self, node):
"""one_line checks if docstring (len < 40) is on one line.
Args:
node (astroid.node): The node visiting.
Returns:
True if successful otherwise False.
"""

doc = node.doc
if doc is None:
return True

if len(doc) > 40:
return True
elif sum(doc.find(nl) for nl in ('\n', '\r', '\n\r')) == -3:
return True
else:
self.add_message('W9001', node=node, line=node.fromlineno)
return False

return True

def has_period(self, node):
"""has_period checks if one line doc end-with '.' .
Args:
node (astroid.node): the node is visiting.
Returns:
True if successful otherwise False.
"""
if node.doc is None:
return True

if len(node.doc.splitlines()) > 1:
return True

if not node.doc.strip().endswith('.'):
self.add_message('W9002', node=node, line=node.fromlineno)
return False

return True

def with_raises(self, node, doc):
"""with_raises checks if one line doc end-with '.' .
Args:
node (astroid.node): the node is visiting.
doc (Docstring): Docstring object.
Returns:
True if successful otherwise False.
"""

find = False
for t in node.body:
if not isinstance(t, astroid.Raise):
continue

find = True
break

if not find:
return True

if len(doc.get_raises()) == 0:
self.add_message('W9008', node=node, line=node.fromlineno)
return False

return True

def with_returns(self, node, doc):
"""with_returns checks if docstring comments what are returned .
Args:
node (astroid.node): the node is visiting.
doc (Docstring): Docstring object.
Returns:
True if successful otherwise False.
"""

find = False
for t in node.body:
if not isinstance(t, astroid.Return):
continue

find = True
break

if not find:
return True

if len(doc.get_returns()) == 0:
self.add_message('W9007', node=node, line=node.fromlineno)
return False

return True

def all_args_in_doc(self, node, doc):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk offline. I'm excited that you can get args from a python function. We can use this solution to detect API change and enforce API-compatibility.

"""all_args_in_doc checks if arguments are mentioned in doc
Args:
node (astroid.node): the node is visiting.
doc (Docstring): Docstring object
Returns:
True if successful otherwise False.
"""
args = []
for arg in node.args.get_children():
if (not isinstance(arg, astroid.AssignName)) \
or arg.name == "self":
continue
args.append(arg.name)

if len(args) <= 0:
return True

parsed_args = doc.args
if len(args) > 0 and len(parsed_args) <= 0:
print "debug:parsed args: ", parsed_args
self.add_message('W9003', node=node, line=node.fromlineno)
return False

for t in args:
if t not in parsed_args:
print t, " with (type) not in ", parsed_args
self.add_message('W9003', node=node, line=node.fromlineno)
return False

return True
19 changes: 19 additions & 0 deletions tools/codestyle/pylint_pre_commit.hook
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

TOTAL_ERRORS=0


DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
export PYTHONPATH=$DIR:$PYTHONPATH

# The trick to remove deleted files: https://stackoverflow.com/a/2413151
for file in $(git diff --cached --name-status | awk '$1 != "D" {print $2}'); do
pylint --disable=all --load-plugins=docstring_checker \
--enable=doc-string-one-line,doc-string-end-with,doc-string-with-all-args,doc-string-triple-quotes,doc-string-missing,doc-string-indent-error,doc-string-with-returns,doc-string-with-raises $file;
TOTAL_ERRORS=$(expr $TOTAL_ERRORS + $?);
done

#exit $TOTAL_ERRORS
#For now, just warning:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't turn on it in CI, but fails locally? (user can still use git commit --no-verify to skip the check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我想了一下,是不是给使用者一个一致的认知比较好?

Copy link
Contributor

@helinwang helinwang Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我担心如果本地也不报失败,没有人会去花时间修复问题,理论上CI需要强制验证,不过可能我们可以有一个试运行期,本地报错,CI不强制认证。

exit 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great that pylint only requires the author to polish the code style for the codes he/she has touched, instead of covering the whole file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a great idea, but maybe we need consistency in the whole Python file? E.g., the old file is 2-space indentation, the rule could be 4-space indentation, we probably want the whole file to have the same 4-space indentation, rather than the modified lines. Some edge case is hard to handle as well, for example the developer changed the second line of a doc string, does the linter have to modify the first line as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.


Loading