Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pylint docstring checker 不工作原因及可能的解决方案 #47821

Closed
SigureMo opened this issue Nov 9, 2022 · 5 comments
Closed

pylint docstring checker 不工作原因及可能的解决方案 #47821

SigureMo opened this issue Nov 9, 2022 · 5 comments
Assignees
Labels
good first issue PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc status/discussing 需求调研中 type/others 其他问题

Comments

@SigureMo
Copy link
Member

SigureMo commented Nov 9, 2022

问题描述 Please describe your issue

目前 Paddle 的 pre-commit hooks 里是有 pylint 的,见

- repo: local
hooks:
- id: pylint-doc-string
name: pylint
description: Check python docstring style using docstring_checker.
entry: bash ./tools/codestyle/pylint_pre_commit.hook
language: system
files: \.(py)$

这是一个 local hook,hook 源码见

#!/bin/bash
TOTAL_ERRORS=0
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
export PYTHONPATH=$DIR:$PYTHONPATH
readonly VERSION="2.12.0"
version=$(pylint --version | grep 'pylint')
if ! [[ $version == *"$VERSION"* ]]; then
pip install pylint==2.12.0
fi
# The trick to remove deleted files: https://stackoverflow.com/a/2413151
for file in $(git diff --name-status | awk '$1 != "D" {print $2}'); do
pylint --disable=all --load-plugins=docstring_checker \
--enable=doc-string-one-line,doc-string-end-with,doc-string-with-all-args,doc-string-triple-quotes,doc-string-missing,doc-string-indent-error,doc-string-with-returns,doc-string-with-raises $file;
TOTAL_ERRORS=$(expr $TOTAL_ERRORS + $?);
done
exit $TOTAL_ERRORS
#For now, just warning:
#exit 0

其实就是关闭所有的 pylint 内置规则,仅仅启用了本地自己写的 docstring_checker 规则

https://github.com/PaddlePaddle/Paddle/blob/develop/tools/codestyle/docstring_checker.py

具体规则可见:

msgs = {
'W9001': (
'One line doc string on > 1 lines',
symbol + "-one-line",
'Used when a short doc string is on multiple lines',
),
'W9002': (
'Doc string does not end with "." period',
symbol + "-end-with",
'Used when a doc string does not end with a period',
),
'W9003': (
'All args with their types must be mentioned in doc string %s',
symbol + "-with-all-args",
'Used when not all arguments are in the doc string ',
),
'W9005': (
'Missing docstring or docstring is too short',
symbol + "-missing",
'Add docstring longer >=10',
),
'W9006': (
'Docstring indent error, use 4 space for indent',
symbol + "-indent-error",
'Use 4 space for indent',
),
'W9007': (
'You should add `Returns` in comments',
symbol + "-with-returns",
'There should be a `Returns` section in comments',
),
'W9008': (
'You should add `Raises` section in comments',
symbol + "-with-raises",
'There should be a `Raises` section in comments',
),
}
options = ()

简单来说就是对函数有无 docstring、docstring 长度、包含字段等等做出了一些限制(比如含 return 语句的函数、类其 docstring 必须包含 Returns 字段,包含 raise 语句的则必须包含 Raises 等等,其实这个已经与现有的文档规范不一致了)

该 hook 于 #9848 最开始引入并在本地仅起到提示的作用,而在 #11351 正式启用,当时应该是可以在 CI 正确拦截有问题的 docstring 的

之后不清楚什么时候 CI 逻辑修改了导致该 hook 在 CI 上完全失效,cpplint hook 也有相似问题(见 #46102 (comment) ),这个问题在 #46136 修复了,其实 cpplint 问题还好,至少本地是 work 的,只是 CI 不 work 而已,而 pylint hook 因为直接使用 git diff 获得改动的文件,在 git 工作区干净的情况下是不会有任何输出的,而我们 commit 时绝大多数情况工作区都是干净的,简单来说,就是本地、CI 基本都不 work

因此 pylint docstring 的存量问题……很吓人,我们可以通过以下方式大概看一下存量:

pip install pylint==2.12.0
cd tools/codestyle
pylint --disable=all --load-plugins=docstring_checker \
    --enable=doc-string-one-line,doc-string-end-with,doc-string-with-all-args,doc-string-triple-quotes,doc-string-missing,doc-string-indent-error,doc-string-with-returns,doc-string-with-raises ../../python

唔,也就 17000+ 行输出吧……

这样的话直接启用肯定不可取,但如果修改一下脚本也许可以考虑一下,比如:

  • 单测不需要强制有 docstring
  • 强制有 Raises 可以删除或者改成强制没有 Raises
  • fluid 目录也可以暂时不管,像 flake8 做的那样
  • 等等等等

或者我们可以考虑直接按照新的需求重写一下

不过就算启用,修存量也是一件痛苦的事情,对于 docstring,我还没有找到一个比较合适的自动化修复工具,这意味着可能很大程度依赖于手动修复 / 自己写脚本

此外说一个算是有点相关的问题,Call-for-contribution - IDEA:iScan 流水线退场 中提到了可能会考虑 PR-CI-iScan-Python 的替代方式,如果考虑在 pre-commit 中的 pylint 来替代的话,就需要考虑开启其他规则,因此想要用 pylint 替代该流水线可能需要先修复下 pylint hook 的问题

@paddle-bot

This comment was marked as off-topic.

@paddle-bot paddle-bot bot added the PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc label Nov 9, 2022
@SigureMo SigureMo assigned Ligoml and unassigned pangyoki Nov 9, 2022
@SigureMo
Copy link
Member Author

SigureMo commented Nov 9, 2022

@Ligoml 这里我整理了下 pylint 存在的问题及可能的修复方式,可能对规范 docstring 格式 / 内容有所帮助

PS: 这个原本是想和 cpplint 一起修的,flake8 tracking issue 里也一直放着这个的 TODO,但也因为上面所说的,存量很难修,所以暂时没有管

@luotao1 luotao1 self-assigned this Nov 10, 2022
@Ligoml
Copy link
Contributor

Ligoml commented Nov 10, 2022

收到,很有用的信息~

@Ligoml Ligoml added status/discussing 需求调研中 and removed status/new-issue 新建 labels Nov 10, 2022
@jzhang533
Copy link
Contributor

jzhang533 commented Nov 10, 2022

So sad to know this. :(

除了在pre-commit阶段做检查之外,我们另外一个值得做的事情是吧doctest 或者 xdoctest,这类工具引入到paddle。相比于现在我们的流水线中,简单的把示例代码抽取出来,直接运行,看是否运行成功的检查来说,这类工具能提供更加灵活(如:运行前设置不同的运行环境),更精准的测试(直接检查运行出来的结果是否符合设定的预期)。

Copy link

paddle-bot bot commented Jan 9, 2024

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复,我们将关闭这个issue/pr。
若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc status/discussing 需求调研中 type/others 其他问题
Projects
None yet
Development

No branches or pull requests

5 participants