[Add]将 xdoctest 引入到飞桨框架工作流中 #540

megemini · 2023-05-21T06:22:00Z

PR types

New features

PR changes

Docs

Describe

[used AI Studio]

中国软件开源创新大赛：飞桨框架任务挑战赛

赛题五：将 xdoctest 引入到飞桨框架工作流中
增加设计文档《将 xdoctest 引入到飞桨框架工作流中》

@SigureMo @Ligoml

请评审！谢谢！

paddle-bot · 2023-05-21T06:22:04Z

你的PR提交成功，感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备，具体请参考示例和模版。
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

megemini · 2023-05-21T06:23:33Z

另外：

Paddle 的 CI 流水线，尤其是百度效率云 iPipe 的具体配置，我还不太清楚。
目前分析，已有代码的格式转换 google 样式，好像大部分要人工参与，不知道有没有什么好的方式？！

还请帮忙指导一下，谢谢！

SigureMo

很棒的 RFC！不过有些细节需要稍微调整下～

SigureMo · 2023-05-21T06:36:57Z

rfcs/Docs/将 xdoctest 引入到飞桨框架工作流中.md

+|提交作者 | megemini (柳顺)             |
+|提交时间 | 2023-05-21                     |
+|版本号 | V1.0                           |
+|依赖飞桨版本 | paddlepaddle>2.4               |


应该在 develop 分支上开发

SigureMo · 2023-05-21T06:42:24Z

rfcs/Docs/将 xdoctest 引入到飞桨框架工作流中.md

+
+### 2.1 文档建设
+
+更新 Paddle 贡献指南中的文档： [开发 API Python 端](https://www.paddlepaddle.org.cn/documentation/docs/zh/dev_guides/api_contributing_guides/new_python_api_cn.html#api-python) 。以此规范后续代码的开发。


API 文档书写规范也应同步修改～

SigureMo · 2023-05-21T06:44:19Z

rfcs/Docs/将 xdoctest 引入到飞桨框架工作流中.md

+
+更新 Paddle 贡献指南中的文档： [开发 API Python 端](https://www.paddlepaddle.org.cn/documentation/docs/zh/dev_guides/api_contributing_guides/new_python_api_cn.html#api-python) 。以此规范后续代码的开发。
+
+添加 `Example` 示例代码的写作要求，要求符合 `xdoctest` 中的 `google` style，即，在示例 `Example` 中代码需要以 `>>>` 开头。且保留目前的 `code-block` 提示，从而不影响中文文档的生成工作。


在示例 Example 中代码需要以 >>> 开头。且保留目前的 code-block 提示

很不错的方案～不过需要确认下，带有 code-block 这种方式是兼容 xdoctest 的嘛？

具体 xdoctest 的源码 parser.py 我只是大体看了一下，目前咱们的 .. code-block:: python 在 xdoctest 应该是当作 TEXT 来处理的，所以没啥影响。

用一个简单的例子可以验证一下：

def test(a): """this is docstring... Examples: .. code-block:: python this is a test... >>> a = 3 >>> print(a) 3 """ pass

得到结果是可以的：

$ xdoctest --style=google test_simple.py ===================================== _ _ ___ ____ ____ ___ ____ ____ ___ \/ | \ | | | | |___ [__ | _/\_ |__/ |__| |___ | |___ ___] | ===================================== Start doctest_module('test_simple.py') Listing tests gathering tests running 1 test(s) ====== <exec> ====== * DOCTEST : test_simple.py::test:0, line 5 <- wrt source file DOCTEST SOURCE 6 >>> a = 3 7 >>> print(a) 3 DOCTEST STDOUT/STDERR 3 DOCTEST RESULT * SUCCESS: test_simple.py::test:0 ====== </exec> ====== ============ === 1 passed in 0.09 seconds ===

SigureMo · 2023-05-21T06:57:54Z

rfcs/Docs/将 xdoctest 引入到飞桨框架工作流中.md

+
+Paddle 代码的 CI 流水线相关工具放置在 [Paddle/tools/](https://github.com/PaddlePaddle/Paddle/tree/develop/tools) 目录下。
+
+目前对于 python 示例代码的检查，主要通过 [Paddle/tools/codestyle/docstring_checker.py](https://github.com/PaddlePaddle/Paddle/blob/develop/tools/codestyle/docstring_checker.py) 完成。


应该是 paddle/tools/sampcd_processor.py 吧？关于 docstring_checker，是一个没有起作用的工具，可参见 PaddlePaddle/Paddle#47821

收到，这个我再具体看看然后改一下～

SigureMo · 2023-05-21T06:59:30Z

rfcs/Docs/将 xdoctest 引入到飞桨框架工作流中.md

+    print("Sample code check is successful!")
+```
+
+此方法存在较多问题，比如，无法验证代码与示例中的结果是否一致，无法处理本应报错的示例代码等。


无法处理本应报错的示例代码

这是指？报错的示例代码现阶段应该会在 CI 中报错的

xdoctest 可以捕获 Error 的输出进行检查：

def test(a): """this is docstring... Examples: .. code-block:: python this is a test... >>> raise ValueError Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError """ pass

执行 xdoctest ：

$ xdoctest --style=google test_error.py ===================================== _ _ ___ ____ ____ ___ ____ ____ ___ \/ | \ | | | | |___ [__ | _/\_ |__/ |__| |___ | |___ ___] | ===================================== Start doctest_module('test_error.py') Listing tests gathering tests running 1 test(s) ====== <exec> ====== * DOCTEST : test_error.py::test:0, line 5 <- wrt source file DOCTEST SOURCE 6 >>> raise ValueError Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError DOCTEST STDOUT/STDERR DOCTEST RESULT * SUCCESS: test_error.py::test:0 ====== </exec> ====== ============ === 1 passed in 0.09 seconds ===

SigureMo · 2023-05-21T07:00:18Z

rfcs/Docs/将 xdoctest 引入到飞桨框架工作流中.md

+
+目前 Paddle 中 python 相关代码，主要放置在 [Paddle/python/paddle/](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle) 目录下。
+
+其中包括 `2334` 个 python 文件，包括示例代码 `341` 段。(commit `8acbf10bd51026c0a41423c2826b7cc886ad1e74`)


包括示例代码 341 段

这里的统计来源是？只有 341 个示例代码嘛？

我这里简单改了一下 docs/ci_scripts/chinese_samplecode_processor.py 进行统计：

import math import os import pickle import shutil import subprocess import multiprocessing import sys import glob def remove_desc_code(srcls, filename): if filename == 'fluid_cn/one_hot_cn.rst': srcls.pop(13) srcls.pop(28) srcls.pop(44) if filename == 'layers_cn/one_hot_cn.rst': srcls.pop(15) srcls.pop(30) srcls.pop(46) if filename == 'profiler_cn/profiler_cn.rst': srcls.pop(41) if filename == 'layers_cn/natural_exp_decay_cn.rst': srcls.pop(13) if filename == 'layers_cn/transpose_cn.rst': srcls.pop(20) if filename == 'layers_cn/array_length_cn.rst': srcls.pop(36) if filename == 'layers_cn/inverse_time_decay_cn.rst': srcls.pop(13) if filename == 'layers_cn/stack_cn.rst': srcls.pop(12) srcls.pop(33) if filename == 'layers_cn/sums_cn.rst': srcls.pop(11) if filename == 'layers_cn/sum_cn.rst': for i in range(len(srcls) - 1, 61, -1): srcls.pop(i) if filename == 'layers_cn/softmax_cn.rst': srcls.pop(30) srcls.pop(57) if filename == 'layers_cn/array_write_cn.rst': srcls.pop(37) if filename == 'layers_cn/lod_append_cn.rst': srcls.pop(11) if filename == 'layers_cn/reorder_lod_tensor_by_rank_cn.rst': srcls.pop(25) if filename == 'layers_cn/round_cn.rst': srcls.pop(10) if filename == 'layers_cn/squeeze_cn.rst': srcls.pop(11) srcls.pop(19) srcls.pop(27) if filename == 'layers_cn/unsqueeze_cn.rst': srcls.pop(11) if filename == 'layers_cn/array_read_cn.rst': srcls.pop(51) if filename == 'layers_cn/scatter_cn.rst': srcls.pop(9) if filename == 'layers_cn/topk_cn.rst': srcls.pop(11) if filename == 'optimizer_cn/ModelAverage_cn.rst': srcls.pop(15) return srcls def check_indent(code_line): indent = "" for c in code_line: if c == '\t': indent += ' ' elif c == ' ': indent += ' ' if c != ' ' and c != '\t': break return indent def find_all(src_str, substr): indices = [] get_one = src_str.find(substr) while get_one != -1: indices.append(get_one) get_one = src_str.find(substr, get_one + 1) return indices def extract_sample_code(srcfile, status_all): content = "" filename = srcfile.name srcc = srcfile.read() srcfile.seek(0, 0) srcls = srcfile.readlines() srcls = remove_desc_code( srcls, filename ) # remove description info for samplecode status = [] sample_code_begins = find_all(srcc, " code-block:: python") if len(sample_code_begins) == 0: status.append(-1) else: for i in range(0, len(srcls)): if srcls[i].find(".. code-block:: python") != -1: content = "" start = i blank_line = 1 while srcls[start + blank_line].strip() == '': blank_line += 1 startindent = "" # remove indent error if srcls[start + blank_line].find("from") != -1: startindent += srcls[start + blank_line][ : srcls[start + blank_line].find("from") ] elif srcls[start + blank_line].find("import") != -1: startindent += srcls[start + blank_line][ : srcls[start + blank_line].find("import") ] else: startindent += check_indent(srcls[start + blank_line]) content += srcls[start + blank_line][len(startindent) :] for j in range(start + blank_line + 1, len(srcls)): # planish a blank line if ( not srcls[j].startswith(startindent) and srcls[j] != '\n' ): break if srcls[j].find(" code-block:: python") != -1: break content += srcls[j].replace(startindent, "", 1) status.append(run_sample_code(content, filename)) status_all[filename] = status return status_all, content def run_sample_code(content, filename): return 0 def test(file): temp = [] src = open(file, 'r') status_all = {} _, content = extract_sample_code(src, status_all) temp.append(status_all) src.close() return temp, content if __name__ == '__main__': with open('codes.txt', 'w') as f_codes: codes = [] count = 0 count_codes = 0 for root, dirs, files in os.walk('/home/shun/Documents/Projects/paddle_xdoctest/Paddle-develop/python/paddle'): # print("当前目录：", root) # print("子目录列表：", dirs) # print("文件列表：", files) for f in files: if f.endswith('.py'): count += 1 filename = os.path.join(root, f) _, _codes = test(filename) if _codes: count_codes += 1 f_codes.write('-'*30 + str(count_codes)) f_codes.write('\n') f_codes.write(filename + '\t' + '-'*30) f_codes.write('\n') f_codes.write(_codes) f_codes.write('\n') print('total...', count) print('total code...', count_codes)

这里抽出来就这么多，我感觉也有点少，不过 python 的文件数好像也对就没深究了呵呵。。。

可以试一下 Paddle 下的脚本 paddle/tools/sampcd_processor.py

SigureMo · 2023-05-21T07:08:10Z

rfcs/Docs/将 xdoctest 引入到飞桨框架工作流中.md

+
+3. 后期收尾阶段：切换流水线至 Paddle 代码中，可移除 Paddle docs 的代码检查。
+    - 中英文 [API 文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html#api) 特性更新，可以复制带有 `>>>` 提示符的代码示例，包含代码与注释，不含输出。
+    - 代码检查移交(可选)，将代码检查的工作全部从 Paddle docs 移交至 Paddle 代码的 CI 流水线中进行。


由于前面所述，Paddle 和 docs 是同时包含代码检查的，这里的一些表述需要修改下

代码检查移交(可选)

我觉得「可选」可以删掉，因为同时使用两个工具来检查会徒增维护成本，该阶段可以移除原有的代码检查

可以复制带有 >>> 提示符的代码示例，包含代码与注释，不含输出。

这个前中期的代码复制是如何保证的呢？用户在前中期看到、复制的代码是包含 >>> 和注释的吗？

这个地方没写详细～

目前 docs 是用 sphinx 构建的吧？模板是不是在 templates_path = ["/templates"] 下面？

我还真没用过 sphinx 构建过文档，不确定前中期看到和复制的代码是什么样的，这个地方单独把这个特性拎出来也是为了跟踪一下。

SigureMo · 2023-05-21T07:21:18Z

rfcs/Docs/将 xdoctest 引入到飞桨框架工作流中.md

+- 后续行中没有 `>>>` 开头的语句视为输出，其上一行必须以 `>>>` 开头。
+- 空行视为新的代码段开始
+
+但是，由于 `xdoctest` 中也暂无此类强行的格式检查，所以，此设计项作为可选。


该阶段是否可以将 .. code-block:: 及缩进移除呢？

可以啊～如果确认不需要 .. code-block:: ，相应的需要修改 Paddle 代码和 Paddle docs 对于示例代码的抽取。

这样的话，建议单独拎一个特性出来～

不过，这里还是要确认一下，由于 xdoctest 对于目前的示例代码是 “兼容” 的，也就是会自动跳过，咱们后面是否需要强制检查这个格式？所以我这里把 2.3 不再兼容旧格式(可选) 列为了可选。

不过，这里还是要确认一下，由于 xdoctest 对于目前的示例代码是 “兼容” 的，也就是会自动跳过，咱们后面是否需要强制检查这个格式？所以我这里把 2.3 不再兼容旧格式(可选) 列为了可选。

如果没有检查的话，会有开发者因为使用了旧的格式而被跳过吧，这样相应的代码即便发生了错误也无法被检查出来了，这是不太能接受的，所以还是比较建议有这样的一个检查的

赞同！：）

SigureMo · 2023-05-21T07:28:49Z

rfcs/Docs/将 xdoctest 引入到飞桨框架工作流中.md

+- 影响 Paddle 代码与 Paddle docs 的 CI 流水线
+- 影响目前 python API 的示例代码写作方式
+- 影响文档 `开发 API Python 端` 的页面显示
+- 影响中英文 API 文档的示例代码显示与代码复制


可否按照 https://github.com/PaddlePaddle/community/blob/master/rfcs/design_template.md#%E4%B8%83%E5%BD%B1%E5%93%8D%E9%9D%A2 分成几类来描述下呢？可以稍微展开说下影响有多大，是否可控

SigureMo · 2023-05-21T09:34:50Z

rfcs/Docs/将 xdoctest 引入到飞桨框架工作流中.md

+
+另外，对于无法验证输出一致性的示例(随机分布)、需要特殊环境(如需要GPU、文件存储)等均无特殊处理。
+
+


另外最好额外提一下，Paddle 现有的代码检查工具的原理是运行时抽取 docstring 还是静态代码分析？xdoctest 又是如何抽取的？

值得注意的是，运行时抽取有一个优势是即便是 C++ 代码中定义的 Docstring 也是可以正确抽取出来的，而静态代码分析则是不太容易做到的，这一点可以确定一下

这里不是很理解，是用 xdoctest 抽取 c++ 中的例子？

比如对于

https://github.com/PaddlePaddle/Paddle/blob/83a12b1110677d98b92c1734cdcc3a31e480ac67/paddle/fluid/pybind/cuda_streams_py.cc#L130-L137

是通过 pybind11 暴露的 API，其生成的文档见

https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/device/cuda/Stream_cn.html#stream

这个 API 的示例代码现有的示例代码检查工具是可以检查的吗？xdoctest 是可以检查的吗？需要对比一下～

嗯 xdoctest 可以动态解析：

analysis (str, default='auto'): if 'static', only static analysis is used to parse call definitions. If 'auto', uses dynamic analysis for compiled python extensions, but static analysis elsewhere, if 'dynamic', then dynamic analysis is used to parse all calldefs.

def parse_dynamic_calldefs(modpath_or_module): ... if getattr(module, '__doc__'): calldefs['__doc__'] = static.CallDefNode( callname='__doc__', docstr=module.__doc__, lineno=0, doclineno=1, doclineno_end=1, args=None ) ...

paddle.device.cuda.Stream.__doc__ 我看能正常抽取出来，但是具体 xdoctest 怎么处理，这个要具体做的时候关注一下！我单独分一个特性出来跟踪吧～：）

[Add]将 xdoctest 引入到飞桨框架工作流中.md

93d10eb

paddle-bot bot added contributor status: proposed labels May 21, 2023

megemini mentioned this pull request May 21, 2023

中国软件开源创新大赛：飞桨框架任务挑战赛（上） PaddlePaddle/Paddle#53172

Closed

SigureMo reviewed May 21, 2023

View reviewed changes

luotao1 assigned luotao1 and SigureMo May 26, 2023

megemini mentioned this pull request May 28, 2023

[Add]将 xdoctest 引入到飞桨框架工作流中v1 #547

Merged

luotao1 closed this Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Add]将 xdoctest 引入到飞桨框架工作流中 #540

[Add]将 xdoctest 引入到飞桨框架工作流中 #540

megemini commented May 21, 2023

paddle-bot bot commented May 21, 2023

megemini commented May 21, 2023

SigureMo left a comment

SigureMo May 21, 2023

SigureMo May 21, 2023

SigureMo May 21, 2023

megemini May 21, 2023

SigureMo May 21, 2023

megemini May 21, 2023

SigureMo May 21, 2023

megemini May 21, 2023

SigureMo May 21, 2023

SigureMo May 21, 2023

megemini May 21, 2023

SigureMo May 21, 2023

SigureMo May 21, 2023

SigureMo May 21, 2023

SigureMo May 21, 2023

megemini May 21, 2023

SigureMo May 21, 2023

megemini May 21, 2023

SigureMo May 21, 2023

megemini May 21, 2023

SigureMo May 21, 2023

SigureMo May 21, 2023

megemini May 21, 2023

SigureMo May 21, 2023 •

edited

Loading

megemini May 21, 2023


		### 2.1 文档建设

		更新 Paddle 贡献指南中的文档： [开发 API Python 端](https://www.paddlepaddle.org.cn/documentation/docs/zh/dev_guides/api_contributing_guides/new_python_api_cn.html#api-python) 。以此规范后续代码的开发。


		更新 Paddle 贡献指南中的文档： [开发 API Python 端](https://www.paddlepaddle.org.cn/documentation/docs/zh/dev_guides/api_contributing_guides/new_python_api_cn.html#api-python) 。以此规范后续代码的开发。

		添加 `Example` 示例代码的写作要求，要求符合 `xdoctest` 中的 `google` style，即，在示例 `Example` 中代码需要以 `>>>` 开头。且保留目前的 `code-block` 提示，从而不影响中文文档的生成工作。


		Paddle 代码的 CI 流水线相关工具放置在 [Paddle/tools/](https://github.com/PaddlePaddle/Paddle/tree/develop/tools) 目录下。

		目前对于 python 示例代码的检查，主要通过 [Paddle/tools/codestyle/docstring_checker.py](https://github.com/PaddlePaddle/Paddle/blob/develop/tools/codestyle/docstring_checker.py) 完成。


		目前 Paddle 中 python 相关代码，主要放置在 [Paddle/python/paddle/](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle) 目录下。

		其中包括 `2334` 个 python 文件，包括示例代码 `341` 段。(commit `8acbf10bd51026c0a41423c2826b7cc886ad1e74`)


		另外，对于无法验证输出一致性的示例(随机分布)、需要特殊环境(如需要GPU、文件存储)等均无特殊处理。

[Add]将 xdoctest 引入到飞桨框架工作流中 #540

[Add]将 xdoctest 引入到飞桨框架工作流中 #540

Conversation

megemini commented May 21, 2023

PR types

PR changes

Describe

paddle-bot bot commented May 21, 2023

megemini commented May 21, 2023

SigureMo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SigureMo May 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SigureMo May 21, 2023 •

edited

Loading