-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon No.40】为 Paddle 新增 ASGD API #58834
【Hackathon No.40】为 Paddle 新增 ASGD API #58834
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
Sorry to inform you that afcc16f's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
Sorry to inform you that 1f3b0bd's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
@@ -191,6 +191,19 @@ | |||
backward : as_strided_grad | |||
no_need_buffer : input | |||
|
|||
- op : asgd_ | |||
args : (Tensor param, Tensor learning_rate, Tensor grad, Tensor d, Tensor y, Tensor n, Tensor master_param, bool multi_precision=false) | |||
output : Tensor(param_out), Tensor(d_out), Tensor(y_out), Tensor(master_param_out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
入参顺序和其他的优化器保持一致吧,把grad 放在第二位
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
func : asgd | ||
data_type : param | ||
data_transform : | ||
support_trans_dtype : learning_rate, n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么要特别指定一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里参照的是 Paddle SGD 的写法,但是 ops.yaml 中仅 SGD 有这个,实测删除不影响结果,推测是历史遗留问题。已删除~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里回复的不对,删除之后需要重新cmake一下,我之前没做,抱歉抱歉。
实测会报类型转换的错误,具体原因正在测试。
这里应该是不能删除的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原因在下面哈
phi::dtype::bfloat16, | ||
float, | ||
double) { | ||
if (kernel_key.dtype() == phi::DataType::FLOAT16 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是不是已经在infermeta的时候做了,这里还有必要吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里参照的是 Paddle SGD 的写法,但是从逻辑和实际结果来看,确实是冗余的。已删除~~
python/paddle/optimizer/asgd.py
Outdated
name=None, | ||
): | ||
if learning_rate is None: | ||
raise ValueError("learning_rate is not set") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个报错信息有点歧义,如果没有主动设置,会有默认值,如果主动传None, 应该睡should not be none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
python/paddle/optimizer/asgd.py
Outdated
m = self._get_accumulator_master(self._m_acc_str, param_and_grad[0]) | ||
|
||
ys = self._get_accumulator_master(self._y_acc_str, param_and_grad[0]) | ||
y = paddle.assign(ys[paddle.mod(m, self._n_tensor).item()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用mod 是不是不是随机的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里参照的是 PyTorch SGD 的逻辑。
我在读 SGD 源码时,发现它并没有显式的采样步骤,后来我在 这里 找到了答案。
大致意思是说,优化器本身并不负责采样,“随机性”是由用户提供的数据决定。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是用assign吗,这个属于ys 的 y应该要被修改吧,这里assign出来了 ,ys里面对应的y会被改动吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ys 里面对应的 y 应该而且会被改动的。
动态图下,可以通过 pdb 定位到这,查看相关信息。我试过的,会被正确的改动。
静态图下,做过静态图和动态图是否统一的单测,见 这里 。
这里我调了很久。正确性可以保证的。
def __init__( | ||
self, | ||
learning_rate=0.001, | ||
batch_num=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_num和batch_szie的区别在哪呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch 是从总样本切分出来的样本,batch_size 是每一个 batch 中的样本数,batch_num 是指在完成一个 epoch 要处理的batch 的个数。
假设有一个包含 200 个样本的数据集(总样本),而且选择的 batch_size 为 5,那么此时 batch_num 为 40。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果是这个意思,那么一个epoch, ASGD这个优化器需要保存所有样本的数据?所以torch为什么没有实现这个版本的ASGD呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,在一个 epoch , ASGD 需要保存所有样本的梯度信息,因为它需要这些信息来:
- 求平均用以更新权重参数;
- 每次迭代用“随机采样”的新样本的梯度信息替换旧样本的梯度信息。
这个有在 RFC 里详细描述过。
为什么 torch 不这样实现。在 RFC 中有分析过,无论从代码逻辑和已有的评论来看, torch 现版本都是有严重问题的。至于 torch 为什么要实现一个错误的优化器,我曾经从百度和 这个网站 一点点翻过相关信息,真的已经无从考证了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
补充一下, SGD 其实都是 BGD(torch 也一样,这个可以从那个网站中求证),一个 batch 只会保留一份梯度信息。eg:这里不需要保存 200 个样本的梯度信息,只需要保留 40 份梯度信息。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
突然想到一个优化思路:batch_num 不强制要求等于 all / batch_size,在显存紧张的情况下,允许将 batch_num 设置的小一点。即我们还是要保留历史梯度,但是可以不保留那么多。eg:all / batch_size 为 1000 的情况下,如果 batch_num 设置为 1000 爆显存,那用户可以把 batch_num 设置为 100 或者 10 等。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那我感觉这个和batch_size的区别好像不大了,不如就直接是batch_size, 每次多保存一份batch_size的梯度数据作为历史梯度数据
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这样不好吧。
无论入参是什么,在计算的时候 batch_num 是多少都是要知道的。
仅传入 batch_size 这一个参数是不能得到 batch_num 的,还要多传一个 all 的参数。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
先这样吧,batch_num这个参数的含义需要详细说明在文档上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lr = self._create_param_lr(param_and_grad) | ||
|
||
if in_dynamic_or_pir_mode(): | ||
_C_ops.asgd_( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
回复一下 ops.yaml 中的“特别指定”的问题。
support_trans_dtype
提供了自动类型转换的功能,eg:此处的 lr dtype 为 float32,而 C++ learning_rate 参数 dtype 为 float16(底层 learning_rate 的 dtype 与 param 相统一),如果没有为 learning_rate 开启 support_trans_dtype
,就会报类型不匹配的错误。
@GGBond8488 老师有时间的时候可以再审核一下~ |
老师能先审核一下吗?12点截止了。@GGBond8488 |
@GGBond8488 不忙的时候喵一眼哈 |
python/paddle/optimizer/asgd.py
Outdated
assign_value | ||
) | ||
|
||
def update_sign(self, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个意义是什么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sign 的作用是标记是否记录中间变量。因为记录中间标量需要额外的储存空间,所以默认为 false。如果需要测试中间变量,可以用这个函数更改 sign 的值为 true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
保存的是那些中间变量?用来做什么的?如果只是测试,在测试脚本加个patch,把这个方法补丁的形式打上就行,实际不需要就不要加
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,这个稍后会更正,您先空看一下另一个问题吧,会对这里有影响
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
中间变量的这些测试,开发的时候能确定是对的就行,真正写在测试里面的应该是测试param的更新是不是符合预期就行了,用完整的模型优化的流程,而不是直接调单个op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我修改了一下代码,您看下符不符合要求
find_master, | ||
) | ||
|
||
self._assign_accumulator_master( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个写法是把accumulator 里面的值先assign 出来,修改了再assign回去? 前面的asgd op 不是设置了inplace吗,应该只要直接拿出来就行了吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
静态图不支持 inplace 操作。
我在测试过程中发现,有的时候动态图会正常更新,静态图确更新失败。
用 assign 动态图和静态图都可以正常更新。
这里有更好的写法嘛。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
静态图是支持inplace 操作的,你只需要inputs 和 output 都传同样的variable就行
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没看明白这个动静不统一的错误是什么意思,你这个_C_ops.asgd_的调用方式都是动态图啊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
您之前不是说 “只需要inputs 和 output 都传同样的variable” 就能实现 inplace 操作嘛,这里传的是 ys[index] ,动态图的情况下很正常,静态图的情况下,并没有更新,我暂时搞不清楚为什么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
除此之外,m+1 需要调用 add,但是add_ 有装饰器 @inplace_apis_in_dygraph_only,我试着直接调 C_ops.add(x, y),也不行
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里一定要 inplace 吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里区分动态图和鸡静态图就行了啊
动态图add inplace 就是 m.add_(paddle.to_tensor([1]))
静态图inplace 就是
append_op(
{
type = "add",
inputs = {"x":m},
outputs = {"out":m}
})
下面的那个inplace 的也一样,只是两个的不同用法罢了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
理解了,这里修改了一下,大概内容如下~
- 分成了三个部分,动态图、pir 下的静态图、非 pir 下的静态图
- 动态图。通过 add_ 与 C_ops.asgd 实现 m 与 ys 的 inplace 更新
- pir 下的静态图。通过 assign_accumulator_master 实现(之前的那种方式),不支持 inplace 更新。这里调用 add 与 C_ops.asgd,不能更新数据。调用 block.append_op ,会直接报错。
- 非 pir 下的静态图。通过 block.append_op 实现 m 的 inplace 更新,不支持 ys 的inplace 更新。通过 paddle.static.setitem 与 _assign_accumulator_master 实现。如果只用 paddle.static.setitem ,会出现一个很奇怪的 bug:在当前 step 下 ys 的数据正常更新了,下一个 step 还是原先的值(全0)
python/paddle/optimizer/asgd.py
Outdated
index = paddle.mod(m, self._n_tensor).item() | ||
y = paddle.assign(ys[index]) | ||
|
||
m = paddle.assign(paddle.add(m, to_tensor([1], dtype=m.dtype))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个add 操作本身就会生成一个新的tensor,再assign 回去是想实现类似inplace的效果?那其实用add 的inplace api paddle.add_就行了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GGBond8488 回复完了,老师再审核一下哈 |
find_master, | ||
) | ||
|
||
self._assign_accumulator_master( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
静态图是支持inplace 操作的,你只需要inputs 和 output 都传同样的variable就行
|
||
outputs = { | ||
"param_out": param_and_grad[0], | ||
"d_out": d, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你这里本身就是一个inplace了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里如果传的是 ys ,是没问题的。但是这里传的是 ys[index] ,总是动静不统一,用 assign 可以绕开这个问题。这里有更好的办法嘛
python/paddle/optimizer/asgd.py
Outdated
assign_value | ||
) | ||
|
||
def update_sign(self, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
保存的是那些中间变量?用来做什么的?如果只是测试,在测试脚本加个patch,把这个方法补丁的形式打上就行,实际不需要就不要加
@GGBond8488 回复您了,有空看一下哈,patch 那个不难改,另一个问题我暂时没有好的解决办法 |
@GGBond8488 ys[index] 的那个是老问题了,暂时没有好的解决办法。除此之外,还有别的问题吗?劳烦您都指出来吧。 |
其他应该没有问题了 |
@GGBond8488 老师不忙的时候喵一眼哈~ |
@GGBond8488 有时间再审核一下哈 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
add_outputs = { | ||
"Out": m, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里其实也不是inplace了吧,直接用add就行了,而且这个m真的需要inplace吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 这个不是 inplace 是因为输入是 to_tensor(m) 而不是 m 吗?修改成 m 了。您看现在可以吗?
- 这里 m 需要更新并替换原先保存的值,因为是静态图所以不能用 add_ ,相比于 assign 出来再 assign 回去,这里用 append_op 应该会更好一点吧,所以用的这种实现方式
CI rerun 不了,已经反应了 |
@GGBond8488 CI 过了, m 那里修改了一下,现在可以吗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for docs
please add link of rfc and chinese document in description above, and ensure that the content in rfc is consistent with the code. |
@jeff41404 Done |
There is a difference between the API signature in the rfc and code, which has been commented in rfc. According to the usual process, rfc should be merged first and then the code |
RFC has been corrected. The comments in the code also need to be modified. Can I still push here? Will it disrupt the state of "approve"? Or propose a new PR? |
Please still push here |
Sorry. The comments in the code already include "multi_precision". |
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
APIs
Description
为 Paddle 新增 ASGD API
Related links
Current progress