Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon No.40】为 Paddle 新增 ASGD API #58834

Merged
merged 30 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
9cb8ec3
add basic asgd
WintersMontagne10335 Nov 8, 2023
afcc16f
Merge remote-tracking branch 'upstream/develop' into winters018
WintersMontagne10335 Nov 8, 2023
80b5266
Merge remote-tracking branch 'upstream/develop' into winters018
WintersMontagne10335 Nov 22, 2023
c3a3a3c
add basic implementation
WintersMontagne10335 Nov 23, 2023
1f3b0bd
fix bugs
WintersMontagne10335 Nov 23, 2023
1cb5a92
Merge remote-tracking branch 'upstream/develop' into winters018
WintersMontagne10335 Dec 14, 2023
c9e2961
Merge remote-tracking branch 'upstream/develop' into winters018
WintersMontagne10335 Dec 15, 2023
56925e4
add unit test cases
WintersMontagne10335 Dec 19, 2023
58ed273
Merge remote-tracking branch 'upstream/develop' into winters018
WintersMontagne10335 Dec 19, 2023
c70caab
fix bug
WintersMontagne10335 Dec 20, 2023
20e41ec
add English document
WintersMontagne10335 Dec 20, 2023
49d8ead
fix bug
WintersMontagne10335 Dec 21, 2023
3fd08f6
update
WintersMontagne10335 Dec 21, 2023
bbc0779
rollback
WintersMontagne10335 Dec 21, 2023
6b70f57
test
WintersMontagne10335 Dec 21, 2023
6fec604
update grad
WintersMontagne10335 Dec 21, 2023
4800282
fix bug
WintersMontagne10335 Dec 21, 2023
def823f
update
WintersMontagne10335 Dec 23, 2023
541a885
Merge remote-tracking branch 'upstream/develop' into winters018
WintersMontagne10335 Dec 29, 2023
08fb34f
update
WintersMontagne10335 Jan 3, 2024
476832f
Merge remote-tracking branch 'upstream/develop' into winters018
WintersMontagne10335 Jan 4, 2024
e170c3d
revert
WintersMontagne10335 Jan 4, 2024
7529d13
update
WintersMontagne10335 Jan 4, 2024
34d75b0
update
WintersMontagne10335 Jan 4, 2024
ca3d4ff
fix bugs
WintersMontagne10335 Jan 11, 2024
517d371
update
WintersMontagne10335 Jan 11, 2024
176c067
update
WintersMontagne10335 Jan 17, 2024
97c9eec
update
WintersMontagne10335 Jan 19, 2024
ce01712
update
WintersMontagne10335 Jan 22, 2024
e8d30eb
update
WintersMontagne10335 Jan 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions paddle/phi/api/yaml/ops.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,19 @@
backward : as_strided_grad
no_need_buffer : input

- op : asgd_
args : (Tensor param, Tensor grad, Tensor learning_rate, Tensor d, Tensor y, Tensor n, Tensor master_param, bool multi_precision=false)
output : Tensor(param_out), Tensor(d_out), Tensor(y_out), Tensor(master_param_out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

入参顺序和其他的优化器保持一致吧,把grad 放在第二位

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

infer_meta :
func : ASGDInferMeta
kernel :
func : asgd
data_type : param
data_transform :
support_trans_dtype : learning_rate, n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么要特别指定一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里参照的是 Paddle SGD 的写法,但是 ops.yaml 中仅 SGD 有这个,实测删除不影响结果,推测是历史遗留问题。已删除~~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里回复的不对,删除之后需要重新cmake一下,我之前没做,抱歉抱歉。
实测会报类型转换的错误,具体原因正在测试。
这里应该是不能删除的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原因在下面哈

optional : master_param, master_param_out
inplace : (param -> param_out), (d -> d_out), (y -> y_out), (master_param -> master_param_out)

- op : asin
args : (Tensor x)
output : Tensor(out)
Expand Down
42 changes: 42 additions & 0 deletions paddle/phi/infermeta/multiary.cc
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,48 @@ void AddNTensorArrayInferMeta(const std::vector<const MetaTensor*>& x,
}
}

void ASGDInferMeta(const MetaTensor& param,
const MetaTensor& grad,
const MetaTensor& learning_rate,
const MetaTensor& d,
const MetaTensor& y,
const MetaTensor& n,
const MetaTensor& master_param,
bool multi_precision,
MetaTensor* param_out,
MetaTensor* d_out,
MetaTensor* y_out,
MetaTensor* master_param_out) {
PADDLE_ENFORCE_NOT_NULL(
param_out,
phi::errors::InvalidArgument(
"Output(ParamOut) of ASGDOp should not be null."));

PADDLE_ENFORCE_NOT_NULL(d_out,
phi::errors::InvalidArgument(
"Output(DOut) of ASGDOp should not be null."));

PADDLE_ENFORCE_NOT_NULL(y_out,
phi::errors::InvalidArgument(
"Output(YOut) of ASGDOp should not be null."));

param_out->set_dims(param.dims());
param_out->set_dtype(param.dtype());
d_out->set_dims(d.dims());
d_out->set_dtype(d.dtype());
y_out->set_dims(y.dims());
y_out->set_dtype(y.dtype());
if (multi_precision) {
master_param_out->set_dims(master_param.dims());
if (DataType::FLOAT16 == master_param.dtype() ||
DataType::BFLOAT16 == master_param.dtype()) {
master_param_out->set_dtype(DataType::FLOAT32);
} else {
master_param_out->set_dtype(master_param.dtype());
}
}
}

void AucInferMeta(const MetaTensor& input,
const MetaTensor& label,
const MetaTensor& stat_pos,
Expand Down
13 changes: 13 additions & 0 deletions paddle/phi/infermeta/multiary.h
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,19 @@ void AddNTensorArrayInferMeta(const std::vector<const MetaTensor*>& x,
MetaTensor* out,
MetaConfig config);

void ASGDInferMeta(const MetaTensor& param,
const MetaTensor& grad,
const MetaTensor& learning_rate,
const MetaTensor& d,
const MetaTensor& y,
const MetaTensor& n,
const MetaTensor& master_param,
bool multi_precision,
MetaTensor* param_out,
MetaTensor* d_out,
MetaTensor* y_out,
MetaTensor* master_param_out);

void AucInferMeta(const MetaTensor& input,
const MetaTensor& label,
const MetaTensor& stat_pos,
Expand Down
37 changes: 37 additions & 0 deletions paddle/phi/kernels/asgd_kernel.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#pragma once

#include "paddle/phi/core/dense_tensor.h"
#include "paddle/phi/core/selected_rows.h"

namespace phi {

template <typename T, typename Context>
void ASGDKernel(const Context& dev_ctx,
const DenseTensor& param,
const DenseTensor& grad,
const DenseTensor& learning_rate,
const DenseTensor& d,
const DenseTensor& y,
const DenseTensor& n,
const paddle::optional<DenseTensor>& master_param,
bool multi_precision,
DenseTensor* param_out,
DenseTensor* d_out,
DenseTensor* y_out,
DenseTensor* master_param_out);

} // namespace phi
73 changes: 73 additions & 0 deletions paddle/phi/kernels/cpu/asgd_kernel.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include "paddle/phi/kernels/asgd_kernel.h"

#include "paddle/phi/backends/cpu/cpu_context.h"
#include "paddle/phi/core/kernel_registry.h"
#include "paddle/phi/kernels/funcs/eigen/common.h"
#include "paddle/phi/kernels/funcs/jit/kernels.h"

namespace phi {

template <typename T, typename Context>
void ASGDKernelCPUImpl(const Context& dev_ctx,
const DenseTensor& param,
const DenseTensor& grad,
const DenseTensor& learning_rate,
const DenseTensor& d,
const DenseTensor& y,
const DenseTensor& n,
DenseTensor* param_out,
DenseTensor* d_out,
DenseTensor* y_out) {
auto param_eigen = EigenVector<T>::Flatten(param);
auto grad_eigen = EigenVector<T>::Flatten(grad);
auto d_eigen = EigenVector<T>::Flatten(d);
auto y_eigen = EigenVector<T>::Flatten(y);
auto param_out_eigen = EigenVector<T>::Flatten(*param_out);
auto d_out_eigen = EigenVector<T>::Flatten(*d_out);
auto y_out_eigen = EigenVector<T>::Flatten(*y_out);
T learning_rate_T = learning_rate.data<T>()[0];
T n_T = n.data<T>()[0];

d_out_eigen = d_eigen - y_eigen + grad_eigen;
y_out_eigen = grad_eigen;
param_out_eigen = param_eigen - (learning_rate_T / n_T) * d_out_eigen;
}

template <typename T, typename Context>
void ASGDKernel(const Context& dev_ctx,
const DenseTensor& param,
const DenseTensor& grad,
const DenseTensor& learning_rate,
const DenseTensor& d,
const DenseTensor& y,
const DenseTensor& n,
const paddle::optional<DenseTensor>& master_param UNUSED,
bool multi_precision UNUSED,
DenseTensor* param_out,
DenseTensor* d_out,
DenseTensor* y_out,
DenseTensor* master_param_out UNUSED) {
dev_ctx.template Alloc<T>(param_out);
dev_ctx.template Alloc<T>(d_out);
dev_ctx.template Alloc<T>(y_out);
ASGDKernelCPUImpl<T, Context>(
dev_ctx, param, grad, learning_rate, d, y, n, param_out, d_out, y_out);
}

} // namespace phi

PD_REGISTER_KERNEL(asgd, CPU, ALL_LAYOUT, phi::ASGDKernel, float, double) {}
106 changes: 106 additions & 0 deletions paddle/phi/kernels/gpu/asgd_kernel.cu
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include "paddle/phi/kernels/asgd_kernel.h"

#include "paddle/phi/backends/gpu/gpu_context.h"
#include "paddle/phi/backends/gpu/gpu_helper.h"
#include "paddle/phi/backends/gpu/gpu_primitives.h"
#include "paddle/phi/common/amp_type_traits.h"
#include "paddle/phi/core/kernel_registry.h"
#include "paddle/phi/core/mixed_vector.h"

namespace phi {

template <typename T, typename MT>
__global__ void ASGDKernelGPUImpl(const T* param,
const T* grad,
const T* learning_rate,
const T* d,
const T* y,
const T* n,
const MT* master_param,
int num,
T* param_out,
T* d_out,
T* y_out,
MT* master_param_out) {
MT learning_rate_MT = static_cast<MT>(learning_rate[0]);
MT n_MT = static_cast<MT>(n[0]);
CUDA_KERNEL_LOOP(i, num) {
MT param_data = master_param ? master_param[i] : static_cast<MT>(param[i]);
MT grad_data = static_cast<MT>(grad[i]);
MT d_data = static_cast<MT>(d[i]);
MT y_data = static_cast<MT>(y[i]);
d_data = d_data - y_data + grad_data;
y_data = grad_data;
param_data = param_data - (learning_rate_MT / n_MT) * d_data;
param_out[i] = static_cast<T>(param_data);
d_out[i] = static_cast<T>(d_data);
y_out[i] = static_cast<T>(y_data);
if (master_param_out) {
master_param_out[i] = param_data;
}
}
}

template <typename T, typename Context>
void ASGDKernel(const Context& dev_ctx,
const DenseTensor& param,
const DenseTensor& grad,
const DenseTensor& learning_rate,
const DenseTensor& d,
const DenseTensor& y,
const DenseTensor& n,
const paddle::optional<DenseTensor>& master_param,
bool multi_precision,
DenseTensor* param_out,
DenseTensor* d_out,
DenseTensor* y_out,
DenseTensor* master_param_out) {
using MPDType = typename phi::dtype::MPTypeTrait<T>::Type;
const MPDType* master_in_data =
multi_precision ? master_param->data<MPDType>() : nullptr;
MPDType* master_out_data =
multi_precision ? dev_ctx.template Alloc<MPDType>(master_param_out)
: nullptr;

int block = 512;
int grid = (param.numel() + block - 1) / block;

ASGDKernelGPUImpl<T, MPDType><<<grid, block, 0, dev_ctx.stream()>>>(
param.data<T>(),
grad.data<T>(),
learning_rate.data<T>(),
d.data<T>(),
y.data<T>(),
n.data<T>(),
master_in_data,
param.numel(),
dev_ctx.template Alloc<T>(param_out),
dev_ctx.template Alloc<T>(d_out),
dev_ctx.template Alloc<T>(y_out),
master_out_data);
}

} // namespace phi

PD_REGISTER_KERNEL(asgd,
GPU,
ALL_LAYOUT,
phi::ASGDKernel,
phi::dtype::float16,
phi::dtype::bfloat16,
float,
double) {}
2 changes: 2 additions & 0 deletions python/paddle/optimizer/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from .adam import Adam
from .adamax import Adamax
from .adamw import AdamW
from .asgd import ASGD
from .lamb import Lamb
from .lbfgs import LBFGS
from .momentum import Momentum
Expand All @@ -32,6 +33,7 @@
'Adam',
'AdamW',
'Adamax',
'ASGD',
'RMSProp',
'Adadelta',
'SGD',
Expand Down
Loading