diff --git a/task/recognition/face/README.md b/task/recognition/face/README.md
new file mode 100644
index 0000000000000..89f489941ba8e
--- /dev/null
+++ b/task/recognition/face/README.md
@@ -0,0 +1,134 @@
+# Face Recognition
+
+The face recognition task is a case of achieving large-scale classification on PLSC, 
+and the goal is to implement and reproduce the SOTA algorithm. It has 
+the ability to train tens of millions of identities with high throughput in a single server.
+
+Function has supported:
+* ArcFace
+* CosFace
+* PartialFC
+* SparseMomentum
+* FP16 training
+* DataParallel(backbone layer) + ModelParallel(FC layer) distributed training
+
+Backbone includes:
+* IResNet
+* FaceViT
+
+## Requirements
+To enjoy some new features, PaddlePaddle 2.4 is required. For more installation tutorials 
+refer to [installation.md](../../../tutorials/get_started/installation.md)
+
+## Data Preparation
+
+### Download Dataset
+
+Download the dataset from insightface datasets.
+
+- [MS1MV2](https://github.com/deepinsight/insightface/tree/master/recognition/_datasets_#ms1m-arcface-85k-ids58m-images-57) (87k IDs, 5.8M images)
+- [MS1MV3](https://github.com/deepinsight/insightface/tree/master/recognition/_datasets_#ms1m-retinaface) (93k IDs, 5.2M images)
+- [Glint360K](https://github.com/deepinsight/insightface/tree/master/recognition/partial_fc#4-download) (360k IDs, 17.1M images)
+- [WebFace42M](https://github.com/deepinsight/insightface/blob/master/recognition/arcface_torch/docs/prepare_webface42m.md) (2M IDs, 42.5M images)
+
+Note:
+* MS1MV2: MS1M-ArcFace
+* MS1MV3: MS1M-RetinaFace
+* WebFace42M: cleared WebFace260M
+
+### [Optional] Extract MXNet Dataset to Images
+```shell
+# for example, here extract MS1MV3 dataset
+python -m plsc.data.dataset.tools.mx_recordio_2_images --root_dir /path/to/ms1m-retinaface-t1/ --output_dir ./dataset/MS1M_v3/
+```
+
+### Extract LFW Style bin Dataset to Images
+```shell
+# for example, here extract agedb_30 bin to images
+python -m plsc.data.dataset.tools.lfw_style_bin_dataset_converter --bin_path ./dataset/MS1M_v3/agedb_30.bin --out_dir ./dataset/MS1M_v3/agedb_30 --flip_test
+```
+
+### Dataset Directory
+We put all the data in the `./dataset/` directory, and we also recommend using soft links, for example:
+```shell
+mkdir -p ./dataset/
+ln -s /path/to/MS1M_v3 ./dataset/MS1M_v3
+```
+
+## How to Train
+
+```bash
+# Note: If running on multiple nodes, 
+# set the following environment variables 
+# and then need to run the script on each node.
+export PADDLE_NNODES=1
+export PADDLE_MASTER="127.0.0.1:12538"
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+
+python -m paddle.distributed.launch \
+    --nnodes=$PADDLE_NNODES \
+    --master=$PADDLE_MASTER \
+    --devices=$CUDA_VISIBLE_DEVICES \
+    plsc-train \
+    -c ./configs/IResNet50_MS1MV3_ArcFace_pfc01_1n8c_dp_mp_fp16o1.yaml
+```
+
+## How to Export
+
+```bash
+# In general, we only need to export the 
+# backbone, so we only need to run the 
+# export command on a single device.
+export PADDLE_NNODES=1
+export PADDLE_MASTER="127.0.0.1:12538"
+export CUDA_VISIBLE_DEVICES=0
+python -m paddle.distributed.launch \
+    --nnodes=$PADDLE_NNODES \
+    --master=$PADDLE_MASTER \
+    --devices=$CUDA_VISIBLE_DEVICES \
+    plsc-export \
+    -c ./configs/IResNet50_MS1MV3_ArcFace_pfc01_1n8c_dp_mp_fp16o1.yaml \
+    -o Global.pretrained_model=./output/IResNet50/latest \
+    -o FP16.level=O0 \ # export FP32 model when training with FP16
+    -o Model.data_format=NCHW # IResNet required if training with NHWC 
+```
+
+## Evaluation IJB-C
+```bash
+python onnx_ijbc.py \
+  --model-root ./output/IResNet50.onnx \
+  --image-path ./ijb/IJBC/ \
+  --target IJBC
+```
+
+## Model Zoo
+
+| Datasets | Backbone | Config                                                       | Devices   | PFC  | agedb30 | IJB-C(1E-4) | IJB-C(1E-5) | checkpoint                                                   | log                                                          |
+| :------: | :------- | ------------------------------------------------------------ | --------- | ---- | ------- | ----------- | :---------- | :----------------------------------------------------------- | ------------------------------------------------------------ |
+|  MS1MV3  | Res50    | [config](./configs/IResNet50_MS1MV3_ArcFace_pfc10_1n8c_dp_mp_fp16o1.yaml) | N1C8*A100 | 1.0  | 0.9825  | 96.52       | 94.60       | [download](https://plsc.bj.bcebos.com/models/face/v2.4/IResNet50_MS1MV3_ArcFace_pfc10_1n8c_dp_mp_fp16o1.pdparams) | [download](https://plsc.bj.bcebos.com/models/face/v2.4/IResNet50_MS1MV3_ArcFace_pfc10_1n8c_dp_mp_fp16o1.log) |
+
+## Citations
+
+```
+@misc{plsc,
+    title={PLSC: An Easy-to-use and High-Performance Large Scale Classification Tool},
+    author={PLSC Contributors},
+    howpublished = {\url{https://github.com/PaddlePaddle/PLSC}},
+    year={2022}
+}
+@inproceedings{deng2019arcface,
+  title={Arcface: Additive angular margin loss for deep face recognition},
+  author={Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
+  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
+  pages={4690--4699},
+  year={2019}
+}
+@inproceedings{An_2022_CVPR,
+    author={An, Xiang and Deng, Jiankang and Guo, Jia and Feng, Ziyong and Zhu, XuHan and Yang, Jing and Liu, Tongliang},
+    title={Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC},
+    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+    month={June},
+    year={2022},
+    pages={4042-4051}
+}
+```
diff --git a/task/recognition/face/configs/IResNet50_MS1MV3_ArcFace_pfc10_1n8c_dp_mp_fp16o1.yaml b/task/recognition/face/configs/IResNet50_MS1MV3_ArcFace_pfc10_1n8c_dp_mp_fp16o1.yaml
new file mode 100644
index 0000000000000..cd3ef84ca7890
--- /dev/null
+++ b/task/recognition/face/configs/IResNet50_MS1MV3_ArcFace_pfc10_1n8c_dp_mp_fp16o1.yaml
@@ -0,0 +1,126 @@
+# global configs
+Global:
+  task_type: recognition
+  train_epoch_func: defualt_train_one_epoch
+  eval_func: face_verification_eval
+  checkpoint: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  max_num_latest_checkpoint: 0
+  eval_during_train: True
+  eval_interval: 2000
+  eval_unit: "step"
+  accum_steps: 1
+  epochs: 20
+  print_batch_step: 100
+  use_visualdl: True
+  seed: 2022
+
+# FP16 setting
+FP16:
+  level: O1
+  GradScaler:
+    init_loss_scaling: 27648.0
+    
+DistributedStrategy:
+  data_parallel: True
+
+# model architecture
+Model:
+  name: IResNet50
+  num_features : 512
+  data_format : "NHWC"
+  class_num: 93431
+  pfc_config:
+    sample_ratio: 1.0
+    model_parallel: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MarginLoss:
+        m1: 1.0
+        m2: 0.5
+        m3: 0.0
+        s: 64.0
+        model_parallel: True
+        weight: 1.0
+        
+LRScheduler:
+  name: Poly
+  learning_rate: 0.1
+  decay_unit: step
+  warmup_steps: 0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  weight_decay: 5e-4
+  grad_clip:
+    name: ClipGradByGlobalNorm
+    clip_norm: 5.0
+    always_clip: True
+    no_clip_list: ['dist']
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: FaceIdentificationDataset
+      image_root: ./dataset/MS1M_v3/
+      cls_label_path: ./dataset/MS1M_v3/label.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.5, 0.5, 0.5]
+            std: [0.5, 0.5, 0.5]
+            order: ''
+        - ToCHWImage: 
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: FaceVerificationDataset
+      image_root: ./dataset/MS1M_v3/agedb_30
+      cls_label_path: ./dataset/MS1M_v3/agedb_30/label.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.5, 0.5, 0.5]
+            std: [0.5, 0.5, 0.5]
+            order: ''
+        - ToCHWImage:
+    sampler:
+      name: BatchSampler
+      batch_size: 128
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 0
+      use_shared_memory: True
+
+Metric:
+  Eval:
+    - LFWAcc:
+        flip_test: True
+
+Export:
+  export_type: onnx
+  input_shape: [None, 3, 112, 112]
diff --git a/task/recognition/face/eval_ijbc.sh b/task/recognition/face/eval_ijbc.sh
new file mode 100644
index 0000000000000..137f994ed2d17
--- /dev/null
+++ b/task/recognition/face/eval_ijbc.sh
@@ -0,0 +1,18 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+# 
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+# 
+#     http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+python onnx_ijbc.py \
+  --model-root ./output/IResNet50.onnx \
+  --image-path ./ijb/IJBC/ \
+  --target IJBC
diff --git a/task/recognition/face/export.sh b/task/recognition/face/export.sh
new file mode 100644
index 0000000000000..0b95254e0ecef
--- /dev/null
+++ b/task/recognition/face/export.sh
@@ -0,0 +1,26 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+# 
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+# 
+#     http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+export PADDLE_NNODES=1
+export PADDLE_MASTER="127.0.0.1:12538"
+export CUDA_VISIBLE_DEVICES=0
+python -m paddle.distributed.launch \
+    --nnodes=$PADDLE_NNODES \
+    --master=$PADDLE_MASTER \
+    --devices=$CUDA_VISIBLE_DEVICES \
+    plsc-export \
+    -c ./configs/IResNet50_MS1MV3_ArcFace_pfc10_1n8c_dp_mp_fp16o1.yaml \
+    -o Global.pretrained_model=output/IResNet50/latest \
+    -o FP16.level=O0 \
+    -o Model.data_format=NCHW
diff --git a/task/recognition/face/onnx_helper.py b/task/recognition/face/onnx_helper.py
new file mode 100644
index 0000000000000..a418a247042e8
--- /dev/null
+++ b/task/recognition/face/onnx_helper.py
@@ -0,0 +1,281 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+# 
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+# 
+#     http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# code modified from: https://github.com/deepinsight/insightface/blob/master/recognition/arcface_torch/onnx_helper.py
+
+from __future__ import division
+import datetime
+import os
+import os.path as osp
+import glob
+import numpy as np
+import cv2
+import sys
+import onnxruntime
+import onnx
+import argparse
+from onnx import numpy_helper
+
+
+class ArcFaceORT:
+    def __init__(self, model_path, cpu=False):
+        self.model_path = model_path
+        self.model_dir = os.path.dirname(model_path)
+        # providers = None will use available provider, for onnxruntime-gpu it will be "CUDAExecutionProvider"
+        self.providers = ['CPUExecutionProvider'
+                          ] if cpu else ['CUDAExecutionProvider']
+
+    #input_size is (w,h), return error message, return None if success
+    def check(self, track='cfat', test_img=None):
+        #default is cfat
+        max_model_size_mb = 1024
+        max_feat_dim = 512
+        max_time_cost = 15
+        if track.startswith('ms1m'):
+            max_model_size_mb = 1024
+            max_feat_dim = 512
+            max_time_cost = 10
+        elif track.startswith('glint'):
+            max_model_size_mb = 1024
+            max_feat_dim = 1024
+            max_time_cost = 20
+        elif track.startswith('cfat'):
+            max_model_size_mb = 1024
+            max_feat_dim = 512
+            max_time_cost = 15
+        elif track.startswith('unconstrained'):
+            max_model_size_mb = 1024
+            max_feat_dim = 1024
+            max_time_cost = 30
+        else:
+            return "track not found"
+
+        if not os.path.exists(self.model_path):
+            return f"{self.model_path} not exists"
+        if not os.path.isdir(self.model_dir):
+            return f"{self.model_dir} should be directory"
+
+        print('use onnx-model:', self.model_path)
+        try:
+            session = onnxruntime.InferenceSession(
+                self.model_path, providers=self.providers)
+        except Exception as e:
+            return "load onnx failed"
+        input_cfg = session.get_inputs()[0]
+        input_shape = input_cfg.shape
+        print('input-shape:', input_shape)
+        if len(input_shape) != 4:
+            return "length of input_shape should be 4"
+        if not isinstance(input_shape[0], str):
+            #return "input_shape[0] should be str to support batch-inference"
+            print('reset input-shape[0] to None')
+            model = onnx.load(self.model_path)
+            model.graph.input[0].type.tensor_type.shape.dim[
+                0].dim_param = 'None'
+            new_model_path = osp.join(self.model_dir, 'zzzzrefined.onnx')
+            onnx.save(model, new_model_path)
+            self.model_path = new_model_path
+            print('use new onnx-model:', self.model_path)
+            try:
+                session = onnxruntime.InferenceSession(
+                    self.model_path, providers=self.providers)
+            except:
+                return "load onnx failed"
+            input_cfg = session.get_inputs()[0]
+            input_shape = input_cfg.shape
+            print('new-input-shape:', input_shape)
+
+        self.image_size = tuple(input_shape[2:4][::-1])
+        #print('image_size:', self.image_size)
+        input_name = input_cfg.name
+        outputs = session.get_outputs()
+        output_names = []
+        for o in outputs:
+            output_names.append(o.name)
+            #print(o.name, o.shape)
+        if len(output_names) != 1:
+            return "number of output nodes should be 1"
+        self.session = session
+        self.input_name = input_name
+        self.output_names = output_names
+        #print(self.output_names)
+        model = onnx.load(self.model_path)
+        graph = model.graph
+        if len(graph.node) < 8:
+            return "too small onnx graph"
+
+        input_size = (112, 112)
+        self.crop = None
+        if track == 'cfat':
+            crop_file = osp.join(self.model_dir, 'crop.txt')
+            if osp.exists(crop_file):
+                lines = open(crop_file, 'r').readlines()
+                if len(lines) != 6:
+                    return "crop.txt should contain 6 lines"
+                lines = [int(x) for x in lines]
+                self.crop = lines[:4]
+                input_size = tuple(lines[4:6])
+        if input_size != self.image_size:
+            return "input-size is inconsistant with onnx model input, %s vs %s" % (
+                input_size, self.image_size)
+
+        self.model_size_mb = os.path.getsize(self.model_path) / float(1024 *
+                                                                      1024)
+        if self.model_size_mb > max_model_size_mb:
+            return "max model size exceed, given %.3f-MB" % self.model_size_mb
+
+        input_mean = None
+        input_std = None
+        if track == 'cfat':
+            pn_file = osp.join(self.model_dir, 'pixel_norm.txt')
+            if osp.exists(pn_file):
+                lines = open(pn_file, 'r').readlines()
+                if len(lines) != 2:
+                    return "pixel_norm.txt should contain 2 lines"
+                input_mean = float(lines[0])
+                input_std = float(lines[1])
+        if input_mean is not None or input_std is not None:
+            if input_mean is None or input_std is None:
+                return "please set input_mean and input_std simultaneously"
+        else:
+            find_sub = False
+            find_mul = False
+            for nid, node in enumerate(graph.node[:8]):
+                print(nid, node.name)
+                if node.name.startswith('Sub') or node.name.startswith(
+                        '_minus'):
+                    find_sub = True
+                if node.name.startswith('Mul') or node.name.startswith(
+                        '_mul') or node.name.startswith('Div'):
+                    find_mul = True
+            if find_sub and find_mul:
+                print("find sub and mul")
+                #mxnet arcface model
+                input_mean = 0.0
+                input_std = 1.0
+            else:
+                input_mean = 127.5
+                input_std = 127.5
+        self.input_mean = input_mean
+        self.input_std = input_std
+        for initn in graph.initializer:
+            weight_array = numpy_helper.to_array(initn)
+            dt = weight_array.dtype
+            if dt.itemsize < 4:
+                return 'invalid weight type - (%s:%s)' % (initn.name, dt.name)
+        assert test_img is not None
+        test_img = cv2.resize(test_img, self.image_size)
+        feat, cost = self.benchmark(test_img)
+        batch_result = self.check_batch(test_img)
+        batch_result_sum = float(np.sum(batch_result))
+        if batch_result_sum in [float('inf'), -float('inf')
+                                ] or batch_result_sum != batch_result_sum:
+            print(batch_result)
+            print(batch_result_sum)
+            return "batch result output contains NaN!"
+
+        if len(feat.shape) < 2:
+            return "the shape of the feature must be two, but get {}".format(
+                str(feat.shape))
+
+        if feat.shape[1] > max_feat_dim:
+            return "max feat dim exceed, given %d" % feat.shape[1]
+        self.feat_dim = feat.shape[1]
+        cost_ms = cost * 1000
+        if cost_ms > max_time_cost:
+            return "max time cost exceed, given %.4f" % cost_ms
+        self.cost_ms = cost_ms
+        print(
+            'check stat:, model-size-mb: %.4f, feat-dim: %d, time-cost-ms: %.4f, input-mean: %.3f, input-std: %.3f'
+            % (self.model_size_mb, self.feat_dim, self.cost_ms,
+               self.input_mean, self.input_std))
+        return None
+
+    def check_batch(self, img):
+        if not isinstance(img, list):
+            imgs = [img, ] * 32
+        if self.crop is not None:
+            nimgs = []
+            for img in imgs:
+                nimg = img[self.crop[1]:self.crop[3], self.crop[0]:self.crop[
+                    2], :]
+                if nimg.shape[0] != self.image_size[1] or nimg.shape[
+                        1] != self.image_size[0]:
+                    nimg = cv2.resize(nimg, self.image_size)
+                nimgs.append(nimg)
+            imgs = nimgs
+        blob = cv2.dnn.blobFromImages(
+            images=imgs,
+            scalefactor=1.0 / self.input_std,
+            size=self.image_size,
+            mean=(self.input_mean, self.input_mean, self.input_mean),
+            swapRB=True)
+        net_out = self.session.run(self.output_names,
+                                   {self.input_name: blob})[0]
+        return net_out
+
+    def meta_info(self):
+        return {
+            'model-size-mb': self.model_size_mb,
+            'feature-dim': self.feat_dim,
+            'infer': self.cost_ms
+        }
+
+    def forward(self, imgs):
+        if not isinstance(imgs, list):
+            imgs = [imgs]
+        input_size = self.image_size
+        if self.crop is not None:
+            nimgs = []
+            for img in imgs:
+                nimg = img[self.crop[1]:self.crop[3], self.crop[0]:self.crop[
+                    2], :]
+                if nimg.shape[0] != input_size[1] or nimg.shape[
+                        1] != input_size[0]:
+                    nimg = cv2.resize(nimg, input_size)
+                nimgs.append(nimg)
+            imgs = nimgs
+        blob = cv2.dnn.blobFromImages(
+            imgs,
+            1.0 / self.input_std,
+            input_size, (self.input_mean, self.input_mean, self.input_mean),
+            swapRB=True)
+        net_out = self.session.run(self.output_names,
+                                   {self.input_name: blob})[0]
+        return net_out
+
+    def benchmark(self, img):
+        input_size = self.image_size
+        if self.crop is not None:
+            nimg = img[self.crop[1]:self.crop[3], self.crop[0]:self.crop[2], :]
+            if nimg.shape[0] != input_size[1] or nimg.shape[1] != input_size[
+                    0]:
+                nimg = cv2.resize(nimg, input_size)
+            img = nimg
+        blob = cv2.dnn.blobFromImage(
+            img,
+            1.0 / self.input_std,
+            input_size, (self.input_mean, self.input_mean, self.input_mean),
+            swapRB=True)
+        costs = []
+        for _ in range(50):
+            ta = datetime.datetime.now()
+            net_out = self.session.run(self.output_names,
+                                       {self.input_name: blob})[0]
+            tb = datetime.datetime.now()
+            cost = (tb - ta).total_seconds()
+            costs.append(cost)
+        costs = sorted(costs)
+        cost = costs[5]
+        return net_out, cost
diff --git a/task/recognition/face/onnx_ijbc.py b/task/recognition/face/onnx_ijbc.py
new file mode 100644
index 0000000000000..163b10c983493
--- /dev/null
+++ b/task/recognition/face/onnx_ijbc.py
@@ -0,0 +1,313 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+# 
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+# 
+#     http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# code modified from: https://github.com/deepinsight/insightface/blob/master/recognition/arcface_torch/onnx_ijbc.py
+
+import argparse
+import os
+import pickle
+import timeit
+
+import cv2
+import numpy as np
+import pandas as pd
+import prettytable
+import skimage.transform
+import paddle
+from sklearn.metrics import roc_curve
+from sklearn.preprocessing import normalize
+from onnx_helper import ArcFaceORT
+
+SRC = np.array(
+    [[30.2946, 51.6963], [65.5318, 51.5014], [48.0252, 71.7366],
+     [33.5493, 92.3655], [62.7299, 92.2041]],
+    dtype=np.float32)
+SRC[:, 0] += 8.0
+
+
+class AlignedDataSet(paddle.io.Dataset):
+    def __init__(self, root, lines, align=True):
+        self.lines = lines
+        self.root = root
+        self.align = align
+
+    def __len__(self):
+        return len(self.lines)
+
+    def __getitem__(self, idx):
+        each_line = self.lines[idx]
+        name_lmk_score = each_line.strip().split(' ')
+        name = os.path.join(self.root, name_lmk_score[0])
+        img = cv2.cvtColor(cv2.imread(name), cv2.COLOR_BGR2RGB)
+        landmark5 = np.array(
+            [float(x) for x in name_lmk_score[1:-1]],
+            dtype=np.float32).reshape((5, 2))
+        st = skimage.transform.SimilarityTransform()
+        st.estimate(landmark5, SRC)
+        img = cv2.warpAffine(
+            img, st.params[0:2, :], (112, 112), borderValue=0.0)
+        img_1 = np.expand_dims(img, 0)
+        img_2 = np.expand_dims(np.fliplr(img), 0)
+        output = np.concatenate((img_1, img_2), axis=0).astype(np.float32)
+        output = np.transpose(output, (0, 3, 1, 2))
+        return paddle.to_tensor(output)
+
+
+@paddle.no_grad()
+def extract(model_root, dataset):
+    model = ArcFaceORT(model_path=model_root)
+    test_img = np.zeros((112, 112, 3), dtype=np.uint8)
+    status = model.check(test_img=test_img)
+    if status is not None:
+        print(status)
+        exit(-1)
+    feat_mat = np.zeros(shape=(len(dataset), 2 * model.feat_dim))
+
+    def collate_fn(data):
+        return paddle.concat(data, axis=0)
+
+    data_loader = paddle.io.DataLoader(
+        dataset,
+        batch_size=128,
+        drop_last=False,
+        num_workers=4,
+        collate_fn=collate_fn)
+    num_iter = 0
+    for batch in data_loader:
+        batch = batch.numpy()
+        batch = (batch - model.input_mean) / model.input_std
+        feat = model.session.run(model.output_names,
+                                 {model.input_name: batch})[0]
+        feat = np.reshape(feat, (-1, model.feat_dim * 2))
+        feat_mat[128 * num_iter:128 * num_iter + feat.shape[0], :] = feat
+        num_iter += 1
+        if num_iter % 50 == 0:
+            print(num_iter)
+    return feat_mat
+
+
+def read_template_media_list(path):
+    ijb_meta = pd.read_csv(path, sep=' ', header=None).values
+    templates = ijb_meta[:, 1].astype(np.int)
+    medias = ijb_meta[:, 2].astype(np.int)
+    return templates, medias
+
+
+def read_template_pair_list(path):
+    pairs = pd.read_csv(path, sep=' ', header=None).values
+    t1 = pairs[:, 0].astype(np.int)
+    t2 = pairs[:, 1].astype(np.int)
+    label = pairs[:, 2].astype(np.int)
+    return t1, t2, label
+
+
+def read_image_feature(path):
+    with open(path, 'rb') as fid:
+        img_feats = pickle.load(fid)
+    return img_feats
+
+
+def image2template_feature(img_feats=None, templates=None, medias=None):
+    unique_templates = np.unique(templates)
+    template_feats = np.zeros((len(unique_templates), img_feats.shape[1]))
+    for count_template, uqt in enumerate(unique_templates):
+        (ind_t, ) = np.where(templates == uqt)
+        face_norm_feats = img_feats[ind_t]
+        face_medias = medias[ind_t]
+        unique_medias, unique_media_counts = np.unique(
+            face_medias, return_counts=True)
+        media_norm_feats = []
+        for u, ct in zip(unique_medias, unique_media_counts):
+            (ind_m, ) = np.where(face_medias == u)
+            if ct == 1:
+                media_norm_feats += [face_norm_feats[ind_m]]
+            else:  # image features from the same video will be aggregated into one feature
+                media_norm_feats += [
+                    np.mean(
+                        face_norm_feats[ind_m], axis=0, keepdims=True),
+                ]
+        media_norm_feats = np.array(media_norm_feats)
+        template_feats[count_template] = np.sum(media_norm_feats, axis=0)
+        if count_template % 2000 == 0:
+            print('Finish Calculating {} template features.'.format(
+                count_template))
+    template_norm_feats = normalize(template_feats)
+    return template_norm_feats, unique_templates
+
+
+def verification(template_norm_feats=None,
+                 unique_templates=None,
+                 p1=None,
+                 p2=None):
+    template2id = np.zeros((max(unique_templates) + 1, 1), dtype=int)
+    for count_template, uqt in enumerate(unique_templates):
+        template2id[uqt] = count_template
+    score = np.zeros((len(p1), ))
+    total_pairs = np.array(range(len(p1)))
+    batchsize = 100000
+    sublists = [
+        total_pairs[i:i + batchsize] for i in range(0, len(p1), batchsize)
+    ]
+    total_sublists = len(sublists)
+    for c, s in enumerate(sublists):
+        feat1 = template_norm_feats[template2id[p1[s]]]
+        feat2 = template_norm_feats[template2id[p2[s]]]
+        similarity_score = np.sum(feat1 * feat2, -1)
+        score[s] = similarity_score.flatten()
+        if c % 10 == 0:
+            print('Finish {}/{} pairs.'.format(c, total_sublists))
+    return score
+
+
+def verification2(template_norm_feats=None,
+                  unique_templates=None,
+                  p1=None,
+                  p2=None):
+    template2id = np.zeros((max(unique_templates) + 1, 1), dtype=int)
+    for count_template, uqt in enumerate(unique_templates):
+        template2id[uqt] = count_template
+    score = np.zeros((len(p1), ))  # save cosine distance between pairs
+    total_pairs = np.array(range(len(p1)))
+    batchsize = 100000  # small batchsize instead of all pairs in one batch due to the memory limiation
+    sublists = [
+        total_pairs[i:i + batchsize] for i in range(0, len(p1), batchsize)
+    ]
+    total_sublists = len(sublists)
+    for c, s in enumerate(sublists):
+        feat1 = template_norm_feats[template2id[p1[s]]]
+        feat2 = template_norm_feats[template2id[p2[s]]]
+        similarity_score = np.sum(feat1 * feat2, -1)
+        score[s] = similarity_score.flatten()
+        if c % 10 == 0:
+            print('Finish {}/{} pairs.'.format(c, total_sublists))
+    return score
+
+
+def main(args):
+    use_norm_score = True  # if Ture, TestMode(N1)
+    use_detector_score = True  # if Ture, TestMode(D1)
+    use_flip_test = True  # if Ture, TestMode(F1)
+    assert args.target == 'IJBC' or args.target == 'IJBB'
+
+    start = timeit.default_timer()
+    templates, medias = read_template_media_list(
+        os.path.join('%s/meta' % args.image_path, '%s_face_tid_mid.txt' %
+                     args.target.lower()))
+    stop = timeit.default_timer()
+    print('Time: %.2f s. ' % (stop - start))
+
+    start = timeit.default_timer()
+    p1, p2, label = read_template_pair_list(
+        os.path.join('%s/meta' % args.image_path, '%s_template_pair_label.txt'
+                     % args.target.lower()))
+    stop = timeit.default_timer()
+    print('Time: %.2f s. ' % (stop - start))
+
+    start = timeit.default_timer()
+    img_path = '%s/loose_crop' % args.image_path
+    img_list_path = '%s/meta/%s_name_5pts_score.txt' % (args.image_path,
+                                                        args.target.lower())
+    img_list = open(img_list_path)
+    files = img_list.readlines()
+    dataset = AlignedDataSet(root=img_path, lines=files, align=True)
+    img_feats = extract(args.model_root, dataset)
+
+    faceness_scores = []
+    for each_line in files:
+        name_lmk_score = each_line.split()
+        faceness_scores.append(name_lmk_score[-1])
+    faceness_scores = np.array(faceness_scores).astype(np.float32)
+    stop = timeit.default_timer()
+    print('Time: %.2f s. ' % (stop - start))
+    print('Feature Shape: ({} , {}) .'.format(img_feats.shape[0],
+                                              img_feats.shape[1]))
+    start = timeit.default_timer()
+
+    if use_flip_test:
+        img_input_feats = img_feats[:, 0:img_feats.shape[1] //
+                                    2] + img_feats[:, img_feats.shape[1] // 2:]
+    else:
+        img_input_feats = img_feats[:, 0:img_feats.shape[1] // 2]
+
+    if use_norm_score:
+        img_input_feats = img_input_feats
+    else:
+        img_input_feats = img_input_feats / np.sqrt(
+            np.sum(img_input_feats**2, -1, keepdims=True))
+
+    if use_detector_score:
+        print(img_input_feats.shape, faceness_scores.shape)
+        img_input_feats = img_input_feats * faceness_scores[:, np.newaxis]
+    else:
+        img_input_feats = img_input_feats
+
+    template_norm_feats, unique_templates = image2template_feature(
+        img_input_feats, templates, medias)
+    stop = timeit.default_timer()
+    print('Time: %.2f s. ' % (stop - start))
+
+    start = timeit.default_timer()
+    score = verification(template_norm_feats, unique_templates, p1, p2)
+    stop = timeit.default_timer()
+    print('Time: %.2f s. ' % (stop - start))
+    result_dir = args.result_dir
+
+    save_path = os.path.join(result_dir, "{}_result".format(args.target))
+    if not os.path.exists(save_path):
+        os.makedirs(save_path)
+    score_save_file = os.path.join(save_path, "{}.npy".format(args.target))
+    np.save(score_save_file, score)
+    print(f'Save the result to {score_save_file}')
+    files = [score_save_file]
+    methods = []
+    scores = []
+    for file in files:
+        methods.append(os.path.basename(file))
+        scores.append(np.load(file))
+    methods = np.array(methods)
+    scores = dict(zip(methods, scores))
+    x_labels = [10**-6, 10**-5, 10**-4, 10**-3, 10**-2, 10**-1]
+    tpr_fpr_table = prettytable.PrettyTable(['Methods'] +
+                                            [str(x) for x in x_labels])
+    for method in methods:
+        fpr, tpr, _ = roc_curve(label, scores[method])
+        fpr = np.flipud(fpr)
+        tpr = np.flipud(tpr)
+        tpr_fpr_row = []
+        tpr_fpr_row.append("%s-%s" % (method, args.target))
+        for fpr_iter in np.arange(len(x_labels)):
+            _, min_index = min(
+                list(zip(abs(fpr - x_labels[fpr_iter]), range(len(fpr)))))
+            tpr_fpr_row.append('%.2f' % (tpr[min_index] * 100))
+        tpr_fpr_table.add_row(tpr_fpr_row)
+    print(tpr_fpr_table)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='do ijb test')
+    # general
+    parser.add_argument('--model-root', default='', help='path to load model.')
+    parser.add_argument(
+        '--image-path',
+        default='/train_tmp/IJB_release/IJBC',
+        type=str,
+        help='')
+    parser.add_argument(
+        '--result-dir', default='./output', help='path to save the results.')
+    parser.add_argument(
+        '--target',
+        default='IJBC',
+        type=str,
+        help='target, set to IJBC or IJBB')
+    main(parser.parse_args())
diff --git a/task/recognition/face/train.sh b/task/recognition/face/train.sh
new file mode 100644
index 0000000000000..8527be9a8fb72
--- /dev/null
+++ b/task/recognition/face/train.sh
@@ -0,0 +1,33 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+# 
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+# 
+#     http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# for single card training
+# CUDA_VISIBLE_DEVICES=0
+# plsc-train -c ./configs/IResNet50_MS1MV3_ArcFace_pfc10_1n8c_dp_mp_fp16o1.yaml
+
+# for multi-node and multi-cards training
+# export PADDLE_NNODES=2
+# export PADDLE_MASTER="192.168.210.1:12538"
+# export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+
+# for single-node and multi-cards training
+export PADDLE_NNODES=1
+export PADDLE_MASTER="127.0.0.1:12538"
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python -m paddle.distributed.launch \
+  --nnodes=$PADDLE_NNODES \
+  --master=$PADDLE_MASTER \
+  --devices=$CUDA_VISIBLE_DEVICES \
+  plsc-train \
+  -c ./configs/IResNet50_MS1MV3_ArcFace_pfc10_1n8c_dp_mp_fp16o1.yaml