Skip to content

Commit

Permalink
fix eva02 pretrain and coco_clip md5
Browse files Browse the repository at this point in the history
fix eva02 pretrain and coco_clip md5
  • Loading branch information
nemonameless authored Oct 13, 2023
2 parents 0be6376 + 084bb90 commit 3dff0ec
Show file tree
Hide file tree
Showing 9 changed files with 38 additions and 39 deletions.
17 changes: 8 additions & 9 deletions paddlemix/datasets/coco_clip.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import collections
import json
import os
from paddle.dataset.common import md5file

from paddle.utils.download import get_path_from_url

from paddlemix.utils.env import DATA_HOME
Expand All @@ -28,27 +28,26 @@
class CaptionCLIP(DatasetBuilder):

URL = "https://bj.bcebos.com/v1/paddlenlp/datasets/paddlemix/coco.tar"
META_INFO = collections.namedtuple(
"META_INFO", ("images", "annotations", "images_md5", "annotations_md5"))
MD5 = "e670ce82b14b3f45d08c9370808ee1e7"
META_INFO = collections.namedtuple("META_INFO", ("images", "annotations", "images_md5", "annotations_md5"))
MD5 = ""
SPLITS = {
"train": META_INFO(
os.path.join("coco", "images"),
os.path.join("coco", "annotations/coco_karpathy_train.json"),
"",
"aa31ac474cf6250ebb81d18348a07ed8",
"",
),
"val": META_INFO(
os.path.join("coco", "images"),
os.path.join("coco", "annotations/coco_karpathy_val.json"),
"",
"b273847456ef5580e33713b1f7de52a0",
"",
),
"test": META_INFO(
os.path.join("coco", "images"),
os.path.join("coco", "annotations/coco_karpathy_test.json"),
"",
"3ff34b0ef2db02d01c37399f6a2a6cd1",
"",
),
}

Expand All @@ -57,8 +56,8 @@ def _get_data(self, mode, **kwargs):
images, annotations, image_hash, anno_hash = self.SPLITS[mode]
image_fullname = os.path.join(DATA_HOME, images)
anno_fullname = os.path.join(DATA_HOME, annotations)
if (not os.path.exists(image_fullname) or not os.path.exists(anno_fullname) or not md5file(anno_fullname) == anno_hash):
get_path_from_url(self.URL, DATA_HOME, self.MD5)
if not os.path.exists(image_fullname) or not os.path.exists(anno_fullname):
get_path_from_url(self.URL, DATA_HOME)

return image_fullname, anno_fullname, mode

Expand Down
8 changes: 4 additions & 4 deletions paddlemix/examples/clip/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ export $PATH=$PATH:$INSTALL_DIR

1) coco数据

数据部分,默认使用`coco_karpathy`数据,使用该数据不需另外配置,会自动下载。解析部分参考`coco_clip.py`文件。
数据部分,默认使用`coco_karpathy`数据,使用该数据不需另外配置,会自动下载。解析部分参考`paddlemix/datasets/coco_clip.py`文件。

如果想手动下载,请点击[DownLoadCoCo](https://bj.bcebos.com/v1/paddlenlp/datasets/paddlemix/coco.tar)
如果想手动下载,请点击[DownLoadCOCO 20G](https://bj.bcebos.com/v1/paddlenlp/datasets/paddlemix/coco.tar)下载数据,可解压后放在`/root/.paddlemix/datasets/`目录下,此目录也为自动下载并解压的目录。

2) 自定义数据

Expand All @@ -63,13 +63,13 @@ export $PATH=$PATH:$INSTALL_DIR

### 4.1 训练

训练时使用`paddlemix/examples/clip/run_pretrain_dist.py`程序进行训练。
训练时使用`paddlemix/examples/clip/run_pretrain_dist.py`程序进行训练**训练前请先检查数据集路径**,如COCO数据集一般会被默认解压存放在`/root/.paddlemix/datasets/coco`目录

训练命令及参数配置示例:

这里示例采用单机8卡程序,sharding_degree=8.

注意如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`.
注意如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`

```
MODEL_NAME="paddlemix/CLIP/Vit-L-14/"
Expand Down
8 changes: 4 additions & 4 deletions paddlemix/examples/coca/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ export $PATH=$PATH:$INSTALL_DIR

1) coco数据

数据部分,默认使用`coco_karpathy`数据,使用该数据不需另外配置,会自动下载。解析部分参考`coco_clip.py`文件。
数据部分,默认使用`coco_karpathy`数据,使用该数据不需另外配置,会自动下载。解析部分参考`paddlemix/datasets/coco_clip.py`文件。

如果想手动下载,请点击[DownLoadCoCo](https://bj.bcebos.com/v1/paddlenlp/datasets/paddlemix/coco.tar)
如果想手动下载,请点击[DownLoadCOCO 20G](https://bj.bcebos.com/v1/paddlenlp/datasets/paddlemix/coco.tar)下载数据,可解压后放在`/root/.paddlemix/datasets/`目录下,此目录也为自动下载并解压的目录。

2) 自定义数据

Expand All @@ -64,13 +64,13 @@ export $PATH=$PATH:$INSTALL_DIR

### 4.1 训练

训练时使用`paddlemix/examples/coca/run_pretrain_dist.py`程序进行训练。
训练时使用`paddlemix/examples/coca/run_pretrain_dist.py`程序进行训练**训练前请先检查数据集路径**,如COCO数据集一般会被默认解压存放在`/root/.paddlemix/datasets/coco`目录

训练命令及参数配置示例:

这里示例采用单机8卡程序,sharding_degree=8.

注意如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`.
注意如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`

```
MODEL_NAME="paddlemix/CoCa/coca_Vit-L-14/"
Expand Down
22 changes: 11 additions & 11 deletions paddlemix/examples/eva02/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,9 +116,9 @@ export $PATH=$PATH:$INSTALL_DIR
注意:

1. 如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`
2. `model_name` 可单独使用创建模型,如果更换teacher,则需自己改写`paddlemix/EVA/EVA02/eva02_Ti_for_pretrain`中config.json and model_config.json的teacher_config这个字段的内容,比如将默认的 `paddlemix/EVA/EVA01-CLIP-g-14` 改为 "paddlemix/EVA/EVA02-CLIP-bigE-14"。而student_config是dict,student模型本身是train from scratch的;
3. 如果 model_name=None,也可采用 teacher_name 和 student_name 来创建模型,但它们必须都各自具有config.json和model_state.pdparams,一般eval或加载全量权重debug时采用 model_name=None 的形式;
4. `TEA_PRETRAIN_CKPT`通常情况下设置为None,模型训练前已加载来自`teacher_name`中的对应teacher预训练权重。但是**如果设置 MP_DEGREE > 1**时,则必须再次设置`TEA_PRETRAIN_CKPT`的路径去加载,一般设置绝对路径,也可从对应的下载链接单独下载相应的`model_state.pdparams`并放置;
2. 设定具体的 `model_name` 可以直接使用创建模型,但如果更换teacher,则需自己改写`paddlemix/EVA/EVA02/eva02_Ti_for_pretrain`中config.json and model_config.json的`teacher_config`这个字段的内容,比如将默认的 `paddlemix/EVA/EVA01-CLIP-g-14` 改为 `paddlemix/EVA/EVA02-CLIP-bigE-14`。而`student_config`这个字段是dict,student模型本身是train from scratch的;
3. 如果设定 `model_name=None`,也可以通过设定具体的 `teacher_name``student_name` 来创建模型,**但它们必须都各自具有config.json和model_state.pdparams**,一般**eval评估或加载全量权重debug时**,常采用 `model_name=None` 的形式;
4. `TEA_PRETRAIN_CKPT`通常情况下设置为None,模型训练前会自动加载来自`teacher_name`中的对应teacher预训练权重。但是**如果设置 MP_DEGREE > 1**时,则必须再次设置`TEA_PRETRAIN_CKPT`的路径去加载,一般设置为其绝对路径,也可从对应的下载链接单独下载相应的`model_state.pdparams`并放置;


训练命令及参数配置示例,这里示例采用单机8卡程序:
Expand Down Expand Up @@ -175,7 +175,7 @@ FP16_OPT_LEVEL="O1"
enable_tensorboard=True

TRAINING_PYTHON="python -m paddle.distributed.launch --master ${MASTER} --nnodes ${TRAINERS_NUM} --nproc_per_node ${TRAINING_GPUS_PER_NODE} --ips ${TRAINER_INSTANCES}"
${TRAINING_PYTHON} paddlemix/examples/eva02/run_eva02_pretrain_dist.py \
${TRAINING_PYTHON} run_eva02_pretrain_dist.py \
--do_train \
--data_path ${DATA_PATH}/train \
--model ${model_name} \
Expand Down Expand Up @@ -244,12 +244,12 @@ STU_PRETRAIN_CKPT=None

1. 如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`

2. 如果训练`paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14`, 则必须加载**其对应的预训练权重**`paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14`,然后设置预训练权重的`model_state.pdparams`的绝对路径,或单独从[这个链接](https://bj.bcebos.com/v1/paddlenlp/models/community/paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14/model_state.pdparams)下载并放置
2. tiny/s是336尺度训练,B/L是448尺度训练,而它们的预训练权重均是224尺度训练得到的

3. tiny/s是336尺度训练,B/L是448尺度训练,而它们的预训练权重均是224尺度训练得到的
3. 如果训练`paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14`, 则必须加载**其对应的预训练权重**`paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14`,然后设置`PRETRAIN_CKPT`即预训练权重的`model_state.pdparams`的绝对路径;或者可以单独从[模型对应下载链接](https://bj.bcebos.com/v1/paddlenlp/models/community/paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14/model_state.pdparams)下载并放置。其他模型同理


训练命令及参数配置示例,这里示例采用单机8卡程序:
训练命令及参数配置示例,这里示例采用单机8卡程序,运行前请先**确保预训练权重即`PRETRAIN_CKPT`的路径是存在的**
```shell
export FLAGS_embedding_deterministic=1
export FLAGS_cudnn_deterministic=1
Expand Down Expand Up @@ -282,8 +282,8 @@ MP_DEGREE=1 # tensor_parallel_degree
SHARDING_DEGREE=1 # sharding_parallel_degree

MODEL_NAME="paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14"
PRETRAIN_CKPT=/root/.paddlenlp/models/paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14/model_state.pdparams # must be added, pretrained model, input_size is 224
# wget https://bj.bcebos.com/v1/paddlenlp/models/community/paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14/model_state.pdparams
PRETRAIN_CKPT=/root/.paddlenlp/models/paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14/model_state.pdparams # must be added, pretrained model, input_size is 224

OUTPUT_DIR=./output/eva02_Ti_pt_in21k_ft_in1k_p14

Expand All @@ -301,7 +301,7 @@ FP16_OPT_LEVEL="O1"
enable_tensorboard=True

TRAINING_PYTHON="python -m paddle.distributed.launch --master ${MASTER} --nnodes ${TRAINERS_NUM} --nproc_per_node ${TRAINING_GPUS_PER_NODE} --ips ${TRAINER_INSTANCES}"
${TRAINING_PYTHON} paddlemix/examples/eva02/run_eva02_finetune_dist.py \
${TRAINING_PYTHON} run_eva02_finetune_dist.py \
--do_train \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
Expand Down Expand Up @@ -349,7 +349,7 @@ ${TRAINING_PYTHON} paddlemix/examples/eva02/run_eva02_finetune_dist.py \

注意:

1. 默认加载的是下载的`paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14`里的训好的权重,所以PRETRAIN_CKPT=None,**如果是本地新训好的权重**则可设置PRETRAIN_CKPT的具体路径去加载和评估
1. 默认加载的是下载的`paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14`里的训好的权重,所以`PRETRAIN_CKPT=None`**如果是本地新训好的权重**则可设置`PRETRAIN_CKPT`的具体路径去加载和评估


```shell
Expand All @@ -363,7 +363,7 @@ num_workers=10

PRETRAIN_CKPT=None # output/eva02_Ti_pt_in21k_ft_in1k_p14/checkpoint-xxx/model_state.pdparams

CUDA_VISIBLE_DEVICES=0 python paddlemix/examples/eva02/run_eva02_finetune_eval.py \
CUDA_VISIBLE_DEVICES=0 python run_eva02_finetune_eval.py \
--do_eval \
--model ${MODEL_NAME} \
--pretrained_model_path ${PRETRAIN_CKPT} \
Expand Down
8 changes: 4 additions & 4 deletions paddlemix/examples/evaclip/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,9 @@ export $PATH=$PATH:$INSTALL_DIR

1) coco数据

数据部分,默认使用`coco_karpathy`数据,使用该数据不需另外配置,会自动下载。解析部分参考`coco_clip.py`文件。
数据部分,默认使用`coco_karpathy`数据,使用该数据不需另外配置,会自动下载。解析部分参考`paddlemix/datasets/coco_clip.py`文件。

如果想手动下载,请点击[DownLoadCoCo](https://bj.bcebos.com/v1/paddlenlp/datasets/paddlemix/coco.tar)
如果想手动下载,请点击[DownLoadCOCO 20G](https://bj.bcebos.com/v1/paddlenlp/datasets/paddlemix/coco.tar)下载数据,可解压后放在`/root/.paddlemix/datasets/`目录下,此目录也为自动下载并解压的目录。

2) 自定义数据

Expand All @@ -89,13 +89,13 @@ export $PATH=$PATH:$INSTALL_DIR

### 4.1 训练

训练时使用`paddlemix/examples/evaclip/run_pretrain_dist.py`程序进行训练。
训练时使用`paddlemix/examples/evaclip/run_pretrain_dist.py`程序进行训练**训练前请先检查数据集路径**,如COCO数据集一般会被默认解压存放在`/root/.paddlemix/datasets/coco`目录

训练命令及参数配置示例:

这里示例采用单机8卡程序,sharding_degree=8.

注意如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`.
注意如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`

```
MODEL_NAME="paddlemix/EVA/EVA02-CLIP-L-14"
Expand Down
6 changes: 3 additions & 3 deletions paddlemix/models/clip/clip_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@
from typing import Union

import numpy as np

from paddlemix.models.model_utils import MixPretrainedModel
from paddlenlp.transformers.configuration_utils import PretrainedConfig
from paddlenlp.utils.log import logger

from paddlemix.models.model_utils import MixPretrainedModel

from .loss import ClipLoss
from .text_model import TextTransformer, TextTransformerConfig
from .vit_model import VisionTransformer, VisionTransformerConfig
Expand Down Expand Up @@ -146,7 +146,7 @@ def from_pretrained(
pretrained_vismodel_name_or_path=None,
pretrained_textmodel_name_or_path=None,
from_hf_hub: bool = False,
subfolder: str = None,
subfolder: str = "",
*args,
**kwargs,
):
Expand Down
4 changes: 2 additions & 2 deletions paddlemix/models/clip/eva_clip_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@
from typing import Union

import numpy as np
from paddlenlp.transformers.configuration_utils import PretrainedConfig

from paddlemix.models.model_utils import MixPretrainedModel
from paddlemix.utils.log import logger
from paddlenlp.transformers.configuration_utils import PretrainedConfig

from .loss import ClipLoss
from .text_model import TextTransformer, TextTransformerConfig
Expand Down Expand Up @@ -145,7 +145,7 @@ def from_pretrained(
pretrained_vismodel_name_or_path=None,
pretrained_textmodel_name_or_path=None,
from_hf_hub: bool = False,
subfolder: str = None,
subfolder: str = "",
*args,
**kwargs,
):
Expand Down
2 changes: 1 addition & 1 deletion paddlemix/models/eva02/modeling_pretrain.py
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ def from_pretrained(
pretrained_teacher_name_or_path=None,
pretrained_student_name_or_path=None,
from_hf_hub: bool = False,
subfolder: str = None,
subfolder: str = "",
*args,
**kwargs,
):
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
numpy
paddlenlp>=2.6.0rc0
paddlenlp>=2.6.1
tensorboardX
opencv-python
Pillow
Expand Down

0 comments on commit 3dff0ec

Please sign in to comment.