Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train objects365 without auto download the dataset. #4658

Closed
nocolour opened this issue Sep 3, 2021 · 14 comments
Closed

How to train objects365 without auto download the dataset. #4658

nocolour opened this issue Sep 3, 2021 · 14 comments
Labels
question Further information is requested Stale Stale and schedule for closing soon

Comments

@nocolour
Copy link

nocolour commented Sep 3, 2021

Dear all,

Have a nice day.

Since, I succeed to train on VisDrone. I wish to test on object365, but facing problem of downloading..

My Problem:
Object365 dataset too huge, download time out and can not complete with train.py --data Objects365.yaml.

To solve this problem, I use download manager to download the files manually one by one.

My question:

  1. If I downloaded the object365 dataset manually. Where I need to unzip it? and folders structure?
  2. If run python train.py --data Objects365.yaml, need to disable the download in Objects365.yaml ? If already downloaded manually.
  3. How many free space needed for object365?
  4. How many epochs needed? 300 enough?
  5. Do I need to use --hpy hyp.finetune_objects365.yaml ? default='data/hyps/hyp.scratch.yaml'

Thanks for your help..

@nocolour nocolour added the question Further information is requested label Sep 3, 2021
@glenn-jocher
Copy link
Member

@nocolour I haven't had enough opportunity to train on Objects365 to answer your questions well, but yes naturally if you download it yourself you should comment out the download field in the yaml so it doesn't try to download it again. You should place your data in the structure indicated in the yaml here, and yes you can use either the default hyps or the 365 hyps you indicated to train, though you should not need 300 epochs since the dataset is much larger than COCO, which is where the 300 number comes from.

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/Objects365 # dataset root dir
train: images/train # train images (relative to 'path') 1742289 images
val: images/val # val images (relative to 'path') 5570 images
test: # test images (optional)

@nocolour
Copy link
Author

nocolour commented Sep 6, 2021

@nocolour I haven't had enough opportunity to train on Objects365 to answer your questions well, but yes naturally if you download it yourself you should comment out the download field in the yaml so it doesn't try to download it again. You should place your data in the structure indicated in the yaml here, and yes you can use either the default hyps or the 365 hyps you indicated to train, though you should not need 300 epochs since the dataset is much larger than COCO, which is where the 300 number comes from.

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/Objects365 # dataset root dir
train: images/train # train images (relative to 'path') 1742289 images
val: images/val # val images (relative to 'path') 5570 images
test: # test images (optional)

Thank you for reply.

@nocolour nocolour changed the title How to train object365 without auto download the dataset. How to train objects365 without auto download the dataset. Sep 6, 2021
@nocolour
Copy link
Author

nocolour commented Sep 7, 2021

@glenn-jocher
The download zip files, I don't see the label text files. Only have json file from zhiyuan_objv2_train.tar.gz.
zhiyuan_objv2_train.json is the label? Need to convert it? What I need to do?

@glenn-jocher
Copy link
Member

@nocolour autodownload handles all conversion, I would recommend you simply use that:

python train.py --data Objects365.yaml

@nocolour
Copy link
Author

nocolour commented Sep 7, 2021

@nocolour autodownload handles all conversion, I would recommend you simply use that:

python train.py --data Objects365.yaml

Ya, I found the way to solve it ready. But need do the test 1st. I will setup the localhost webserver to host the dataset I downloaded. Then change the download url in Objects365.yaml.

However still thank you.

@glenn-jocher
Copy link
Member

@nocolour oh good idea!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2021

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Oct 8, 2021
@wangsun1996
Copy link

Can you provide a pre-training model on object365?

@wangsun1996
Copy link

thank you very much! furthermore,Can you provide a pre-training model yolov5s6 or yolov5s on object365?

@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 30, 2022 via email

@wangsun1996
Copy link

OK,thank you very much!!!

@wangsun1996
Copy link

i want to train a yolov5s6 on OBJ365 dataset to get a pre-train model,but we need 12h to train an epoch,Is there any way to speed up training?

@glenn-jocher
Copy link
Member

glenn-jocher commented Dec 3, 2022

👋 Hello! Thanks for asking about training speed issues. YOLOv5 🚀 can be trained on CPU (slowest), single-GPU, or multi-GPU (fastest). If you would like to increase your training speed some options are:

  • Increase --batch-size
  • Reduce --img-size
  • Reduce model size, i.e. from YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s
  • Train with multi-GPU DDP at larger --batch-size
  • Train on cached data: python train.py --cache (RAM caching) or --cache disk (disk caching)
  • Train on faster GPUs, i.e.: P100 -> V100 -> A100
  • Train on free GPU backends with up to 16GB of CUDA memory: Open In Colab Open In Kaggle

Good luck 🍀 and let us know if you have any other questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

3 participants