Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset autodownload feature addition #685

Merged
merged 7 commits into from
Aug 10, 2020
Merged

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Aug 9, 2020

Dataset autodownload branch initial commit.

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Refinement of dataset handling and download procedures for YOLOv5.

πŸ“Š Key Changes

  • Updated coco.yaml, coco128.yaml, and voc.yaml dataset configurations to include optional download commands.
  • Removed the get_coco2017.sh script and replaced it with get_coco.sh in a new directory scripts.
  • Moved and updated the PASCAL VOC download script get_voc.sh to the scripts directory.
  • Deleted specific download commands and added a general check to confirm dataset existence or trigger an autodownload if necessary.
  • Minor refactoring in test.py and train.py to call the new dataset check function and handle download if needed.

🎯 Purpose & Impact

  • Improves user experience by streamlining dataset acquisition; users can now easily download needed datasets directly from the setup commands.
  • Reduces ambiguity by centralizing dataset downloading scripts in a specific scripts directory.
  • Enhances code maintainability and reduces code duplication by using a single function to check for dataset presence and initiate download.
  • Potential impact includes reduced setup errors and an automated approach to managing dataset availability, allowing users to get started with training models more efficiently. πŸš€

@glenn-jocher glenn-jocher self-assigned this Aug 9, 2020
@glenn-jocher glenn-jocher added enhancement New feature or request TODO High priority items labels Aug 9, 2020
@glenn-jocher glenn-jocher changed the title initial commit Dataset autodownload feature addition Aug 9, 2020
train.py Outdated Show resolved Hide resolved
@glenn-jocher
Copy link
Member Author

HI Glenn, one thing I saw was that this branch didn’t incorporate my fix on β€œMissingAttribute” stride.

Ah! Ok it needs a rebase then. Maybe this will work: /rebase

@NanoCode012
Copy link
Contributor

My default unit test passed! I didn’t use any auto download functionality though.

How should I use it to download coco128?

Would calling train auto download it if I don’t have the folder available?

@glenn-jocher
Copy link
Member Author

@NanoCode012 yes, now you can start training without downloading a dataset first! In this screenshot you can see we clone the repo, install requirements.txt, and run train.py right away. Each of the 3 datasets now have download directions in the yaml (either a zipfile URL, or a bash command).

If it all works correctly you should see something like this:
Screen Shot 2020-08-09 at 8 10 33 PM

@NanoCode012
Copy link
Contributor

WARNING: Dataset not found, nonexistant paths: ['/coco128/images/train2017', '/coco128/images/train2017']
Attempting autodownload from: https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 21.1M/21.1M [01:56<00:00, 189kB/s]
Dataset autodownload success

It works! Very simple and nice! However, I see that you output β€œtrain2017” directory twice. Is there a reason? Is it meant to be β€œval2017”?

@glenn-jocher
Copy link
Member Author

WARNING: Dataset not found, nonexistant paths: ['/coco128/images/train2017', '/coco128/images/train2017']
Attempting autodownload from: https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 21.1M/21.1M [01:56<00:00, 189kB/s]
Dataset autodownload success

It works! Very simple and nice! However, I see that you output β€œtrain2017” directory twice. Is there a reason? Is it meant to be β€œval2017”?

Ah, yes this is because coco128 trains and tests on the same 128 images. It's really meant as a sanity check to ensure your setup converges before trying larger datasets.

@glenn-jocher glenn-jocher merged commit 41523e2 into master Aug 10, 2020
@glenn-jocher glenn-jocher deleted the data-autodownload branch August 10, 2020 03:52
@glenn-jocher glenn-jocher removed the TODO High priority items label Aug 10, 2020
burglarhobbit pushed a commit to burglarhobbit/yolov5 that referenced this pull request Jan 1, 2021
* initial commit

* move download scripts into data/scripts

* new check_dataset() function in general.py

* move check_dataset() out of with context

* Update general.py

* DDP update

* Update general.py
KMint1819 pushed a commit to KMint1819/yolov5 that referenced this pull request May 12, 2021
* initial commit

* move download scripts into data/scripts

* new check_dataset() function in general.py

* move check_dataset() out of with context

* Update general.py

* DDP update

* Update general.py
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* initial commit

* move download scripts into data/scripts

* new check_dataset() function in general.py

* move check_dataset() out of with context

* Update general.py

* DDP update

* Update general.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants