[Feature] Add a tool to find broken files. #482

Ezra-Yu · 2021-10-09T12:11:57Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

After preparing the dataset, the data may be broken. add a tool to find out all the broken files.

Modification

Add a verify_dataset.py tool

BC-breaking (Optional)

No.

Use cases (Optional)

python tools/misc/verify_dataset.py ${CONFIG_PATH}  --num-process ${CPU_TO_USE} --phase ${PHASE} --out-path ${OUT}

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
CLA has been signed and all committers have signed the CLA in this PR.

codecov · 2021-10-09T12:17:05Z

Codecov Report

Merging #482 (60dc641) into master (6fba107) will increase coverage by 0.76%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #482      +/-   ##
==========================================
+ Coverage   78.01%   78.77%   +0.76%     
==========================================
  Files         101      103       +2     
  Lines        5617     5702      +85     
  Branches      923      927       +4     
==========================================
+ Hits         4382     4492     +110     
+ Misses       1108     1088      -20     
+ Partials      127      122       -5

Flag	Coverage Δ
unittests	`78.77% <ø> (+0.76%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmcls/datasets/pipelines/formating.py	`0.00% <0.00%> (-43.48%)`	⬇️
mmcls/models/utils/attention.py	`98.72% <0.00%> (-1.28%)`	⬇️
mmcls/datasets/builder.py	`42.55% <0.00%> (-1.20%)`	⬇️
mmcls/apis/train.py	`22.72% <0.00%> (ø)`
mmcls/apis/inference.py	`19.64% <0.00%> (ø)`
mmcls/models/backbones/vgg.py	`86.58% <0.00%> (ø)`
mmcls/models/backbones/resnet.py	`100.00% <0.00%> (ø)`
mmcls/models/backbones/__init__.py	`100.00% <0.00%> (ø)`
mmcls/models/backbones/res2net.py	`95.50% <0.00%> (ø)`
mmcls/datasets/pipelines/formatting.py	`43.47% <0.00%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6fba107...60dc641. Read the comment docs.

mzr1996

I pushed a commit and make some changes, please check it. @Ezra-Yu

Wu-570 · 2022-06-08T12:56:35Z

I have used verify_dataset.py to check my datasets, and there were 1885 broken files. My datasets were organized according to ImageNet and they were all jpg images. I wander why they were broken, and I want to know how to handle the 1885 broken files to make them pass the verify_dataset.py.

* add verify dataset * add phase * rm attr of single_process * Use `mmcv.track_parallel_progress` to track the validation. Co-authored-by: mzr1996 <[email protected]>

Ezra-Yu added 2 commits September 30, 2021 14:35

add verify dataset

21f8ae0

add phase

814a019

Ezra-Yu and others added 2 commits October 12, 2021 16:33

rm attr of single_process

d40b23b

Use mmcv.track_parallel_progress to track the validation.

60dc641

mzr1996 approved these changes Oct 26, 2021

View reviewed changes

mzr1996 changed the title ~~Add tool to find broken files~~ [Feature] Add a tool to find broken files. Oct 27, 2021

mzr1996 merged commit 52e6256 into open-mmlab:master Oct 27, 2021

Ezra-Yu deleted the broken-files branch October 27, 2021 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add a tool to find broken files. #482

[Feature] Add a tool to find broken files. #482

Ezra-Yu commented Oct 9, 2021 •

edited

Loading

codecov bot commented Oct 9, 2021 •

edited

Loading

mzr1996 left a comment

Wu-570 commented Jun 8, 2022

[Feature] Add a tool to find broken files. #482

[Feature] Add a tool to find broken files. #482

Conversation

Ezra-Yu commented Oct 9, 2021 • edited Loading

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

codecov bot commented Oct 9, 2021 • edited Loading

Codecov Report

mzr1996 left a comment

Choose a reason for hiding this comment

Wu-570 commented Jun 8, 2022

Ezra-Yu commented Oct 9, 2021 •

edited

Loading

codecov bot commented Oct 9, 2021 •

edited

Loading