This is the first version of image sorter/pruner. It will enable the user to identify duplicate photos from a Media and prune them.
- [ ] Support all image formats.
- [ ] Support sorting based on date/alphabetically.
- [ ] Identify duplicate photos based on SHA1 hash.
- [ ] Either delete or move the duplicates to a separate directory.
- [ ] Dont traverse links, soft or hard.
- [ ] Minimum speed=10MBps. Avg photo size = 500kB, so 20photos per seconds processing.
- [ ] Should be able to identify the almost similar photos as well. Investigate this.
- [ ] Should be able to Index photos based on folder names.
With the easy access to mobiles and cameras(and whatsapp too :P), we have lots of photos on any storage media like Hard disk, pen drive,etc. Often, we copy these photos from multiple friends or sources like whatsapp/facebook. While doing that, generally we just copy them, thereby having multiple copies of the same photo. Its impractical to scan the folders again for those duplicates. This utility tries to solve that problem.
- GUI.
- Image Processor.
- [X] Given a location, fetch the list of images with directory structure.[3/3]
- [X] Write a program to read the directory.
- [X] Modify it to read the images from that directory.
- [X] Create the hierarchy of the images.
- [-] Create a hashmap with duplicate images going to same bucket.[1/2]
- [X] Traverse the hierarchy of images and create a hashmap, st [key, value] = [md5, list of image locations]
- [ ] Write this info to a file in json format.
- [ ] Make the image map generatiom multi threaded.
- [ ] Provide the list of duplicate images to the user 1 by one.
- [ ] Delete/move the duplicates as specified by the user.
- [ ] Add performance metrics to the tool.
- [ ] Add logging levels.
- [ ] Create dirs and update cmake
- [ ] Test with big dataset.
Test it with different image workloads like-
- Folder with only images.
- Folder with images and folders of images.
- Folder with no images.
- Folder with links to images(soft/hard).
- For each of the above test, record the timings.