Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Remember processed files #5

Closed
Torkiliuz opened this issue Mar 17, 2016 · 38 comments
Closed

Enhancement: Remember processed files #5

Torkiliuz opened this issue Mar 17, 2016 · 38 comments
Assignees
Milestone

Comments

@Torkiliuz
Copy link

Could you make it so that imgult remembers processed files, similarly to how rsync does? Mainly this is useful so that if the computer crashes during processing imgult can skip already processed files.

@ryanpcmcquen
Copy link
Owner

In some ways it already does. Most of the tools used by imgult blaze right through files they have already processed/optimized.

Try running it on the same file twice. The second run should be considerably faster.

@Torkiliuz
Copy link
Author

At least in my testing with a library consisting of thumbnails and posters for 26 TB worth of movies/TV shows, my server has a big problem surviving with all the processes spawning, even when I change the nice level to 19 😉 That's the reason it would be nice if it created a imgult-processed.txt-file, so that it could diff imgult-files.txt with that and pick up from when the server crashes, Just a thought though 😉

@ryanpcmcquen
Copy link
Owner

I like what you're saying. It will be a bit tricky because several tools process the file (not just one), but I may have an idea.

I am curious though, what kind of server and what version of everything are you running?

jpegoptim
mozjpeg
optipng
pngquant
gifsicle
exiv2
svgo

@Torkiliuz
Copy link
Author

Linux 3.19.0-56-generic #62~14.04.1-Ubuntu x86_64 GNU/Linux

jpegoptim v1.3.0 x86_64-pc-linux-gnu
mozjpeg version 3.1 (build 20150904)
OptiPNG 0.6.4: Advanced PNG optimizer.
pngquant 2.0.1 (September 2013)
LCDF Gifsicle 1.78
exiv2 0.23 001700 (64 bit build)
svgo 0.6.2

@ryanpcmcquen
Copy link
Owner

Will you give this a run?

WARNING: THIS VERSION IS UNTESTED, IT MAY EXPLODE.

https://github.com/ryanpcmcquen/image-ultimator/blob/diffProcessedFiles/imgult

@Torkiliuz
Copy link
Author

It runs and completes, but the second run still seems to run through all of them again

@Torkiliuz
Copy link
Author

you're doing the grep, but not sending that to anything, maybe that is the problem? I think you need a third file that you write that grep to, something like imgult-notprocessedfiles.txt?

@ryanpcmcquen
Copy link
Owner

Try this one:

https://github.com/ryanpcmcquen/image-ultimator/blob/diffProcessedFiles/imgult

@Torkiliuz
Copy link
Author

The grep takes a while, but it works 👍
I think an echo with "matching already processed files" or something could be a nice addition, but I'm just really happy you took the time to make this work!

@ryanpcmcquen
Copy link
Owner

Great idea! Thanks so much for testing and the suggestion. :^)

@ryanpcmcquen
Copy link
Owner

Would you mind if I mentioned your use case in the README for the 4.0.00 release? 26TB is quite the testing ground.

@ryanpcmcquen ryanpcmcquen added this to the 4.0.00 milestone Mar 17, 2016
@ryanpcmcquen
Copy link
Owner

Also, will you give it one final run?

https://github.com/ryanpcmcquen/image-ultimator/blob/diffProcessedFiles/imgult

I did a little cleanup, with your go-ahead I will release this as the new version. 😺

@Torkiliuz
Copy link
Author

I'd be honored 😄 To clarify, the image-files are far smaller than 26 TB in size, but it's posters, albumart etc. for 26 TB of media. The size of the posters and albumart is roughly 140 GB 😉

@ryanpcmcquen
Copy link
Owner

Still that's quite the testing ground. I am surprised imgult actually parses through all that. And 140GB is a huge amount of images, how many times do you have to run imgult to successfully complete all that?

@Torkiliuz
Copy link
Author

Tested this in the same folder as earlier, but noticed this now:
OpenSans-BoldItalic-webfont.svg: The file contains data of an unknown image type
I have svgo installed, and it should work in theory, but I'm not certain, as it is a "font-svg". PNG's go through normal as before though.
I also get this:
spinner.gif: Writing to GIF images is not supported
This is probably just because it's an animated gif?

It seems good to go right now.
I have been running imgult for about 10 times now with the old version, but it was kind of pointless, since it would start at the beginning each time. With this version I should at least be able to get all images optimized after 10 runs or so 😄

@ryanpcmcquen
Copy link
Owner

That's amazing! The GIF message is actually from exiv2, it doesn't currently support gifs but I bet it will in the future, so I just process them for now (the warning is innocuous).

You may want to bring up that specific file with the svgo people, it would probably be helpful for them, or they may have an idea what is going on. I would like to know as well. 😃

@ryanpcmcquen
Copy link
Owner

Let me know if there is anything else I can do here, and thanks again for the report!

The release is live!

https://github.com/ryanpcmcquen/image-ultimator/releases

@Torkiliuz
Copy link
Author

Figured out a faster way to do the comparison, by using comm.
From my testing it seemed to work, although it's getting hard to tell by now 😜
nice -n15 comm -13 --nocheck-order ${IMGULT_PROCESSED_FILES_LIST} ${IMGULT_FILES_LIST} > ${IMGULT_TEMP_FILES_LIST}

@ryanpcmcquen
Copy link
Owner

How much faster is it?

@Torkiliuz
Copy link
Author

If it still works as it should, which is now hard to determine, it runs through the list so fast it doesn't even show up in htop. I might have made a typo, so I'll try to get a better test setup, so that I can be sure that it works. I get this at the end, but it still seems to run through the files:
/usr/local/bin/imgultnew: 0: /usr/local/bin/imgultnew: Cannot fork

@Torkiliuz
Copy link
Author

Tested it now, it does not work, it even stopped working on the first run 😞
I didn't see it work correctly earlier either, it just looked like it worked, really sorry about that. Tested now with the Kodak png images, but they are still at the same size...

@ryanpcmcquen
Copy link
Owner

The new version works though, correct?

@Torkiliuz
Copy link
Author

No, it seems to not work, now that I finally got a stable test-folder 😢

The only output i get now:

File 1/1: ./kodim20.png
File 1/1: ./kodim23.png
File 1/1: ./kodim15.png
File 1/1: ./kodim17.png
File 1/1: ./kodim16.png
File 1/1: ./kodim01.png
File 1/1: ./kodim14.png
File 1/1: ./kodim02.png
File 1/1: ./kodim05.png
File 1/1: ./kodim22.png
File 1/1: ./kodim07.png
File 1/1: ./kodim21.png
File 1/1: ./kodim13.png
File 1/1: ./kodim24.png
File 1/1: ./kodim19.png
File 1/1: ./kodim03.png
File 1/1: ./kodim12.png
File 1/1: ./kodim06.png
File 1/1: ./kodim11.png
File 1/1: ./kodim08.png
File 1/1: ./kodim09.png
File 1/1: ./kodim18.png
File 1/1: ./kodim10.png
File 1/1: ./kodim04.png

****************************************************************************** 
                 ___           ___           ___           ___       ___      
     ___        /\__\         /\  \         /\__\         /\__\     /\  \     
    /\  \      /::|  |       /::\  \       /:/  /        /:/  /     \:\  \    
    \:\  \    /:|:|  |      /:/\:\  \     /:/  /        /:/  /       \:\  \   
    /::\__\  /:/|:|__|__   /:/  \:\  \   /:/  /  ___   /:/  /        /::\  \  
 __/:/\/__/ /:/ |::::\__\ /:/__/_\:\__\ /:/__/  /\__\ /:/__/        /:/\:\__\ 
/\/:/  /    \/__/--/:/  / \:\  /\ \/__/ \:\  \ /:/  / \:\  \       /:/  \/__/ 
\::/__/           /:/  /   \:\ \:\__\    \:\  /:/  /   \:\  \     /:/  /      
 \:\__\          /:/  /     \:\/:/  /     \:\/:/  /     \:\  \    \/__/       
  \/__/         /:/  /       \::/  /       \::/  /       \:\__\               
                \/__/         \/__/         \/__/         \/__/               

****************************************************************************** 

* Execute parametric cleaning sequence: * 
removed ‘imgult-files.txt’

* The imgult has completed. Take care. * 
******************************************************************************

@ryanpcmcquen ryanpcmcquen removed this from the 4.0.00 milestone Mar 17, 2016
@ryanpcmcquen
Copy link
Owner

Would you try the master? I borked something, should work now. Also, if this does work we can test comm again. 👍

@Torkiliuz
Copy link
Author

The hotfix worked, I tried to change grep to comm, but that did not work though 😄

@ryanpcmcquen
Copy link
Owner

How about this version? (with comm):

http://sprunge.us/JIZU

@Torkiliuz
Copy link
Author

I have tested comm by creating two textfiles and listed some similar and different elements in them, and comm shows the result correctly.
But for some reason it doesn't work with this though 😞
I'll do a manual test and write paths in the textfiles, I'm pretty sure that's where the problem occurs

@Torkiliuz
Copy link
Author

When I do the manual steps it works, so for some reason it stops working when run from a script. Weird 😕

@ryanpcmcquen
Copy link
Owner

@Torkiliuz
Copy link
Author

Weird, it runs, but on the second run it doesn't skip any of the files 😕

@Torkiliuz
Copy link
Author

comm: file 1 is not in sorted order
comm: file 2 is not in sorted order

I think there might be a sort-function that needs to run first for it to work

@ryanpcmcquen
Copy link
Owner

Strange. It skips all the files here. What arguments are you sending to comm? Keep in mind that on non-Linux systems comm does not have the --nocheck-order, which may make using comm a showstopper.

@ryanpcmcquen ryanpcmcquen added this to the 4.0.00 milestone Mar 17, 2016
@Torkiliuz
Copy link
Author

It's just running comm -13 ${IMGULT_PROCESSED_FILES_LIST} ${IMGULT_FILES_LIST} > ${IMGULT_TEMP_FILES_LIST}

But it ends up with this:

comm: file 1 is not in sorted order
comm: file 2 is not in sorted order
File 1/1: ./kodim15.png
File 1/1: ./kodim11.png
File 1/1: ./kodim10.png
File 1/1: ./kodim13.png
File 1/1: ./kodim05.png
File 1/1: ./kodim19.png
File 1/1: ./kodim14.png
File 1/1: ./kodim07.png
File 1/1: ./kodim08.png
File 1/1: ./kodim04.png
File 1/1: ./kodim12.png
File 1/1: ./kodim20.png
File 1/1: ./kodim06.png
File 1/1: ./kodim09.png
./kodim14.png:
./kodim08.png:
./kodim13.png:
./kodim05.png:
./kodim12.png:
./kodim07.png:
./kodim06.png:
./kodim20.png:
./kodim15.png:
./kodim11.png:
./kodim19.png:
./kodim09.png:
./kodim04.png:
./kodim10.png:

There might be a way around it by running a sort-function, so that would solve --nocheck-order not being available on other systems. Not sure how much more I can test today, it's getting late, thanks so much for what you've done so far at least 😃

@ryanpcmcquen
Copy link
Owner

By the time we write a sort function, we probably will not save any time over just using the grep one-liner we have now. If you do find it is still faster, I would be happy to incorporate the change.

I will keep the commTest branch open for now.

@Torkiliuz
Copy link
Author

Ubuntu 14.04 set up like the following runs 4.0.01 on the whole drive without making the server crash:

jpegoptim v1.3.0 x86_64-pc-linux-gnu (from normal repositories)
mozjpeg version 3.1 (build 20150904) (from here)
OptiPNG 0.6.4: Advanced PNG optimizer. (from normal repositories)
pngquant 2.3.0 (July 2014) (from here)
LCDF Gifsicle 1.78 (from normal repositories)
exiv2 0.23 001700 (64 bit build) (from normal repositories)
svgo 0.6.2 (sudo npm install -g svgo)

@ryanpcmcquen
Copy link
Owner

That's amazing!!! Thank you so much for your help. Does that mean we can close this issue?

@Torkiliuz
Copy link
Author

Yes, the issue can be closed. It might be a good idea to write the versions
you need as requirements though 😄 As it is now it's kind of confusing
because the default versions in at least Ubuntu are not the best ones 😜

@ryanpcmcquen
Copy link
Owner

That is a good idea! I use Slackware so I have more current versions of all the tools ... luckily when Ubuntu 16.04 gets released people will have much newer versions of everything by default.

@ryanpcmcquen ryanpcmcquen modified the milestones: 4.0.01, 4.0.00 Mar 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants