-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite in Python 3.7 #7
Conversation
Hm, I uploaded my key to pool.sks-keyservers.net so not too sure why it isn't being picked up. |
On Fri, Apr 03, 2020 at 11:22:27AM -0700, Jason Phan wrote:
So during me replacing `input()`, I found that even if the client exits due to an error, the server still runs.
For example, after the server receives a PDF file, it only sends the # of pages and RGB bitmaps to the client from then on. However, say an invalid # of pages is sent and the client exits with an error. The server will still continue to process pages into bitmaps and send them over to the client, even if the client closed its `sys.stdout` and `sys.stdin` (which I thought would raise an IOError on the server the next time it called `print()` or `sys.stdout.buffer.write()` to let it know the client died).
Is there some way to indicate to the server that the client died? Or maybe some way to end the qrexec-client-vm process if the client died?
If the server doesn't read any more data, there is no "nice" way to
tell it to terminate.
But killing qrexec-client-vm process should do the trick.
…--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
|
So what do you think of replacing |
It may make more sense to reverse calling order, like done here. In short: instead of calling |
Okay, I've setup a small test to see how that would work. It's basically a synchronous version of the link you sent using The problem I have now is that I could only get consistent communication between the server & client if the subprocess is unbuffered. I think it's fine performance-wise since we're just doing IPC and not file writes but this means that the server needs to send to stdout each rgb file's size along with their contents since the server may be faster and send the contents of 2 separate rgb files before the client calls Any objections to this? If a bad/wrong filesize is sent to the client, either we'll end up an invalid rgb file or a partial one. Either way, we can verify that either visually or when we pass it to Oh also, if we go this way, can I just have |
I don't think that's necessary, but you may need to add
You're doing something wrong. You should know exactly how many bytes a page have and you should read exactly that many bytes (see argument to
Basic verification (data size in this case) should be done before data hit
Yes, one file less. |
Oh my god, how did I completely forget about the image dimensions the client gets lol.
Gotcha. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some more thorough review of the current code. Some of this issues would be fixed by the above discussed change, but some are independent.
I think it would make more sense to focus on one thing at a time - first the python rewrite keeping the old protocol and one-file limit. And only then add multi-file support, using one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides comments inline, do not rename directory to pdf-converter. It is a name of python module (see setup.py) and it cannot contain dashes.
That's my bad, not as familiar with Python packaging as I'd like to be. I can keep the files in a sub-directory though right? edit: Maybe I'll just call it |
You can simply add them into existing |
Done in 326e867. |
Almost Done!Barring any changes/fixes you might suggest after going over the new changes, I really only have a few major things I want to get in:
I'll probably be done with both of these some time today. Other changes/features can wait until after the merge. Points of InterestError handling saw a major improvement in 0449636 and fb7f609 (thank you to whoever thought of |
Drafting this PR for now. After working on the UX for a bit, I found some bugs that need fixing (mainly around error handling; some around the conversion process too). Besides, nice output and error messages for the user should really be a part of this PR anyway. |
Oops, didn't mean to re-request a review @marmarek, sorry about that. |
I don't count myself as experienced enough in python to review the code, however for the failed checks:
and
This syntax doesn't exist for python3.5, but anyway no reason for fedora-25 (dom0) to try to copy/install/check those files. Related to 'setup.py' but didn't searched his exact role |
Thanks so much! I'm really awful at CI/testing stuff so bear with me if these are stupid questions.
I think I need to also get rid of the egg-info and qubespdfconverter lines? |
Yes. But consider adding pylint instead.
Rebase is the only way. And it is ok for pull request related branch.
Dom0 indeed doesn't require the actual pdf converter scripts. But integration tests (tests.py) file should stay. There should be a way to do that with setup() arguments. You can leave it as is, I'll take care of this part. |
This commit also adds more robust argument parsing in anticipation of future options and filepath existence checks to avoid potentially wasteful qrexec-client-vm runs.
PNG tasks were being enqueued too quickly, leaving no time for RGB conversions or PNG deletions. This meant that the server would create PNGs for every single page of a PDF before any conversions started, which is clearly not ideal. After experimenting with limits on the number of PNGs created before forcing the PNG creation task to join on the queue, I found that a limit of 1 gave the best performance. Technically, it's a limit of 2 since we start a new task before we await the previous one. In any case, the server is quite a bit faster now and won't run out of space easily.
Sorry, did you mean
Does it? To update
Oops, I guess I deleted the removal code and forgot to put it back in. It's back in there now.
Did you run out of space on the server or client? The client should raise an IOError (on merges/saves) or a CalledProcessError (on rep conversions) if you run out of space. If it's not then that's a problem. As for the server, c00e7a1 should prevent the server from running out of space since it only has 1 or 2 images in /tmp at a time. Interestingly, when I tried to reproduce your error (with the new changes in place) by using a batch size of 500, I didn't even get to run out of space, the process just ended up getting killed by OOM lol.
c00e7a1 makes it so that there should only be 1-2 pdftocairo processes running at any time on the server. The server essentially starts up a conversion and then waits until the last one finishes before queueing the current on. idk why, this gives waaay better performance than other solutions I've tried.
Changes:
Performance (excludes VM startup time):
Comments:
|
The former. That function simply doesn't exist in that version, the dict is built inline in format_meter.
Yes, that's separate issue that is easy to solve like this: try:
self.bar.reset(total=pagenums)
except AttributeError:
# tqdm older than 4.32 do not have reset(), open-code it here
self.bar.last_print_n = self.bar.n = 0
self.bar.last_print_t = self.bar.start_t = self.bar._time()
self.bar.total = pagenums
self.bar.refresh()
That's indeed the case with this solution.
I don't know tqdm enough, but perhaps the more naive method would work: changing bar_format to
In fact both. One because of missing /tmp cleanup on the client side, the other one because of too many parallel pdfcairo processes producing all the output at once. |
Ah, I see. I'll try it out.
I'll play with the bar some more and see what works.
If you have time, try out the new commits and see if it's any better. They should help.
Hmmmmm... You didn't see error logs at the end of the program like this? Sending files...
foobar.txt...fail
foobbar2.txt...done
ERROR: foobar.txt: a very nice log message
Total Sanitized Files: 1/2 If an exception's raised and caught, there should be an error like that (note: they all show up at once at the very end of the program). If that's not showing then something's up. |
No, it wasn't there. |
With recent commits, the missing error message is still an issue (but now I do get non-zero exit code correctly). How to test:
|
It looks like it was just an unhandled OSError from when we save initial representations. The error logs now show up nicely now for me. I think all that's left is the bar stuff. |
Better :) So, now the only remaining issue is working with older tqdm (Debian buster). |
So, I installed the Debian 10 template (didn't have it before) and made an appvm off of it. Then I copied over the client program, installed It seems to run fine with |
This is the place where you cheated ;) |
Huh, is |
No, |
Uhhh not too sure what to do about the |
Don't worry about conflict, I'll handle it on merge. |
Woohoo! Python and multiple file support!
Note that this doesn't touch any of the GUI parts of the converter nor any of the testing infrastructure, just the CLI. I'm still bit newer to the former two so I'll still be working on those.
Also there's still quite a bit of polishing left to be done on the CLI but I'll save that for the mailing list.