-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for PdfReader with in-memory pdf #43
Conversation
Well, actually it can, I just misunderstood the interface. As far as I understand now you want the users to be able to "load" data by 2 means:
It works if you give it unicode strings (or a filelike object open in text mode, with 'latin-1' encoding). |
Actually, I am not sure you did misunderstand the interface. I haven't looked at your tests yet, but see if the code snippets I put in issue #38 make your tests work. If so, perhaps you can add that snippet to your PR and we'll add the test and the fixed code at the same time :-) Thanks, |
Here's my completed PR. We now have many ways to load data into a I added |
Bastien: I am very impressed with the way you have come up to speed on the codebase, not only getting it to do what you want, but also providing a well-coded and useful test. You have apparently developed an excellent understanding of how the system fits together in the face of very sparse documentation, and I would love for you to become a regular pdfrw contributor. But... I am not convinced that six is right for pdfrw. I suppose my thoughts echo those of Armin Ronacher where he advises to "skip six." Most of my reluctance to embrace six could be fairly characterized as aesthetic. For example, I have a visceral negative reaction to the thought of increasing the size of the codebase by over 25%, but I could probably overcome that if someone convinced me that the benefits were worthwhile. As you point out, there are benefits to using six that can be substantial for some codebases and coding styles. It is a de facto community standard, well-understood by people who maintain libraries that support 2 and 3, and the new hybrid Python variant it creates allows a far more flexible coding style than the one I have chosen for the bulk of pdfrw. But while six abstracts away some differences, it does not abstract away all of them. For example, you propose adding these lines to
This uses tools in six that help to manage the differences between two and three, but it does not actually hide those differences. In fact, these lines expose a lot of knowledge about both the differences between Python 2 and 3 and knowledge about six itself (e.g., what is The current version of If I stick with that decision to abstract away the 2 vs. 3 file handling differences in a separate module, then it seems to me that the value proposition of six is greatly reduced, and on the cost side, if we include six in the package as you have proposed, then we risk running into issues like this and if we add a dependency for six as an external module, well, that would be the first external dependency, and that's a pretty high hurdle to clear. So it seems unlikely you could convince me that six is the right answer for pdfrw as of now. But did I mention I love your test? :-) (as well as some of the other changes such as updating for 3.5) Seriously, I think your test along with the code snippet in issue #38 is what it takes to put that issue to bed, and I really appreciate it. If you update the PR to insure your test works with that code, that would be awesome, but if you're too busy, I'll understand -- I'm swamped at the moment as well, but I can probably get around to fixing this sometime next week. Thanks, |
No problem, I understand. To be totally honest I introduced it because I was kinda afraid with your In any case I'll make the changes you suggest. |
I'm certainly willing to consider refactoring, but can you describe what problems you see? Thanks, |
The try/except with |
Ah. I'm a big fan of duck typing :-) |
I guess the style there isn't really duck typing, but is similar to the EAFP used in duck typing. FWIW, I wasn't interested in porting to Python 3 until the features in 3.3 made it less painful. (I guess ignoring 3.0-3.2 also removes much of the impetus for six.) Also, AFAIK, most of my serious "customers" still run Python 2.7, because of the performance hit with Python 3 strings being bigger. I could fix most of the performance hit (by using bytes instead of strings), but that would make the code a lot uglier (with |
I'll rebase this one in a minute (following the merge of #44) |
Make read of in-memory PDF work with 3.x - This closes issue #38 with code discussed there plus a regression test. - Also add Python version 3.5 to regression tests - Also add OSX filetypes to .gitignore
PdfReader can't read in-memory pdf with Python 3 (at least Python 3.5). The added test should demonstrate it (Travis please!).