MAINT: Small refactoring after #788 #830

MartinThoma · 2022-04-27T11:52:00Z

This refactoring aims at making maintenance easier:

Too long functions make it hard to grasp the overall behavior. Hence the _get_xref_issues function was split out
_get_xref_issues is made a static method of the PdfFileReader to show that it belongs to the reader, but doesn't require any of its attributes
_get_xref_issues makes use of an integer return value instead of raising + catching exceptions. That also seems easier to grasp for me. Also, capturing exceptions is a tiny bit more expensive than just returning an int
_rebuild_xref_table was moved to a method for the same reason.

codecov · 2022-04-27T11:54:30Z

Codecov Report

Merging #830 (10cca01) into main (904b0df) will increase coverage by 0.02%.
The diff coverage is 84.07%.

@@            Coverage Diff             @@
##             main     #830      +/-   ##
==========================================
+ Coverage   77.18%   77.20%   +0.02%     
==========================================
  Files          12       12              
  Lines        3532     3540       +8     
  Branches      830      830              
==========================================
+ Hits         2726     2733       +7     
- Misses        589      590       +1     
  Partials      217      217

Impacted Files	Coverage Δ
PyPDF2/pdf.py	`82.21% <84.07%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 904b0df...10cca01. Read the comment docs.

pubpub-zz · 2022-04-27T16:16:43Z

@MartinThoma
For your information, in the fork https://github.com/pubpub-zz/PyPPDF4.sav I did, I split pdf.py in 3 files for readability and commonality :
pdfcommon.py for a base type that holds all the common parts between the reader and the writer,
pdfreader.py for reader specific (about 1000 lines)
pdfwriter.py for all writing functions (adding bookmarks,...)
merger.py was quite small (most of the functions are in the writer)

This architecture requires a little of rewriting (trailer/xref had to be rewritten) but should ease maintenance. Your opinion?

This refactoring aims at making maintenance easier: 1. Too long functions make it hard to grasp the overall behavior. Hence the _get_xref_issues function was split out 2. `_get_xref_issues` is made a static method of the PdfFileReader to show that it belongs to the reader, but doesn't require any of its attributes 3. `_get_xref_issues` makes use of an integer return value instead of raising + catching exceptions. 4. `_rebuild_xref_table` was moved to a method for the same reason.

Robustness (ROB): - Handle missing destinations in reader (#840) - warn-only in readStringFromStream (#837) - Fix corruption in startxref or xref table (#788 and #830) Documentation (DOC): - Project Governance (#799) - History of PyPDF2 - PDF feature/version support (#816) - More details on text parsing issues (#815) Developer Experience (DEV): - Add benchmark command to Makefile - Ignore IronPython parts for code coverage (#826) Maintenance (MAINT): - Split pdf module (#836) - Separated CCITTFax param parsing/decoding (#841) - Update requirements files Testing (TST): - Use external repository for larger/more PDFs for testing (#820) - Swap incorrect test names (#838) - Add test for PdfFileReader and page properties (#835) - Add tests for PyPDF2.generic (#831) - Add tests for utils, form fields, PageRange (#827) - Add test for ASCII85Decode (#825) - Add test for FlateDecode (#823) - Add test for filters.ASCIIHexDecode (#822) Code Style (STY): - Apply pre-commit (black, isort) + use snake_case variables (#832) - Remove debug code (#828) - Documentation, Variable names (#839) Full Changelog: 1.27.9...1.27.10

MAINT: Small refactoring after #788

46fd5dd

MartinThoma force-pushed the 788-refactoring branch from e4e67a9 to 46fd5dd Compare April 27, 2022 11:53

MartinThoma mentioned this pull request Apr 27, 2022

Fix #297 : fix corruption in startxref or xref table #788

Merged

MartinThoma added 2 commits April 27, 2022 14:07

Remove duplication

3c078ce

Split PdfFileReader.read up a bit more

10cca01

MartinThoma merged commit fd775d3 into main Apr 27, 2022

MartinThoma deleted the 788-refactoring branch April 27, 2022 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Small refactoring after #788 #830

MAINT: Small refactoring after #788 #830

MartinThoma commented Apr 27, 2022 •

edited

Loading

codecov bot commented Apr 27, 2022 •

edited

Loading

pubpub-zz commented Apr 27, 2022

MAINT: Small refactoring after #788 #830

MAINT: Small refactoring after #788 #830

Conversation

MartinThoma commented Apr 27, 2022 • edited Loading

codecov bot commented Apr 27, 2022 • edited Loading

Codecov Report

pubpub-zz commented Apr 27, 2022

MartinThoma commented Apr 27, 2022 •

edited

Loading

codecov bot commented Apr 27, 2022 •

edited

Loading