From 21b52947c5de048af9111fe7a0484eb50e12027c Mon Sep 17 00:00:00 2001 From: Martin Thoma Date: Tue, 19 Apr 2022 19:59:42 +0200 Subject: [PATCH] DOC: Robustness (#785) --- README.md | 2 +- docs/index.rst | 3 ++- docs/user/robustness.md | 40 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+), 2 deletions(-) create mode 100644 docs/user/robustness.md diff --git a/README.md b/README.md index 884e9a4cf4..25645dfd88 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ [![PyPI version](https://badge.fury.io/py/PyPDF2.svg)](https://badge.fury.io/py/PyPDF2) [![Python Support](https://img.shields.io/pypi/pyversions/PyPDF2.svg)](https://pypi.org/project/PyPDF2/) [![](https://img.shields.io/badge/-documentation-green)](https://pypdf2.readthedocs.io/en/latest/) -![GitHub last commit](https://img.shields.io/github/last-commit/py-pdf/PyPDF2) +[![GitHub last commit](https://img.shields.io/github/last-commit/py-pdf/PyPDF2)](https://github.com/py-pdf/PyPDF2) [![codecov](https://codecov.io/gh/py-pdf/PyPDF2/branch/main/graph/badge.svg?token=id42cGNZ5Z)](https://codecov.io/gh/py-pdf/PyPDF2) # PyPDF2 diff --git a/docs/index.rst b/docs/index.rst index 1b587f8614..f92aa5ef43 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -19,6 +19,7 @@ You can contribute to `PyPDF2 on Github `_. :maxdepth: 1 user/installation + user/robustness user/metadata user/extract-text user/encryption-decryption @@ -36,9 +37,9 @@ You can contribute to `PyPDF2 on Github `_. :maxdepth: 1 modules/PdfFileReader + modules/PdfFileWriter modules/PdfFileMerger modules/PageObject - modules/PdfFileWriter modules/DocumentInformation modules/XmpInformation modules/Destination diff --git a/docs/user/robustness.md b/docs/user/robustness.md new file mode 100644 index 0000000000..a516d70af7 --- /dev/null +++ b/docs/user/robustness.md @@ -0,0 +1,40 @@ +# Robustness and strict=False + +PDF is [specified in various versions](https://www.pdfa.org/resource/pdf-specification-index/). +The specification of PDF 1.7 has 978 pages. This length makes it hard to get +everything right. As a consequence, a lot of PDF are not strictly following the +specification. + +If a PDF file does not follow the specification, it is not always possible to +be certain what the intended effect would be. Think of the following broken +Python code as an example: + +```python +# Broken +function (foo, bar): + +# Potentially intendet: +def function(foo, bar): + ... + +# Also possible: +function = (foo, bar) +``` + +Writing a parser you can go two paths: Either you try to be forgiving and try +to figure out what the user intendet, or you are strict and just tell the user +that they should fix their stuff. + +PyPDF2 gives you the option to be strict or not. + +PyPDF2 has three core objects and all of them have a `strict` parameter: + +* [`PdfFileReader`](https://pypdf2.readthedocs.io/en/latest/modules/PdfFileReader.html) +* [`PdfFileWriter`](https://pypdf2.readthedocs.io/en/latest/modules/PdfFileWriter.html) +* [`PdfFileMerger`](https://pypdf2.readthedocs.io/en/latest/modules/PdfFileMerger.html) + +Choosing `strict=True` means that PyPDF2 will raise an exception if a PDF does +not follow the specification. + +Choosing `strict=False` means that PyPDF2 will try to be forgiving and do +something reasonable, but it will log a warning message.