Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: How to surppress exceptions/warnings/log messages #1037

Merged
merged 2 commits into from
Jun 29, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ You can contribute to `PyPDF2 on Github <https://github.com/py-pdf/PyPDF2>`_.

user/installation
user/robustness
user/suppress-warnings
user/metadata
user/extract-text
user/encryption-decryption
Expand Down
75 changes: 75 additions & 0 deletions docs/user/suppress-warnings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Suppress Warnings and Log messages

PyPDF2 makes use of 3 mechanisms to show that something went wrong:

* **Exceptions**: Error-cases the client should explicitly handle. In the
`strict=True` mode, most log messages will become exceptions. This can be
useful in applications where you can force to user to fix the broken PDF.
* **Warnings**: Avoidable issues, such as using deprecated classes / functions / parameters
* **Log messages**: Nothing the client can do, but they should know it happened.


## Exceptions

Exeptions need to be catched if you want to handle them. For example, you could
want to read the text from a PDF as a part of a search function.

Most PDF files don't follow the specifications. In this case PyPDF2 needs to
guess which kinds of mistakes were potentially done when the PDF file was created.
See [the robustness page](robustness.md) for the related issues.

As a users, you likely don't care about it. If it's readable in any way, you
want the text. You might use pdfminer.six as a fallback and do this:

```python
from PyPDF2 import PdfReader
from pdfminer.high_level import extract_text as fallback_text_extraction

text = ""
try:
reader = PdfReader("example.pdf")
for page in reader.pages:
text += page.extract_text()
except Exception as exc:
text = fallback_text_extraction("example.pdf")
```

You could also capture [`PyPDF2.errors.PyPdfError`](https://github.com/py-pdf/PyPDF2/blob/main/PyPDF2/errors.py)
if you prefer something more specific.

## Warnings

The [`warnings` module](https://docs.python.org/3/library/warnings.html) allows
you to ignore warnings:

```python
import warnings

warnings.filterwarnings("ignore")
```

In many cases, you actually want to start Python with the `-W` flag so that you
see all warnings. This is especially true for Continuous Integration (CI).

## Log messages

Log messages can be noisy in some cases. PyPDF2 hopefully is having a reasonable
level of log messages, but you can reduce which types of messages you want to
see:

```python
import logging

logger = logging.getLogger("PyPDF2")
logger.setLevel(logging.ERROR)
```

The [`logging` module](https://docs.python.org/3/library/logging.html#logging-levels)
defines six log levels:

* CRITICAL
* ERROR
* WARNING
* INFO
* DEBUG
* NOTSET