Skip to content

Commit

Permalink
Documentation: Adds macro for PyMuPDF and PDF titles in .rst
Browse files Browse the repository at this point in the history
  • Loading branch information
jamie-lemon committed Mar 28, 2024
1 parent 179886f commit ef9bca2
Show file tree
Hide file tree
Showing 33 changed files with 204 additions and 193 deletions.
6 changes: 3 additions & 3 deletions docs/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Features Comparison
Feature Matrix
~~~~~~~~~~~~~~~~~~~

The following table illustrates how :title:`PyMuPDF` compares with other typical solutions.
The following table illustrates how |PyMuPDF| compares with other typical solutions.


.. include:: about-feature-matrix.rst
Expand All @@ -28,7 +28,7 @@ Performance



To benchmark :title:`PyMuPDF` performance against a range of tasks a test suite with a fixed set of :ref:`8 PDFs with a total of 7,031 pages<Appendix4_Files_Used>` containing text & images is used to obtain performance timings.
To benchmark |PyMuPDF| performance against a range of tasks a test suite with a fixed set of :ref:`8 PDFs with a total of 7,031 pages<Appendix4_Files_Used>` containing text & images is used to obtain performance timings.


Here are current results, grouped by task:
Expand All @@ -49,7 +49,7 @@ License and Copyright



:title:`PyMuPDF` and :title:`MuPDF` are now available under both, open-source :title:`AGPL` and commercial license agreements. Please read the full text of the :title:`AGPL` license agreement, available in the distribution material (file COPYING) and `here <https://www.gnu.org/licenses/agpl-3.0.html>`_, to ensure that your use case complies with the guidelines of the license. If you determine you cannot meet the requirements of the :title:`AGPL`, please contact `Artifex <https://artifex.com/contact/pymupdf-inquiry.php?utm_source=rtd-pymupdf&utm_medium=rtd&utm_content=inline-link>`_ for more information regarding a commercial license.
|PyMuPDF| and :title:`MuPDF` are now available under both, open-source :title:`AGPL` and commercial license agreements. Please read the full text of the :title:`AGPL` license agreement, available in the distribution material (file COPYING) and `here <https://www.gnu.org/licenses/agpl-3.0.html>`_, to ensure that your use case complies with the guidelines of the license. If you determine you cannot meet the requirements of the :title:`AGPL`, please contact `Artifex <https://artifex.com/contact/pymupdf-inquiry.php?utm_source=rtd-pymupdf&utm_medium=rtd&utm_content=inline-link>`_ for more information regarding a commercial license.

.. raw:: html

Expand Down
28 changes: 14 additions & 14 deletions docs/app4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ The following three sections deal with different performance aspects:

* :ref:`Document Copying<app4_copying>` - This includes opening and parsing :title:`PDFs`, then writing them to an output file. Because the same basic activities are also used for joining (merging) :title:`PDFs`, the results also apply to these use cases.
* :ref:`Text Extraction<app4_text_extraction>` - This extracts plain text from :title:`PDFs` and writes it to an output text file.
* :ref:`Page Rendering<app4_page_rendering>` - This converts :title:`PDF` pages to image files looking identical to the pages. This ability is the basic prerequisite for using a tool in :title:`Python GUI` scripts to scroll through documents. We have chosen a medium-quality (resolution 150 DPI) version.
* :ref:`Page Rendering<app4_page_rendering>` - This converts |PDF| pages to image files looking identical to the pages. This ability is the basic prerequisite for using a tool in :title:`Python GUI` scripts to scroll through documents. We have chosen a medium-quality (resolution 150 DPI) version.

Please note that in all cases the actual speed in dealing with :title:`PDF` structures is not directly measured: instead, the timings also include the durations of writing files to the operating system's file system. This cannot be avoided because tools other than :title:`PyMuPDF` do not offer the option to e.g., separate the image **creation** step from the following step, which **writes** the image into a file.
Please note that in all cases the actual speed in dealing with |PDF| structures is not directly measured: instead, the timings also include the durations of writing files to the operating system's file system. This cannot be avoided because tools other than |PyMuPDF| do not offer the option to e.g., separate the image **creation** step from the following step, which **writes** the image into a file.

So all timings documented include a common, OS-oriented base effort. Therefore, performance **differences per tool are actually larger** than the numbers suggest.

Expand Down Expand Up @@ -130,7 +130,7 @@ A set of eight files is used for the performance testing. With each file we have
Tools used
-------------

In each section, the same fixed set of :title:`PDF` files is being processed by a set of tools. The set of tools used per performance aspect however varies, depending on the supported tool features.
In each section, the same fixed set of |PDF| files is being processed by a set of tools. The set of tools used per performance aspect however varies, depending on the supported tool features.

All tools are either platform independent, or at least can run on both, :title:`Windows` and :title:`Unix` / :title:`Linux`.

Expand All @@ -140,20 +140,20 @@ All tools are either platform independent, or at least can run on both, :title:`

* - **Tool**
- **Description**
* - :title:`PyMuPDF`
* - |PyMuPDF|
- The tool of this manual.
* - PDFrw_
- A pure :title:`Python` tool, being used by :title:`rst2pdf`, has interface to :title:`ReportLab`.
* - PyPDF2_
- A pure :title:`Python` tool with a large function set.
* - PDFMiner_
- A pure :title:`Python` to extract text and other data from :title:`PDF`.
- A pure :title:`Python` to extract text and other data from |PDF|.
* - XPDF_
- A command line utility with multiple functions.
* - PikePDF_
- A :title:`Python` package similar to :title:`PDFrw`, but based on :title:`C++` library :title:`QPDF`.
* - PDF2JPG_
- A :title:`Python` package specialized on rendering :title:`PDF` pages to :title:`JPG` images.
- A :title:`Python` package specialized on rendering |PDF| pages to :title:`JPG` images.



Expand All @@ -163,13 +163,13 @@ All tools are either platform independent, or at least can run on both, :title:`
Copying / Joining / Merging
----------------------------------

How fast is a :title:`PDF` file read and its content parsed for further processing? The sheer parsing performance cannot directly be compared, because batch utilities always execute a requested task completely, in one go, front to end. :title:`PDFrw` too, has a *lazy* strategy for parsing, meaning it only parses those parts of a document that are required in any moment.
How fast is a |PDF| file read and its content parsed for further processing? The sheer parsing performance cannot directly be compared, because batch utilities always execute a requested task completely, in one go, front to end. :title:`PDFrw` too, has a *lazy* strategy for parsing, meaning it only parses those parts of a document that are required in any moment.

To find an answer to the question, we therefore measure the time to copy a :title:`PDF` file to an output file with each tool, and do nothing else.
To find an answer to the question, we therefore measure the time to copy a |PDF| file to an output file with each tool, and do nothing else.

These are the :title:`Python` commands for how each tool is used:

:title:`PyMuPDF`
|PyMuPDF|

.. code-block:: python
Expand Down Expand Up @@ -209,7 +209,7 @@ These are the :title:`Python` commands for how each tool is used:
**Observations**

These are our run time findings in **seconds** along with a base rate summary compared to :title:`PyMuPDF`:
These are our run time findings in **seconds** along with a base rate summary compared to |PyMuPDF|:

.. list-table::
:header-rows: 1
Expand Down Expand Up @@ -287,7 +287,7 @@ The following table shows plain text extraction durations. All tools have been u

**Observations**

These are our run time findings in **seconds** along with a base rate summary compared to :title:`PyMuPDF`:
These are our run time findings in **seconds** along with a base rate summary compared to |PyMuPDF|:

.. list-table::
:header-rows: 1
Expand Down Expand Up @@ -359,13 +359,13 @@ These are our run time findings in **seconds** along with a base rate summary co
Page Rendering
--------------------------

We have tested rendering speed of :title:`PyMuPDF` against :title:`pdf2jpg` and :title:`XPDF` at a resolution of 150 DPI,
We have tested rendering speed of |PyMuPDF| against :title:`pdf2jpg` and :title:`XPDF` at a resolution of 150 DPI,


These are the :title:`Python` commands for how each tool is used:


:title:`PyMuPDF`
|PyMuPDF|

.. code-block:: python
Expand Down Expand Up @@ -398,7 +398,7 @@ These are the :title:`Python` commands for how each tool is used:
**Observations**

These are our run time findings in **seconds** along with a base rate summary compared to :title:`PyMuPDF`:
These are our run time findings in **seconds** along with a base rate summary compared to |PyMuPDF|:


.. list-table::
Expand Down
2 changes: 1 addition & 1 deletion docs/document-writer-class.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ DocumentWriter

This class represents a utility which can output various :ref:`document types supported by PyMuPDF<Supported_File_Types>`.

In :title:`PyMuPDF` only used for outputting PDF documents whose pages are populated by :ref:`Story` DOMs.
In |PyMuPDF| only used for outputting PDF documents whose pages are populated by :ref:`Story` DOMs.

Using DocumentWriter_ also for other document types might happen in the future.

Expand Down
7 changes: 7 additions & 0 deletions docs/header.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@

<div style="width:100%; text-align:right"><b>This class is for PDF only.</b></div>

.. |PyMuPDF| raw:: html

<cite>PyMuPDF</cite>

.. |PDF| raw:: html

<cite>PDF</cite>

.. raw:: html

Expand Down
14 changes: 7 additions & 7 deletions docs/how-to-open-a-file.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Opening Files
Supported File Types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:title:`PyMuPDF` can open files other that just :title:`PDF`.
|PyMuPDF| can open files other than just |PDF|.

The following file types are supported:

Expand Down Expand Up @@ -48,11 +48,11 @@ Assume that *"some.file"* is actually an **XPS**. Open it like so:
.. note::

:title:`PyMuPDF` itself does not try to determine the file type from the file contents. **You** are responsible for supplying the file type information in some way -- either implicitly, via the file extension, or explicitly as shown with the `filetype` parameter. There are pure :title:`Python` packages like `filetype <https://pypi.org/project/filetype/>`_ that help you doing this. Also consult the :ref:`Document` chapter for a full description.
|PyMuPDF| itself does not try to determine the file type from the file contents. **You** are responsible for supplying the file type information in some way -- either implicitly, via the file extension, or explicitly as shown with the `filetype` parameter. There are pure :title:`Python` packages like `filetype <https://pypi.org/project/filetype/>`_ that help you doing this. Also consult the :ref:`Document` chapter for a full description.

If :title:`PyMuPDF` encounters a file with an unknown / missing extension, it will try to open it as a :title:`PDF`. So in these cases there is no need for additional precautions. Similarly, for memory documents, you can just specify `doc=fitz.open(stream=mem_area)` to open it as a :title:`PDF` document.
If |PyMuPDF| encounters a file with an unknown / missing extension, it will try to open it as a |PDF|. So in these cases there is no need for additional precautions. Similarly, for memory documents, you can just specify `doc=fitz.open(stream=mem_area)` to open it as a |PDF| document.

If you attempt to open an unsupported file then :title:`PyMuPDF` will throw a file data error.
If you attempt to open an unsupported file then |PyMuPDF| will throw a file data error.


----------
Expand All @@ -62,14 +62,14 @@ Opening Files as Text
~~~~~~~~~~~~~~~~~~~~~~~~~~~~


:title:`PyMuPDF` has the capability to open any plain text file as a document. In order to do this you should provide the `filetype` parameter for the `fitz.open` function as `"txt"`.
|PyMuPDF| has the capability to open any plain text file as a document. In order to do this you should provide the `filetype` parameter for the `fitz.open` function as `"txt"`.

.. code-block:: python
doc = fitz.open("my_program.py", filetype="txt")
In this way you are able to open a variety of file types and perform the typical **non-PDF** specific features like text searching, text extracting and page rendering. Obviously, once you have rendered your `txt` content, then saving as :title:`PDF` or merging with other :title:`PDF` files is no problem.
In this way you are able to open a variety of file types and perform the typical **non-PDF** specific features like text searching, text extracting and page rendering. Obviously, once you have rendered your `txt` content, then saving as |PDF| or merging with other |PDF| files is no problem.


Examples
Expand Down Expand Up @@ -103,7 +103,7 @@ Opening a `JSON` file
And so on!

As you can imagine many text based file formats can be *very simply opened* and *interpreted* by :title:`PyMuPDF`. This can make data analysis and extraction for a wide range of previously unavailable files suddenly possible.
As you can imagine many text based file formats can be *very simply opened* and *interpreted* by |PyMuPDF|. This can make data analysis and extraction for a wide range of previously unavailable files suddenly possible.



Expand Down
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@
</style>

Welcome to :title:`PyMuPDF`
Welcome to |PyMuPDF|
================================

:title:`PyMuPDF` is a high-performance **Python** library for data extraction, analysis, conversion & manipulation of **PDF** (and other) documents.
|PyMuPDF| is a high-performance **Python** library for data extraction, analysis, conversion & manipulation of **PDF** (and other) documents.

:title:`PyMuPDF` is hosted on `GitHub <https://github.com/pymupdf/PyMuPDF>`_ and registered on `PyPI <https://pypi.org/project/PyMuPDF/>`_.
|PyMuPDF| is hosted on `GitHub <https://github.com/pymupdf/PyMuPDF>`_ and registered on `PyPI <https://pypi.org/project/PyMuPDF/>`_.


----
Expand Down
Binary file modified docs/locales/ja/LC_MESSAGES/about.mo
Binary file not shown.
10 changes: 5 additions & 5 deletions docs/locales/ja/LC_MESSAGES/about.po
Original file line number Diff line number Diff line change
Expand Up @@ -45,23 +45,23 @@ msgstr "機能比較表"

#: ../../about.rst:18 261ef72bf32c43819f3f88f5141acfdd
msgid ""
"The following table illustrates how :title:`PyMuPDF` compares with other "
"The following table illustrates how |PyMuPDF| compares with other "
"typical solutions."
msgstr "以下の表は、:title:`PyMuPDF` が他の典型的な解決策と比較した場合の違いを示しています。"
msgstr "以下の表は、|PyMuPDF| が他の典型的な解決策と比較した場合の違いを示しています。"

#: ../../about.rst:27 78cf9c6560a94b9ba6c1f62a7ff5a8e7
msgid "Performance"
msgstr "パフォーマンス"

#: ../../about.rst:31 c9fff5c5b0094db4923780a94bdc1e6a
msgid ""
"To benchmark :title:`PyMuPDF` performance against a range of tasks a test"
"To benchmark |PyMuPDF| performance against a range of tasks a test"
" suite with a fixed set of :ref:`8 PDFs with a total of 7,031 "
"pages<Appendix4_Files_Used>` containing text & images is used to obtain "
"performance timings."
msgstr ""
":ref:`8つのPDFファイル(合計7,031ページ)<Appendix4_Files_Used>` "
"にテキストと画像が含まれている固定されたセットのテストスイートを使用して、:title:`PyMuPDF` "
"にテキストと画像が含まれている固定されたセットのテストスイートを使用して、|PyMuPDF| "
"のパフォーマンスをさまざまなタスクに対してベンチマークします。"

#: ../../about.rst:34 c8b8933c46404fdb9f64f1ab30f0348b
Expand All @@ -80,7 +80,7 @@ msgstr "ライセンスと著作権"

#: ../../about.rst:52 b3b1f9567d8e45b99c0a43821f16d943
msgid ""
":title:`PyMuPDF` and :title:`MuPDF` are now available under both, open-"
"|PyMuPDF| and :title:`MuPDF` are now available under both, open-"
"source :title:`AGPL` and commercial license agreements. Please read the "
"full text of the :title:`AGPL` license agreement, available in the "
"distribution material (file COPYING) and `here "
Expand Down
Binary file modified docs/locales/ja/LC_MESSAGES/app4.mo
Binary file not shown.
Loading

0 comments on commit ef9bca2

Please sign in to comment.