Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Cleanup of annotations #1745

Merged
merged 29 commits into from
Jul 29, 2023
Merged

MAINT: Cleanup of annotations #1745

merged 29 commits into from
Jul 29, 2023

Conversation

MartinThoma
Copy link
Member

@MartinThoma MartinThoma commented Mar 25, 2023

The goal of this PR is to create a more intuitive interface for creating annotations. The AnnotationBuild gets deprecated in favor of several annotation classes, e.g.

# old
from pypdf.generic import AnnotationBuilder
annotation = AnnotationBuilder.free_text(
    "Hello World\nThis is the second line!",
    rect=(50, 550, 200, 650),
    font="Arial",
    bold=True,
    italic=True,
    font_size="20pt",
    font_color="00ff00",
    border_color="0000ff",
    background_color="cdcdcd",
)

# new
from pypdf.annotations import FreeText
annotation = FreeText(
    text="Hello World\nThis is the second line!",
    rect=(50, 550, 200, 650),
    font="Arial",
    bold=True,
    italic=True,
    font_size="20pt",
    font_color="00ff00",
    border_color="0000ff",
    background_color="cdcdcd",
)
  • pypdf/generic/_annotations.pypypdf/annotations/
  • Create abstract base class AnnotationDictionary
  • Create abstract base class MarkupAnnotation which inherits from AnnotationDictionary. Most annotations are MarkupAnnotations.
  • Deprecated AnnotationBuilder
  • Ensure the AnnotationBuilder is not used in the docs

Closes #107

Hints for reviewers

See AnnotationBuilder design discussion

@MartinThoma MartinThoma marked this pull request as draft March 25, 2023 22:51
@codecov
Copy link

codecov bot commented Mar 25, 2023

Codecov Report

Patch coverage: 96.71% and project coverage change: -0.01% ⚠️

Comparison is base (8abd34a) 94.15% compared to head (e7ccd15) 94.15%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1745      +/-   ##
==========================================
- Coverage   94.15%   94.15%   -0.01%     
==========================================
  Files          38       41       +3     
  Lines        7189     7284      +95     
  Branches     1427     1428       +1     
==========================================
+ Hits         6769     6858      +89     
- Misses        262      266       +4     
- Partials      158      160       +2     
Files Changed Coverage Δ
pypdf/constants.py 100.00% <ø> (ø)
pypdf/_writer.py 87.80% <50.00%> (-0.33%) ⬇️
pypdf/annotations/_base.py 93.33% <93.33%> (ø)
pypdf/annotations/_non_markup_annotations.py 94.11% <94.11%> (ø)
pypdf/annotations/_markup_annotations.py 96.63% <96.63%> (ø)
pypdf/annotations/__init__.py 100.00% <100.00%> (ø)
pypdf/generic/__init__.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MartinThoma MartinThoma force-pushed the annotations branch 5 times, most recently from ac7d8bd to b9c0bf2 Compare March 26, 2023 09:00
* Annotation module: pypdf/generic/_annotations.py ➔  pypdf/annotation.py
* Create abstract base class AnnotationDictionary
* Create annotation classes: Text, FreeText
* DOC: Remove AnnotationBuilder from the docs, use AnnotationDictionary
  classes directly

See #107

See #1741
  The annotationBuilder design pattern discussion
Copy link
Collaborator

@pubpub-zz pubpub-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first comments

pypdf/annotations.py Outdated Show resolved Hide resolved
pypdf/annotations.py Outdated Show resolved Hide resolved
pypdf/annotations.py Outdated Show resolved Hide resolved
pypdf/annotations.py Outdated Show resolved Hide resolved
pypdf/annotations.py Outdated Show resolved Hide resolved
pypdf/annotations.py Outdated Show resolved Hide resolved
pypdf/annotations.py Outdated Show resolved Hide resolved
p1: Vertex,
p2: Vertex,
rect: Union[RectangleObject, Tuple[float, float, float, float]],
text: str = "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we consider this parameter only through property same for title_bar ? You have not added it in polyline despite it exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very uncertain what we should do with this parameter. Depending on the context, it has different semantics:

Text that shall be displayed for the annotation or, if this type of
annotation does not display text, an alternate description of the
annotation’s contents in human-readable form. In either case, this text is
useful when extracting the document’s contents in support of
accessibility to users with disabilities or for other purposes (see 14.9.3,
“Alternate Descriptions”). See 12.5.6, “Annotation Types” for more
details on the meaning of this entry for each annotation type.

I don't want to make user have to know this.

Instead, I would prefer one parameter name that represents the semantics. That means although it might be called "text" for two different annotation types, we might have two different parameters for that.

Copy link
Member Author

@MartinThoma MartinThoma Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will be two:

  • text for Free text annotations and markup annotations: Although the semantics is slightly different 🤔
  • alternative_descriptions for sound annotations (which we don't support so far

Hm. I'm really not sure. Maybe just text everywhere? Or contents?

pypdf/annotations.py Outdated Show resolved Hide resolved
@MartinThoma MartinThoma marked this pull request as ready for review March 30, 2023 20:47
@MartinThoma
Copy link
Member Author

@pubpub-zz I've added a couple of updates. Open questions I see are:

  1. Positional parameters: Should we leave any or make all keyword-only?
  2. Which properties should go in the constructor and which should (only) have properties?
  3. Setting of a default_author

Am I missing anything?

@MartinThoma
Copy link
Member Author

For (1) I tend to make everything keyword-only. There is no particular order that makes more sense than others + we have lots of parameters. It's just too easy to get something wrong.

For (2) I'm really undecided.

For (3): I tend rather to not do that. I vaguely remember matplotlib having similar behavior and that it felt rather weird to me as a user to have such global state when I call the module in different parts of the code.

@MartinThoma
Copy link
Member Author

MartinThoma commented Apr 1, 2023

(3) would have been clearer with the annotation builder class. Then we could have assigned / set the default author to an instanciated AnnotationBuilder + make the current static methods non-static.

@pubpub-zz
Copy link
Collaborator

  1. Positional parameters: Should we leave any or make all keyword-only?

Personnaly I would leave the "mandatory" parameters as positional (eg: rect+text for TextAnnotation, rect+destination for LinkAnnotation,...

2. Which properties should go in the constructor and which should (only) have properties?

I would propose all properties to be accepted within the constructor as keyed parameters.

3. Setting of a `default_author`

Am I missing anything?

Will review your PR.

Copy link

@AndrewADev AndrewADev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple minor typos I noticed (documentation)

docs/user/adding-pdf-annotations.md Outdated Show resolved Hide resolved
docs/user/adding-pdf-annotations.md Outdated Show resolved Hide resolved
@MartinThoma MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Jul 17, 2023
@MartinThoma MartinThoma changed the title [WIP] MAINT: Cleanup of annotations MAINT: Cleanup of annotations Jul 17, 2023
@MartinThoma MartinThoma merged commit abd2673 into main Jul 29, 2023
@MartinThoma MartinThoma deleted the annotations branch July 29, 2023 09:20
MartinThoma added a commit that referenced this pull request Jul 29, 2023
## What's new

### New Features (ENH)
-  Accelerate image list keys generation (#2014)
-  Use `cryptography` for encryption/decryption as a fallback for PyCryptodome (#2000)
-  Extract LaTeX characters (#2016)
-  ASCIIHexDecode.decode now returns bytes instead of str (#1994)

### Bug Fixes (BUG)
-  Add RunLengthDecode filter (#2012)
-  Process /Separation ColorSpace (#2007)
-  Handle single element ColorSpace list (#2026)
-  Process lookup decoded as TextStringObjects (#2008)

### Robustness (ROB)
-  Cope with garbage collector during cloning (#1841)

### Maintenance (MAINT)
-  Cleanup of annotations (#1745)

[Full Changelog](3.13.0...3.14.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
soon PRs that are almost ready to be merged, issues that get solved pretty soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tutorials or better demos to manipulate annotations
3 participants