Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntactic rules for valid attribute objects across clauses 14.7 and 14.8 is very difficult to understand and has mistakes #226

Closed
petervwyatt opened this issue Oct 25, 2022 · 12 comments
Assignees
Labels
ISO approved Resolved issue approved by ISO

Comments

@petervwyatt
Copy link
Member

Syntactic (not semantic!) rules for valid attribute objects spread across clauses 14.7.6 and 14.8.5 is very difficult to discover and understand - and there are some errors:

  • there is no definitive list of all formally defined values for the required key O (owner).
    "Table 360 - Entries common to all attribute object dictionaries" defines NSO and UserProperties, and "one of the values from 14.8.5" which is a 23-page subclause!! So is this really meaning only the values listed in Table 376? Or are there also other valid values of O not in Table 376, but buried somewhere else in the text of 14.8.5?
    If it is just Table 376 it would be far far better to reference it precisely from Table 360 rather than some mega-clause.
    And it would be good if Table 376 included (or noted) NSO and UserProperties for completeness, with a cross-reference back to Table 360.

  • throughout all subclauses of 14.8.5, Layout, Table, PrintField and Artifact are formatted as bold which indicates a key name, when, in fact, they are O (owner) key values and should really be italic. This is confusing.

  • 14.8.5.3 attribute inheritance vs explicit exclusion requirements based on the value of O in various 14.8.5.x subclauses is unclear.
    For example, 14.8.5.4.1 (layout attributes) states "Attributes in this category shall be defined in attribute objects whose O (owner) entry has the value Layout or whose owner is any other owner excluding List, Table, PrintField and Artifact.". But 14.8.5.3 on attribute inheritance lacks explanation in terms of normative requirements: "an attribute that is specified for an element shall apply to all the descendants of the element in the structure tree unless a descendent element specifies an explicit value for the attribute".
    So does this mean that if you want to inherit an attribute that is prohibited by the value of O (owner) then you need to push it further up the structure tree to another node that has some other O (owner) that permits it?
    In which case, 14.8.5.3 should probably note that not all elements in the structure tree may allow for the inclusion of certain inheritable attributes based on requirements in other subclauses.
    Or can you have the "wrong attributes" for the purposes of inheritance?
    In which case the various file format requirements like ".. shall be defined in attribute objects whose O (owner) entry..." should change to something like a processor requirement to "apply"...

  • Table 377, second column is titled "Attributes" and the entries are formatted as normal text. I believe these are precise dictionary key names that can occur in attribute objects, rather than a description of the attribute so it would be better if:

    1. the entries were bold - as that is the convention for key names throughout 32K
    2. the column title was "Attribute Key" so it is crystal clear these are normative key names and not descriptive
    3. each row should have a corresponding cross-reference to one of Table 378 to Table 381 (since Table 377 duplicates the inheritance requirement it is important that the synchronisation is acknowledged).
  • Table 379 and Table 385 conflict on which values of O (owner) BBox can be validly defined:
    Table 379 defines BBox and is constrained by the 2nd sentence in 14.8.5.4.1: "Attributes in this category shall be defined in attribute objects whose O (owner) entry has the value Layout or whose owner is any other owner excluding List, Table, PrintField and Artifact.".
    Table 385 also defines BBox and is constrained by "... attribute objects whose O (owner) entry has the value Artifact or whose owner is any other owner excluding Layout, List, PrintField and Table."
    So it appears that when O (owner) is either Layout or_Artifact_, or whose owner is any other owner excluding List, PrintField and Table (e.g. NSO or the external format values in Table 376), BBox is allowed!

@petervwyatt petervwyatt added bug Something isn't correct documentation Improvements or additions to documentation help wanted Extra attention is needed labels Oct 25, 2022
@MatthiasValvekens
Copy link
Member

MatthiasValvekens commented Oct 25, 2022

there is no definitive list of all formally defined values for the required key O (owner).

I think it would be good to create such a list, but I would caution against formulating it as a limitation. Given that the presumptive default way to process an unknown attribute owner is "do nothing", using externally defined attribute owners shouldn't be a problem for any kind of validation. For better or worse, the current PDF/UA and WTPDF drafts rely on this neutral-by-default extensibility, so we need to be careful not to contradict ourselves.

As far as I'm concerned, the primary use case for the list would be to easily find the provisions to validate the syntax of attribute dictionaries with attribute owners that are defined in the core specification.

throughout all subclauses of 14.8.5, Layout, Table, PrintField and Artifact are formatted as bold which indicates a key name, when, in fact, they are O (owner) key values and should really be italic. This is confusing.

Granted, we do the same thing for structure element types, so there's something to be said for bolding attribute owners as well... But perhaps that should be clarified somewhere.

So does this mean that if you want to inherit an attribute that is prohibited by the value of O (owner) then you need to push it further up the structure tree to another node that has some other O (owner) that permits it?

I might have misunderstood your point, but the A entry on a structure element can be an array with multiple entries, each of which is an attribute dictionary with its own O and its own set of attributes. Here's a typical example I grabbed from one of my own files: a TH element with an explicitly defined Scope (=> a Table attribute), in addition to BBox, Color and Padding (those are Layout attributes).

structelem

@petervwyatt
Copy link
Member Author

I think it would be good to create such a list, but I would caution against formulating it as a limitation.

I certainly didn't/don't mean that - O can be anything(*). But I just want to understand what the spec defines since that enables basic interoperability, as 14.7.6 says "To facilitate the interchange of content among conforming products, PDF defines a set
of standard structure attributes identified by specific standard owners
".

(*) Note that Annex E reference does limit the values of O to being 2nd class names so it's not quite anything...

... the primary use case for the list would be to easily find the provisions to validate the syntax of attribute dictionaries with attribute owners that are defined in the core specification.

Did someone say Arlington? 😁


My other question wasn't well phrased but is also more vague as I was trying to grok all the requirements in the case of "not because you should, but because you can" for inheritance. Let me try to rephrase my thoughts:

  • is inheritance only limited through common O owner ancestry?
    I cannot see anything that says that, so I think the answer is "no". However, I do note the last para under 14.7.6 "When an array of attribute objects is provided, the value of the O and NS keys may be repeated across attribute objects. If a given attribute is specified more than once, the later (in array order) entry shall take precedence.". This is not "vertical" inheritance through ancestry but resolution across array elements in a single array at a single node.
    It is also unclear if this 2nd sentence only applies to the first sentence (for same O and NS) or in all cases (any O)...
  • so that must mean inheritance is solely a property of the attribute name (i.e. the key name) - which can be anything, but let's keep to the standard structure attributes identified by specific standard owners
  • so, by way of an example, does that mean a BBox of an O=Artifact can alter the BBox of an O=Layout (or vice-versa) via inheritance?

Hopefully that is clearer.

@mrbhardy
Copy link

There's a lot of questions bundled together here. It is a shame Github doesn't allow richer threads to enable conversations on the different points without them having to be completely independent issues.

In fact, we should treat O as being permitted to be anything, regardless of whether that is what the spec officially says. Other ISO standards and extensions are permitted to extend this with first class names and I would argue that if that is the only extension, it would be a valid ISO 32000-2 without that extension being applied. What ISO 32000-2 does do is define a set of known values for O and keys that are associated with each of those known values.

If it is just Table 376 it would be far far better to reference it precisely from Table 360 rather than some mega-clause.
And it would be good if Table 376 included (or noted) NSO and UserProperties for completeness, with a cross-reference back to Table 360.

I don't necessarily agree with this. When the value is other than NSO or UserProperties, then the owner keys are the ones that apply and, as I said above, it should be treated as an open-ended set. In fact, I think we're confusing things if we merge these, since one is part of Logical Structure (O with values NSO or UserProperties) and the other is part of Tagged PDF. It is a common mistake to treat these as interchangeable, but they are not. O is open-ended in Logical Structure with no known values and when used in a Tagged PDF, a set of known values within that open-ended set is defined and reserved.

14.8.5.3 attribute inheritance vs explicit exclusion requirements based on the value of O in various 14.8.5.x subclauses is unclear.

I think you are misreading this section. What we are saying is that the defined attribute owners in Tagged PDF have reserved attribute keys and values. The Layout set of attributes are reserved to that owner (but we are trying to be clear that there is no intent to reserve these values outside of Layout. So, if you have an owner of List and you have the attribute BorderColor, then you are effectively saying nothing (or a strict validator might reject this). It isn't saying that the layout attributes have the same meanings and definitions.

Therefore, inheritance is only meaningful within the context of an attribute owner. So, you have to define an O value of Layout to meaningfully have inheritance for any attribute in that set. If the same attribute key occurs within the same structure tree node or beneath but within a different attribute object with a different owner, then they have no interaction.

I can define the MRBH attribute owner with a key BorderColor, it has no relationship to the BorderColor defined in the Layout attribute owner.

is inheritance only limited through common O owner ancestry?

Yes

so that must mean inheritance is solely a property of the attribute name (i.e. the key name) - which can be anything, but let's keep to the standard structure attributes identified by specific standard owners

No

so, by way of an example, does that mean a BBox of an O=Artifact can alter the BBox of an O=Layout (or vice-versa) via inheritance?

Zero interaction

@MatthiasValvekens
Copy link
Member

I agree with what @mrbhardy wrote, but there are two points I would like to clarify.

The first one is perhaps a little obvious, but when conceptualising attributes as key-value pairs, I like to think of them as (Owner, Attribute name) <> Value pairs. In other words, the owner is part of the key in the pairing. I'd call it a namespacing scheme, but that term is already quite overloaded as-is ;). With this perspective, the separation between different owners is also immediately obvious.

This "model" (if you can even call it that) is also useful to explain attribute inheritance, IMO:

so that must mean inheritance is solely a property of the attribute name (i.e. the key name) - which can be anything, but let's keep to the standard structure attributes identified by specific standard owners

No

I get the feeling that the misunderstanding is about conflating two processor requirements that don't really interact.

Let's single out this piece of text quoted by @petervwyatt above:

14.7.6 "When an array of attribute objects is provided, the value of the O and NS keys may be repeated across attribute objects. If a given attribute is specified more than once, the later (in array order) entry shall take precedence."

This is not about inheritance, but rather about how to resolve conflicts when two or more attribute dictionaries with the same owner appear inside (the A entry of) a single structure element. The inheritance rules are orthogonal to that: once all the (Owner, Attribute name) <> Value pairs for a given structure element are known (after applying conflict resolution rules as necessary), these tuples are inherited down the structure tree according to the rules set forth in the spec for those particular attributes.

I hope that made sense.

@petervwyatt
Copy link
Member Author

Thanks @mrbhardy and @MatthiasValvekens - that really helped!

I have no issue with what is described, just that the words don't capture some of this, so let me think about some simple point-solution errata fixes that we might be able to do to make this far clearer (without rewriting large slabs of text). Bulk formatting changes to make things consistent with the rest of ISO 32K are off the table AFAIC (except to note that it should be done sometime in the future).

One remaining question regarding O (owner): it is now implied that this can literally be anything and not just a 2nd class name. How do we avoid name collisions from different implementers? Or is this not considered a problem because these are "merely attributes" and since they're considered as an <Owner, Attribute> pair, a collision needs to be on both O owner and key name, thus further reducing probability of a collision? Would we recommend or note that 2nd class names are useful?

PS. The list of official exceptions to 2nd class name rules are captured in Annex E.2, 2nd bullet under "Second class names" where it states "... except keys added to a document information dictionary (see 14.3.3, "Document information dictionary") or a thread information dictionary (in the I entry of a thread dictionary; see 12.4.3, "Articles"). If attribute O owner is another exception we should extend this list.

@petervwyatt petervwyatt added this to the Tagged PDF related milestone Oct 25, 2022
@petervwyatt
Copy link
Member Author

@mrbhardy another Q: can anyone add additional new custom attributes to the defined sets of O owners?
I assume so - but are the key names of such attributes limited to 2nd class names?

@mrbhardy
Copy link

mrbhardy commented Jun 21, 2023

In principle, for a given O owner that is fully defined in ISO 32000-2 (i.e. Layout, List, Table, ListNumbering and Artifact), I would say no, they cannot be extended and that should be considered (softly) invalid. However, I could live with second class names, because at least there's no conflict for the future. However, ideally they would have a different owner associated with the entity adding them, rather than stuffing them into existing Attribute Objects.

For the list of defined owners that are defined externally (e.g. ARIA-1.1), I think they are more open to interpretation, so would not want to try validate them. For custom owners or namespaces, I would say it is legitimately open-ended.

@mrbhardy
Copy link

@petervwyatt what are next steps on this issue?

@petervwyatt
Copy link
Member Author

Let me draft up some proposed new wordings...

@petervwyatt
Copy link
Member Author

Trying to conclude some concrete improvements:

  • ISSUE: there is no definitive list of all formally defined values for the required key O (owner).

    1. Change Table 360 from "... one of the values from 14.8.5" to a reference to Table 376 instead
    2. Add new NOTE below Table 376 with a cross-reference stating that other values of the O entry defined in ISO 32000-2 are in Table 360.
  • ISSUE: throughout all subclauses of 14.8.5, Layout, Table, PrintField and Artifact are formatted as bold which indicates a key name, when, in fact, they are O (owner) key values and should really be italic.

    1. too big for an errata - add a generic EDITORS NOTE at the start of 14.8.5 reminding me to do this later.
  • ISSUE: Table 377, column 2 is simply titled "Attributes" and the entries are formatted as normal text but are precise dictionary key names:

    1. change the 2nd column header to be "Attribute key"
    2. format the 2nd column cells as bold (indicative of a key name). Un-bold column 1 (I think this was caused by the style sheet - boldness has no meaning for this column in this table).
    3. add a 4th column with Table numbers (potentially multiple!) to where each attribute is formally defined.
  • ISSUE: Tables 377, 379 and 385 confuse on which values of O (owner) BBox can be validly defined:

    1. In Table 377, separate out the BBox row and change column 1 to be "Figure, Form, Formula, Table and Artifact"
    2. Table 379 (BLSE), BBox description: add NOTE 4: Artifact attributes also define a BBox attribute (see table 385).
    3. Table 385 (Artifact), BBox description: add NOTE: BLSE attributes also define a BBox attribute (see table 379

@petervwyatt
Copy link
Member Author

14.7.6: "When an array of attribute objects is provided, the value of the O and NS keys may be repeated across attribute objects. If a given attribute is specified more than once, the later (in array order) entry shall take precedence."

This second sentence needs to clarify that the processor requirement for precedence only applies for the same O (owner, and thus possibly also NS) whereas it is currently ambiguous.

If a given attribute for a specific owner (as defined by the O and NS entries) is specified more than once, the later (in array order) entry shall take precedence.

@petervwyatt petervwyatt added proposed solution Proposed solution is ready for review and removed help wanted Extra attention is needed labels Nov 21, 2023
@mrbhardy
Copy link

PDF/UA TWG committee agrees with proposed solutions.

petervwyatt added a commit that referenced this issue May 24, 2024
petervwyatt added a commit that referenced this issue Jun 25, 2024
@petervwyatt petervwyatt removed bug Something isn't correct documentation Improvements or additions to documentation proposed solution Proposed solution is ready for review labels Nov 12, 2024
@petervwyatt petervwyatt added the ISO approved Resolved issue approved by ISO label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ISO approved Resolved issue approved by ISO
Projects
None yet
Development

No branches or pull requests

3 participants