-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntactic rules for valid attribute objects across clauses 14.7 and 14.8 is very difficult to understand and has mistakes #226
Comments
I think it would be good to create such a list, but I would caution against formulating it as a limitation. Given that the presumptive default way to process an unknown attribute owner is "do nothing", using externally defined attribute owners shouldn't be a problem for any kind of validation. For better or worse, the current PDF/UA and WTPDF drafts rely on this neutral-by-default extensibility, so we need to be careful not to contradict ourselves. As far as I'm concerned, the primary use case for the list would be to easily find the provisions to validate the syntax of attribute dictionaries with attribute owners that are defined in the core specification.
Granted, we do the same thing for structure element types, so there's something to be said for bolding attribute owners as well... But perhaps that should be clarified somewhere.
I might have misunderstood your point, but the A entry on a structure element can be an array with multiple entries, each of which is an attribute dictionary with its own O and its own set of attributes. Here's a typical example I grabbed from one of my own files: a TH element with an explicitly defined Scope (=> a Table attribute), in addition to BBox, Color and Padding (those are Layout attributes). |
I certainly didn't/don't mean that - O can be anything(*). But I just want to understand what the spec defines since that enables basic interoperability, as 14.7.6 says "To facilitate the interchange of content among conforming products, PDF defines a set (*) Note that Annex E reference does limit the values of O to being 2nd class names so it's not quite anything...
Did someone say Arlington? 😁 My other question wasn't well phrased but is also more vague as I was trying to grok all the requirements in the case of "not because you should, but because you can" for inheritance. Let me try to rephrase my thoughts:
Hopefully that is clearer. |
There's a lot of questions bundled together here. It is a shame Github doesn't allow richer threads to enable conversations on the different points without them having to be completely independent issues. In fact, we should treat O as being permitted to be anything, regardless of whether that is what the spec officially says. Other ISO standards and extensions are permitted to extend this with first class names and I would argue that if that is the only extension, it would be a valid ISO 32000-2 without that extension being applied. What ISO 32000-2 does do is define a set of known values for O and keys that are associated with each of those known values.
I don't necessarily agree with this. When the value is other than NSO or UserProperties, then the owner keys are the ones that apply and, as I said above, it should be treated as an open-ended set. In fact, I think we're confusing things if we merge these, since one is part of Logical Structure (O with values NSO or UserProperties) and the other is part of Tagged PDF. It is a common mistake to treat these as interchangeable, but they are not. O is open-ended in Logical Structure with no known values and when used in a Tagged PDF, a set of known values within that open-ended set is defined and reserved.
I think you are misreading this section. What we are saying is that the defined attribute owners in Tagged PDF have reserved attribute keys and values. The Layout set of attributes are reserved to that owner (but we are trying to be clear that there is no intent to reserve these values outside of Layout. So, if you have an owner of List and you have the attribute BorderColor, then you are effectively saying nothing (or a strict validator might reject this). It isn't saying that the layout attributes have the same meanings and definitions. Therefore, inheritance is only meaningful within the context of an attribute owner. So, you have to define an O value of Layout to meaningfully have inheritance for any attribute in that set. If the same attribute key occurs within the same structure tree node or beneath but within a different attribute object with a different owner, then they have no interaction. I can define the MRBH attribute owner with a key BorderColor, it has no relationship to the BorderColor defined in the Layout attribute owner.
Yes
No
Zero interaction |
I agree with what @mrbhardy wrote, but there are two points I would like to clarify. The first one is perhaps a little obvious, but when conceptualising attributes as key-value pairs, I like to think of them as This "model" (if you can even call it that) is also useful to explain attribute inheritance, IMO:
I get the feeling that the misunderstanding is about conflating two processor requirements that don't really interact. Let's single out this piece of text quoted by @petervwyatt above:
This is not about inheritance, but rather about how to resolve conflicts when two or more attribute dictionaries with the same owner appear inside (the A entry of) a single structure element. The inheritance rules are orthogonal to that: once all the I hope that made sense. |
Thanks @mrbhardy and @MatthiasValvekens - that really helped! I have no issue with what is described, just that the words don't capture some of this, so let me think about some simple point-solution errata fixes that we might be able to do to make this far clearer (without rewriting large slabs of text). Bulk formatting changes to make things consistent with the rest of ISO 32K are off the table AFAIC (except to note that it should be done sometime in the future). One remaining question regarding O (owner): it is now implied that this can literally be anything and not just a 2nd class name. How do we avoid name collisions from different implementers? Or is this not considered a problem because these are "merely attributes" and since they're considered as an PS. The list of official exceptions to 2nd class name rules are captured in Annex E.2, 2nd bullet under "Second class names" where it states "... except keys added to a document information dictionary (see 14.3.3, "Document information dictionary") or a thread information dictionary (in the I entry of a thread dictionary; see 12.4.3, "Articles"). If attribute O owner is another exception we should extend this list. |
@mrbhardy another Q: can anyone add additional new custom attributes to the defined sets of O owners? |
In principle, for a given O owner that is fully defined in ISO 32000-2 (i.e. Layout, List, Table, ListNumbering and Artifact), I would say no, they cannot be extended and that should be considered (softly) invalid. However, I could live with second class names, because at least there's no conflict for the future. However, ideally they would have a different owner associated with the entity adding them, rather than stuffing them into existing Attribute Objects. For the list of defined owners that are defined externally (e.g. ARIA-1.1), I think they are more open to interpretation, so would not want to try validate them. For custom owners or namespaces, I would say it is legitimately open-ended. |
@petervwyatt what are next steps on this issue? |
Let me draft up some proposed new wordings... |
Trying to conclude some concrete improvements:
|
14.7.6: "When an array of attribute objects is provided, the value of the O and NS keys may be repeated across attribute objects. If a given attribute is specified more than once, the later (in array order) entry shall take precedence." This second sentence needs to clarify that the processor requirement for precedence only applies for the same O (owner, and thus possibly also NS) whereas it is currently ambiguous.
|
PDF/UA TWG committee agrees with proposed solutions. |
Syntactic (not semantic!) rules for valid attribute objects spread across clauses 14.7.6 and 14.8.5 is very difficult to discover and understand - and there are some errors:
there is no definitive list of all formally defined values for the required key O (owner).
"Table 360 - Entries common to all attribute object dictionaries" defines NSO and UserProperties, and "one of the values from 14.8.5" which is a 23-page subclause!! So is this really meaning only the values listed in Table 376? Or are there also other valid values of O not in Table 376, but buried somewhere else in the text of 14.8.5?
If it is just Table 376 it would be far far better to reference it precisely from Table 360 rather than some mega-clause.
And it would be good if Table 376 included (or noted) NSO and UserProperties for completeness, with a cross-reference back to Table 360.
throughout all subclauses of 14.8.5, Layout, Table, PrintField and Artifact are formatted as bold which indicates a key name, when, in fact, they are O (owner) key values and should really be italic. This is confusing.
14.8.5.3 attribute inheritance vs explicit exclusion requirements based on the value of O in various 14.8.5.x subclauses is unclear.
For example, 14.8.5.4.1 (layout attributes) states "Attributes in this category shall be defined in attribute objects whose O (owner) entry has the value Layout or whose owner is any other owner excluding List, Table, PrintField and Artifact.". But 14.8.5.3 on attribute inheritance lacks explanation in terms of normative requirements: "an attribute that is specified for an element shall apply to all the descendants of the element in the structure tree unless a descendent element specifies an explicit value for the attribute".
So does this mean that if you want to inherit an attribute that is prohibited by the value of O (owner) then you need to push it further up the structure tree to another node that has some other O (owner) that permits it?
In which case, 14.8.5.3 should probably note that not all elements in the structure tree may allow for the inclusion of certain inheritable attributes based on requirements in other subclauses.
Or can you have the "wrong attributes" for the purposes of inheritance?
In which case the various file format requirements like ".. shall be defined in attribute objects whose O (owner) entry..." should change to something like a processor requirement to "apply"...
Table 377, second column is titled "Attributes" and the entries are formatted as normal text. I believe these are precise dictionary key names that can occur in attribute objects, rather than a description of the attribute so it would be better if:
Table 379 and Table 385 conflict on which values of O (owner) BBox can be validly defined:
Table 379 defines BBox and is constrained by the 2nd sentence in 14.8.5.4.1: "Attributes in this category shall be defined in attribute objects whose O (owner) entry has the value Layout or whose owner is any other owner excluding List, Table, PrintField and Artifact.".
Table 385 also defines BBox and is constrained by "... attribute objects whose O (owner) entry has the value Artifact or whose owner is any other owner excluding Layout, List, PrintField and Table."
So it appears that when O (owner) is either Layout or_Artifact_, or whose owner is any other owner excluding List, PrintField and Table (e.g. NSO or the external format values in Table 376), BBox is allowed!
The text was updated successfully, but these errors were encountered: