-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Schemas #451
Comments
Let's try a more detailed sketch of the proposal. The static context is enhanced so it contains one unnamed schema and any number of named schemas. The schema name is an NCName. The import schema declaration in XQuery and XSLT is enhanced so that you can import a schema and name it at the same time. Also you can simply import a schema by name without specifying a namespace or location hint; this works on the basis that you must have previously made the schema known to the XSLT/XQuery processor using some external API (e.g. by loading it into an XML database). If not preloaded in this way, schema names are local to a query or stylesheet; two different queries or stylesheets can use the same name to refer to different schemas, or different names to refer to the same schema. In the SequenceType/ItemType syntax, any reference to a schema component (schema element or attribute name, schema type name) may be qualified by the schema name; I'm inclined to use the syntax SS/TT where SS is the schema name and TT is the type name. The built-in types such as Where XSLT or XQuery syntax is used to invoke strict or lax validation of an instance document, the syntax is enhanced to allow a schema to be named. An option such as schema=#local is provided to indicate that the document should be validated against a schema identified using xsi:schemaLocation, which will be built as a free-standing schema and not interfere with any other schemas in use. This will result in the document having type annotations referring to types that are not in the static context and therefore cannot be referenced by name. Functions like doc() and collection() are augmented with options to request validation against a specific schema (or a local schema). |
Note that XSLT currently says:
This definition is inadequate. Suppose package P imports namespace N, while package Q imports N and M. And suppose that M contains an element declaration F to be within the substitution group of an element E defined in N. Then in package Q, F is substitutable for E, while in package P it is not, which means that a element validated in package Q against a type T may be invalid against type T in package P. If a function in P is declared to expect an argument of type (Saxon currently deals with this by using the union of all these schemas at run-time. But this isn't right either, because an element validated against this union schema may be invalid against a subset of the schema.) One solution is to impose stronger constraints on the consistency of the schemas imported by the packages making up a stylesheet. As far as I'm aware the cases where the validity of an element against types in schema S is affected by adding components from another schema T include:
A rather heavy-handed way forward might be to define schemas as incompatible if they are affected by these issues. A less draconian solution might be to say that a function expecting an instance of element(*, T) has to satisfy itself that the supplied element is valid against type T as defined in the schema of the containing package; the fact that the element was validated against type T in some other schema is not by itself proof of this. This may involve revalidation. But this raises questions about the type annotation of the revalidated node. Currently validating a node involves copying it, to create a different node with different identity. Perhaps the proposal for issue #596 (pinned values) allows us to contemplate the idea of having two "annotated nodes" that share the same "node identity" but have different (or multiple) type annotations? |
There might be a better approach to this: when a stylesheet declares a function parameter of type Note that this doesn't just affect stylesheets with multiple packages, it affects any situation where the schema used to validate a source document differs in any way (including, for example, the use of xsi:schemaLocation) from the schema imported into a query or stylesheet. |
So sketching this out: Two schemas [sets of schema components] A and B are compatible if for every QName that identifies global element or attribute declarations or global schema types existing in both A and B, the definitions of that component in the two schemas are compatible. For two schema components to be compatible, the properties of the schema components must be the same. Note that it is NOT required that every valid instance of a type T when assessed using schema A is also a valid instance of type T when assessed using schema B. For example, the effects of validating against type T may vary depending on substitution group membership, types derived by extension, or element declarations that satisfy lax or strict wildcards. When a stylesheet or query uses an item type reference such as element(*, T) or schema-element(E), it cannot be assumed that the instances of that type have been validated using the schema defined by the static context of that item type reference; only that they have been validated using a schema that is compatible with that one. |
Note that XQuery (in §2.3.5) defines stronger constraints for cross-module schema consistency:
In practice, it is very hard to satisfy these constraints unless all modules use exactly the same schema (and unless validation of instance documents also uses that schema). (In Saxon, all modules do instance validation against the union of all the imported schemas, though the scope of names used in each module is confined to the schema components imported by the specific module.) I propose to loosen the constraints as described in previous comments. Something like:
There's still a bit of a loose end here. To check whether an element F is a valid instance of |
I wrote: "As far as I'm aware the cases where the validity of an element against types in schema S is affected by adding components from another schema T include:..." For the record I found another case: the outcome of validating an element that uses type alternatives (conditional type assignment) may depend on whether attributes of an ancestor element are declared to be inheritable, which may vary from one schema to another. |
Partially resolved (#635); “PR Pending” removed. |
There are many situations in which a single transformation wants to deal with multiple schemas: for example when transforming from v1 of some industry standard to v2 of the same standard, or when processing a collection of input documents each of which references its own schema using
xsi:schemaLocation
.This is currently possible only if the schemas are compatible (that is, if the union of the schemas is itself a valid schema). And even where it is possible, validation against the union of S1 and S2 may produce a different outcome from validation against S2, for example because a strict wildcard allows content that S2 would not allow. Substitution groups are a particular problem: if v1 and v2 have elements with different substitution group membership, then validating against the union of v1 and v2 allows the union of the substitution groups, which means that you haven't actually verified that the result document is valid against v2.
The problem is confounded by considerations that are outside the scope of the spec. What happens when you run two different stylesheets against the same source document? If the source document has been validated against S1, this means that both stylesheets must use schemas that are supersets of S1. The way this requirement is managed in Saxon is to introduce the concept of a Configuration in which transformations run; a Configuration has a single schema, and all source documents and stylesheets within the Configuration must use compatible subsets of this schema. A source document validated using one Configuration cannot be used in a different Configuration, because the type annotations would be meaningless against a different schema.
My proposal is to introduce the idea of a named schema (that is, a named collection of schema components). When we do
xsl:import-schema
, we can give the imported schema a name, and there is no requirement that the components in this schema should be compatible with the components in any other schema. When we refer to a schema type (for example in$s cast as QName
) we should be able to qualify the type name with a schema name (we can postpone discussions of syntax, let's saycast as my:part-number§v1
for now). When we request validation, we should be able to nominate the schema to be used for validation, for example<xsl:element name="e" validation="strict" schema="v2">
.The trickiest part is handling source documents, mainly because validation of source documents (especially those read using doc() or collection()) is at present almost entirely implementation-defined. I believe that we need explicit options to request validation of source documents against a specific schema. There should also be an option to validate a document against the schema identified in its own
xsi:schemaLocation
, in which case there should be no requirement that that schema is compatible with any schema known statically to the stylesheet.The text was updated successfully, but these errors were encountered: