Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support different (older) PAGE namespaces #67

Closed
kba opened this issue Apr 23, 2018 · 9 comments
Closed

Support different (older) PAGE namespaces #67

kba opened this issue Apr 23, 2018 · 9 comments

Comments

@kba
Copy link
Member

kba commented Apr 23, 2018

No description provided.

@bertsky
Copy link
Collaborator

bertsky commented Jul 17, 2019

I would like to share some ideas on this. This is both an issue of how good our implementation can be, and what can be done in the schema itself to make life easier for applications.

problem

Currently, we have a very bad situation to begin with: The pagecontent schema changes its xs:targetNamespace in a fixed release schedule (once per year), usually without even introducing breaking changes. And our implementation can only parse (or produce) instances with the one version it was "built with" (i.e. on which generateDS was run). This forces us to both release new versions of core each year, and (worse) recreate (or map) all documents to the new version, too. If we forgot to do either of that in just a single case, processing would break down.

We can do better. In the following, I will frequently refer to concepts and recommendations in this excellent guide on XML schema versioning.

First and foremost, I do not think it is good practise to use targetNamespace versioning, if one does not attempt non-breaking changes.

For example, right now, a new release http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 is under way, but all it changes w.r.t. the previous version http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15 are new elements and attributes which are optional. Even in previous years, most of the changes were compatible extensions. The rest were bugfixes (so no instance could possibly become invalid or wrong after the change if it has not been before) and minor semantic changes that would only invalidate a tiny fraction of documents that used the respective features heavily.

solution

targetNamespace for major, version for minor releases

As long as we have purely backwards compatible changes (as we do now), we could just stay in the namespace (even if it happens to be named after some particular year), and add an (internal) version attribute to its declaration (i.e. /xsd:schema/@version), which we increase with each release (e.g. 2.0 etc). That way, old documents can stay as they were, and only applications have to be updated.

(They could be updated manually in a fixed release schedule, or even be built in such a way that they update automatically: they merely have to look up the schema location whether new versions are published, then download and incorporate them accordingly. Or they could look up the schema location and just show a warning that they are outdated.)

And if, in the future, needs do arise that require breaking changes, then we can still start a new namespace (but again with version="1.0"). So we would have:

  • /schema/@targetNamespace changes for major releases, and
  • /schema/@version changes for minor releases.

How does that apply to the situation we face now, how do we introduce this? For the current schema does not yet have a version, and existing applications do not yet have an updating mechanism in place. So if tomorrow new documents appear with the extended features (new elements and attributes), applications will show them as invalid as long as they themselves have not been updated.

But remember: this is not any worse than what we already faced year by year! On the contrary: we used to have the dilemma of either updating our application and breaking all the old documents, or not updating and breaking some of the new documents. Now at least we can safely promote updating applications!

schema/@version for releases vs PcGts/@compatibleVersion for documents

However, it gets better. We could also introduce a new (external) attribute in the schema definition that informs the application which version(s) the document is compatible with, say /pc:PcGts/@compatibleVersion, with a xs:default="1.0". Now an application (updated with the new schema versioning mechanism) can first pre-parse the document and look at compatibleVersion: this will implicitly yield 1.0 for all old documents, whereas new documents (at least those which need new features) will have to explicitly specify 2.0. Now the application can look up all its known releases and re-parse the document fully – even enforcing the minor version by validation. Also, applications can show the correct error message if they meet a document that is newer than themselves.

conclusion

So to sum up, I recommend changing the current versioning system to allow differentiating between major and minor releases, so the burden on both data providers and application programmers gets reduced. This can be done by just staying within the namespace and introducing version="2.0" with the next minor release. Optionally, we could even introduce compatibleVersion in the root element to allow finer control and more flexibility.

(Note: We could even revert the latest 2019-07-15: up until now, it is unlikely that any applications or documents have already adopted the new namespace.)

@kba @wrznr @tboenig @chris1010010 @cneud What do you think?

@wrznr
Copy link
Contributor

wrznr commented Jul 18, 2019

The proposal sounds very reasonable to me and would help us in many ways. However, this would in fact be better placed and discussed in the PAGE XML repository, right?

In addition, it would be interesting to know how versioning is handled in ALTO... @cneud?

@bertsky
Copy link
Collaborator

bertsky commented Jul 18, 2019

I could transfer the issue to https://github.com/PRImA-Research-Lab/PAGE-XML ...

@chris1010010
Copy link

Yes, that sounds reasonable. I think we'd need a required "schemaRevision" or "schemaVersion" attribute in the root.
Please transfer to PAGE-XML, if it's no trouble

@chris1010010
Copy link

Oh, and versioning in ALTO was a mess :-)
Better now I think

@bertsky
Copy link
Collaborator

bertsky commented Jul 18, 2019

Please transfer to PAGE-XML, if it's no trouble

Sorry, cannot do it myself. Github states you must have write permissions on both the sending and the receiving end. Anyone?

@bertsky
Copy link
Collaborator

bertsky commented Jul 18, 2019

Yes, that sounds reasonable. I think we'd need a required "schemaRevision" or "schemaVersion" attribute in the root.

I will make a PR then to get this running. So it's @schemaVersion rather than @compatibleVersion then.

@chris1010010
Copy link

@wrznr
Copy link
Contributor

wrznr commented Jul 18, 2019

Closed as off-topic. Pls. refer to PRImA-Research-Lab/PAGE-XML#14

@wrznr wrznr closed this as completed Jul 18, 2019
bertsky pushed a commit to bertsky/core that referenced this issue Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants