-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Common Serialization Style #80
Comments
Thanx Jan Martin, for this comprehensive overview. Very difficult to weigh the different pros and cons, I immediately admit. Such issues have always been the reasons why I have worked with a manual text file so far. But apart from that, it is always good to look ahead. I really appreciate that. |
No, I don't know one. That is a very special requirement and I doubt there exists one or it would be worth the effort to develop one. The switch from RDF/XML to Turtle was intended just a minor addition, given a switch to a standard serialization will be done. The main issue is the use of an standard serialization, as it would enable the use of all kind of tools for automated updates on the ontology and would therefore ease and speed up the work on many other issues. |
OWLAPI is the most widely used serializer. |
@HajoRijgersberg: Any progress on this? To better understand your requirement: What do you use the comments and order for? |
Thanx both, I have to dive into OWLAPI. |
I have been thinking a lot in the meanwhile about the entire issue. I would like to know your opinion on the following (I think not ideal) approach: what if I created a script that writes the entire OM, and different versions of OM (DL, EL, etc.)? The script should read from a database containing all units, quantities and so on and relations between them (and other concepts). Both the database and the script could be on this Git. Every time the database gets altered/extended and/or the script has changed, a new version of OM could be generated using the script. |
I thinks this kind of automation is the direction to go to improve the development process. That is what I meant with
Github Actions / Pipelines can automatically get triggered by pushes, releases, pull requests, … and they are configured in the repository itself. But I recommend to stick with RDF based technologies: TTL instead of CSVs or databases, SPARQL CONSTRUCT or SPARQL UPDATE where feasible. The first steps in my point of view are:
After that, one could start to add automated generation scripts. |
Thanx Jan Martin, the idea of pipelines sounds really good. Two questions, if you allow me: 1. Earlier I argued why we (still) have to work with a manual rdf file. In short: because of transparency through human readability, on its turn through structure, order and comments. I write 'yet' in brackets because I still keep the hope that there will be an ontology editor that will maintain such kind of things. But I assume that in principle it would be no problem that the pipelines would be based on that manual source file? I thought I'd check with you for optimal clarity. |
I don't think the new serialization will worse transparency. The TTL serialization of the OWL API has a (different) clear structure to:
But human readability is also matter of taste on the one hand. On the other hand, you as the author are used to the structure of the file you wrote. I completely understand and respect, that you don't want give that up something lightly.
I don't expect that there will once be such an editor. The reason is, that (with some exception like databases) tools typically parse/de-serialize a file in the beginning into an internal representation to work on it and at the end, they generate a new serialization from the internal representation. That way, serialization and de-serialization are completely separated modules of the software, which makes them easier to maintain. Adapting an existing serialization would require an (additional) module that is concerned with both, serialization and de-serialization, which is probably error-prone and way harder to maintain (especially, if the format is not designed for in-place updates).
Yes, a pipeline could also work on the current serialization to perform regular tasks. However, it would not be possible to use tools to do one-time changes that will get pushed back into the repository. That way, it would not be possible to use, e.g. SPARQL UPDATE to perform updates of issues like #79, #84. Given that neither you nor someone else has the time to do all these changes manually soon, the further evolution of OM would benefit significantly from the option to use automation for one-time changes.
I see the main benefit of Turtle for the ontology maintenance, as it eases (in my point of view) editing, comparing and merging. The serialization format of the fill one imports into another ontology or uploads into triplestore does not really matter. But my major point in this issue is the use of a standard serialization. |
Thanx again for your response, Jan Martin.
It's not so much the ttl; I would certainly like to move to ttl, for two reasons: it is even better human readable, and it is more popular (I think).
The structure of om-2.0.rdf is different: it is a.o. according to application area. (It should even be improved: simplified as a matter of fact. The second level, namely, is per quantity and unit (to put it simply). These will be integrated in the future. It is good that I tried this out (the current structure) as I have learnt how to improve it in a next step.)
True, but it is more than that: the quality of OM is supported in two ways: your ABECTO and me as the author being able to overview the complete file and its contents (and also enabling other people to do so). This is of course a huge guarantee (although never ultimate) for the quality of OM, which we can never give up. (There's also the problem that we could never go back once we would have given it up, but I'm not sure how important that problem is compared to the statement about the guarantee of the quality of OM.)
I appreciate that, really a lot. One of my primary concerns namely is that I may disappoint you in the above matters. Where you do so much for OM. It would really make me sad if I would disappoint you, I'm telling you honestly. On the other hand, of course, I have to carry the responsibility for OM till the extent that I am able to carry it. I'm sure you'll understand. And I'm also sure that we will make steps with your pipelines, maybe not everything how you envisioned it, but definitely important steps like automatically deriving ttl DL, EL, etc. versions from the original om-2.0.rdf every time it gets updated. That would really be very great! :) Of course I'll describe all that in the readme of the OM Git.
As described above, unfortunately we cannot do that.
That editor should "only" be able to remember the order of statements and comments as much as possible and put them back. I don't think that should be in the serialization. RDF, namely, does not support such functionality (I think?).
That is great to read. I hope we (you) could start with that! :)
I understand. We have to postpone that to OM 3.0. I would be a fool if I wouldn't develop that one in ttl.
Not soon, but I have manually developed OM 1 and 2. Number 3 will also be developed, manually, by me. I am making preparations presently.
True, but as argued above, summarized: the price concerning the transparency and therefore the quality of OM is, unfortunately, too high. :/
As to the standard serialization: purely for my optimal understanding: the xml format is also a standard serialization, isn't it? |
Yes, RDF/XML is a standardized serialization. And OM is compliant to them. My wording in this point was not ideal: The point is to use a common serialization style (compliant to the standardized serialization format), produced by a widely used serialization implementation that produces a stable order of statements to become able to restore the style and order after using arbitrary tools to update the ontology.
Did you already consider to split up the ontology into several files? That could enable a specific order and the use of a common serialization style at the same time. With a release pipeline, it could be merged into one file later on.
Will you share earlier states for feedback / discussion, e.g. in another branch? I close this issue now, as there is a clear decision to not change the serialization style. |
Thanx for your answer, Jan Martin. Please allow me to ask some further questions, although this issue has been closed:
But, again for my understanding, RDF/XML also is a common serialization style, right?
That would be good, but: if it were ordered according to application area, a.o. Note that we have this fixed order, but I assume you mean using an automated tool (not manually as I'm doing with OM).
Yes, OM is organized in such way that a prestep is made to future splitting it up in these several application-area-specific ontologies.
Yes, that's a very good idea. A number of the present issues b.t.w. relate to this future version in the sense that I would like to deal with them/incorporate them in OM 3.0. |
RDF/XML is a serialization language / RDF file format specified in a W3C recommendation. However, this standard provides some flexibility, e.g. in terms of statement order, indentation, …. With the term serialization style (a self made term, borrowed from code style, maybe there is better term for that) I refer to additional rules that narrow the flexibility in the serialization language specification to gain readability and comparability of the serialization. That way, RDF/XML is a standardized serialization language, which can be used in several serialization styles. |
Clear (I think), thanx! |
Following up #42 and #49 (@dr-shorthair), I also would like to propose to use an standard serialization for OM.
This has the advantage, that one can do automated changes beyond regex replacements, also automatically triggered after each push. An disadvantage would be, that comments in the documents get lost (but they could be moved into annotations).
I think the best choice would be the OWL API serialization because:
An alternative would for example be the Apache Jena serialization:
but
One could also use this occasion to switch to Turtle as the main development serialization. It has in my point of view the following advantages compared to RDF/XML:
An automated generation of other serialization would be possible with a release pipeline. (But that is another issue.)
If you want this, the following steps must be done:
The text was updated successfully, but these errors were encountered: