-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle 'KnoraRules.pie' changes #1506
Comments
KnoraRules.pie contains our consistency-checking rules and a few custom inference rules. From time to time we add things to it to support new features in Knora, so I don't think it's possible to freeze it forever. Getting rid of it would mean getting rid of consistency checking in GraphDB, which is a useful feature that users have been very glad to have. It's unlikely that changes to KnoraRules.pie would require changes to data. But if it happened, wouldn't the situation would be basically the same as any change in Knora requiring data to be updated? I don't understand what you mean by "upgrade graphdb image and start a new container". I haven't changed KnoraRules.pie in a long time, but I think you have to restart GraphDB, and perhaps also delete the repository and create it again, to get GraphDB to reload KnoraRules.pie. But we could do this automatically in the upgrade script, so it would be done with every upgrade. |
To upgrade the GraphDB container, one would shut it down and delete it, download the new image and spin up a new container using the data from the previous version. When the new container starts it also starts GraphDB, which then has the new version of What could work though, if we must have Sooner or later we will need to part ways with |
Why not automatically update KnoraRules.pie as part of the data update? In other words:
That wouldn't work, because consistency rules have to be defined before any data is loaded into the repository. Otherwise, there could be data that is not checked.
But I suppose the situation will be the same: the SHACL rules will have to be defined when the repository is created. |
Yes, this is what I was describing in my first post and what I would like to avoid :-) There are too many manual steps involved. Ideally, I would like the upgrade process to be completely automatic. As the size of the data grows, this will become more and more painful. If the upgrade is painful and time-consuming, then nobody will want to do it, including me. My goal is actually to go towards continuous deployment, i.e., each commit and not yearly releases ;-) I will have to think of something to be able to automate it. Just a thought, as our user interface gets developed, I hope that nobody will have the need to manually edit their data. If they still want to do it for some reason, then maybe an external tool, which they could run over their data would be more helpful, then loading it into GraphDB to see if it is formally correct. Besides the usefulness for the users who edit the data outside of Knora, are there any features of |
Consistency checking in the triplestore isn't only useful for people who edit data outside of Knora. It also protects us from database corruption caused by bugs in Knora. It would be a nightmare if a bug in Knora caused data corruption that was not noticed until after a lot more data was added, making it impossible to fix the problem by reverting to an earlier backup. If we were using a relational database, I would implement consistency checks in the database, using mechanisms like these:
Ensuring data integrity is a feature. :) Most of our SPARQL relies heavily on GraphDB inference to optimise queries. KnoraRules.pie provides a custom combination of RDFS inference and the inference rule for If we didn't use KnoraRules.pie, we wouldn't be able to have this custom mixture of RDFS and OWL inference rules. We would have to use one of GraphDB's standard |
user comment: as previously stated elsewhere, here at Lausanne, we did a couple of knora api versions jumps and we have seen existing data not passing the consistancy checks of a newer pie file. Our upgrade process is:
For now, when data editing is required by a knora-api update, it needs to be done off line, so through trig files and re-import, so updating the pie file is not a problem. |
Ok, so we need It maybe doesn’t need to be 'hardcoded' in the repository? |
I don’t understand what you mean by “as necessary”. The rules are always necessary. They are necessary when the repository is created, before any data is loaded, to ensure that all data is checked. Similarly, in relational databases, integrity constraints are specified as part of the CREATE TABLE command. |
I'm only talking about administration tasks. For example when dropping a Graph. This, of course, requires correct knowledge of what depends on what. If I want to drop the only data graph or the ontology and data graph of a project, then this should be fairly safe to do with the rules turned off. As a contrast, currently dropping a graph involves exporting everything but the graphs that we want to delete, emptying the repository, and then reloading the export. We simply need a way to perform certain administrative tasks in a reasonable timeframe without the need to sit at the computer and perform a number of manual steps where each step can be botched and prolong the whole process. I'm speaking from experience. Everything that could go wrong, I managed to get wrong. It is simply too boring to sit for 20 minutes and wait for GraphDB. I then simply forget what I was doing or skip a step and can the start from the beginning. This is really not fun and we need a much better solution. If GraphDB does not allow us to do this, then we should think of alternatives. Currently, we have a handful of projects and already somewhat big problems. I'm afraid that when we really begin to pump projects into Knora, that the problems are going to get only worse. Also, just that we are clear on this. I don't have a problem with consistency, I have a problem that the implementation in GraphDB is very slow, at least for the mentioned cases. |
As far as I know, GraphDB is the only triplestore that has a production-ready consistency checking implementation. The current implementations of SHACL were all still experimental the last time I checked. And who knows how fast they are. Dropping a graph could introduce an inconsistency, because there can be links between resources in different graphs. It sounds to me like you're doing tasks manually that should instead be automated. If all the steps were automated by a single script, there would be no need for you to sit for 20 minutes waiting. You could start the task at the end of the day, and the next morning it would be done. |
Only if you can guarantee that there are no links between resources in different projects. And if, as you say, anything that can go wrong will go wrong, doesn't it seem safer to let the triplestore check this for you? |
To put it another way, this is the same contradiction I pointed out here:
I think best practice is not to trust any human or computer program not to corrupt the database. The database should always protect itself from inconsistencies. That's why DBMS systems generally have declarative integrity constraints. |
there is no contradiction but a matter of setting-up procedures.
|
In that case, why would |
The idea that "you know it is safe" assumes that you will never make a mistake, e.g. typing a command in the wrong terminal, using the wrong file, etc. |
In my humble opinion:
that would come in handy for the sysadmin and reduce the error prone handling of different commands and terminals
even with the consistency checker enabled on prod, I run the upgrade script on staging first (as long as I can afford the space) and do a back-up before running on prod. We can put more safety layers, but I think that they have their reasons of being, KnoraRules checks the live operations, and upgrades so far are not running on the live system so they can be checked separately. |
OK, then, let's try adding a Knora route that turns consistency checking on and off. @subotic do you know how to do this? Would you like to implement it? |
yes, it is easy, though I'm not sure if we need a route for this. Let me think a bit about this. I wanted to implement it in a different way. |
If you disable |
I don't think so. But reinferring can be started with the following statement:
|
Turn off:
Turn on:
|
Currently, when using the published Docker images,
KnoraRules.pie
is backed into thedhlabbasel/knora-graphdb-se
Docker image.If an existing deployment is upgraded to newer Docker images and the new version of
dhlabbasel/knora-graphdb-se
contains a changedKnoraRules.pie
which requires changes to the data, will GraphDB startup? I seem to remember, that it would complain and that we had to re-run the data loading script.So, what can happen is that we are left with a deployment which will require a very manual process to get working again. Basically:
This highly manual and error-prone workflow is not something I'm willing to do.
My suggestions:
KnoraRules.pie
foreverKnoraRules.pie
Or are there any other possibilities that I'm missing?
The text was updated successfully, but these errors were encountered: