Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle 'KnoraRules.pie' changes #1506

Closed
subotic opened this issue Nov 7, 2019 · 21 comments
Closed

How to handle 'KnoraRules.pie' changes #1506

subotic opened this issue Nov 7, 2019 · 21 comments
Assignees
Labels
enhancement improve existing code or new feature

Comments

@subotic
Copy link
Collaborator

subotic commented Nov 7, 2019

Currently, when using the published Docker images, KnoraRules.pie is backed into the dhlabbasel/knora-graphdb-se Docker image.

If an existing deployment is upgraded to newer Docker images and the new version of dhlabbasel/knora-graphdb-se contains a changed KnoraRules.pie which requires changes to the data, will GraphDB startup? I seem to remember, that it would complain and that we had to re-run the data loading script.

So, what can happen is that we are left with a deployment which will require a very manual process to get working again. Basically:

  1. export data before upgrade
  2. run upgrade script on data
  3. upgrade graphdb image and start a new container
  4. import data

This highly manual and error-prone workflow is not something I'm willing to do.

My suggestions:

  • freeze KnoraRules.pie forever
  • get rid of KnoraRules.pie

Or are there any other possibilities that I'm missing?

@subotic subotic added the enhancement improve existing code or new feature label Nov 7, 2019
@benjamingeer
Copy link

KnoraRules.pie contains our consistency-checking rules and a few custom inference rules. From time to time we add things to it to support new features in Knora, so I don't think it's possible to freeze it forever. Getting rid of it would mean getting rid of consistency checking in GraphDB, which is a useful feature that users have been very glad to have.

It's unlikely that changes to KnoraRules.pie would require changes to data. But if it happened, wouldn't the situation would be basically the same as any change in Knora requiring data to be updated? I don't understand what you mean by "upgrade graphdb image and start a new container".

I haven't changed KnoraRules.pie in a long time, but I think you have to restart GraphDB, and perhaps also delete the repository and create it again, to get GraphDB to reload KnoraRules.pie. But we could do this automatically in the upgrade script, so it would be done with every upgrade.

@subotic
Copy link
Collaborator Author

subotic commented Nov 7, 2019

I don't understand what you mean by "upgrade graphdb image and start a new container".

To upgrade the GraphDB container, one would shut it down and delete it, download the new image and spin up a new container using the data from the previous version. When the new container starts it also starts GraphDB, which then has the new version of KnoraRules.pie. So basically, the old data is started with GraphDB using a new version of KnoraRules.pie. I don't think that GraphDB will start. Thus the upgrade program will not work, because there is no running GraphDB. Maybe I'm wrong. That was my question.

What could work though, if we must have KnoraRules.pie, is that the rules are loaded dynamically by knora-api. Also, at some point in time (sooner than later) there should not be any direct access to GraphDB for data loading operations, but only through knora-api. See #1485.

Sooner or later we will need to part ways with KnoraRules.pie because we are going to move to a different triplestore (https://rya.apache.org was mentioned). Also, if we provide support for Fuseki again, then there we will also need a solution for consistency checking. It would probably be better to start looking at SHACL.

@benjamingeer
Copy link

To upgrade the GraphDB container, one would shut it down and delete it, download the new image and spin up a new container using the data from the previous version. When the new container starts it also starts GraphDB, which then has the new version of KnoraRules.pie.

Why not automatically update KnoraRules.pie as part of the data update? In other words:

  1. Export the data into a TriG file for the update.
  2. Shut down the container and delete it.
  3. Run the update script, generating a new TriG file.
  4. Start a new container with the new KnoraRules.pie.
  5. Import the new TriG file into GraphDB.

What could work though, if we must have KnoraRules.pie, is that the rules are loaded dynamically by knora-api.

That wouldn't work, because consistency rules have to be defined before any data is loaded into the repository. Otherwise, there could be data that is not checked.

Also, if we provide support for Fuseki again, then there we will also need a solution for consistency checking. It would probably be better to start looking at SHACL.

But I suppose the situation will be the same: the SHACL rules will have to be defined when the repository is created.

@subotic
Copy link
Collaborator Author

subotic commented Nov 7, 2019

  1. Export the data into a TriG file for the update.
  2. Shut down the container and delete it.
  3. Run the update script, generating a new TriG file.
  4. Start a new container with the new KnoraRules.pie.
  5. Import the new TriG file into GraphDB.

Yes, this is what I was describing in my first post and what I would like to avoid :-) There are too many manual steps involved. Ideally, I would like the upgrade process to be completely automatic. As the size of the data grows, this will become more and more painful. If the upgrade is painful and time-consuming, then nobody will want to do it, including me. My goal is actually to go towards continuous deployment, i.e., each commit and not yearly releases ;-)

I will have to think of something to be able to automate it.

Just a thought, as our user interface gets developed, I hope that nobody will have the need to manually edit their data. If they still want to do it for some reason, then maybe an external tool, which they could run over their data would be more helpful, then loading it into GraphDB to see if it is formally correct.

Besides the usefulness for the users who edit the data outside of Knora, are there any features of knora-api that depend on KnoraRules.pie and wouldn't work without it?

@benjamingeer
Copy link

benjamingeer commented Nov 8, 2019

as our user interface gets developed, I hope that nobody will have the need to manually edit their data

Consistency checking in the triplestore isn't only useful for people who edit data outside of Knora. It also protects us from database corruption caused by bugs in Knora. It would be a nightmare if a bug in Knora caused data corruption that was not noticed until after a lot more data was added, making it impossible to fix the problem by reverting to an earlier backup.

If we were using a relational database, I would implement consistency checks in the database, using mechanisms like these:

are there any features of knora-api that depend on KnoraRules.pie and wouldn't work without it?

Ensuring data integrity is a feature. :)

Most of our SPARQL relies heavily on GraphDB inference to optimise queries. KnoraRules.pie provides a custom combination of RDFS inference and the inference rule for owl:TransitiveProperty.

If we didn't use KnoraRules.pie, we wouldn't be able to have this custom mixture of RDFS and OWL inference rules. We would have to use one of GraphDB's standard .pie files for RDFS or OWL inference. These standard rules files could also need to be updated with new versions of GraphDB. So we would still have the same problem.

@loicjaouen
Copy link
Contributor

user comment: as previously stated elsewhere, here at Lausanne, we did a couple of knora api versions jumps and we have seen existing data not passing the consistancy checks of a newer pie file.

Our upgrade process is:

  • run Ben's script, get the resulting trig file
  • re-init the tiplestore with the script (with the line loading the data commented out) that reloads the current pie file
  • upload the trig file

For now, when data editing is required by a knora-api update, it needs to be done off line, so through trig files and re-import, so updating the pie file is not a problem.

@subotic
Copy link
Collaborator Author

subotic commented Nov 10, 2019

Ok, so we need KnorsRules.pie. Good, that’s settled.

It maybe doesn’t need to be 'hardcoded' in the repository? knora-api could load and unload it (the therein specified rules) as necessary, to allow a more automated upgrade procedure.

@benjamingeer
Copy link

benjamingeer commented Nov 10, 2019

It maybe doesn’t need to be 'hardcoded' in the repository? knora-api could load and unload it (the therein specified rules) as necessary, to allow a more automated upgrade procedure.

I don’t understand what you mean by “as necessary”. The rules are always necessary. They are necessary when the repository is created, before any data is loaded, to ensure that all data is checked.

Similarly, in relational databases, integrity constraints are specified as part of the CREATE TABLE command.

@subotic
Copy link
Collaborator Author

subotic commented Nov 10, 2019

I don’t understand what you mean by “as necessary”.

I'm only talking about administration tasks. For example when dropping a Graph. This, of course, requires correct knowledge of what depends on what. If I want to drop the only data graph or the ontology and data graph of a project, then this should be fairly safe to do with the rules turned off.

As a contrast, currently dropping a graph involves exporting everything but the graphs that we want to delete, emptying the repository, and then reloading the export.

We simply need a way to perform certain administrative tasks in a reasonable timeframe without the need to sit at the computer and perform a number of manual steps where each step can be botched and prolong the whole process. I'm speaking from experience. Everything that could go wrong, I managed to get wrong. It is simply too boring to sit for 20 minutes and wait for GraphDB. I then simply forget what I was doing or skip a step and can the start from the beginning. This is really not fun and we need a much better solution.

If GraphDB does not allow us to do this, then we should think of alternatives. Currently, we have a handful of projects and already somewhat big problems. I'm afraid that when we really begin to pump projects into Knora, that the problems are going to get only worse.

Also, just that we are clear on this. I don't have a problem with consistency, I have a problem that the implementation in GraphDB is very slow, at least for the mentioned cases.

@benjamingeer
Copy link

As far as I know, GraphDB is the only triplestore that has a production-ready consistency checking implementation. The current implementations of SHACL were all still experimental the last time I checked. And who knows how fast they are.

Dropping a graph could introduce an inconsistency, because there can be links between resources in different graphs.

It sounds to me like you're doing tasks manually that should instead be automated. If all the steps were automated by a single script, there would be no need for you to sit for 20 minutes waiting. You could start the task at the end of the day, and the next morning it would be done.

@benjamingeer
Copy link

If I want to drop the only data graph or the ontology and data graph of a project, then this should be fairly safe to do with the rules turned off.

Only if you can guarantee that there are no links between resources in different projects. And if, as you say, anything that can go wrong will go wrong, doesn't it seem safer to let the triplestore check this for you?

@benjamingeer
Copy link

benjamingeer commented Nov 10, 2019

To put it another way, this is the same contradiction I pointed out here:

  1. You don't trust yourself to do a manual process without making mistakes.
  2. You trust yourself to turn off the database's consistency checks, because you know what you're doing.

I think best practice is not to trust any human or computer program not to corrupt the database. The database should always protect itself from inconsistencies. That's why DBMS systems generally have declarative integrity constraints.

@loicjaouen
Copy link
Contributor

there is no contradiction but a matter of setting-up procedures.

  • test it first on a staging server, with consistency on, and it take ages but you know that it works
  • then because you know it is safe, you do it again on prod, without consistency checking, so the downtime is acceptable

@benjamingeer
Copy link

In that case, why would knora-api need to deal with turning consistency checking on and off? It doesn't even know whether it's running on production. Only the sysadmin knows this.

@benjamingeer
Copy link

The idea that "you know it is safe" assumes that you will never make a mistake, e.g. typing a command in the wrong terminal, using the wrong file, etc.

@loicjaouen
Copy link
Contributor

In my humble opinion:

why would knora-api need to deal with turning consistency checking on and off?

that would come in handy for the sysadmin and reduce the error prone handling of different commands and terminals

The idea that "you know it is safe" assumes that you will never make a mistake

even with the consistency checker enabled on prod, I run the upgrade script on staging first (as long as I can afford the space) and do a back-up before running on prod. We can put more safety layers, but I think that they have their reasons of being, KnoraRules checks the live operations, and upgrades so far are not running on the live system so they can be checked separately.

@benjamingeer
Copy link

OK, then, let's try adding a Knora route that turns consistency checking on and off. @subotic do you know how to do this? Would you like to implement it?

@subotic
Copy link
Collaborator Author

subotic commented Nov 12, 2019

yes, it is easy, though I'm not sure if we need a route for this. Let me think a bit about this. I wanted to implement it in a different way.

@benjamingeer
Copy link

If you disable KnoraRules.pie completely, what happens to the triples that were inferred previously? Are they all deleted?

@subotic
Copy link
Collaborator Author

subotic commented Nov 12, 2019

I don't think so. But reinferring can be started with the following statement:

INSERT DATA { [] <http://www.ontotext.com/owlim/system#reinfer> [] }

@subotic
Copy link
Collaborator Author

subotic commented Nov 12, 2019

Turn off:

INSERT DATA {
    _:b sys:defaultRuleset "none"
}

Turn on:

INSERT DATA {
    _:b sys:defaultRuleset "KnoraRules"
}

@subotic subotic added this to the Backlog milestone Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement improve existing code or new feature
Projects
None yet
Development

No branches or pull requests

3 participants