Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The unnamed/default graph should have a standard name #43

Open
dbooth-boston opened this issue Dec 12, 2018 · 40 comments
Open

The unnamed/default graph should have a standard name #43

dbooth-boston opened this issue Dec 12, 2018 · 40 comments
Labels
enhancement New feature or request

Comments

@dbooth-boston
Copy link
Collaborator

At present the unnamed/default graph has no standard name. This means that, when writing code that manipulates graphs, one must special-case the unnamed/default graph. It also violates one of the Axioms of Web Architecture: "Any resource of significance should be given a URI."

I think the unnamed/default graph should have a standard name, such as http://www.w3.org/1999/02/22-rdf-syntax-ns#defaultGraph ( rdf:defaultGraph ). Implied references to the unnamed/default graph in SPARQL, TriG, etc., should be understood as short-hand for this graph name.

@kasei
Copy link
Collaborator

kasei commented Dec 12, 2018

Does this imply that you think disparate endpoints would be explicitly using the same default graph? That sounds rather strange to me.

One of the underlying setups we attempted to support in the SPARQL 1.1 WG was systems which had an underlying quadstore (where every graph has a name) in which the query engine would use one specific named graph as the default. I've designed systems like this in the past, and been very happy with it. I think an alternative approach to this issue (defined by a future version of SPARQL) might be to reconsider the default graph as a pre-defining of the active graph (as if the query were wrapped in a GRAPH <g> { ... } block). The specific graph being used could be indicated in the service description, allowing it to be referenced explicitly.

@dbooth-boston
Copy link
Collaborator Author

Does this imply that you think disparate endpoints would be explicitly using the same default graph?

It would be the same as when different SPARQL endpoints use urn:foo as a named graph name. A query would not magically cause all SPARQL endpoints to return results from all SPARQL endpoints that use that graph name.

@kasei
Copy link
Collaborator

kasei commented Dec 12, 2018

No, but it might cause some surprising results if you start dealing with metadata or provenance data about graphs (where statements made about another endpoint's default graph now apply to all default graphs everywhere).

@kasei
Copy link
Collaborator

kasei commented Dec 12, 2018

Alternatively, for endpoints that are using the SPARQL Protocol, a default graph IRI could be constructed based on the service endpoint URL.

@rnavarropiris
Copy link

As the Dedicated Unnamed Default Graph is not referenceable, it is not possible to join it with other graphs. In other words, if a query specifies any other graph (whether using FROM or FROM NAMED) the default dataset of the service will be overwritten (13.2 Specifying RDF Datasets) and therefore the Dedicated Unnamed Default Graph is not accessible anymore.

by @depressiveRobot, article here

I complete agree with @dbooth-boston, the default graph should be referenceable, so that it could also be used in a dataset definition.

Furthermore, the query dataset could then be defined as the union default graph, since otherwise there would be no way to retrieve the list of existing graphs in the quad store with a query e.g. in the form

SELECT DISTINCT ?g { GRAPH ?g {?s ?p ?o}}

@dbooth-boston dbooth-boston transferred this issue from w3c/EasierRDF Apr 3, 2019
@cygri
Copy link

cygri commented Apr 3, 2019

One option here is to follow SPARQL Update and support:

FROM DEFAULT

and

GRAPH DEFAULT { ... }

It solves some of the problems/inconveniences (inability to access original default graph if dataset is specified with FROM / FROM NAMED; inability to switch back to default graph inside a GRAPH clause), but does not solve others (listing all graphs in the dataset).

@dydra
Copy link

dydra commented Apr 4, 2019

No, but it might cause some surprising results if you start dealing with metadata or provenance data about graphs (where statements made about another endpoint's default graph now apply to all default graphs everywhere).

that expectation could be framed by choice of iri form.
alternative to a keyword, such as "DEFAULT", one could use a urn or a default indirect graph identifier.

@JervenBolleman JervenBolleman added the enhancement New feature or request label Apr 4, 2019
@afs afs changed the title SPARQL: The unnamed/default graph should have a standard name The unnamed/default graph should have a standard name Apr 5, 2019
@afs
Copy link
Collaborator

afs commented Apr 5, 2019

Removing "SPARQL: " on transferred issue.

@jaw111
Copy link
Contributor

jaw111 commented Apr 8, 2019

As well as supporting DEFAULT as an 'alias', it'd be great to see NAMED and ALL supported in the FROM clause as a way to address union of all named graphs and the default plus all named graphs. Currently those typically have a 'special' name (URI) that is implementation specific.

Examples:

SELECT *
FROM ALL
WHERE { ?s ?p ?o }
SELECT *
FROM ALL NAMED # to avoid confusion with FROM NAMED <uri> syntax
WHERE { ?s ?p ?o }

Edit opened #59 for this topic as it is a separate (but related) issue

@jindrichmynarz
Copy link

RDF4J has a special constant sesame:NIL that refers to null named graph. Would be nice to have a more standard identifier.

@cygri
Copy link

cygri commented Apr 9, 2019

A slightly cheeky option would be to use <about:default-graph> as the IRI of the default graph. Note that's not a prefixed name; it's an IRI using the about: scheme. The RFC defining the scheme states:

This document describes the "about" URI scheme, which is widely used by Web browsers and some other applications to designate access to their internal resources, such as settings, application information, hidden built-in functionality, and so on.

That seems close enough to cover the case of the default graph. Currently, the scheme is used in web browsers as URL for special pages like about:blank and about:config. There is a an IANA registry for the blank/config part, but browser vendors generally don't seem to bother with registration.

@lisp
Copy link
Contributor

lisp commented Apr 9, 2019

a concern about using a registered scheme is that there would be a temptation to use it as the name for a concrete graph.

@cygri
Copy link

cygri commented Apr 10, 2019

@lisp I don't understand what point you are trying to make.

@lisp
Copy link
Contributor

lisp commented Apr 10, 2019

how would it work out if a quad import were to include the following,

<http://example.org/s>  <http://example.org/p> "o" <about:all-graphs> .

?

@cygri
Copy link

cygri commented Apr 10, 2019

@lisp I proposed the IRI <about:default-graph> as a name for the default graph. I don't see how your question is related to that proposal.

@kasei
Copy link
Collaborator

kasei commented Apr 10, 2019

@lisp I proposed the IRI <about:default-graph> as a name for the default graph. I don't see how your question is related to that proposal.

I think the concern here is what and endpoint should do if about:default-graph was found as a graph name in real world data. Should it just be hidden by the endpoint's own use of that graph name as special? If an existing quad store had such a graph name, could a SPARQL Update processor do anything with the actual named graph as opposed to the default graph it (also) referenced?

This might not be a problem for systems that have a single (named) graph that is identified internally as the default graph for SPARQL purposes, but SPARQL also supports systems where the default graph isn't just a normal graph internally. For example, the default graph can also act as a union of some or all named graphs. Hard to see what would be the correct behavior for these systems if there's a collisions between a graph name and a special name used to identify the default graph.

@cygri
Copy link

cygri commented Apr 10, 2019

I think the concern here is what and endpoint should do if about:default-graph was found as a graph name in real world data.

You mean loading an N-Quads file that contains triples in a graph named <about:default-graph>? That IRI names the default graph. So the triples should go into the default graph.

If an existing quad store had such a graph name,

You mean someone used <about:default-graph> as a graph name in their SPARQL 1.1 graph store? Well, in that case, the vendor will receive a support request from a very confused customer who just upgraded their graph store software and now the data from one of their graphs is gone. The response from the vendor will be that the customer should have known better than to use that graph name. And also that they should have read the upgrade instructions where it was clearly mentioned that any existing graph named <about:default-graph> must be removed before upgrading.

SPARQL also supports systems where the default graph isn't just a normal graph internally. For example, the default graph can also act as a union of some or all named graphs.

Such systems should treat the named graph <about:default-graph> exactly like they currently treat the default graph.

@lisp
Copy link
Contributor

lisp commented Apr 10, 2019

Such systems should treat the named graph about:default-graph exactly like they currently treat the default graph.

the broader proposal is that there be some standard syntactical elements which designate all three of the distinguished cases. the example with a term analogous to the proposed "default" term demonstrates the problem(s) which would ensue from using an otherwise legitimate iri.

we do now something which is analogous to the proposed iri.
it is not a good idea.
that due to the situation described, above.
it is done now, exactly because it requires no change to sparql syntax.
given the latitude to consider alternatives, one which is less likely to confuse is much to be recommended.

@cygri
Copy link

cygri commented Apr 11, 2019

@lisp You have an interesting way of expressing yourself. Would you please humour me and say that again in simple English?

@jindrichmynarz
Copy link

jindrichmynarz commented Apr 11, 2019

Since any IRI can be used to identify a named graph, SPARQL 1.2 would have to decide what to do when IRIs reserved for default graph or union graph are found in user data, such as when loading quads containing the reserved IRIs. The IRI reserved for default graph can be effectively ignored, but the union graph IRI (e.g., <about:all-graphs>) doesn't have a straightforward interpretation.

These decisions can be avoided if default graph and union graph are not identified via IRIs but via dedicated keywords, such as in SPARQL 1.1 Update. The cost of this approach is breaking changes to SPARQL syntax.

@cygri
Copy link

cygri commented Apr 11, 2019

Thank you, @jindrichmynarz. Note that there was no suggestion to introduce IRIs for anything but the default graph. Pointing out problems with introducing an IRI for something else doesn't demonstrate problems with introducing an IRI for the default graph.

Allowing the DEFAULT keyword in more places is a partial solution. For example, it would allow “switching back” to the default graph deep in a nested query, which is currently impossible:

GRAPH ?g { ... GRAPH DEFAULT { ... } ... }

However, it doesn't solve other aspects. For example, take a parameterised query (see #57) where the target graph is supposed to be a parameter. This currently requires elaborate special casing in the query to support the default graph as target, and GRAPH DEFAULT doesn't help.

@afs
Copy link
Collaborator

afs commented Apr 11, 2019

For me, the baseline choice is keywords; other proposals have to offer some advantage overall.

I prefer using keywords because of the issues around use of URIs, not just in quads but also they aren't naming the same graph across datasets. Taking a special prefix name is also a possibility (and it isn't defined by PREFIX; if it is, it isn't special) but that looks more like an unusual way to write a keyword.

I can see wanting to say "the default graph is " (the URI it actually is, not a placeholder) but that does not make all default graphs that .

Nearby: default-graph-uri in the protocol.

@cygri
Copy link

cygri commented Apr 12, 2019

Using IRIs to refer to local resources is fine, and is done all the time—<file://...>, <http://localhost...>, <about:config> in web browsers.

The purpose of <about:default-graph> is to allow query writers to refer to the local default graph. Why is it a problem that it refers to default graphs with different contents in different datasets?

@afs
Copy link
Collaborator

afs commented Apr 12, 2019

Not a problem as breakage rather than confusion when the URI is used in data as mentioned up-thread.

I don't see a strong connection to <about:config> because the URL bar has various capabilities. That because some systems use poor URIs, we ought to, rather I see that as a factor against when looking at the balance of options because "cool URIs"

A similar oddity is created with GRAPH ?g { } evaluating to the named graphs of the dataset.

To combine with templating, an indirection through a keyword DEFAULT could actually be helpful. The query text says DEFAULT and the execution setup says "DEFAULT is <uri>" and it also applies to the query outside GRAPHbut this isn't needed at all, the protocol does this withdefault-graph-uri` for example, as it is about the formation of the dataset.

To make GRAPH ?g focus on the default graph can be done when the default graph also has a regular URI name in the collection of all graphs available to be queried. default-graph-uri and FROM both take for an actual graph.

This is then fits with the UNION feature #59.

@cygri
Copy link

cygri commented Apr 12, 2019

I do acknowledge the problem that <about:default-graph> would have to appear in the result of GRAPH ?g {}, and we probably wouldn't want that.

You mention a DEFAULT keyword. Where would that appear in the query? Do you mean as GRAPH DEFAULT {...} in a graph pattern? I said above that this wouldn't help with parameterised queries where the target graph is a parameter.

Specifying the dataset via FROM/FROM NAMED or their protocol counterparts is not really an option when working with a system that relies heavily on named graphs. In our product, when working with a named graph ?userGraph containing some user data, queries often do things like:

BIND (tq:graphWithImports(?userGraph) AS ?dataGraph)
BIND (tq:metadataGraph(?userGraph) AS ?metadataGraph)

which produces IRIs of virtual or system-managed graphs, and the query then casually jumps back and forth between those graphs using GRAPH ?dataGraph {} and GRAPH ?metadataGraph {}. If we used FROM/FROM-NAMED or *-graph-iri, it would “wipe out” all these other graphs from the dataset. So we don't use these forms.

Maybe you are getting at something like this?

PREFIX my: <...>
DEFAULT GRAPH IS ALSO my:DefaultGraph
SELECT ... {
    ...
    GRAPH my:DefaultGraph { ... }
    ...
}

where DEFAULT GRAPH IS ALSO makes the existing default graph available as an additional named graph in the dataset, with an IRI chosen by the query author? That seems like it could be a solution, although it is rather byzantine.

@dbooth-boston
Copy link
Collaborator Author

For me, the baseline choice is keywords; other proposals have to offer some advantage overall.

A clear advantage of a URI over a keyword is that a URI allows all graphs to be identified uniformly, using the same syntax, rather than having to special-case the default graph.

[<about:default-graph> is not] naming the same graph across datasets.

Yes, but that is also true of the DEFAULT keyword (or in quads, the lack of a graph URI): it is not naming the same graph across datasets. That is intentional, just as it is for certain URI schemes, such as the about: and file: schemes.

To quote from RFC 3986:

URIs have a global scope and are interpreted consistently regardless
of context, though the result of that interpretation may be in
relation to the end-user's context. For example, "http://localhost/"
has the same interpretation for every user of that reference, even
though the network interface corresponding to "localhost" may be
different for each end-user: interpretation is independent of access.
However, an action made on the basis of that reference will take
place in relation to the end-user's context, which implies that an
action intended to refer to a globally unique thing must use a URI
that distinguishes that resource from all other things. URIs that
identify in relation to the end-user's local context should only be
used when the context itself is a defining aspect of the resource,
such as when an on-line help manual refers to a file on the end-
user's file system (e.g., "file:///etc/hosts").

Identifying the default graph "in relation to the end-user's local context" is exactly the desired behavior in this case, and that is what a URI like <about:default-graph> offers.

In summary, I do not see any semantic benefit in using a keyword instead of a URI, but I do see a downside, because of the special casing that it requires.

On the other hand, maybe "DEFAULT" would be convenient as syntactic sugar for the URI, just as Turtle allows "a" as syntactic sugar for rdf:type.

@JervenBolleman
Copy link
Collaborator

We can also think of a standard location derived IRI for these graph names.

Assuming a public endpoint e.g. https://sparql.rhea-db.org/sparql
Then we could have as UNION graph
[https://sparql.rhea-db.org/sparql/?graph=union]
And for the default.
[https://sparql.rhea-db.org/sparql/?graph=default]

The idea here is that in a IRI form space of the sparql protocol it is very unlikely that these IRIs will have been minted and in use.

@namedgraph
Copy link

Why not use ?default from GSP Indirect Graph Identification?

@kasei
Copy link
Collaborator

kasei commented Jun 6, 2019

Why not use ?default from GSP Indirect Graph Identification?

I think using an IRI with ?default in the query string might cause problems for deployments where there are multiple endpoints for a single service. At that point, you'd have multiple IRIs all being used to identify the single default graph, and the underlying SPARQL engine might not even be aware of which (if any) of those IRIs represented valid endpoints.

@afs
Copy link
Collaborator

afs commented Jun 7, 2019

A keyword makes more sense to be for the GRAPH DEFAULT use case, used inside a query to switch back to the default graph. Using a specific URI, of whatever form, makes a need to explain the local meaning and why it is not in the list of named graphs.

For the protocol parameterization use case in #57, a URI is more convenient.

In summary - both. DEFAULT as surface syntax, and a URI for parameterization from outside the query syntax.

@abrokenjester
Copy link
Collaborator

abrokenjester commented Jun 23, 2020

RDF4J has a special constant sesame:NIL that refers to null named graph. Would be nice to have a more standard identifier.

I'm a bit late to the party, thanks for mentioning this though (we've actually renamed it to rdf4j:nil though both work update is on the to do list - see eclipse-rdf4j/rdf4j#2401). I should also point out that in addition RDF4J also accepts the DEFAULT keyword. I'm kinda with @afs here that there's room for both a keyword and an IRI.

As an aside: we named our IRI constant sesame:nil rather than sesame:default because we wanted to be very explicit about the fact that it references the unnamed ("nil") graph, that is, all statements for which the backing database has no named graph information available. The term "default graph" is a more flexible concept that, depending on database implementation defaults, can contain only those 'ungraphed' statements, or can contain some union of everything available in the store (including statements from all named graphs) (edit I should be more clear here that what is implementation-dependent is how the default/implicit dataset is defined: its default graph can be configured in multiple ways).

That being said I'm not against using the term 'default' for what we're trying to do here. I just wanted to call out that we'll need to be clear that it means default graph in a non-ambiguous way.

@tiffoknee
Copy link

@jeenbroekstra - what's the prefix for rdf4j:nil? I used sesame but it points to a dead site that sells coupons, and that seems wrong..

@abrokenjester
Copy link
Collaborator

@tiffoknee ah, apologies, I was mistaken when I said we'd renamed it. It's still sesame:nil (full URI is http://www.openrdf.org/schema/sesame#nil). There's an open ticket for us to change it to a more up-to-date name and URI (see eclipse-rdf4j/rdf4j#2401).

@tiffoknee
Copy link

Ah ok - thanks. I found that (as a noob I expect I do inexplicable things) if I do "select * from default" it flags up a syntax error but does actually work.. Probably this is bad? I don't know.

@abrokenjester
Copy link
Collaborator

Ah ok - thanks. I found that (as a noob I expect I do inexplicable things) if I do "select * from default" it flags up a syntax error but does actually work.. Probably this is bad? I don't know.

That sounds like a minor bug in the SPARQL editor in the RDF4J workbench. Thanks for pointing this out, issue logged as eclipse-rdf4j/rdf4j#2421. I suggest we take further discussion of this and other RDF4J-specific problems to the RDF4J mailinglist and/or issue tracker.

abrokenjester added a commit to metaphacts/sparql-12 that referenced this issue Sep 17, 2020
abrokenjester added a commit to metaphacts/sparql-12 that referenced this issue Sep 17, 2020
abrokenjester added a commit to metaphacts/sparql-12 that referenced this issue Sep 17, 2020
@afs
Copy link
Collaborator

afs commented Sep 17, 2020

Apache Jena URIs:

urn:x-arq:DefaultGraph
urn:x-arq:UnionGraph

These are accessed through functions in the SPARQL engine so changing or adding them should be not too disruptive.

Mild advantage of "urn"; is that it is not HTTP-dereferencable.

@lisp
Copy link
Contributor

lisp commented Sep 17, 2020

dydra recognizes the following uris :

urn:dydra:default
urn:dydra:named
urn:dydra:all
urn:dydra:none

with the intended behaviour :

  • in graph content, they are constant terms.
  • as a SourceSelector they act as designators
  • as a VarOrIRI in a GraphGraphPattern, when constants, they act as designators
  • in the last position, when bound to a variable, they act as terms.

although the last two cases are not obvious, on one hand, i do not recall any occasion to explain the distinction, but, on the other, i have not checked if the situation arises in any actual repository and/or inline in any query.

abrokenjester added a commit to metaphacts/sparql-12 that referenced this issue Sep 18, 2020
@abrokenjester
Copy link
Collaborator

abrokenjester commented Sep 18, 2020

Mild advantage of "urn"; is that it is not HTTP-dereferencable.

I had a look at that recently when considering an updated IRI for RDF4J for this purpose, but I believe the idea of using urn:**x-something** for experimental / unregistered namespaces has been deprecated now, so I am not sure we are in a position to sanction a "official" urn for this purpose, unless we can use a registered IANA namespace for this. Does W3C perhaps have one that we can use?

@kasei
Copy link
Collaborator

kasei commented Sep 18, 2020

I believe the idea of using urn:**x-something** for experimental / unregistered namespaces has been deprecated now, so I am not sure we are in a position to sanction a "official" urn for this purpose

A tag URI would serve the same purpose and could be placed under a w3 authority.

@afs
Copy link
Collaborator

afs commented Sep 24, 2020

My understanding is that x-*, for URNs and HTTP headers is deprecated is that transition from "experimental" to "agreed" is painful. Instead, the style is "just do it" and register when agreed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests