-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query result ordering for ungrouped models #128
Comments
Note: Using (a variation of) the query constructor for grouped models is NOT an option! The idea of the constructor for grouped models is to retrieve however many result rows there are for a limited number of entities (see #114) - this relies on So using the query constructor for grouped models on an ungrouped model would certainly lead to incorrect pagination. E.g. if one used the grouped model query constructor for a query select ?s ?p ?o
where {
?s ?p ?o .
{
select distinct ?s
where {
?s ?p ?o .
}
order by ?s
limit 10
offset 10
}
} for page 2 of size 10 (if somehow However, since the I think there is no way around ordering over the entire graph for ungrouped model definitions. Backend implementers should be able to apply custom result ordering though. Whether an ORDER BY solution modifier should be permissible for incoming queries or custom ordering should be specified in the Also, I think ungrouped queries over large graphs are kind of an edge case for |
This needs partial revision. Another - computationally less costly and probably better - default would be to order by the first SPARQL variable of the projection. Given a query select *
where {
?s ?p ?o .
}
order by ?s
limit 2
offset 0 |
Possible implementation for extracting SPARQL projections: def get_query_projection(query: str) -> list[str]:
query = parseQuery(query)[1]
try:
projection = query["projection"] # type: ignore
return [attrgetter("var")(var) for var in projection]
except KeyError:
triples = query["where"]["part"][0]["triples"] # type: ignore
projection = dict.fromkeys(
i for i in chain.from_iterable(triples) if isinstance(i, Variable)
)
return list(projection) Interaction with raw |
rdfproxy
injects an ORDER BY solution modifier into SPARQL queries for grouped models, see #114.The query constructor for SPARQL queries associated with ungrouped models currently does not inject an ORDER BY modifier - and I am not entirely decided if it even has to.
For a query associated with an ungrouped model
the constructor for page 1 of size 2 currently constructs:
I think that a purely SPARQL-based pagination mechanism (i.e. the pagination mechanism of
rdfproxy
) might not be able to guarantee static and reproducible page results without ORDER BY - but this also depends on the Triplestore and possible Triplestore indexing which might apply ordering across LIMIT/OFFSET queries.Anyway, I think that queries associated with ungrouped models should have ORDER BY modifiers injected.
This raises two problems however:
One solution to this is to simply ORDER BY all variables of the projection, utilizing the fact that the ORDER BY solution modifier takes a sequence of order comparators (SPARQL 1.1 Spec).
I.e. for the above query example the constructor should generate:
The implications of this approach need discussion and careful evaluation though.
E.g. I am currently not sure if the
rdflib
implementation of SPARQL algebra allows to retrieve ordered variables for "SELECT *" clauses.returns
"?s ?o ?p"
.How meaningful is the order of ORDER BY comparators??
Unlike the ordering of grouped model queries, the SPARQL result set is not reduced by a subquery first.
This means that ordering ungrouped model queries means ordering over the entire graph.
Even for medium-sized graphs this can mean a notable performance penalty, especially when ordering by a sequence of comparators.
The text was updated successfully, but these errors were encountered: