Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse complete SPARQL queries using ANTLR #790

Merged
merged 28 commits into from
Sep 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
b5823f5
Parse `ConstructQuery` and `Query` rules in ANTLR
Qup42 Sep 15, 2022
f6d0ca5
Add actual location of EXPECT failures for `ParseExceptionTest.cpp`
Qup42 Sep 15, 2022
0692f13
Generalize code
Qup42 Sep 15, 2022
780b6de
Code Review
Qup42 Sep 15, 2022
bf2ae75
Make Parsing of Sparql Query a static function of `SparqlParser`
Qup42 Sep 15, 2022
7a2e2b8
Inline `parseWithAntlr` in `SparqlParser`
Qup42 Sep 15, 2022
1f3c17f
Purge obsolete Code
Qup42 Sep 15, 2022
7940e12
Hopefully fix build for gcc12
Qup42 Sep 15, 2022
e62dbfa
Fix tests
Qup42 Sep 16, 2022
5520f34
Extract visible Variables into common base Class
Qup42 Sep 16, 2022
2b3b92c
Shorten test
Qup42 Sep 16, 2022
5d3166e
Generalize ParsedQuery for VisibleVariables being in all Clauses
Qup42 Sep 16, 2022
950b258
Enable more Checks for `ParsedQuery`
Qup42 Sep 16, 2022
ebd7ebc
Extract code
Qup42 Sep 16, 2022
6604b17
Code Review
Qup42 Sep 16, 2022
8ceca1e
Code Review
Qup42 Sep 16, 2022
7e31e9b
Add tests for `Query`
Qup42 Sep 16, 2022
ccde5e4
Code Review
Qup42 Sep 16, 2022
cacb100
Fix AD_CHECK
Qup42 Sep 16, 2022
886c1ef
Fix e2e tests
Qup42 Sep 16, 2022
7d52db2
Finally fix e2e tests
Qup42 Sep 16, 2022
d559fae
Apply suggestions from code review
Qup42 Sep 16, 2022
4d455b0
Code Review
Qup42 Sep 16, 2022
d77e5b7
Really fix e2e tests
Qup42 Sep 16, 2022
5ac04ef
Add Tests for `ConstructQuery` and complete Tests for `Query`
Qup42 Sep 16, 2022
d79707f
Re-add expression descriptor to Exception message in ParsedQuery checks
Qup42 Sep 16, 2022
a5629d8
Resolve introduced TODO
Qup42 Sep 16, 2022
c348f1e
Add extremely basic tests for visible Variables
Qup42 Sep 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ if ("${CMAKE_GENERATOR}" STREQUAL "Ninja")
endif ()

if (("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU") AND
(CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL "12") AND
(CMAKE_CXX_COMPILER_VERSION VERSION_LESS "12.1"))
(CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL "12") AND
(CMAKE_CXX_COMPILER_VERSION VERSION_LESS "12.2"))
message(STATUS "Adding -Wno-restrict for g++12.0 because of false positives")
add_compile_options(-Wno-restrict)
else()
Expand Down
6 changes: 3 additions & 3 deletions e2e/scientists_queries.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1118,10 +1118,10 @@ queries:
?t ql:contains-word "algo*"
}
checks:
- num_cols: 2
- num_cols: 3
- num_rows: 11
- selected: ["?x", "?t"]
- contains_row: ["<Grete_Hermann>","Hermann's algorithm for primary decomposition is still in use now."]
- selected: [ "?x", "?ql_textscore_t", "?t" ]
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
- contains_row: [ "<Grete_Hermann>",null,"Hermann's algorithm for primary decomposition is still in use now." ]


- query : select_asterisk_regex-lastname-stein
Expand Down
20 changes: 10 additions & 10 deletions src/engine/Server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -437,8 +437,8 @@ Awaitable<json> Server::composeResponseQleverJson(
query.hasSelectClause()
? qet.writeResultAsQLeverJson(query.selectClause(), limit, offset,
std::move(resultTable))
: qet.writeRdfGraphJson(query.constructClause(), limit, offset,
std::move(resultTable));
: qet.writeRdfGraphJson(query.constructClause().triples_, limit,
offset, std::move(resultTable));
requestTimer.stop();
}
j["resultsize"] = query.hasSelectClause() ? resultSize : j["res"].size();
Expand Down Expand Up @@ -510,11 +510,11 @@ Server::composeResponseSepValues(const ParsedQuery& query,
auto compute = [&] {
size_t limit = query._limitOffset._limit;
size_t offset = query._limitOffset._offset;
return query.hasSelectClause()
? qet.generateResults<format>(query.selectClause(), limit,
offset)
: qet.writeRdfGraphSeparatedValues<format>(
query.constructClause(), limit, offset, qet.getResult());
return query.hasSelectClause() ? qet.generateResults<format>(
query.selectClause(), limit, offset)
: qet.writeRdfGraphSeparatedValues<format>(
query.constructClause().triples_,
limit, offset, qet.getResult());
};
return computeInNewThread(compute);
}
Expand All @@ -530,8 +530,8 @@ ad_utility::streams::stream_generator Server::composeTurtleResponse(
}
size_t limit = query._limitOffset._limit;
size_t offset = query._limitOffset._offset;
return qet.writeRdfGraphTurtle(query.constructClause(), limit, offset,
qet.getResult());
return qet.writeRdfGraphTurtle(query.constructClause().triples_, limit,
offset, qet.getResult());
}

// _____________________________________________________________________________
Expand Down Expand Up @@ -646,7 +646,7 @@ boost::asio::awaitable<void> Server::processQuery(
<< (pinResult ? " [pin result]" : "")
<< (pinSubtrees ? " [pin subresults]" : "") << "\n"
<< query << std::endl;
ParsedQuery pq = SparqlParser(query).parse();
ParsedQuery pq = SparqlParser::parseQuery(query);

// The following code block determines the media type to be used for the
// result. The media type is either determined by the "Accept:" header of
Expand Down
6 changes: 5 additions & 1 deletion src/parser/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ add_library(parser
SparqlParserHelpers.h SparqlParserHelpers.cpp
TripleComponent.h
GraphPatternOperation.cpp
PropertyPath.h PropertyPath.cpp Alias.h data/SolutionModifiers.h data/LimitOffsetClause.h data/SparqlFilter.h data/SparqlFilter.cpp data/OrderKey.h data/GroupKey.h ParseException.cpp SelectClause.cpp SelectClause.h GraphPatternOperation.cpp GraphPatternOperation.h GraphPattern.cpp GraphPattern.h)
PropertyPath.h PropertyPath.cpp Alias.h data/SolutionModifiers.h
data/LimitOffsetClause.h data/SparqlFilter.h data/SparqlFilter.cpp
data/OrderKey.h data/GroupKey.h ParseException.cpp SelectClause.cpp
SelectClause.h GraphPatternOperation.cpp GraphPatternOperation.h
GraphPattern.cpp GraphPattern.h ConstructClause.h)
target_link_libraries(parser sparqlParser parserData sparqlExpressions rdfEscaping re2 absl::flat_hash_map util)

30 changes: 30 additions & 0 deletions src/parser/ConstructClause.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
// Copyright 2022, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Julian Mundhahs ([email protected])

#pragma once

#include "parser/SelectClause.h"
#include "parser/data/Types.h"

namespace parsedQuery {
struct ConstructClause : ClauseBase {
ad_utility::sparql_types::Triples triples_;

ConstructClause() = default;
explicit ConstructClause(ad_utility::sparql_types::Triples triples)
: triples_(std::move(triples)) {}

// Yields all variables that appear in this `ConstructClause`. Variables that
// appear multiple times are also yielded multiple times.
cppcoro::generator<const Variable> containedVariables() const {
for (const auto& triple : triples_) {
for (const auto& varOrTerm : triple) {
if (auto variable = std::get_if<Variable>(&varOrTerm)) {
co_yield *variable;
}
}
}
}
};
} // namespace parsedQuery
125 changes: 84 additions & 41 deletions src/parser/ParsedQuery.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,6 @@ using std::vector;
string ParsedQuery::asString() const {
std::ostringstream os;

// PREFIX
os << "PREFIX: {";
for (size_t i = 0; i < _prefixes.size(); ++i) {
os << "\n\t" << _prefixes[i].asString();
if (i + 1 < _prefixes.size()) {
os << ',';
}
}
os << "\n}";

bool usesSelect = hasSelectClause();
bool usesAsterisk = usesSelect && selectClause().isAsterisk();

Expand All @@ -56,7 +46,7 @@ string ParsedQuery::asString() const {
os << "{";
}
} else if (hasConstructClause()) {
const auto& constructClause = this->constructClause();
const auto& constructClause = this->constructClause().triples_;
os << "\n CONSTRUCT {\n\t";
for (const auto& triple : constructClause) {
os << triple[0].toSparql();
Expand Down Expand Up @@ -130,24 +120,35 @@ Variable ParsedQuery::addInternalBind(

// ________________________________________________________________________
void ParsedQuery::addSolutionModifiers(SolutionModifiers modifiers) {
// Process groupClause
// TODO<qup42, joka921> Check that all variables that are part of an
// expression that is grouped on are visible in the Query Body.
auto processVariable = [this](const Variable& groupKey) {
// TODO: implement for `ConstructClause`
if (hasSelectClause()) {
if (!ad_utility::contains(selectClause().getVisibleVariables(),
groupKey)) {
throw ParseException(
"Variable " + groupKey.name() +
" was used in an GROUP BY but is not visible in the query body.");
}
auto checkVariableIsVisible = [this](const Variable& var,
const std::string& locationDescription) {
if (!ad_utility::contains(getVisibleVariables(), var)) {
throw ParseException("Variable " + var.name() + " was used in " +
locationDescription +
", but is not visible in the Query Body.");
}
};
auto checkUsedVariablesAreVisible =
[&checkVariableIsVisible](
const sparqlExpression::SparqlExpressionPimpl& expression,
const std::string& locationDescription) {
for (const auto* var : expression.containedVariables()) {
checkVariableIsVisible(*var, locationDescription + " in Expression " +
expression.getDescriptor());
}
};

// Process groupClause
auto processVariable = [this,
&checkVariableIsVisible](const Variable& groupKey) {
checkVariableIsVisible(groupKey, "GROUP BY");

_groupByVariables.emplace_back(groupKey.name());
};
auto processExpression =
[this](sparqlExpression::SparqlExpressionPimpl groupKey) {
[this, &checkUsedVariablesAreVisible](
sparqlExpression::SparqlExpressionPimpl groupKey) {
checkUsedVariablesAreVisible(groupKey, "Group Key");
auto helperTarget = addInternalBind(std::move(groupKey));
_groupByVariables.emplace_back(helperTarget.name());
};
Expand All @@ -167,21 +168,27 @@ void ParsedQuery::addSolutionModifiers(SolutionModifiers modifiers) {
}

// Process havingClause
// TODO<joka921, qup42> as soon as FILTER and HAVING support proper
// expressions, also add similar sanity checks for the HAVING clause here.
_havingClauses = std::move(modifiers.havingClauses_);
Qup42 marked this conversation as resolved.
Show resolved Hide resolved

// Process orderClause
// TODO<qup42, joka921> Check that all variables that are part of an
// expression that is ordered on are visible in the Query Body.
auto processVariableOrderKey = [this](VariableOrderKey orderKey) {
auto processVariableOrderKey = [this, &checkVariableIsVisible](
VariableOrderKey orderKey) {
// Check whether grouping is done. The variable being ordered by
// must then be either grouped or the result of an alias in the select.
const vector<Variable>& groupByVariables = _groupByVariables;
if (!groupByVariables.empty() &&
!ad_utility::contains(groupByVariables, orderKey.variable_) &&
!ad_utility::contains_if(selectClause().getAliases(),
[&orderKey](const Alias& alias) {
return alias._target == orderKey.variable_;
})) {
if (groupByVariables.empty()) {
checkVariableIsVisible(orderKey.variable_, "ORDERY BY");
} else if (!ad_utility::contains(groupByVariables, orderKey.variable_) &&
// `ConstructClause` has no Aliases. So the variable can never be
// the result of an Alias.
(hasConstructClause() ||
!ad_utility::contains_if(selectClause().getAliases(),
[&orderKey](const Alias& alias) {
return alias._target ==
orderKey.variable_;
}))) {
throw ParseException(
"Variable " + orderKey.variable_.name() +
" was used in an ORDER BY "
Expand All @@ -195,8 +202,10 @@ void ParsedQuery::addSolutionModifiers(SolutionModifiers modifiers) {
// QLever currently only supports ordering by variables. To allow
// all `orderConditions`, the corresponding expression is bound to a new
// internal variable. Ordering is then done by this variable.
auto processExpressionOrderKey = [this](ExpressionOrderKey orderKey) {
if (!_groupByVariables.empty())
auto processExpressionOrderKey = [this, &checkUsedVariablesAreVisible](
ExpressionOrderKey orderKey) {
checkUsedVariablesAreVisible(orderKey.expression_, "Order Key");
if (!_groupByVariables.empty()) {
// TODO<qup42> Implement this by adding a hidden alias in the
// SELECT clause.
throw ParseException(
Expand All @@ -206,6 +215,7 @@ void ParsedQuery::addSolutionModifiers(SolutionModifiers modifiers) {
"\"). Please assign this expression to a "
"new variable in the SELECT clause and then order by this "
"variable.");
}
auto additionalVariable = addInternalBind(std::move(orderKey.expression_));
_orderBy.emplace_back(additionalVariable, orderKey.isDescending_);
};
Expand All @@ -219,6 +229,8 @@ void ParsedQuery::addSolutionModifiers(SolutionModifiers modifiers) {
// Process limitOffsetClause
_limitOffset = modifiers.limitOffset_;

// Check that the query is valid

auto checkAliasOutNamesHaveNoOverlapWith =
[this](const auto& container, const std::string& message) {
for (const auto& alias : selectClause().getAliases()) {
Expand All @@ -228,8 +240,6 @@ void ParsedQuery::addSolutionModifiers(SolutionModifiers modifiers) {
}
};

// Check that the query is valid

if (hasSelectClause()) {
if (!_groupByVariables.empty()) {
ad_utility::HashSet<string> groupVariables{};
Expand Down Expand Up @@ -287,15 +297,27 @@ void ParsedQuery::addSolutionModifiers(SolutionModifiers modifiers) {
throw ParseException("The variable name " + a._target.name() +
" used in an alias was already selected on.");
}
// TODO<qup42, joka921> Check that all variables used in the expression of
// Aliases are visible in the QueryBody.

checkUsedVariablesAreVisible(a._expression, "Alias");
}
} else if (hasConstructClause()) {
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
if (_groupByVariables.empty()) {
return;
}

for (const auto& variable : constructClause().containedVariables()) {
if (!ad_utility::contains(_groupByVariables, variable)) {
throw ParseException("Variable " + variable.name() +
" is used but not "
"aggregated despite the query not being "
"grouped by " +
variable.name() + ".");
}
}
}
}

void ParsedQuery::merge(const ParsedQuery& p) {
_prefixes.insert(_prefixes.begin(), p._prefixes.begin(), p._prefixes.end());

auto& children = _rootGraphPattern._graphPatterns;
auto& otherChildren = p._rootGraphPattern._graphPatterns;
children.insert(children.end(), otherChildren.begin(), otherChildren.end());
Expand All @@ -305,6 +327,27 @@ void ParsedQuery::merge(const ParsedQuery& p) {
_rootGraphPattern.recomputeIds(&_numGraphPatterns);
}

// _____________________________________________________________________________
const std::vector<Variable>& ParsedQuery::getVisibleVariables() const {
return std::visit(&parsedQuery::ClauseBase::getVisibleVariables, _clause);
}

// _____________________________________________________________________________
void ParsedQuery::registerVariablesVisibleInQueryBody(
const vector<Variable>& variables) {
for (const auto& var : variables) {
registerVariableVisibleInQueryBody(var);
}
}

// _____________________________________________________________________________
void ParsedQuery::registerVariableVisibleInQueryBody(const Variable& variable) {
auto addVariable = [&variable](auto& clause) {
clause.addVisibleVariable(variable);
};
std::visit(addVariable, _clause);
}
Comment on lines +345 to +349
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto addVariable = [&variable](auto& clause) {
clause.addVisibleVariable(variable);
};
std::visit(addVariable, _clause);
}
std::visit(&parsedQuery::ClauseBase::addvisibleVariable, _clause);
}

I thought I suggested this before, but it seems like I didn't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did but i had to revert it because it does not work. The lambda also captures the variable that is passed to addVisibleVariable.


// _____________________________________________________________________________
void ParsedQuery::GraphPattern::toString(std::ostringstream& os,
int indentation) const {
Expand Down
Loading