Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement functions UCASE and LCASE #1060

Merged
merged 8 commits into from
Aug 21, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/engine/sparqlExpressions/NaryExpression.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,6 @@ SparqlExpression::Ptr makeYearExpression(SparqlExpression::Ptr child);
SparqlExpression::Ptr makeStrExpression(SparqlExpression::Ptr child);
SparqlExpression::Ptr makeStrlenExpression(SparqlExpression::Ptr child);

SparqlExpression::Ptr makeUppercaseExpression(SparqlExpression::Ptr child);
SparqlExpression::Ptr makeLowercaseExpression(SparqlExpression::Ptr child);
} // namespace sparqlExpression
28 changes: 28 additions & 0 deletions src/engine/sparqlExpressions/StringExpressions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,26 @@ inline auto strlen = [](std::optional<std::string> s) {
};
using StrlenExpression = StringExpressionImpl<1, decltype(strlen)>;

// LCASE
static auto lowercaseImpl = [](std::optional<std::string> input) -> IdOrString {
if (!input.has_value()) {
return Id::makeUndefined();
} else {
Comment on lines +78 to +80
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cascaded question: When exactly is the argument std.:nullopt, when the argument given in the SPARQL query was not actually a string? If yes, is the result of a built-in SPARQL function taking one or several string as argument always UNDEF when one of these arguments is not a string? If yes, should this be handled in the functions or outside? I am asking because for the numeric functions the return Id::makeUndefined happens elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a good idea.
For the Numeric functions we have a wrapper that handles the "invalid input" -> expression error -> UNDEF result case automagically,
But we could extend this to all kinds of expressions, s.t. for example the string function implementations only need to handle a string instead of an std::optional<std::string>.
I will think of something and integrate it into one of the next PRs.

return ad_utility::getLowercaseUtf8(input.value());
}
};
using LowercaseExpression = StringExpressionImpl<1, decltype(lowercaseImpl)>;

// UCASE
static auto uppercaseImpl = [](std::optional<std::string> input) -> IdOrString {
if (!input.has_value()) {
return Id::makeUndefined();
} else {
return ad_utility::getUppercaseUtf8(input.value());
}
};
using UppercaseExpression = StringExpressionImpl<1, decltype(uppercaseImpl)>;

} // namespace detail
using namespace detail;
SparqlExpression::Ptr makeStrExpression(SparqlExpression::Ptr child) {
Expand All @@ -72,4 +92,12 @@ SparqlExpression::Ptr makeStrExpression(SparqlExpression::Ptr child) {
SparqlExpression::Ptr makeStrlenExpression(SparqlExpression::Ptr child) {
return std::make_unique<StrlenExpression>(std::move(child));
}

SparqlExpression::Ptr makeLowercaseExpression(SparqlExpression::Ptr child) {
return std::make_unique<LowercaseExpression>(std::move(child));
}

SparqlExpression::Ptr makeUppercaseExpression(SparqlExpression::Ptr child) {
return std::make_unique<UppercaseExpression>(std::move(child));
}
} // namespace sparqlExpression
4 changes: 4 additions & 0 deletions src/parser/sparqlParser/SparqlQleverVisitor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1581,6 +1581,10 @@ ExpressionPtr Visitor::visit([[maybe_unused]] Parser::BuiltInCallContext* ctx) {
return createUnary(&makeStrExpression);
} else if (functionName == "strlen") {
return createUnary(&makeStrlenExpression);
} else if (functionName == "ucase") {
return createUnary(&makeUppercaseExpression);
} else if (functionName == "lcase") {
return createUnary(&makeLowercaseExpression);
} else if (functionName == "year") {
return createUnary(&makeYearExpression);
} else if (functionName == "month") {
Expand Down
14 changes: 14 additions & 0 deletions src/util/StringUtils.h
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,20 @@ std::string getLowercaseUtf8(std::string_view s) {
return result;
}

// Get the uppercase value. For details see `getLowercaseUtf8` above
inline std::string getUppercaseUtf8(std::string_view s) {
std::string result;
icu::StringByteSink<std::string> sink(&result);
UErrorCode err = U_ZERO_ERROR;
icu::CaseMap::utf8ToUpper(
"", 0, icu::StringPiece{s.data(), static_cast<int32_t>(s.size())}, sink,
nullptr, err);
if (U_FAILURE(err)) {
throw std::runtime_error(u_errorName(err));
}
return result;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the almost identical code, how about a helper function utf8StringTransform, which takes a std::string_view and the transformation function as arguments?

Also, it looks like it would be more consistent to name the two functions based on this helper function utf8StringToLower (instead of getLowercaseUtf8) and utf8StringToUpper (instead of getUppercaseUtf8).

/**
* @brief get a prefix of a utf-8 string of a specified length
*
Expand Down
2 changes: 2 additions & 0 deletions test/SparqlAntlrParserTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1266,6 +1266,8 @@ TEST(SparqlParser, builtInCall) {
auto expectBuiltInCall = ExpectCompleteParse<&Parser::builtInCall>{};
auto expectFails = ExpectParseFails<&Parser::builtInCall>{};
expectBuiltInCall("StrLEN(?x)", matchUnary(&makeStrlenExpression));
expectBuiltInCall("ucaSe(?x)", matchUnary(&makeUppercaseExpression));
expectBuiltInCall("lCase(?x)", matchUnary(&makeLowercaseExpression));
expectBuiltInCall("StR(?x)", matchUnary(&makeStrExpression));
expectBuiltInCall("year(?x)", matchUnary(&makeYearExpression));
expectBuiltInCall("month(?x)", matchUnary(&makeMonthExpression));
Expand Down
13 changes: 9 additions & 4 deletions test/StringUtilsTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,18 @@

using ad_utility::constantTimeEquals;
using ad_utility::getLowercaseUtf8;
using ad_utility::getUppercaseUtf8;
using ad_utility::getUTF8Substring;

TEST(StringUtilsTest, getLowercaseUtf8) {
setlocale(LC_CTYPE, "");
ASSERT_EQ("schindler's list", getLowercaseUtf8("Schindler's List"));
ASSERT_EQ("#+-_foo__bar++", getLowercaseUtf8("#+-_foo__Bar++"));
ASSERT_EQ("fôéßaéé", getLowercaseUtf8("FÔÉßaéÉ"));
EXPECT_EQ("schindler's list", getLowercaseUtf8("Schindler's List"));
EXPECT_EQ("#+-_foo__bar++", getLowercaseUtf8("#+-_foo__Bar++"));
EXPECT_EQ("fôéßaéé", getLowercaseUtf8("FÔÉßaéÉ"));
}
TEST(StringUtilsTest, getUppercaseUtf8) {
EXPECT_EQ("SCHINDLER'S LIST", getUppercaseUtf8("Schindler's List"));
EXPECT_EQ("#+-_BIMM__BAMM++", getUppercaseUtf8("#+-_bImM__baMm++"));
EXPECT_EQ("FÔÉSSAÉÉ", getUppercaseUtf8("FôéßaÉé"));
}

TEST(StringUtilsTest, getUTF8Substring) {
Expand Down