-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for XPath scalar functions - xpath,xpath_string,xpath_boolean,xpath_double #9219
Conversation
f67aeea
to
0236dd6
Compare
…ath_string,xpath_boolean,xpath_double
8fa3f44
to
f539da3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left initial comments. I'm looking into other database behavior.
Please add the documentation.
The below comment about the argument order isn't yet resolved, so let me remind here.
#4024 (comment)
* Utility class for all XPath UDFs. Each UDF instance should keep an instance | ||
* of this class. | ||
*/ | ||
public class UDFXPathUtil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use "UDF" to represent functions.
I guess you copied this class from Hive repository. Please add the reference to javadoc.
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/xml/UDFXPathUtil.java
@ScalarFunction("xpath_boolean") | ||
@LiteralParameters({"x", "y"}) | ||
@SqlType(StandardTypes.BOOLEAN) | ||
public static Boolean xpath_boolean(@SqlType("varchar(y)") Slice xml, @SqlType("varchar(x)") Slice path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to xpathBoolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to change the value in @ScalarFunction
annotation, but this method should be renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@ScalarFunction("xpath_double") | ||
@LiteralParameters({"x", "y"}) | ||
@SqlType(StandardTypes.DOUBLE) | ||
public static Double xpath_double(@SqlType("varchar(y)") Slice xml, @SqlType("varchar(x)") Slice path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to xpathDouble
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
@Test | ||
public void testXpathString() | ||
{ | ||
assertFunction("xpath_string('<a><b>bb</b><c>cc</c></a>','a/b')", VarcharType.createVarcharType(25), "bb"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static import createVarcharType()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
|
||
@SqlNullable | ||
@Description("Returns the text contents of the first xml node that matches the xpath expression") | ||
@ScalarFunction("xpath_string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of the annotation is redundant.
.map(nodeList::item) | ||
.filter(field -> field.getNodeValue() != null) | ||
.map(field -> field.getNodeValue()) | ||
.collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use toImmutableList()
.
UDFXPathUtil xpath = new UDFXPathUtil(); | ||
NodeList nodeList = xpath.evalNodeList(xml, path); | ||
if (nodeList == null) { | ||
return Collections.<String>emptyList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return ImmutableList.of()
.
public static Slice xpathString(@SqlType("varchar(x)") Slice xml, @SqlType("varchar(y)") Slice path) | ||
{ | ||
UDFXPathUtil xpath = new UDFXPathUtil(); | ||
String s = xpath.evalString(xml.toStringUtf8(), path.toStringUtf8()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use abbreviation for variable name.
{ | ||
assertFunction("xpath('<a><b id=\"foo\">b1</b><b id=\"bar\">b2</b></a>','//@id')", | ||
new ArrayType(VARCHAR), | ||
Arrays.asList("foo", "bar")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ImmutableList.of()
.
@ScalarFunction("xpath") | ||
@LiteralParameters({"x", "y"}) | ||
@SqlType("array(varchar)") | ||
public static Block xpath(@SqlType("varchar(y)") Slice xml, @SqlType("varchar(x)") Slice path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why x
and y
are flipped? Subsequent functions also.
@sajjoseph Are you still continuing the development of this feature? I need this feature in my work and I migrated the related functions in Hive using the same approach, can you continue to develop this PR and have it merged into the master branch |
👋 @sajjoseph - this PR has become inactive. If you're still interested in working on it, please let us know. @ebyhr might have some insight how/if to proceed. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
this lit 🔥 |
Closing this one out due to inactivity, but please reopen if you would like to pick this back up. It seems like there's a decent amount of interest here, so if anyone else would like to take this on and get it updated, go for it. |
Supersedes #4024 Fixes #3888
Added support for the following scalar functions.
xpath - returns an array of strings that matched the criteria
xpath_string - returns a string that matched the criteria
xpath_boolean - returns the boolean value based on the xpath expression
xpath_double - returns a double-precision value after evaluating the xpath expression
Example:
Few more sample queries can be found here - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+XPathUDF