Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for XPath scalar functions - xpath,xpath_string,xpath_boolean,xpath_double #9219

Closed
wants to merge 2 commits into from

Conversation

sajjoseph
Copy link
Contributor

@sajjoseph sajjoseph commented Sep 12, 2021

Supersedes #4024 Fixes #3888

Added support for the following scalar functions.

xpath - returns an array of strings that matched the criteria
xpath_string - returns a string that matched the criteria
xpath_boolean - returns the boolean value based on the xpath expression
xpath_double - returns a double-precision value after evaluating the xpath expression

Example:

trino> select xpath ('<a><b id="1"><c/></b><b id="2"><c/></b></a>','/descendant::c/ancestor::b/@id');
 _col0
--------
 [1, 2]
 
 trino> SELECT xpath_string ('<a><b>bb</b><c>cc</c></a>', 'a/b');
 _col0
-------
 bb


trino> SELECT xpath_double('<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>', 'sum(a/*)') ;
 _col0
-------
  15.0

Few more sample queries can be found here - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+XPathUDF

@cla-bot cla-bot bot added the cla-signed label Sep 12, 2021
@ebyhr ebyhr added needs-docs This pull request requires changes to the documentation syntax-needs-review labels Sep 12, 2021
@findepi findepi added the enhancement New feature or request label Sep 13, 2021
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left initial comments. I'm looking into other database behavior.

Please add the documentation.

The below comment about the argument order isn't yet resolved, so let me remind here.
#4024 (comment)

* Utility class for all XPath UDFs. Each UDF instance should keep an instance
* of this class.
*/
public class UDFXPathUtil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use "UDF" to represent functions.

I guess you copied this class from Hive repository. Please add the reference to javadoc.
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/xml/UDFXPathUtil.java

@ScalarFunction("xpath_boolean")
@LiteralParameters({"x", "y"})
@SqlType(StandardTypes.BOOLEAN)
public static Boolean xpath_boolean(@SqlType("varchar(y)") Slice xml, @SqlType("varchar(x)") Slice path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to xpathBoolean

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to change the value in @ScalarFunction annotation, but this method should be renamed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@ScalarFunction("xpath_double")
@LiteralParameters({"x", "y"})
@SqlType(StandardTypes.DOUBLE)
public static Double xpath_double(@SqlType("varchar(y)") Slice xml, @SqlType("varchar(x)") Slice path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to xpathDouble

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

@Test
public void testXpathString()
{
assertFunction("xpath_string('<a><b>bb</b><c>cc</c></a>','a/b')", VarcharType.createVarcharType(25), "bb");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static import createVarcharType().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.


@SqlNullable
@Description("Returns the text contents of the first xml node that matches the xpath expression")
@ScalarFunction("xpath_string")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of the annotation is redundant.

.map(nodeList::item)
.filter(field -> field.getNodeValue() != null)
.map(field -> field.getNodeValue())
.collect(Collectors.toList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use toImmutableList().

UDFXPathUtil xpath = new UDFXPathUtil();
NodeList nodeList = xpath.evalNodeList(xml, path);
if (nodeList == null) {
return Collections.<String>emptyList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return ImmutableList.of().

public static Slice xpathString(@SqlType("varchar(x)") Slice xml, @SqlType("varchar(y)") Slice path)
{
UDFXPathUtil xpath = new UDFXPathUtil();
String s = xpath.evalString(xml.toStringUtf8(), path.toStringUtf8());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use abbreviation for variable name.

{
assertFunction("xpath('<a><b id=\"foo\">b1</b><b id=\"bar\">b2</b></a>','//@id')",
new ArrayType(VARCHAR),
Arrays.asList("foo", "bar"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ImmutableList.of().

@ScalarFunction("xpath")
@LiteralParameters({"x", "y"})
@SqlType("array(varchar)")
public static Block xpath(@SqlType("varchar(y)") Slice xml, @SqlType("varchar(x)") Slice path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why x and y are flipped? Subsequent functions also.

@wgzhao
Copy link
Member

wgzhao commented Nov 24, 2021

@sajjoseph Are you still continuing the development of this feature? I need this feature in my work and I migrated the related functions in Hive using the same approach, can you continue to develop this PR and have it merged into the master branch

@mosabua
Copy link
Member

mosabua commented Oct 28, 2022

👋 @sajjoseph - this PR has become inactive. If you're still interested in working on it, please let us know. @ebyhr might have some insight how/if to proceed.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@tooptoop4
Copy link
Contributor

this lit 🔥

@colebow
Copy link
Member

colebow commented Nov 30, 2022

Closing this one out due to inactivity, but please reopen if you would like to pick this back up. It seems like there's a decent amount of interest here, so if anyone else would like to take this on and get it updated, go for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed enhancement New feature or request needs-docs This pull request requires changes to the documentation syntax-needs-review
Development

Successfully merging this pull request may close these issues.

Support for XPath functions
7 participants