-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch the library used in regex function by calling #768
Comments
Just FYI, the characteristics is derived from their algorithms and illustrated on for example this http://lh3lh3.users.sourceforge.net/reb.shtml |
Both libraries define the same set of functions, so they cannot be loaded together. It could be solved with #8. However maybe we could have something like |
Regarding the use cases we are considering, it's enough to define the different functions. But as you said, it's not good in terms of the namespace convention. @martint Do you think it's acceptable to create aliases |
To be exact, joni defines something like |
This is problematic as Presto functions do not have access to system session properties.
That would work. |
What if we can cast the regex to |
In the short term, yes, #8 would be a way around it, either by defining the function in different namespaces and adding the desired on the the SQL PATH or by allowing functions to take session properties. In the long term, we actually want to get rid of the JONI vs RE2J distinction if we can. We just haven't spent the time to make RE2J faster in every scenario. Adding the two implementation-specific aliases is problematic because it's harder to remove them in the future without breaking compatibility. If we were to expose different regex functions, it'd be based on the type of regex (POSIX, PCRE, etc), not based on their implementation library. |
I agree that in the long term it should be specified by type. Just FYI, why we want to use RE2J is to speed up a pattern which is for example matching "Presto|Hive|Spark|Big ?Query" with Web page title to query a visitor who are interested in query engines. (real example has a hundred of |
We recently see a query pattern runs extremely slow with JONI. A query with multiple |
Currently, we support two types of library
JONI
andRE2J
for regex function. It's statically decided at launch time. But we sometimes want to change the library dynamically because the performance characteristics of each library are different. Selecting the appropriate library that fits each use case is desirable.My suggestion is extending functions to support another field to specify the library as follows so that we can switch the library by calling.
Or specifying the library as a session parameter might be another option.
The text was updated successfully, but these errors were encountered: