-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KQL] Add regex support #46855
Comments
Pinging @elastic/kibana-app |
I am trying to perform a Kibana KQL search on a text field for any value that doesn't end in $ For instance, when parsing Windows Event Logs for successful/unsuccessful logins, I am trying to not show computer accounts (which end with $). I have looked at sever other questions around this same concept (Regex search where a string field ends with $) but that solution isn't working for me as I it is using lucene, not KQL. I know that KQL supports wildcards so I was assuming it was going to be a query along the lines of: Full regex support would be helpful in finding these documents. |
Pinging @elastic/kibana-app-arch (Team:AppArch) |
To provide additional background, @randomuserid was just explaining to me that lack of regexp support in KQL means that they need to fall back to the Lucene search syntax whenever they need regexps, here is an example for instance: https://github.com/elastic/detection-rules/blob/main/rules/linux/privilege_escalation_setgid_bit_set_via_chmod.toml. So there's no urgency to support regexps since we can fall back to Lucene, but it would be better if KQL supported regexps so that we would no longer need to fall back to Lucene in such cases. |
Wondering if we should use the new wildcard field in regexps? In that case #60933 is related |
@rayafratkina the regex query should work on every field that supports it in KQL. A user though would do good in using a wildcard field if they know they need to use regexp queries a lot. I'm not sure if we can do anything reasonable to advertise that though from KQL since at that point indexing is already done and it's kind of "too late". |
@jpountz do I think the wildcard functionality of KQL is a little flaky from what I've seen and those currently convert to |
@rw-access sorry I'm not sure I get the question. Did you mean |
I wonder if we should have a discussion here first maybe how we want the syntax of regex queries in KQL to look like? Due to backwards compatibility reasons we cannot use the Lucene way Given that we only treat the following characters as special characters, which would need to be escaped in a value: |
@jpountz ah, I was using @timroes wouldn't that syntax be subject to the same problem? Since We've had to make many similar decisions for EQL. One of the guiding principles for changes was that we won't reinterpret syntax that's already valid with new semantics, unless is was truly a bug. For breaking changes or limiting the syntax, we decided that we should still accept the syntax in the grammar, so that we can recognize it and raise an error message. That seems to be a good path forward for us . I think that means Or we introduce a new predicate instead of Thoughts? It's not great, but our options are limited. And I think the feature is desired enough — both by internal Elastic teams and our users — that we might have to pick a syntax that's less than ideal. |
There are additional concerns about how to expose the important regex options of case insensitivity. This is done in other engines using Symptoms of a broader issue - KQL is becoming a bottleneck to putting functionality in users hands. As long as KQL is the top-level means for users to assemble clauses with Boolean logic we will have issues :
With the Sculptor object model as a top-level organiser for Boolean logic :
|
@rw-access As far as I understand the grammar atm, the fieldname can not have (unquoted) spaces, thus we know that the operator is part of the field name. Maybe Lukas will be the better candidate for talking about that. I know we also experimented some time with having everything some kind of functions in KQL, which would be more along the lines with your Regarding flags, even with a custom operator we could still put the regex in |
True, but the downside is that query = '''
event.category:(network or network_traffic) and network.transport:tcp and destination.port:8000 and
source.ip:(10.0.0.0/8 or 172.16.0.0/12 or 192.168.0.0/16) and
not destination.ip:(10.0.0.0/8 or 127.0.0.0/8 or 172.16.0.0/12 or 192.168.0.0/16 or "::1")
''' Agreed for flags. I was thinking about adding |
We have a PR flip-flopping on what to do - whether to make API concessions that make KQL easier with extended pattern syntax or stick to more formal APIs with named JSON flags. |
I have a preference for formal JSON APIs in elasticsearch with dedicated editors as counterparts in the GUI to simplify. We could create a formal query JSON syntax for automatons (char_sequence, ORs, nots, repeats etc). |
I think that's a fine point to show how you don't think KQL fits your needs or doesn't solve its problem well. But that discussion might be a little easier to have in a separate issue that's better scoped, and we keep the scope of this issue constrained to adding regex support to KQL. I don't mean that at all to shut you down, but just that we keep those discussions separate, since it's already a little hard to keep track of the two. |
++ @rw-access There are already issues to discuss this. Please continue discussion in #8112 (Graphical query builder) or #14272 (more control over how filters are added to the filter bar) which are the more appropriate places for discussion around the overall concept of the filter bar. Discussion in this thread should be about the Regex support in KQL so everyone can keep better track about it. |
++ Happy to keep discussion elsewhere - just wanted to flag that regex construction is complex and adding this might mark a tipping point in how much complexity we try shoe-horn into KQL. We hit this wall 15 years ago in Lucene's query syntax. We have a proposal for an elasticsearch API that you might want to incorporate as an aid to regex authors. It could help validate that the expressions people write are actually understood correctly by Lucene's parser. |
i like the suggestion of using a function |
I would prefer to avoid any functional syntax (like I would definitely prefer to go with something regex users are already used to (like Adding a completely new operator for regex when we already use |
A dependency you'll need to track - Lucene PR to add |
Pinging @elastic/kibana-data-discovery (Team:DataDiscovery) |
+100 |
Closing this because it's not planned to be resolved in the foreseeable future. It will be tracked in our Icebox and will be re-opened if our priorities change. Feel free to re-open if you think it should be melted sooner. When using ES|QL in Kibana it's already possible to make use of RexExp e.g. by using RLIKE |
Describe the feature:
KQL currently supports wildcard queries using the
*
character to denote "zero or more characters". It does not support?
to denote "one character", nor does it support searching using full regular expressions.It would be nice if KQL supported searching using regular expressions. Internally, it could leverage Elasticsearch regexp queries or regex inside a query string query.
The syntax could be something like
optionalFieldName: /my-regex-pattern/
. (We will need to have a migration so that queries already using this syntax are escaped.)Related: #126532
The text was updated successfully, but these errors were encountered: