Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use regular expression in filter ? #719

Closed
stefansaye opened this issue Sep 21, 2016 · 24 comments
Closed

Is it possible to use regular expression in filter ? #719

stefansaye opened this issue Sep 21, 2016 · 24 comments

Comments

@stefansaye
Copy link

stefansaye commented Sep 21, 2016

hi all,

I try to using regular expression in my rule as below, it's will alert when id is 1.1 and msg contain type.24:6

 filter:
 - and:
    - term:
         id: "1.1"
   - query:
        query_string:
            query: "msg:\"Type.[0-9]{1,2}\\:6\""

but it's cannot alert ,Is it possible to use regular expression in filter ?

Thanks by advance

@Qmando
Copy link
Member

Qmando commented Sep 21, 2016

Yes. You need to use https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_regular_expressions. Basically, surround it in / ... /

Also, the and there is implicit, so you don't need it.
I also changed msg to msg.raw, which is probably what you want if you are using default mappings.

I think this is what you want:

 filter:
   - term:
         id: "1.1"
   - query:
        query_string:
            query: "msg.raw: /Type.[0-9]{1,2}\\:6/"

@stefansaye
Copy link
Author

hi Qmando,
I have reset my filter as you said as below,

 filter:
   - term:
         id: "1.1"
   - query:
        query_string:
            query: "myfield:/Type.[0-9]{1,2}\\:6/"

but it's get any alert when have match document ,
Would you have any idea of what might be causing this ?
thanks,

@Qmando
Copy link
Member

Qmando commented Sep 26, 2016

What is the EXACT value you are trying to match?
Did you try using .raw ?

@stefansaye
Copy link
Author

stefansaye commented Sep 30, 2016

hi Qmando,
for example, exact value is

  Line protocol on Interface GE0/7/1/0/20, changed state to Down 

my filter as below

 filter:
 - term:
        id: "1.1"
  - regexp:
       log_desc: "TE[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}, changed state to Down"

but when I run the rule ,got the error output

 File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
 File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/elastalert/elastalert/elastalert.py", line 1408, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/elastalert/elastalert/elastalert.py", line 1403, in main
    client = ElastAlerter(args)
  File "/elastalert/elastalert/elastalert.py", line 94, in __init__
    self.conf = load_rules(self.args)
  File "elastalert/config.py", line 400, in load_rules
    raise EAException('Error loading file %s: %s' % (rule_file, e))
elastalert.util.EAException: Error loading file ./rules/test.yaml: Could not parse file     ./rules/test.yaml: while scanning a double-quoted scalar
  in "./rules/test.yaml", line 23, column 24
found unknown escape character '/'
  in "./rules/test.yaml", line 23, column 43

@Qmando
Copy link
Member

Qmando commented Sep 30, 2016

I don't think you need the backslashes () because / doesn't need to be escaped. Also note (https://www.elastic.co/guide/en/elasticsearch/reference/2.1/query-dsl-regexp-query.html) that regular expressions are anchored, so they must match the whole string. I'm not sure if you want to match GE0 or TE0, but you probably want

 filter:
 - term:
        id: "1.1"
  - regexp:
       log_desc.raw: "Line protocol on Interface TE[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,2}, changed state to Down"

You can probably simplify it to "Line protocol on Interface TE.*?changed state to Down"

@stefansaye
Copy link
Author

stefansaye commented Oct 7, 2016

hi Qmando,
is it must to use .raw field ?

I have follow as you said,

   filter:
       - term:
              id: "1.1"
      - regexp: 
            log_desc: "Line protocol on Interface TE.*,changed state to Down"

but still cannot alert when match match the rule ?

@Qmando
Copy link
Member

Qmando commented Oct 7, 2016

Yes. use .raw, like I said.

You also got rid of the space before "changed"

@stefansaye
Copy link
Author

stefansaye commented Oct 11, 2016

Hi Qmando,
now i have follow as you said , using .raw field,
for example field value is

bla,bla,bla,bla,bla,bla,test::cerent84State.16785422.430:40,blablablablablablablablablablabla,b

I want to alert when match the pattern

       cerent84State.16785422.430:40

but 16785422 and 430 is random produce, so I set * to my rule

and my filter as below,

  filter:
       - term:
              id: "1.1"
      - regexp: 
             log_desc.raw: "State.*:40"

but still cannot alert when match match the rule ?
what's wrong with my code ? Thanks!

@Qmando
Copy link
Member

Qmando commented Oct 11, 2016

Elasticsearch regexp is anchored, meaning you have to use ".cerent84State.:40". Leading wildcard is slow but it should match.

@stefansaye
Copy link
Author

stefansaye commented Oct 12, 2016

Hi Qmand,

the dot . in my pattern is original message

          State.16785422.430:40

follow as you said ,it's still cannot match with my rule

  filter:
       - term:
              id: "1.1"
      - regexp: 
            log_desc.raw: ".State.:40"

and i try to use wildcard as below ,still cannot match

  filter:
       - term:
              id: "1.1"
      - regexp: 
            log_desc.raw: "*State*:40*"

what's wrong with my code ?
Any help should be appreciated!
Thanks!

@Qmando
Copy link
Member

Qmando commented Oct 12, 2016

".*cerent84State.*:40"
Sorry, the github thought the asterisk was a markdown character

@stefansaye
Copy link
Author

stefansaye commented Oct 13, 2016

hi @Qmando ,
log_desc value is

bla,bla,bla,bla,bla,bla,test::cerent84State.16785422.430:40,blablablablablablablablablablabla,b

the filed mapping as below,

   "log_desc" : {
        "type" : "string",
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },

but as you said still cannot math the rule as below,

 filter:
       - term:
              id: "1.1"
      - regexp: 
            log_desc.raw: "*cerent84State.*:40*"

and I tried the filter

 filter:
       - term:
              id: "1.1"
      - regexp: 
            log_desc.raw: "*cerent84State.*.*:40*"

,still cannot match

what's wrong with my code please ?
Any help should be appreciated please!
Thanks!

@Qmando
Copy link
Member

Qmando commented Oct 13, 2016

Again, you need to start with .* not just *

@stefansaye
Copy link
Author

hi @Qmando ,

I have try

filter:
       - term:
              id: "1.1"
      - regexp: 
            log_desc.raw: ".*cerent84State.*.*:40*"

and

filter:
       - term:
              id: "1.1"
      - regexp: 
            log_desc.raw: ".*cerent84State.*:40*"

and finally ,

filter:
       - term:
              id: "1.1"
      - regexp: 
            log_desc.raw: ".*"

but ,all of these cannot match

Any help should be appreciated please!
Thanks!

@Qmando
Copy link
Member

Qmando commented Oct 14, 2016

Have you considered if id is not matching?

@badsector3
Copy link

badsector3 commented Mar 20, 2017

Hi. I have similar issue so I thought I will just add here.

I try to alert on following:
"message": "2017-03-19 01:15:53.625 CET [10651]: [2-1] user=,db=,client=, app_name=[] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000044444054440000040 has already been removed\n\t",

rule configuration:
type: any

filter:

  • term:
    level: "FATAL"
  • regexp:
    message: ".ERROR:."

realert:
hours: 1

If I alert on term only I can see hits but together with regex: - 0 hits
It looks to me that the regex is incorrect.

@badsector3
Copy link

badsector3 commented Mar 22, 2017

I found following:
#790

The solution looks to be:
`filter:

  • query_string:
    query: "message: error AND (level: FATAL)"`

Sorry but the formatting is not right. filter and - query_string should be lined

@abhishekjiitr
Copy link
Contributor

abhishekjiitr commented Nov 1, 2018

Example yaml for regex matching documents with message key having error anywhere in text
Note: you have to surround the text to match with / (forward slash)

filter:
 - query:
      query_string:
        query: "message: /.*error.*/"

@alankis
Copy link

alankis commented Nov 9, 2018

@abhishekjiitr What about using complex queries like this one below:

filter:
- query:
    query_string:
      query: "beat.hostname: some.hostname  AND system.process.name: java AND system.process.cmdline: /*java*/ AND system.process.cmdline: /*org\.apache\.spark\.deploy\.master\.Master*/"
      analyze_wildcard: true

I am getting the same error 'found unknown escape character'.

Edit: Each of reserved characters should be escaped. So the last line should look like this:

filter: 
- query:
    query_string:
      query: "system.process.cmdline: /*org/.apache/.spark/.deploy/.master/.Master/*"
      analyze_wildcard: true

@abhishekjiitr
Copy link
Contributor

abhishekjiitr commented Nov 9, 2018

@alankis is this query working fine after the edit? I think you need to add .* insted of * in the start of query, what are you trying to match exactly?

filter: 
- query:
    query_string:
      query: "system.process.cmdline: /.*org/.apache/.spark/.deploy/.master/.Master.*/"
      analyze_wildcard: true

@alankis
Copy link

alankis commented Nov 11, 2018

@abhishekjiitr Not exactly. Now I am getting hits, even when the process exists. I have rewritten my query slightly:

filter:
- query:
    query_string:
      query: "beat.hostname: ip-xx-xx-xx-xxx.ec2.internal"

- query:
    query_string:
      query: "system.process.name: java"

- query:
    query_string:
      query: "system.process.cmdline: /*java*/"
      analyze_wildcard: true

- query:
    query_string:
      query: "system.process.cmdline: /*org.apache.spark.deploy.master.Master/*"
      analyze_wildcard: true

Now my query is getting hit even when the process exists.

I am using your example, I am getting the following error:

WARNING:elasticsearch:GET http://xx.xx.xx.xx:9200/metricbeat-*/metricsets/_count?ignore_unavailable=true [status:400 request:0.065s]
ERROR:root:Error running count query: RequestError(400, u'search_phase_execution_exception', u'Failed to parse query [system.process.cmdline: /.*org/.apache/.spark/.deploy/.master/.Master.*/]')

I am trying to match the following 'cmdline' in my ES:

WARNING:elasticsearch:GET http://xx.xx.xx.xx:9200/metricbeat-*/metricsets/_count?ignore_unavailable=true [status:400 request:0.065s]
ERROR:root:Error running count query: RequestError(400, u'search_phase_execution_exception', u'Failed to parse query [system.process.cmdline: /.*org/.apache/.spark/.deploy/.master/.Master.*/]')

@alankis
Copy link

alankis commented Nov 12, 2018

@abhishekjiitr If I am correct, I should also escape '.' (dot character) with a leading backslash? Eg.
/.*org\.apache\.spark\.deploy\.master\.Master\.*/

This is at least true from the ES documentation.

I am using the following query in Kibana discover pane and it is returning the expected value:

system.process.cmdline: *org\.apache\.spark\.deploy\.master\.Master*

@sivatarun
Copy link

sivatarun commented May 14, 2019

@Qmando i have issue with using regex in query_string filter

filter:
- query:
     query_string:
       query: "field: /.*'.*/"

I used multiple ways which are
1.
-query: 'field: /.*'.*/ ---which throws error
2.
-query: 'field: /.*''.*/' ---which doesnt throw error but doesnt alert

My alert condition is to alert if the field has any data with an apostrophe('). Kindly help in this regard

@Qmando
Copy link
Member

Qmando commented May 14, 2019

Is field analyzed? If so, you'll definitely need field.raw or field.keyword. I would have tried option # 1 so if that doesn't work I can't help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants