Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding another grok pattern to the filebeats mongo module pipeline. #7560

Closed
wants to merge 4 commits into from

Conversation

cwray
Copy link

@cwray cwray commented Jul 10, 2018

Adding another grok pattern to the filebeats mongo module, ingest pipeline.

We have some extended logging turned on in mongo. The log file format that mongo creates for these extended logs is not parseable by the current grok filter. Adding another grok statement to the list of Grok statements to handle these extended mongo logs.

see.
https://discuss.elastic.co/t/filebeat-mongo-module-grok-pattern-is-incorrect/138971/4

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@karmi
Copy link

karmi commented Jul 10, 2018

Hi @cwray, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@cwray
Copy link
Author

cwray commented Jul 10, 2018

Added.

@ruflin
Copy link
Contributor

ruflin commented Jul 11, 2018

@cwray Thanks for the contribution. Could you share a few log lines from a mongodb log file to test against the new pattern?

I initially wanted to ask you to add it to our existing test file but just realised it's missing :-( Will try to open a PR to add one so you can add it then.

Could you add an entry to the changelog file?

@kvch
Copy link
Contributor

kvch commented Jul 11, 2018

Do you mind rebasing your PR on top of mine? This contains the missing example logs: #7568

@cwray
Copy link
Author

cwray commented Jul 11, 2018

For the logs that don't match.

2018-07-09T17:05:25.294+0000 I -        [conn212435] end connection 127.0.0.1:43454 (479 connections now open)
2018-07-09T17:05:26.003+0000 I -        [conn212344]   Index Build (background): 18324200/222053333 8%

The log lines will work fine.

2018-07-09T17:05:25.243+0000 I NETWORK  [thread1] connection accepted from 127.0.0.1:43454 #212435 (479 connections now open)
2018-07-09T17:05:25.266+0000 I ACCESS   [conn212435] Successfully authenticated as principal adcellerantClusterAdministrator on admin
2018-07-09T17:05:26.923+0000 I COMMAND  [conn212367] command adc command: aggregate ----REDACTED---- protocol:op_query 350ms

It might aslo be good to catch anyting with a COMMAND statement and parse them out diffrently To be able to look at what commands are running and how long they are taking. But that is for another day.

@James-Quigley
Copy link

Experiencing an issue because of this as well. Would love an update for this to make it in by default so I don't have to manage my own ingest pipelines

@James-Quigley
Copy link

FWIW, I was experiencing an issue related to this: https://discuss.elastic.co/t/mongodb-module-grok-expression-does-not-match/139241/4

However creating a pipeline manually with the grok patterns in this PR still did not fix my problem. The patterns pass when using the simulate pipeline, but not when using filebeat.

@cwray
Copy link
Author

cwray commented Jul 11, 2018

@James-Quigley

What does your pipeline look like?

Here is what is currently working for me.

"filebeat-6.3.0-mongodb-log-pipeline": {
    "on_failure": [
      {
        "set": {
          "field": "error.message",
          "value": "{{ _ingest.on_failure_message }}"
        }
      }
    ],
    "description": "Pipeline for parsing MongoDB logs",
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "%{TIMESTAMP_ISO8601:mongodb.log.timestamp} %{WORD:mongodb.log.severity} %{WORD:mongodb.log.component} *\\[%{WORD:mongodb.log.context}\\] %{GREEDYDATA:mongodb.log.message}",
            "%{TIMESTAMP_ISO8601:mongodb.log.timestamp} %{WORD:mongodb.log.severity} - *\\[%{WORD:mongodb.log.context}\\] %{GREEDYDATA:mongodb.log.message}"
          ],
          "ignore_missing": true
        }
      }
    ]
  }

You will need to add this pipeline, Then remove the current index and restart filebeats. Im not sure why I had to remove the current index to get this all to work.

@James-Quigley
Copy link

@cwray I copied the default filebeat pipeline, but just changed the grok patterns. I've totally purged ES of indexes related to filebeat and I removed the existing pipelines before adding the custom pipeline.

When the documents get ingested into ES though, I get "Grok expressions do not match".

Just so I'm clear though, what do you mean when you say "Remove the current index"

@ruflin
Copy link
Contributor

ruflin commented Jul 16, 2018

@cwray As #7568 is merged, could you rebase you changes on top of master?

@cwray
Copy link
Author

cwray commented Jul 17, 2018

@ruflin I would love to but I'm a bit new to this process. What is the best way to handle the rebase and push back to this branch? I tried and I think I screwed it up.

@@ -77,3 +78,22 @@ func _GetExtendedTcpTable(pTcpTable uintptr, pdwSize *uint32, bOrder bool, ulAf
}
return
}

func _GetExtendedUdpTable(pTcpTable uintptr, pdwSize *uint32, bOrder bool, ulAf uint32, tableClass uint32, reserved uint32) (code syscall.Errno, err error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func _GetExtendedUdpTable should be _GetExtendedUDPTable
func parameter pTcpTable should be pTCPTable

@@ -57,6 +57,7 @@ var (
modiphlpapi = windows.NewLazySystemDLL("iphlpapi.dll")

procGetExtendedTcpTable = modiphlpapi.NewProc("GetExtendedTcpTable")
procGetExtendedUdpTable = modiphlpapi.NewProc("GetExtendedUdpTable")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var procGetExtendedUdpTable should be procGetExtendedUDPTable

owningPID uint32
}

type UDP6RowOwnerPID struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type UDP6RowOwnerPID should have comment or be unexported

@@ -51,6 +52,19 @@ type TCP6RowOwnerPID struct {
owningPID uint32
}

type UDPRowOwnerPID struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type UDPRowOwnerPID should have comment or be unexported

@@ -24,6 +24,7 @@ import (
)

const (
UDP_TABLE_OWNER_PID = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use ALL_CAPS in Go names; use CamelCase
exported const UDP_TABLE_OWNER_PID should have comment (or a comment on this block) or be unexported

// specific language governing permissions and limitations
// under the License.

package index_recovery

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use an underscore in package name

// specific language governing permissions and limitations
// under the License.

package add_kubernetes_metadata

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use an underscore in package name


return tuple
}

func (t *TCPTuple) ComputeHashebles() {
func (t *TCPTuple) ComputeHashables() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method TCPTuple.ComputeHashables should have comment or be unexported


return tuple
}

func (t *IPPortTuple) ComputeHashebles() {
func (t *IPPortTuple) ComputeHashables() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method IPPortTuple.ComputeHashables should have comment or be unexported

@@ -66,7 +66,7 @@ const (
type matcher func(last, current []byte) bool

var (
sigMultilineTimeout = errors.New("multline timeout")
sigMultilineTimeout = errors.New("multiline timeout")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error var sigMultilineTimeout should have name of the form errFoo

@kvch
Copy link
Contributor

kvch commented Jul 17, 2018

Add elastic/beats to your remotes, if you don't have it already.

git remote add upstream https://github.com/elastic/beats
git fetch upstream

Rebase your branch.

git checkout add_mongo_grok_pattern
git rebase upstream/master

Resolve conflicts.

git add filebeat/module/mongodb/log/ingest/pipeline.json
git rebase --continue

Or if you can't add your patch run git rebase --skip, to skip that patch.

Force push it to your repo.

git push origin add_mongo_grok_pattern -f

Also, it's a good idea to create a backup branch before start rebasing. If things get ugly, you can revert to your backup.
Let me know if you need further help.

@cwray cwray force-pushed the add_mongo_grok_pattern branch from f1c681f to f8687bd Compare July 17, 2018 15:38
@cwray cwray force-pushed the add_mongo_grok_pattern branch from f8687bd to 069b5e7 Compare July 17, 2018 16:15
@cwray
Copy link
Author

cwray commented Jul 17, 2018

Ok, I think I managed to unscrew my commit from yesterday and rebase. Let me know if I can do anything else.

@James-Quigley
Copy link

@cwray I'm still not able to get this working. I copied exactly what is in the commit and created a new pipeline. Deleted the indices I'm going to be writing to. Then restarted filebeat using the new pipeline and the grok expressions still don't match.

Am I missing something obvious? The exact same pipeline passes when using the simulate API

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwray Rebase looks good and all green.

What would be great now is if you could add one or multiple log lines which were not working before to this file here. https://github.com/elastic/beats/blob/master/filebeat/module/mongodb/log/test/mongodb-debian-3.2.11.log After that you can run GENERATE=1 INTEGRATION_TESTS=1 TESTING_FILEBEAT_MODULES=mongodb nosetests tests/system/test_modules.py to update the generated content. You will need to have an Elasticsearch instance running on localhost:9200 for this to work.

If it does not work, could also do it for you if you give me access to your branch.

Could you also add a CHANGELOG entry?

@@ -4,7 +4,9 @@
"grok": {
"field": "message",
"patterns":[
"%{TIMESTAMP_ISO8601:mongodb.log.timestamp} %{WORD:mongodb.log.severity} %{WORD:mongodb.log.component} \\s*\\[%{WORD:mongodb.log.context}\\] %{GREEDYDATA:mongodb.log.message}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this empty line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I will give all this build stuff a go if I can get a chance today.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin I tried to give this a go having a few problems. I have given you access to my repo If you would like to give it a try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the error you are getting?
You need to put you example logs in filebeat/module/mongodb/log/test/mongodb-debian-3.2.11.log. Before running the generator command, you need to build filebeat using make filebeat.test and then run GENERATE=1 INTEGRATION_TESTS=1 TESTING_FILEBEAT_MODULES=mongodb nosetests tests/system/test_modules.py.

@James-Quigley
Copy link

FWIW I'm still having this issue. Can't get our mongo logs into ES because of it

@ruflin ruflin added the Team:Integrations Label for the Integrations team label Jan 17, 2019
@cwray
Copy link
Author

cwray commented Jan 17, 2019

Sorry, this has not been revisited in a while. I have been pulled off this project at my company and resigned to a different task.

@kaiyan-sheng
Copy link
Contributor

This problem is already fixed with the latest filebeat mongodb grok pattern.
With input log:

2018-07-10T20:42:37.863+0000 I NETWORK [thread1] connection accepted from 10.6.4.44:56076 #26676 (42 connections now open)

Output expected json is:

    {
        "@timestamp": "2018-07-10T20:42:37.863Z",
        "event.dataset": "mongodb.log",
        "event.module": "mongodb",
        "fileset.name": "log",
        "input.type": "log",
        "log.level": "I",
        "log.offset": 102,
        "message": "connection accepted from 10.6.4.44:56076 #26676 (42 connections now open)",
        "mongodb.log.component": "NETWORK",
        "mongodb.log.context": "thread1",
        "service.type": "mongodb"
    }

Will close this PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat module review Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants