Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Haproxy filebeat module tcp and default formats #8637

Merged

Conversation

sayden
Copy link
Contributor

@sayden sayden commented Oct 17, 2018

Adds TCP and Default format parsing in parallel to HTTP format as described here https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#8.2 So the three types of logs can be parsed even if they are mixed in the file

Merges #8428 and #8526 into a single PR because now #8526 depends on #8428

@sayden sayden added enhancement in progress Pull request is currently in progress. module Filebeat Filebeat needs_backport PR is waiting to be backported to other branches. labels Oct 17, 2018
@sayden sayden requested a review from jsoriano October 17, 2018 18:27
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added comments with the discussions on the previous PRs so we don't forget.

filebeat/docs/fields.asciidoc Show resolved Hide resolved
Default HAProxy log format


*`haproxy.default.facility`*::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use default on these fields, as they can be common to other log formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, but are they specific of the default log format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I have spent some time digging in the doc without really understanding what it is. I was supported on the idea that if it's correct in the Grok patterns in that way, it is for us. I really don't care about placing it outside on inside of the default format.

So if you can clarify this, just tell me where you'd like that I place it and I'll do it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't put them under default, specially if they are common to other formats.

Also, if something looks quite general, look for them in ECS or existing filebeat fields, for example log.syslog.facility and log.syslog.priority already exist in filebeat, you could use them to store facility and priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are better going to do the ECS port in a different PR so we can decide between original and ECS naming conventions

- name: time_duration
description: time_duration is the time the request remained active in haproxy, which is the total time in milliseconds elapsed between the first byte of the request was received and the last byte of response was sent.
- name: time_backend_connect
description: time_backend_connect is the total time in milliseconds spent waiting for the connection to establish to the final server, including retries.
Copy link
Member

@jsoriano jsoriano Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to rewrite these descriptions on this PR do it, if not we can do it in another PR, but we should do it for consistency at some moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 884ef4f772540389e5c3e96ee8a1a91ef9db36c1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I think I didn't explain well, I meant the other way around 🙁 If you look at other modules they don't start with the name of the field.

@@ -0,0 +1,148 @@
###################### Metricbeat Configuration Example #######################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably not be in this PR.

@ruflin
Copy link
Contributor

ruflin commented Oct 18, 2018

@sayden Could you add more details to the PR description on what this PR is about?

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the format of the descriptions!

- name: process_name
description: Name of the process

- name: pid
description: Process ID
description: pid of the process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use PID, or just left the previous description, it was fine. Or add a description about the process, something like PID of the process that executed the request

"haproxy.connections.frontend": 1,
"haproxy.connections.retries": 0,
"haproxy.connections.server": 0,
"haproxy.default.logsource": "127.0.0.1",
Copy link
Member

@jsoriano jsoriano Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding fields under default, this logsource looks common to default and TCP, it shouldn't be under default. If this is always going to be an IP, it could be haproxy.source.ip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it seems it could be an IP or a hostname. I'll place this field in the root (haproxy.logsource) anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin what would we do ECS-wise for a source that can be an ip or a hostname?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a solution here as one is ip type and other other is keyword type in ES. Probably best it to store it under keyword for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayden umm, in haproxy documentation it says that this is an IP, also I think that source IP in default format and client IP in tcp format are the same, I'd store both in haproxy.source.ip.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, they can also be a unix socket, tricky... @ruflin opinions for these source fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haproxy.source then? I wouldn't use haproxy.logsource because it seems to be the source of the log, not the source of the logged connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haproxy.source sounds like a good tradeoff for me at the moment. We should make it a keyword.

@sayden sayden force-pushed the haproxy-filebeat-module-tcp-and-default-formats branch from 88524da to 3f0acdb Compare October 18, 2018 11:42
@ruflin
Copy link
Contributor

ruflin commented Oct 18, 2018

Seems like the failing tests are related. The generated output files for the tests must be updated. Please ping me if you need assistance on how to generate them.

@sayden sayden added review in progress Pull request is currently in progress. and removed in progress Pull request is currently in progress. review labels Oct 18, 2018
@sayden sayden force-pushed the haproxy-filebeat-module-tcp-and-default-formats branch from 3f0acdb to 0b72a09 Compare October 18, 2018 15:23
@sayden sayden force-pushed the haproxy-filebeat-module-tcp-and-default-formats branch from 0b72a09 to 37511b1 Compare October 18, 2018 15:53
@sayden
Copy link
Contributor Author

sayden commented Oct 18, 2018

Failed test is related. Checking

fields:
- name: mode
type: text
description: mode that the frontend is operating (TCP or HTTP)
Copy link
Member

@jsoriano jsoriano Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can also be common to multiple log formats, remove it too from the default namespace, and it can be set to tcp on the tcp log format.
A possible name for this could be haproxy.protocol, or network.protocol as of ECS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use network.protocol yet from ECS as it still might change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can read in the HAProxy documentation,mode is only specific for Default format https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#8.2.1 In TCP format there is no "mode" at all (mode is always TCP).

About the naming, I thought that we already agreed that we were going to leave ECS apart in this PR and focus on maintain naming of the service, which is the most familiar naming for an HAProxy user who could use this module.

Copy link
Member

@jsoriano jsoriano Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mode seems to be only specified in the default format, it wouldn't make sense in protocol-specific formats, but it can still be useful in events so users can differentiate logs from tcp and http connections.

We can set haproxy.protocol to the value of mode for the default formats, and for protocol-specific formats set it to the specific protocol depending on the matching pattern. I'm not sure how to set a field depending on the matching pattern, we can leave it for a future change, but I'd use a common name like haproxy.protocol instead of haproxy.default.mode in any case, thinking in the future uses of this field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are spending too much time and energy in a field that only appears (and gets parsed) in a deprecated log format.

I feel like writing complex conditional parsing for non-straightforward (invented) fields which doesn't usually appear in HAProxy and to get something that we haven't measure and that it has not come from a user requirements is over-engineering things a bit

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I said we can leave adding this field in other formats for future changes. I was just asking about using a generic name like haproxy.protocol (instead of haproxy.default.mode), no overengineering, just using a more future-proof name, wdyt about this?

Copy link
Member

@jsoriano jsoriano Oct 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or haproxy.mode if you want to keep haproxy naming, but without the default namespacing.

@jsoriano
Copy link
Member

Oh, and could you please add a changelog entry?

@sayden
Copy link
Contributor Author

sayden commented Oct 22, 2018

Changelog added! :)

@@ -134,6 +134,7 @@ https://github.com/elastic/beats/compare/v6.4.0...master[Check the HEAD diff]
- Release `docker` input as GA. {pull}8328[8328]
- Keep unparsed user agent information in user_agent.original. {pull}8537[8537]
- Better tracking of number of open file descriptors. {pull}7986[7986]
- Added default and TCP parsing formats {issue}8311[8311] {pull}8637[8637]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...to haproxy module? 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you're totally right

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failures are not related.

@sayden sayden merged commit e6f3a6e into elastic:master Oct 23, 2018
@sayden sayden added v6.5.0 and removed needs_backport PR is waiting to be backported to other branches. labels Oct 23, 2018
sayden added a commit that referenced this pull request Oct 23, 2018
…mats (#8690)

Haproxy filebeat module tcp and default formats
@sayden sayden deleted the haproxy-filebeat-module-tcp-and-default-formats branch August 25, 2022 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Filebeat Filebeat in progress Pull request is currently in progress. module v6.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants