Add 'Host' header value to 'hostname' in rule message field, if exists #2906

airween · 2023-05-22T09:56:56Z

In case of libmodsecurity3 in the generated message the hostname field always contains the server's IP address.

I had a quick look at how it works in mod_security2 and if the request has a "Host" header it's there, otherwise the server IP is placed.

This field is important, because it shows what was the Host header of the request.

Normally Nginx's log contains the host: ... part, but in some very special cases, when the previous request: ... part is too long, the host part isn't there, because Nginx truncates the log line after 2048 bytes. If a user wants to avoid it, then he needs to recompile the whole Nginx. But in many cases, it's not possible.

With this patch we can provide (almost in all cases) that hostname will be presented.

airween · 2023-06-19T07:23:08Z

Is there any question about this PR? We need this patch in our servers.

Please let me know, if any discussion is necessary.

martinhsv · 2023-06-19T13:29:04Z

Hello @airween ,

I'm not entirely sure that this is a good idea.

The 'Host' header is untrusted user input. Which means it could be misleading to use it to populate report fields that are not pretty explicitly identifiable as such.

airween · 2023-06-19T14:37:26Z

The 'Host' header is untrusted user input. Which means it could be misleading to use it to populate report fields that are not pretty explicitly identifiable as such.

I see your argument, but I think most data in error log are some user input (this is why I sent this PR earlier).

As I described in my first comment, in some cases the error log can be longer than the allowed 2048 bytes, and the hostname field will miss from there. This is a huge problem, because we just see the (local) IP address of the server.

Also I described that with this patch the behavior will be the same as in case of Apache (and mod_security2). For eg. take a look to this request:

GET /dump.php HTTP/1.1
Host: ' UNION SELECT username, password FROM users--
Accept: */*
User-Agent: Mozilla/1.0

In case of Apache, I see this:

[tag "reporting"] [hostname "' union select username, password from users--"] [uri "/dump.php"]

In case of Nginx (with the patched version):

[accuracy "0"] [tag "anomaly-evaluation"]  [hostname "' UNION SELECT username, password FROM users--"] [uri "/dump.php"] [unique_id "168718507414.966399"] [ref ""], client: ::1, server: _, request: "GET /dump.php HTTP/1.1", host: "' UNION SELECT username, password FROM users--"

Please note, that the untrusted user input is already in the log by Nginx, but not in the correct place.

What do you think if I would sanitize this value (as I did in the PR #2854 with other fields)?

martinhsv · 2023-06-19T15:23:19Z

Hi @airween ,

My concern was not primarily about any content that might resemble a generic attack (like SQLi or XSS), but, rather, input that looks like a legitimate hostname, but is not the hostname servicing the request.

For example, suppose a request is being serviced by mysiiitemain.com, but the 'Host:' header actually has the false value mysiiitesub.com?

Would it really be a good idea, in this case, for the error.log to claim: hostname: mysiiitesub.com?

airween · 2023-06-19T16:03:43Z

that looks like a legitimate hostname, but is not the hostname servicing the request.

Sorry, now I'm confused a bit - if the given hostname not configured in the server, then it really does not matter what is that, server will server the default config. It's like a request without Host header (more or less). (But correct me please if I'm wrong).

Would it really be a good idea, in this case, for the error.log to claim: hostname: mysiiitesub.com?

Yes, definitely! That's what we need, it's very informative and useful information. We know which virtualhosts are configured, and if we don't have any context with that name, then we shouldn't care about that request (in point of false positives).

martinhsv · 2023-06-19T17:52:41Z

I'm afraid I don't follow your line of reasoning.

My main concern is twofold:

there is nothing in the output that suggests 'hostname:' is unreliable; indeed the surrounding information (file, line, id, unique_id, etc.) suggests that it is not a client-manipulatable value
I don't see the positive use case where this untrusted data is safe to use for decision making

You state: "We know which virtualhosts are configured, and if we don't have any context with that name, then we shouldn't care about that request (in point of false positives)."

I don't follow. It could be the case that both the the true host and the false host in the 'Host:' header both represent hosts being protected by ModSecurity -- maybe one is of lower concern (like a host that serves only static content, while another is an administration portal).

In addition to the wariness I've outlined above, it should be noted, that if a particular installation wants the value of the 'Host:' header in output:

as you noted, they already have it (barring truncation, see below) by default
truncation issues are largely under the control of the administrator anyway; truncation can often be eliminated by managing the output items that are requested in the rule (for example, by reducing the number of 'tag' actions)
if that is still a problem, the user may add their own 'msg' action to rules to specifically output the value of the 'Host:' header

airween · 2023-06-19T19:13:26Z

there is nothing in the output that suggests 'hostname:' is unreliable; indeed the surrounding information (file, line, id, unique_id, etc.) suggests that it is not a client-manipulatable value

you mean with current behavior there is nothing which suggests the hostname is unreliable? I agree, but now it does not have any information... (I know what's my server's address :), it's unnecessary to show again.). Now it's a worthless information.

I don't see the positive use case where this untrusted data is safe to use for decision making

This untrusted data is in Apache's error.log, and is in Nginx error.log - but unfortunately sometimes it's truncated and does not visible. But it's an important information.

I think the administrator can decide that value is trusted or not - and this is the point.

I don't follow. It could be the case that both the the true host and the false host in the 'Host:' header both represent hosts being protected by ModSecurity -- maybe one is of lower concern (like a host that serves only static content, while another is an administration portal).

Wait: if both hosts being protected by ModSecurity, then why is fake one of them? If ModSecurity protects a configured website, it means that hostname is not fake... Or - sorry, but - I don't see your argument here now.

as you noted, they already have it (barring truncation, see below) by default

yes, but in case one of our customer, about the 20% of total lines (above 300k per day) is truncated. The customer has about 200 virtual hosts.

truncation issues are largely under the control of the administrator anyway; truncation can often be eliminated by managing the output items that are requested in the rule (for example, by reducing the number of 'tag' actions)

First of all, unfortunately this would mean that the administrator should modify the whole rule set. Sorry to say but that makes no sense. (What about after an upgrade? Admin does it again?)

Secondly, the part of ModSecurity error log is limited to 1024 bytes. Tags can use a hundred bytes in extreme cases, then we would win 100 bytes - but I'm talking about few hundred bytes. Total length of Nginx error log is limited in 2048 bytes. The prefix (date, severity, pid) is maximum 50 bytes. Now we are at 1074 bytes. Now imagine that the URI is longer than the remained 974 bytes. The total URI is typically more than 1k - removing tags is not a solution.

if that is still a problem, the user may add their own 'msg' action to rules to specifically output the value of the 'Host:' header

again: really does need the admin to modify the whole rule set and append an extra value to each rule? Instead of replacing a completely useless field value with a useful one?

Here the untrusted as a marker IS THE information.

airween · 2023-06-19T19:39:39Z

A similar opinion like mine:
corazawaf/coraza#517 (comment)

martinhsv · 2023-06-19T21:01:52Z

"Now it's a worthless information."

Perhaps. But showing no information is almost always better than showing incorrect or misleading information. The same is true for showing information of low utility over against incorrect or misleading information.

And, is it really useless? It may be if there is only a single server IP address involved, but that may not be the case. And if the logs are aggregated somewhere, perhaps some admins would value having a reliable IP address in that field more than an entirely untrusted string.

"This untrusted data is ... in Nginx error.log"

You mean in the item called "host: ..." after the request? Yes, there is potential for that to be misunderstood as well, but I think less so; at least its name ('host') corresponds directly to a request header, so perhaps people will more easily deduce where it is sourced from.

"I think the administrator can decide that value is trusted or not - and this is the point."

But that presupposes the administrator (even a novice one) has knowledge of how it works. ModSecurity can already involve a lot to learn, and it's generally undesirable to make things harder by assuming even more background knowledge.

"the administrator should modify the whole rule set. Sorry to say but that makes no sense. (What about after an upgrade? Admin does it again?)"

I'm not sure I understand your concern here. ModSecurity includes many tools to modify rulesets for particular needs. In a case like this, for example, one option could be to simply make a change to SecDefaultAction to add something like
tag:'Host-Hdr: %{REQUEST_HEADERS.Host}' . Every rule would not need to be modified individually. (Keep in mind that this is simply a possible workaround for those individuals for whom seeing the 'Host:' header value early in the error.log output is a high priority.

More generally, the information in error.log is, by necessity, terse. It seems likely that there are many pieces of information that are valuable to at least some admins but are not contained in that log. The most usual source for more complete info is the audit log.

airween · 2023-06-20T10:21:14Z

showing no information is almost always better than showing incorrect or misleading information.

I can't agree with that. If a user sends a fake Host header, that's not an incorrect information - that's THE information, what we need.

perhaps some admins would value having a reliable IP address in that field more than an entirely untrusted string.

Perhaps. Perhaps it would like to see the real request from users. We're just guessing now :)

(Btw Nginx's error.log contains the server field, so administrator can access that information. And that's is before the request field, so it is sure that not truncated, like the host)

But that presupposes the administrator (even a novice one) has knowledge of how it works.

You are right. But what if someone uses both mod_security2 and libmodsecurity3? Now these engines produce different outputs. Doesn't this confuse the user?

to simply make a change to SecDefaultAction to add something like tag:'Host-Hdr: %{REQUEST_HEADERS.Host}'

Sorry, I'm not sure I can follow you. Above you wrote that we should remove tags from rules, then we can make shorter lines. I explained why it is not a solution (removing tags). If we append a new tag, then in our special cases that will the first what the engine truncates.

More generally, the information in error.log is, by necessity, terse. It seems likely that there are many pieces of information that are valuable to at least some admins but are not contained in that log. The most usual source for more complete info is the audit log.

I would dispute that. First, there are very few cases where we need the audit.log, the error.log is more than enough. Secondly if user follows the default audit.log settings (SecAuditLogRelevantStatus "^(?:5|4(?!04))") that log contains only the rejected requests. If the user use CRS in anomaly scoring mode, the requests with scores below the threshold won't appear in audit.log.

Error.log is good enough.

==%==

I'm afraid our positions are not close to each other, so would it be acceptable if I added some code and the user could choose this feature at compile time? I mean there would be an option for ./configure script, eg.:

./configure --enable-host-header-in-log

and in that case the engine will follow this mechanism, otherwise this code has no effect?

M4tteoP · 2023-06-20T17:03:32Z

Hi there,
I personally see valid reasonings on both sides. What I'm wondering is, would it be possible to just keep both of them (hostname and Host header) in the error log?
Something like:

[Mon Jun 19 ...] [security2:error] [pid 60:tid 28142730754440] [client 192.168.112.1:55555] [client 192.168.112.1] ModSecurity: Warning. [file "..."] [line "..."] [id "..."] [msg "..."] [ver "..."] [tag "modsecurity"] [tag "anomaly-evaluation"] [hostname "127.0.0.1"] [uri "/"] [host "evillocalhost"] [unique_id "..."]

I see that it would lead to an increased verbosity, but it seems that host is just like like uri, a meaningful part of the request, and somehow it would also make libmodsecurity3 error logs as complete (actually more) as mod_security2.

As a side note, FYI, there are some ongoing discussions around it also in Coraza (Which currently is just providing the IP address, just like libModSecurity):

martinhsv · 2023-07-31T17:04:15Z

No, I don't think making the behaviour configurable helps meaningfully.

I'm going to go ahead and close this for the reasons stated.

airween · 2023-07-31T19:28:17Z

No, I don't think making the behaviour configurable helps meaningfully.

And what do you think about @M4tteoP's suggestion? Adding an extra field to the line?

Add 'Host' header value to 'hostname' in rule message field, if exists

abc84bc

This was referenced Jun 19, 2023

missing hostname in logs corazawaf/coraza#517

Merged

Missing hostname in WAF logs corazawaf/coraza-caddy#75

Open

martinhsv closed this Jul 31, 2023

airween mentioned this pull request Jul 29, 2024

Discussion about 'hostname' field in log #3200

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'Host' header value to 'hostname' in rule message field, if exists #2906

Add 'Host' header value to 'hostname' in rule message field, if exists #2906

airween commented May 22, 2023

airween commented Jun 19, 2023

martinhsv commented Jun 19, 2023

airween commented Jun 19, 2023

martinhsv commented Jun 19, 2023 •

edited

Loading

airween commented Jun 19, 2023

martinhsv commented Jun 19, 2023

airween commented Jun 19, 2023

airween commented Jun 19, 2023

martinhsv commented Jun 19, 2023 •

edited

Loading

airween commented Jun 20, 2023

M4tteoP commented Jun 20, 2023

martinhsv commented Jul 31, 2023 •

edited

Loading

airween commented Jul 31, 2023

Add 'Host' header value to 'hostname' in rule message field, if exists #2906

Add 'Host' header value to 'hostname' in rule message field, if exists #2906

Conversation

airween commented May 22, 2023

airween commented Jun 19, 2023

martinhsv commented Jun 19, 2023

airween commented Jun 19, 2023

martinhsv commented Jun 19, 2023 • edited Loading

airween commented Jun 19, 2023

martinhsv commented Jun 19, 2023

airween commented Jun 19, 2023

airween commented Jun 19, 2023

martinhsv commented Jun 19, 2023 • edited Loading

airween commented Jun 20, 2023

M4tteoP commented Jun 20, 2023

martinhsv commented Jul 31, 2023 • edited Loading

airween commented Jul 31, 2023

martinhsv commented Jun 19, 2023 •

edited

Loading

martinhsv commented Jun 19, 2023 •

edited

Loading

martinhsv commented Jul 31, 2023 •

edited

Loading