Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiple IP addresses in X-Forwarded-For header in Apache logger #218

Merged
merged 6 commits into from
Oct 11, 2024

Conversation

cdepillabout
Copy link
Contributor

@cdepillabout cdepillabout commented Oct 4, 2024

This PR slightly changes the parsing for the X-Forwarded-For header within the Apache logger. In this PR, the list of IPs in the X-Forwarded-For header is parsed and only the left-most IP (most likely to be the actual client IP) is written to the Apache log.

Here's an explanation of how the X-Forwarded-For header generally works: https://en.wikipedia.org/wiki/X-Forwarded-For

For instance, if a request goes through 3 proxies, the final Haskell backend might get a request where the X-Forwarded-For header looks like the following:

X-Forwarded-For: 203.0.113.195, 70.41.3.18, 150.172.238.178

In this situation, the actual client IP would be the left-most IP, 203.0.113.195.

Currently in wai-logger in master, the value of the X-Forwarded-For header isn't parsed, so you end up with Apache log lines that look like the following:

203.0.113.195, 70.41.3.18, 150.172.238.178 - - [01/Oct/2024:09:18:49 +0000] "GET /some/path HTTP/1.1" 200 - "" ""

Because there are multiple comma-separated IP addrs, this format of log line often can't be parsed correctly. (For instance, this exact log line isn't able to be parsed by Vector's parse_apache_log function. It also doesn't appear to match the format as specified in the Apache documentation itself.)

With the change in this PR, the above Apache log line will now look like the following:

203.0.113.195 - - [01/Oct/2024:09:18:49 +0000] "GET /some/path HTTP/1.1" 200 - "" ""

This matches the format most would be expecting.

Here's a more in-depth article about the pit-falls around the X-Forwarded-For header: https://adam-p.ca/blog/2022/03/x-forwarded-for/

…r logging

The X-Forwarded-For header has a "standard" format:
https://en.wikipedia.org/wiki/X-Forwarded-For

Here's an example value you may see:

```
X-Forwarded-For: 203.0.113.195, 70.41.3.18, 150.172.238.178
```

This would be a request that has passed through 3 proxies,
where the client's IP address is 203.0.113.195.

This commit modifies the Apache logger so that only the first IP address
from this X-Forwarded-For list is written to the Apache log.
@cdepillabout
Copy link
Contributor Author

@cevantes would you also be interested in doing a quick review of this PR? I added some doctests for the function added in #215.

Comment on lines +209 to +224
-- | Pull out the first IP in a comma-separated list of X-Forwarded-For IPs.
--
-- >>> firstIpInXFF "1.2.3.4, 5.6.7.8, 10.11.12.13"
-- "1.2.3.4"
--
-- If there are no commas, just return the whole input ByteString:
--
-- >>> firstIpInXFF "5.6.7.8"
-- "5.6.7.8"
--
-- Note that this function doesn't make sure the input is actually an IP address:
--
-- >>> firstIpInXFF "hello, world"
-- "hello"
firstIpInXFF :: ByteString -> ByteString
firstIpInXFF = BS.takeWhile (/= ',')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change in how the X-Forwarded-For header is handled is likely a breaking change, and probably deserves a major version bump. I can do the version bump in this PR if preferred.

Copy link
Contributor

@cevantes cevantes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments with more examples!

getSourceFromHeaderCustom :: [HeaderName] -> Request -> Maybe ByteString
getSourceFromHeaderCustom hs = getSourceFromHeaders hs
getSourceFromHeaderCustom hs = getSourceFromHeaders (fmap (,id) hs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the firstIpInXFF here as well? The case would be when you need "x-forwarded-for" but you cannot use getSource because it would read "x-real-ip" first when both are present.

Copy link
Contributor Author

@cdepillabout cdepillabout Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

We could use firstIpInXFF in getSourceFromHeaderCustom. However, my thought is that it would be somewhat surprising for a user, so I didn't end up adding it.

Really what would be nice is a new data constructor like:

data IPAddrSource
  = ...
  | FromRequest (Request -> ByteString)

so that the end-user could be in full control of exactly how to figure out the IP address. I didn't choose to add that in this PR, but I could if there is a desire to have it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, though ideally I'd prefer a plain function in the spirit of setApacheRequestFilter and setApacheUserGetter but this FromRequest would help with many edge cases.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about both using firstIpInXFF in getSourceFromHeaderCustom AND adding FromRequest above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea, this FromRequest supersedes all the other options, but it may be nice to have a choice.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdepillabout I you agree, please push a commit to implement the two things above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kazu-yamamoto I pushed a couple commits, with the important ones being:

  • e349e59: Add the FromRequest (Request -> ByteString) data constructor.
  • 1d519c6: Use firstIpInXFF in getSourceFromHeaderCustom. I made the choice to apply firstIpInXFF to all the headers passed in to getSourceFromHeaderCustom, since it should effectively be a no-op if the header value doesn't have a comma in it.
  • 2d25632: Bump wai-logger to 2.5.0. (Feel free to drop this commit if you'd prefer to do this yourself in a separate PR or something)

This allows the user full control over exactly how to pull out the IP
source address for the Apache logs.
@kazu-yamamoto kazu-yamamoto self-requested a review October 11, 2024 08:21
Copy link
Owner

@kazu-yamamoto kazu-yamamoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now LGTM

@kazu-yamamoto kazu-yamamoto merged commit 680d95c into kazu-yamamoto:main Oct 11, 2024
@kazu-yamamoto
Copy link
Owner

Merged.
v2.4.1 has been deprecated and v2.5.0 has been released.
Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants