Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiple IP addresses in X-Forwarded-For header in Apache logger #218

Merged
merged 6 commits into from
Oct 11, 2024
105 changes: 95 additions & 10 deletions wai-logger/Network/Wai/Logger/Apache.hs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{-# LANGUAGE OverloadedStrings, CPP #-}
{-# LANGUAGE OverloadedStrings, CPP, TupleSections #-}

module Network.Wai.Logger.Apache (
IPAddrSource(..)
Expand All @@ -18,7 +18,7 @@ import qualified Data.ByteString.Char8 as BS
import Data.List (find)
import Data.Maybe (fromMaybe)
#if MIN_VERSION_base(4,5,0)
import Data.Monoid ((<>))
import Data.Monoid ((<>), First (..))
#else
import Data.Monoid (mappend)
#endif
Expand All @@ -37,10 +37,17 @@ data IPAddrSource =
-- | From the peer address of the HTTP connection.
FromSocket
-- | From X-Real-IP: or X-Forwarded-For: in the HTTP header.
--
-- This picks either X-Real-IP or X-Forwarded-For depending on which of these
-- headers comes first in the ordered list of request headers.
--
-- If the X-Forwarded-For header is picked, the value will be parsed and the
-- left-most IP address will be used (which is mostly likely to be the actual
-- client IP address).
| FromHeader
-- | From a custom HTTP header, useful in proxied environment.
| FromHeaderCustom [HeaderName]
-- | From the peer address if header is not found.
-- | Just like 'FromHeader', but falls back on the peer address if header is not found.
| FromFallback

-- | Apache style log format.
Expand Down Expand Up @@ -130,6 +137,20 @@ getSourceFromSocket = BS.pack . showSockAddr . remoteHost
-- "-"
-- >>> getSourceFromHeader defaultRequest { requestHeaders = [] }
-- "-"
--
-- 'getSourceFromHeader' uses the first instance of either @"X-Real-IP"@ or
-- @"X-Forwarded-For"@ that it finds in the ordered header list:
--
-- >>> getSourceFromHeader defaultRequest { requestHeaders = [ ("X-Real-IP", "1.2.3.4"), ("X-Forwarded-For", "5.6.7.8") ] }
-- "1.2.3.4"
-- >>> getSourceFromHeader defaultRequest { requestHeaders = [ ("X-Forwarded-For", "5.6.7.8"), ("X-Real-IP", "1.2.3.4") ] }
-- "5.6.7.8"
--
-- 'getSourceFromHeader' handles pulling out the first IP in the
-- comma-separated IP list in X-Forwarded-For:
--
-- >>> getSourceFromHeader defaultRequest { requestHeaders = [ ("X-Forwarded-For", "5.6.7.8, 10.11.12.13, 1.2.3.4") ] }
-- "5.6.7.8"
getSourceFromHeader :: Request -> ByteString
getSourceFromHeader = fromMaybe "-" . getSource

Expand All @@ -142,6 +163,20 @@ getSourceFromHeader = fromMaybe "-" . getSource
-- "0.0.0.0"
-- >>> getSourceFromFallback defaultRequest { requestHeaders = [] }
-- "0.0.0.0"
--
-- 'getSourceFromFallback' uses the first instance of either @"X-Real-IP"@ or
-- @"X-Forwarded-For"@ that it finds in the ordered header list:
--
-- >>> getSourceFromFallback defaultRequest { requestHeaders = [ ("X-Real-IP", "1.2.3.4"), ("X-Forwarded-For", "5.6.7.8") ] }
-- "1.2.3.4"
-- >>> getSourceFromFallback defaultRequest { requestHeaders = [ ("X-Forwarded-For", "5.6.7.8"), ("X-Real-IP", "1.2.3.4") ] }
-- "5.6.7.8"
--
-- 'getSourceFromFallback' handles pulling out the first IP in the
-- comma-separated IP list in X-Forwarded-For:
--
-- >>> getSourceFromFallback defaultRequest { requestHeaders = [ ("X-Forwarded-For", "5.6.7.8, 10.11.12.13, 1.2.3.4") ] }
-- "5.6.7.8"
getSourceFromFallback :: Request -> ByteString
getSourceFromFallback req = fromMaybe (getSourceFromSocket req) $ getSource req

Expand All @@ -154,15 +189,65 @@ getSourceFromFallback req = fromMaybe (getSourceFromSocket req) $ getSource req
-- Nothing
-- >>> getSource defaultRequest
-- Nothing
--
-- 'getSource' uses the first instance of either @"X-Real-IP"@ or
-- @"X-Forwarded-For"@ that it finds in the ordered header list:
--
-- >>> getSource defaultRequest { requestHeaders = [ ("X-Real-IP", "1.2.3.4"), ("X-Forwarded-For", "5.6.7.8") ] }
-- Just "1.2.3.4"
-- >>> getSource defaultRequest { requestHeaders = [ ("X-Forwarded-For", "5.6.7.8"), ("X-Real-IP", "1.2.3.4") ] }
-- Just "5.6.7.8"
--
-- 'getSource' handles pulling out the first IP in the comma-separated IP list
-- in X-Forwarded-For:
--
-- >>> getSource defaultRequest { requestHeaders = [ ("X-Forwarded-For", "5.6.7.8, 10.11.12.13, 1.2.3.4") ] }
-- Just "5.6.7.8"
getSource :: Request -> Maybe ByteString
getSource = getSourceFromHeaders ["x-real-ip", "x-forwarded-for"]
getSource = getSourceFromHeaders [("x-real-ip", id), ("x-forwarded-for", firstIpInXFF)]

getSourceFromHeaders :: [HeaderName] -> Request -> Maybe ByteString
getSourceFromHeaders headerNames req = addr
-- | Pull out the first IP in a comma-separated list of X-Forwarded-For IPs.
--
-- >>> firstIpInXFF "1.2.3.4, 5.6.7.8, 10.11.12.13"
-- "1.2.3.4"
--
-- If there are no commas, just return the whole input ByteString:
--
-- >>> firstIpInXFF "5.6.7.8"
-- "5.6.7.8"
--
-- Note that this function doesn't make sure the input is actually an IP address:
--
-- >>> firstIpInXFF "hello, world"
-- "hello"
firstIpInXFF :: ByteString -> ByteString
firstIpInXFF = BS.takeWhile (/= ',')
Comment on lines +219 to +234
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change in how the X-Forwarded-For header is handled is likely a breaking change, and probably deserves a major version bump. I can do the version bump in this PR if preferred.


getSourceFromHeaders :: [(HeaderName, ByteString -> ByteString)] -> Request -> Maybe ByteString
getSourceFromHeaders headerNamesAndPostProc req = getFirst $ foldMap f $ requestHeaders req
where
maddr = find (\(name,_) -> name `elem` headerNames) hdrs
addr = fmap snd maddr
hdrs = requestHeaders req
-- Take a header name and value from the request, and try match it against
-- the list of headers and post-processing functions. If it matches,
-- return the ByteString resulting from applying the post-processing function
-- to the header value.
f :: (HeaderName, ByteString) -> First ByteString
f (headerNameFromReq, headerValFromReq) =
let maybePostProc = find (\(headerNameFromPostProc, _) -> headerNameFromReq == headerNameFromPostProc) headerNamesAndPostProc
in First $ fmap (\(_, postProc) -> postProc headerValFromReq) maybePostProc

-- |
-- >>> getSourceFromHeaderCustom ["x-foobar"] defaultRequest { requestHeaders = [ ("X-catdog", "1.2.3.4"), ("X-Foobar", "5.6.7.8"), ("Other", "1.1.1.1") ] }
-- Just "5.6.7.8"
--
-- If none of the headers in the passed-in list are in the 'Request', then return 'Nothing':
--
-- >>> getSourceFromHeaderCustom ["x-foobar", "baz"] defaultRequest { requestHeaders = [ ("abb", "1.2.3.4"), ("xyz", "5.6.7.8") ] }
-- Nothing
--
-- 'getSourceFromHeaderCustom' uses the first instance of any header in the
-- passed in list that it finds in the ordered header list from the request:
--
-- >>> getSourceFromHeaderCustom ["x-foobar", "baz"] defaultRequest { requestHeaders = [ ("baz", "1.2.3.4"), ("x-foobar", "5.6.7.8") ] }
-- Just "1.2.3.4"
getSourceFromHeaderCustom :: [HeaderName] -> Request -> Maybe ByteString
getSourceFromHeaderCustom hs = getSourceFromHeaders hs
getSourceFromHeaderCustom hs = getSourceFromHeaders (fmap (,id) hs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the firstIpInXFF here as well? The case would be when you need "x-forwarded-for" but you cannot use getSource because it would read "x-real-ip" first when both are present.

Copy link
Contributor Author

@cdepillabout cdepillabout Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

We could use firstIpInXFF in getSourceFromHeaderCustom. However, my thought is that it would be somewhat surprising for a user, so I didn't end up adding it.

Really what would be nice is a new data constructor like:

data IPAddrSource
  = ...
  | FromRequest (Request -> ByteString)

so that the end-user could be in full control of exactly how to figure out the IP address. I didn't choose to add that in this PR, but I could if there is a desire to have it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, though ideally I'd prefer a plain function in the spirit of setApacheRequestFilter and setApacheUserGetter but this FromRequest would help with many edge cases.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about both using firstIpInXFF in getSourceFromHeaderCustom AND adding FromRequest above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea, this FromRequest supersedes all the other options, but it may be nice to have a choice.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdepillabout I you agree, please push a commit to implement the two things above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kazu-yamamoto I pushed a couple commits, with the important ones being:

  • e349e59: Add the FromRequest (Request -> ByteString) data constructor.
  • 1d519c6: Use firstIpInXFF in getSourceFromHeaderCustom. I made the choice to apply firstIpInXFF to all the headers passed in to getSourceFromHeaderCustom, since it should effectively be a no-op if the header value doesn't have a comma in it.
  • 2d25632: Bump wai-logger to 2.5.0. (Feel free to drop this commit if you'd prefer to do this yourself in a separate PR or something)