Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose a request ID to help with debug logging #407

Closed
markt-asf opened this issue Jul 9, 2021 · 13 comments
Closed

Expose a request ID to help with debug logging #407

markt-asf opened this issue Jul 9, 2021 · 13 comments

Comments

@markt-asf
Copy link
Contributor

markt-asf commented Jul 9, 2021

Inspired by what Netty currently does.

To assist with debug logging - particularly in the context of asynchronous I/O, HTTP upgrade and HTTP/2 - it would be helpful if the Servlet API exposed some form of tracking ID to help track the following:

  • network connection
    • identify requests that shared a connection (HTTP keep-alive, HTTP upgrade)
  • request within a connection
    • particularly for async requests that move between threads, link activity associated with the same request
    • must be unique for the current OS process when combined with the network connection ID
    • may have additional uniqueness properties
  • protocol specific IDs
    • HTTP/2 connection and stream
    • WebSocket sesison ID
  • local address and port
    • for correlation with network dumps, other processes
  • remote address and port
    • for correlation with network dumps, other processes
  • container specific
    • is there a requirement for this?

The intention is that the ID would correlate with container specific logging, assumming the container uses the same IDs for its internal logging.

"request within a connection" and "protocol specific" look to be equivalent.

The network address and port info could be relatively expensive to obtain. Container specific solutions often have short and long IDs. The would give us:

  • getShortID(), returns <connectionID>-<requestID>
  • getLongID(), returns <connectionID>-<requestID>-<localAddressAndPort>-<remoteAddressAndPort>[-<customContainer>]

Where:

  • connectionID is 8 hex digits (hash code of appropriate object as hex - avoids negative hashCodes as I want to use "-" as the separator)
  • requestID depends on protocol
    • HTTP - simple counter for the connection
    • HTTP/2 - connectionID:streamID
    • WebSocket - session ID
    • Other - "NONE" or protocol dependent
  • localAddressAndPort is IP:port in same format as URL
  • remoteAddressAndPort is IP:port in same format as URL
  • customContainer
    • optional

On ServletRequest or HttpServletRequest? I'm leaning towards ServletRequest.

@gregw
Copy link
Contributor

gregw commented Jul 10, 2021

Firstly, I'm not sure that a hashCode is sufficient as an ID, since it is not required to be unique. Most implementations will provide a practically unique hashcode, but is that sufficient?

Then, I'm not sure that providing an API that does the combination is the right thing to do. You've already had to suggest 2 combinations, but there will always be users that want other combinations. I fear that we will see applications parsing these IDs to access the individual components. Thus my preference would be to provide API to the raw components (connectionID, requestID, connectionRequestCount, etc.) and then applications would be free to combine however they want to get an ID.

It would also need to be made clear if this was just a debugging API (and thus perhaps optional or no great need to enforce uniqueness) or are they APIs that could be used for real application work (perhaps caching security credentials by connectionID for example, and thus needing a guarantee of uniqueness?). I'm not sure we can stop users using a debugging API for real application work, so I think the API would have to be the later.

@stuartwdouglas
Copy link
Contributor

This also sounds relatively expensive, so it needs to be able to be calculated in a lazy manner.

@markt-asf
Copy link
Contributor Author

The two combinations were based on what Netty provides but I take the point that an API that provides the components offers more flexibility.
If we take the component approach, the only elements missing on ServletRequest are:

  • String getConnectionID()
  • String getRequestID()

Definitions something long the lines of:

  • Connection ID: case insensitive alphanumeric identifier that is unique within the lifetime of the JVM that represents the network connection.
  • Request ID: case insensitive alphanumeric identifier that is unique for a given network connection.

Thinking about it some more, keeping these distinct from any protocol specific IDs offers some benefits - particularly around HTTP upgrade if you want to track a connection that starts as HTTP and then upgrades to something else. However, there is no API to obtain either the h2 connection ID or stream ID. How about exposing these via request attributes?

If containers and/or frameworks opt to provide some form of combined ID then I'd expect that implementation to utilise lazy creation and caching as appropriate.

@gregw
Copy link
Contributor

gregw commented Sep 2, 2021

Note that with a connectionID then we may need some kind of Listener so we can inform the application that a connection has closed.

@stuartwdouglas
Copy link
Contributor

I am not super enthusiastic about exposing connection related information to the user, as it can lead to the user making assumptions that are not always true (e.g. assuming that a connection is 1:1 with a user, assuming only one request active on the connection at a time).

Connection listeners are also really problematic, in terms of how their lifecycle is managed. If every request adds a connection listener and that connection lasts for a long time you have a potential memory leak.

@gregw
Copy link
Contributor

gregw commented Sep 2, 2021

@stuartwdouglas I'm kind of on the fence. Probably users shouldn't know about connections and may make wrong assumptions and listeners are complex..... but then you also get users that use other mechanism to reverse engineer connections... so there is a use-case. E.g. some large company app store is using HTTP/2 connections for authentication, so that one authed request over the connection makes the whole connection OK. Do we say to such users that servlets will never be for them?

@stuartwdouglas
Copy link
Contributor

I am not 100% against it, but we need to make sure all the pitfalls are well documented, and that whatever listener API we come up with makes it easy to avoid possible concurrency issues you can get with HTTP/2.

We would also need to specify the threading requirements for listeners, as connection close can happen as an async notification. We should add a requirement that all request processing is complete (committed and threads returned to container) before the listeners are invoked.

@markt-asf
Copy link
Contributor Author

The primary purpose of exposing these IDs is for debugging. I think connection tracking and associated listeners deserves a separate issue and discussion.

@gregw
Copy link
Contributor

gregw commented Sep 2, 2021

I'm not sure we can stop the IDs being used for uses other than debugging, specially as there is demand for connection management. The APN protocol is an example of something that uses connection state for a key part of their protocol. We currently could not implement APN in servlets because we have no way to know about connections nor to control parameters like h2 streams per connection.
Similarly many proxy style implementation in servlets could benefit from knowing about connections for significant performance improvements.

So I think this is precisely the kind of thing we need to consider in a major release... although I realise time is very short for 6.0 and it would be better to do nothing rather than get this wrong. But it is at least worth a couple of moments to think about what a full solution could look like and then make sure that any baby steps we take now are in the right direction.

Rather than just adding a connectionID method, when it is possible that we will need more methods later, perhaps we need to add a ServletConnection ServletRequest.getServletConnection() method?

public interface ServletConnection {
    String getConnectionID();
    InetSocketAddress getRemoteAddr();
    InetSocketAddress getLocalAddr();
    InetSocketAddress getRemote();
    InetSocketAddress getLocal();
    boolean isSecure();
    boolean isPersistent();
    int getRequestCount();
    int getMaxMultiplex();
    void setMaxMultiplex();
    void addListener(Listener listener);
    Object getAttribute(String name);
    void setAttribute(String name, Object value);

    interface Listener {
        onClose();
    }
}

Notes:

  • the difference between getLocal/Remote() and getLocal/RemoteAddr() is that the former is logical and the later actual. These may differ if PROXY protocol is used and h3 allows different ports to be used.
  • a connection may non be secure (ie not SSL), but individual requests might be secure because they have proxy headers saying they are secure. This together with making SSL certificates available in the attributes would go some way to letting a server know if a connection could be shared or not.

@joakime
Copy link

joakime commented Sep 3, 2021

Don't use InetSocketAddress.
That's incompatible with things like UnixSocket.

Mark, String getConnectionID(), what would that represent in HTTP/2 (the physical connection? the stream?) and with HTTP/3 it falls apart again.

@gregw
Copy link
Contributor

gregw commented Sep 3, 2021

@joakim, there is a well define connection concept in both h2 and h3.

@gregw
Copy link
Contributor

gregw commented Sep 4, 2021

Note also that I'm asking about our destination, which may not be the next step. If the destination might look something like what i purposed then it may be that the next step is just:

Interface ServletConnection {
  String getId();
}

@markt-asf
Copy link
Contributor Author

The first step is probably slightly more than that but not much. I like the general idea. I'll work up a PR for this as I think we are at the point where having something more concrete to discuss would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants