Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async servlet failed to send large data frames completely when HTTP/2 trailers are present #2016

Closed
dapengzhang0 opened this issue Jul 24, 2018 · 5 comments · Fixed by #2112

Comments

@dapengzhang0
Copy link

dapengzhang0 commented Jul 24, 2018

I tried to let a async servlet send out a sequence of large data (40k each) followed by trailers. The client does receive the trailer, but only receives partial data.

  • This issue does not happen if I use Undertow 2.0.12 instead of Glassfish 5.0.
  • The issue also does not happen if I Thread.sleep for 100ms (the shorter the time is, the more flaky the result will be) before asyncContext.complete()
  • This issue also does not happen if only sending out a few bytes.
  • The servlet will send out data frames completely if I do not send trailers.

Steps to reproduce:

  1. Deploy the following servlet with glassfish 5.0
import java.io.IOException;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Supplier;
import javax.servlet.AsyncContext;
import javax.servlet.ServletOutputStream;
import javax.servlet.WriteListener;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@WebServlet(urlPatterns = {"/large_output_with_trailer"}, asyncSupported = true)
public class LargeOutputWithTrailer extends HttpServlet {
  @Override
  protected void doGet(HttpServletRequest req, HttpServletResponse resp)
      throws IOException {
    AsyncContext asyncCtx = req.startAsync();

    resp.setStatus(HttpServletResponse.SC_OK);
    resp.setContentType("application/binary");
    resp.setHeader("trailers", "hello");

    final TrailerSupplier trailerSupplier = new TrailerSupplier();
    resp.setTrailerFields(trailerSupplier);
    resp.flushBuffer();

    final ServletOutputStream outputStream = resp.getOutputStream();
    outputStream.setWriteListener(new WriteListener() {
      int i;

      @Override
      public void onWritePossible() throws IOException {

        System.out.println("onWritePossible");
        while (outputStream.isReady()) {
          if (i < 3) {
            byte[] bytes = new byte[40000];
            outputStream.write(bytes);
            i++;
          } else {
            // Allow more time to let buffer clear out can workaround the issue
            // try {
            //   Thread.sleep(100);
            // } catch (InterruptedException e) {
            // }

            System.out.println("set trailers");
            trailerSupplier.get().put("hello", "world");
            System.out.println("complete");
            asyncCtx.complete();
            return;
          }
        }

        System.out.println("not ready");
      }

      @Override
      public void onError(Throwable t) {
        t.printStackTrace();
      }
    });
  }

  private static final class TrailerSupplier implements Supplier<Map<String, String>> {
    final Map<String, String> trailers = new ConcurrentHashMap<>();

    @Override
    public Map<String, String> get() {
      return trailers;
    }
  }
}
  1. Run HTTP/2 client to send a POST request
nghttp "http://localhost:8080/large_output_with_trailer" -v

Console output:

[  2.445] recv DATA frame <length=16383, flags=0x00, stream_id=13>
[  2.445] recv DATA frame <length=16383, flags=0x00, stream_id=13>
[  2.445] recv DATA frame <length=7234, flags=0x00, stream_id=13>
[  2.445] recv DATA frame <length=16383, flags=0x00, stream_id=13>
[  2.445] recv DATA frame <length=9152, flags=0x00, stream_id=13>
[  2.445] send WINDOW_UPDATE frame <length=4, flags=0x00, stream_id=0>
          (window_size_increment=40000)
[  2.445] send WINDOW_UPDATE frame <length=4, flags=0x00, stream_id=13>
          (window_size_increment=40000)
[  2.446] recv DATA frame <length=14465, flags=0x00, stream_id=13>
[  2.455] recv DATA frame <length=16383, flags=0x00, stream_id=13>
[  2.456] recv DATA frame <length=9152, flags=0x00, stream_id=13>
[  2.456] recv (stream_id=13) hello: world
[  2.456] recv HEADERS frame <length=11, flags=0x05, stream_id=13>
          ; END_STREAM | END_HEADERS
          (padlen=0)
[  2.456] send GOAWAY frame <length=8, flags=0x00, stream_id=0>
          (last_stream_id=0, error_code=NO_ERROR(0x00), opaque_data(0)=[])

  1. Sum up the lengths of DATA frames the client received:
    Only 105535 bytes were received, expecting 120000 bytes to be received.
    If you try multiple times, occasionally the client can receive all the 120000 bytes, but it's rare.
@MattGill98
Copy link
Contributor

Hi @dapengzhang0,

After an initial investigation, I just wanted to clarify my understanding of the problem and make sure we're on the same wavelength!

The total data bytes received before a closed connection seem to mostly be one of three values: 65535, 105535, or 120000. If it's 120000, it means all data was received. 65535 is the default window size, and 105535 as per the log you posted means the window size was incremented once (by the client). From what I can tell on the server end, the data seems to be getting sent before the client has a chance to increase the window size, so the client is disregarding this data. I'm looking into why this data is not being held until a window size update. Is that similar to your own diagnosis?

Kind regards,

Matt

@dapengzhang0
Copy link
Author

@MattGill98 Thanks you very much for the investigation. I actually don't know what is really happening under the hood. I'm just experiencing data corruption when the Servlet on Glassfish sends large data followed by trailers. This is observed with Netty client as well as nghttp client. If there are no trailers, or the trailers are sent a bit later after the payload like with a 100 ms delay, then the data that the client receives will not be corrupted.

@MattGill98
Copy link
Contributor

Hi @dapengzhang0,

No worries at all, that's fine! I'm fairly sure I know where it's coming from. I'll let you know when I get any further!

Kind regards,

Matt

@MattGill98
Copy link
Contributor

Hi @dapengzhang0,

A pull request for a potential fix has been made here: payara#13.

Once it's been verified and merged to Payara, we can work on making an upstream PR.

Kind regards,

Matt

@benmccann
Copy link

The fix has been merged to Payara. Can it be ported back upstream here now?

dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
…implementations

- default is none (uses JDK's impl)
- profile npn uses grizzly-npn-bootstrap
- profile openjsse uses openjsse
- versions can be overriden from command line (to compare results)

Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
- important loggers are explicitly named in logging.properties

Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
Create new phase in the handshake listener to allow registration of the
custom ALPN logic. This allows the HTTP/2 filter to work correctly.

Signed-off-by: Matthew Gill <[email protected]>
Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
Streams still allowed pushing resources when globally disabled as the
stream itself and the session both allow it. This change makes the
stream also respect the global configuration.

Signed-off-by: Matthew Gill <[email protected]>
Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
…nterface

- moving this interface out is a breaking change

Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
…y connection closure.

Signed-off-by: Matt Gill <[email protected]>
Signed-off-by: David Matejcek <[email protected]>
dmatej pushed a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
Signed-off-by: Matt Gill <[email protected]>
Signed-off-by: David Matejcek <[email protected]>
dmatej pushed a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
Signed-off-by: Matt Gill <[email protected]>
Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
- partial cleanup, but requires much more than this
- DefaultOutputSink
  - combination of "fixes by refactoring" mixed with the changes in the flow
    of the logic
  - sendTrailers now locks the deflater
- added some javadocs and much more logs


Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
- can run with any compatible implementation, not only NPN Bootstrap
- with new JDK versions after 8u250 uses it's JSSE impl by default
- still can use older NPN bootstrap versions if configured
- can use also other implementations (openjsse)

Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
…mes, but ...

- in this state grizzly:
  - fails TrailersTest (grizzly-http2; race condition, passes if I use stepping)
  - passes TCK tests with Payara (websocket)
  - still fails h2spec with Payara (race condition, different, but related issue)

Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
- 19001 is often occupied in Kubuntu

Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
- if the queue size is not same as reserved space, and if it is last content
  and the content is a trailer, send trailers
- if the queue size is same as reserved space, and record was already processed,
  don't send trailers.
- TrailersTest - just formatting and finals

Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
Signed-off-by: Matthew Gill <[email protected]>
Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
- original code was throwing NPE
- handler can be null
  - then we have nothing to handle the reason
- closeReason can be null
  - then we use "locally closed" as a default (may be incorrect)
- even ignored exception should be visible somewhere (logging)

Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
- based on h2spec results - client was able to cause ISE on the server simply
  by misformatted header
- assertions are worthless in production

Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
- fixes another NPE seen in logs

Signed-off-by: David Matejcek <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 19, 2020
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
…implementations

- default is none (uses JDK's impl)
- profile npn uses grizzly-npn-bootstrap
- profile openjsse uses openjsse
- versions can be overriden from command line (to compare results)

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
Signed-off-by: Matthew Gill <[email protected]>
Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- original code was throwing NPE
- handler can be null
  - then we have nothing to handle the reason
- closeReason can be null
  - then we use "locally closed" as a default (may be incorrect)
- even ignored exception should be visible somewhere (logging)

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- based on h2spec results - client was able to cause ISE on the server simply
  by misformatted header
- assertions are worthless in production

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- fixes another NPE seen in logs

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
…implementations

- default is none (uses JDK's impl)
- profile npn uses grizzly-npn-bootstrap
- profile openjsse uses openjsse
- versions can be overriden from command line (to compare results)

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- important loggers are explicitly named in logging.properties

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
Create new phase in the handshake listener to allow registration of the
custom ALPN logic. This allows the HTTP/2 filter to work correctly.

Signed-off-by: Matthew Gill <[email protected]>
Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
Streams still allowed pushing resources when globally disabled as the
stream itself and the session both allow it. This change makes the
stream also respect the global configuration.

Signed-off-by: Matthew Gill <[email protected]>
Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
…nterface

- moving this interface out is a breaking change

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
…y connection closure.

Signed-off-by: Matt Gill <[email protected]>
Signed-off-by: David Matějček <[email protected]>
dmatej pushed a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
Signed-off-by: Matt Gill <[email protected]>
Signed-off-by: David Matějček <[email protected]>
dmatej pushed a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
Signed-off-by: Matt Gill <[email protected]>
Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- partial cleanup, but requires much more than this
- DefaultOutputSink
  - combination of "fixes by refactoring" mixed with the changes in the flow
    of the logic
  - sendTrailers now locks the deflater
- added some javadocs and much more logs

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- can run with any compatible implementation, not only NPN Bootstrap
- with new JDK versions after 8u250 uses it's JSSE impl by default
- still can use older NPN bootstrap versions if configured
- can use also other implementations (openjsse)

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
…mes, but ...

- in this state grizzly:
  - fails TrailersTest (grizzly-http2; race condition, passes if I use stepping)
  - passes TCK tests with Payara (websocket)
  - still fails h2spec with Payara (race condition, different, but related issue)

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- 19001 is often occupied in Kubuntu

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- if the queue size is not same as reserved space, and if it is last content
  and the content is a trailer, send trailers
- if the queue size is same as reserved space, and record was already processed,
  don't send trailers.
- TrailersTest - just formatting and finals

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
Signed-off-by: Matthew Gill <[email protected]>
Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- original code was throwing NPE
- handler can be null
  - then we have nothing to handle the reason
- closeReason can be null
  - then we use "locally closed" as a default (may be incorrect)
- even ignored exception should be visible somewhere (logging)

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- based on h2spec results - client was able to cause ISE on the server simply
  by misformatted header
- assertions are worthless in production

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- fixes another NPE seen in logs

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 20, 2020
- Test results:
- OpenJDK 8u275, openjsse 1.1.5 ... SUCCESS
- OpenJDK 11.0.8, openjsse 1.1.5 ... FAIL ... as expected
- OpenJDK 8u275, npn bootstrap 1.9 ... FAIL ... as expected
- OpenJDK 8u232, npn bootstrap 1.9.payara-p1 ... SUCCESS
- OpenJDK 8u275 ... SUCCESS
- OpenJDK 11.0.8 ... SUCCESS

Signed-off-by: David Matějček <[email protected]>
dmatej added a commit to dmatej/grizzly that referenced this issue Nov 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants