-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jetty-12 HttpContent should have an async API #8790
Comments
So to take my proposal to it's logical extreme, we should change the method in @Override
public Request.Processor handle(Request request) throws Exception
{
HttpContent content = _resourceService.getContent(Request.getPathInContext(request), request);
if (content == null)
return super.handle(request); // no content - try other handlers
return (rq, rs, cb) -> _resourceService.doGet(rq, rs, cb, content);
} to @Override
public Request.Processor handle(Request request) throws Exception
{
HttpContent content = _resourceService.getContent(Request.getPathInContext(request), request);
if (content == null)
return super.handle(request); // no content - try other handlers
return content;
} We may even save on the allocation of a lambda this way :) |
@gregw Do you think that |
I'm just doing big picture here.... details like that are for you :) |
So thinking out loud here about how this could work if At the simplest level, we would have a We could then have a A Any With regards to caching, there are two types of cache we could use/implement. Firstly, because the Secondly, we could have a A |
The problem is that
which is quite long and cumbersome, and it needs to be re-wrapped as a To write directories, we would need to pass to the Also, I don't like that you now rely on For example, the CachingHCF would be hit, cache miss, delegate to nested HCF, which returns a HC. Any non-Path based Resource would now be excluded from all the wrapping HCF (for example, I am skeptical about this approach, even with |
This is not an issue. This design is only relevant to the server anyway, so it just needs to be in interface HttpContentProcessor extends Request.Processor
{
HttpContent getHttpContent();
} Or it could be a Ultimately the project structure is there to support good design, not to limit design choices.
Oh if For now, I'll just call it
Not necessarily. The policy decision could be made by the factory when the You only need to give the Again, configuration is important, but a second order issue to good design of how to serve content efficiently.
It is not relied on. Implementations are perfectly able to make decisions entirely on the basis of the intercepted/wrapped request/response/HttpContent. I gave the example of a cache that can cache any content and which looks at This is no different from the current code, which for example, needs to know there is a path before deciding to try to memory map the file. At least this design allows for path independent implementations because it has access to the response headers of the wrapped content. The
It doesn't only have
No - it would only be excluded from the wrapping HCF's that have path specific behaviours.
Yet |
I'm still not convinced about this.
So, And What am I missing? |
Yeah well, the convincing needs to go both ways! I'm not convinced about the currently checked in code and I only acquiesced to several PRs being merged for expediency as lack of static resources was holding up other development. We cannot allow incumbency in an alpha branch to be the determining factor of our final design.... else a lot more PRs are going to be -1 reviewed until they are perfect and convince everybody. That is not a productive work cycle. So I don't need to make the case to replace the current alpha implementation. The new implementation needs to make the case for why it belongs in the final code base - other than expediency.
You're missing that we should have an API that is extensible, flexible and efficient. It is not good enough to have an API that is just good enough for basic cases. The problems with
No. It writes the content to "a" response, which is not necessarily "the" response. The whole
As described above, there are multiple options for a cache. One can be file system aware, so it can use So is this proposal not workable? I think it embraces all the good work we have done to make What am I missing? |
@gregw If my understanding is correct, you're proposing to manage static content with a mechanism vaguely similar to the the 11.0.x handler one: pass the request/response/callback trio to some This raises two questions in my mind:
I'm inclined to believe the answer to the 1st one is yes, meaning we should move on with this idea. But I really am on the fence about the 2nd one. |
@lorban not really.... at least i don't think of it in that way. I'm proposing to embrace the new design. A HttpContent is obtained from content factory more or less as done now in the alpha. But then letting the content serve itself rather than the service serving the content (by making a policy choice about calling getBuffer or creating an iterating callback that uses the channel from the resource). This would be done with a process like API. Because it has the full request and response, it is more powerful than getBuffer and will allow such things as welcome files and pre-compression to be done in content wrappers rather than the service. The HttpContent (or it's wrappers) would make them policy decisions with full access to request, response, factory, service, resource and any cached data or meta data.
Well the previous resource service was a great big blob of if buts and elses. It was hard to understand and near impossible to maintain. It certainly was not easily extensible. It was your idea to simplify by moving the caching out to content factories and content wrappers - it was a good idea as not only does it let us deconstruct our own caching, it opens the possibility for more complex caching (3rd party?) to be plugged in. I'm just proposing we follow that example/pattern for other special case resource handling (welcome files, pre compression, re-dispatch, things we have not thought of yet).
Pass. That detail can be sorted out by whoever actually does there implementation.
I agree that decomposing into factories and content wrappers will be easier to maintain. I'm flexible on exactly what kind of reassembly model is used. |
@gregw it is currently not true that HC has access to request and response because it lives in Let's say that we move it to A cache implementation will have to wrap the response with some complex logic about either aggregating buffers into a single one that it is going to be cached, or store them aside and always use multiple writes to write them out. Also, we would need to write Servlet specific HCFs for every possible HCF that needs to perform redirects/writes because we want to use Servlet APIs to do those, not core APIs. Pseudo-code for file mapping: class FileMappingHC extends HC.Wrapper {
void process(Request req, Response res, Callback cbk) {
if (couldMap == null) {
buffer = tryMapping(getWrapped());
couldMap = buffer != null;
}
if (couldMap == Boolean.TRUE) {
// TODO: must have a Servlet version for this.
res.write(true, buffer.slice(), cbk);
} else {
getWrapped().process(req, res, cbk);
}
}
} Pseudo-code for caching: class CachingHC extends HC.Wrapper {
void process(Request req, Response res, Callback cbk) {
if (couldCache == null) {
res2 = wrapResponse(res);
cbk2 = Callback.NOOP;
getWrapped().process(req, res2, cbk2);
// Now we have a buffer stored in res2.
couldCache = canBeCached(res2.buffer);
if (couldCache) {
// TODO: also cache headers that could have been "written".
buffer = res2.buffer;
}
}
if (couldCache == Boolean.TRUE) {
// TODO: must have a Servlet version for this.
res.write(true, buffer.slice(), cbk);
} else {
// TODO: replace this with NonCacheable(this) in the cache?
getWrapped().process(req, res, cbk);
}
}
} |
I didn't say it did? I'm OK with it being moved, or if the
Why? I'm sure we already have code that can act as a
I don't think so, as we can just wrap the servlet request/response as core request/responses and send them into the resource service so it will go via any servlet response wrappers. We already do this in other places (although it perhaps has not been generalized). But we need to avoid having multiple versions of any such behavior, else we will end up with core, ee9, ee10, ee11, ... etc. versions. Far better to have a single core implementation and use wrapping to access that from servlets.
That's not how I see it at all. You would not wrap a 'HttpContentProcessor' with a 'FileMappedHttpContentProcessor' unless you already knew it could be mapped. So the implementation of class FileMappingHCP extends HCP.Wrapper {
void process(Request req, Response res, Callback cbk) {
writeHeaders(res, getHttpContent());
Response.write(req, res, _preMappedBuffer, cbk); The only logic we'd put in process is logic that is dependent on something about the request and not on something about the resource. Decisions based on the resource would be made by the factory. |
@gregw I thought you wanted If you precompute the file mapping, the caching etc. now all the work is being done by the constructor at the I thought we did not want to do that to avoid as much as possible the stampede effect. So if we are still on that idea of I just wanted to try to write it down to understand how good/bad was it. |
No. read all my comments above where I proposed: interface HttpContentProcessor extends Request.Processor
{
HttpContent getHttpContent();
} This is server side only stuff, so it can be in o.e.j.server. If
Not necessarily. It can be done that way, and probably should be for a start (optimize the fast path first!). But because we can asynchronously wait with process, there are many ways of actually solving stampede (as opposed to just moving it to 'getByteBuffer'). For example, the first request(s) to the factory could be just given the un-mapped HCP, whilst the mapped buffer is prepared and given to subsequent requests; or we could initially return a HCP that did the complexity and made process calls threadlessly wait, and then replaced itself in the factory once the buffer was available. I don't know if such complexity is warranted as I think mapping is quick, but there are plenty of alternatives. Either way, it would be good to come up with a solution that once the slow path was finished (i.e the initial request(s)) then the fast path was as simple as possible.
Sure, but there are many ways to skin that cat. The
No, because you can have a fast/light The point being that if we want, the
Stampede can be solved if we have async waiting. Just not sure that is necessary for us to implement initially. I'm sold on the idea of let's optimize the fast path first and then see what slow paths turn up in the flame graphs before putting too much effort into them. So in that spirit, having a Also, I think you are somewhat confusing what the stampede problem is. If we get 1000s of initial requests for a resource, then stampede is not making 999 of them wait for the first. A stampede is if the 999 all create the same expensive resource that the first one does, only for them to be thrown away. So we can solve stampede by:
I think 1. allows for the best optimization of the fast path. If the global lock becomes a problem, then wrapping that fast HC in a generic impl of 3. or 4. should be easy to do and would trade off a virtual read plus extra delegation in the fast path for avoiding the global lock. |
See also discussion at #8767 (comment) |
@lachlan-roberts @sbordet @joakime @lorban I'm just re pitching the need for an async API in resources in this comment. It is a distillation of what I think after all the conversations above (and elsewhere): One of the goals of the refactoring that has been done on the The very first website we deployed on the new One such new behavior that has often been suggested is the use of a 3rd party caching library. However, we are currently unable to use any 3rd party HTTP caching library because the The A With the Finally, we have the async @lachlan-roberts do you want to give this a go, or should I? |
@lachlan-roberts can you link your branch here. Was there ever a PR that actually got reviewed? I think we need to capture the discussions around this. |
@gregw the branch is here https://github.com/eclipse/jetty.project/tree/jetty-12.0.x-AsyncHttpContent It never had a PR but we reviewed in hangout with @sbordet and @lorban and decided that it introduced too much unnecessary complexity. |
Yeah but I think it might also introduce some necessary correctness (i.e. #9079) ! |
This issue has been automatically marked as stale because it has been a |
In some ways this is related to #11094 where I'm adding async APIs to RetainableByteBuffer. A HttpContent is kind of similar |
Building on the work in #11598 that makes RetainableByteBuffer a more generic buffer API, we should look at changing The key API would be |
Unfortunately, you cannot have |
The API of long getContentLengthValue();
ByteBuffer getByteBuffer();
void release(); and the new
So I think the simplest solution here is that we make |
@lorban I've had a quick start at the approach above ( |
@gregw Making How would you serve uncached 2+ GB files without keeping |
@lorban you can have a getByteBuffer() that returns null (although technically it would be better for it to now throw BufferOverException), but the HttpContent type needs to implement the writeTo method. These implementations can use Content.copy and ByteChannelContentSource or InputStreamContentSource to keep the write async. I've played around with it at have a few tests working but more work is needed. Have a look at the |
@gregw There is no need for The If we were still in the old Jetty 11 model, I'd even consider that |
@lorban but why isn't a HttpContent is-a RBB, just like a chunk is RBB? It has the same signature? |
@gregw ByteBuffer buffer = httpContent.getByteBuffer();
if (buffer != null)
sendContent(buffer, callback);
else
sendContent(httpContent.getResource(), callback); I checked I do not see a reason why I'm going to try transposing my thoughts into code today to have a better basis for discussion. |
@lorban I'm definitely not opposed to what you are describing. No Standing by to see your code and tools down on this PR now. |
Note that I think a |
#8790 implement HttpContent.writeTo() async API Signed-off-by: Ludovic Orban <[email protected]>
@lachlan-roberts has recently been working on the content caching mechanism for Jetty-12. He's been receiving contradictory advice: I've been advocating that
HttpContent
should have an async API (maybe even just implementRequest.Processor
, but I think others have been advocating for keeping the blockinggetBuffer()
approach.So I'm opening this issue to capture some of the arguments/discussions, but ultimately I think we need to have a meeting to pick a single direction and go with it.
The
ResourceService
in Jetty-12 has been greatly simplified, primarily by moving a lot of functionality out of special case handing intoContentFactory
implementations and wrappers. Thus caching, cache invalidation etc. are all applied by wrapping / configuring / replacing theContentFactory
. I think this is a great approach, but without an async API it means that only blocking behaviours can be moved into aContentFactory
. Currently, we still have behaviours for directory listing, welcome files, precompressed content, range request etc. handled as conditional behaviours in theResourceService
. IfHttpContent
was aProcessor
most of these complexities could be moved out toContentFactory
wrappers and only applied if actually needed/used.A good example was that @sbordet has a use-case for handling welcome files with a redispatch through the context handlers: the case where a welcome file might be index.php and to serve it it needs to be redispatched via the handler that will proxy a php to a different process. With the current code, the
ContentFactory
will produce aHttpContent
that represents the source code of theindex.php
file, probably even blocking for a little while in the constructor to load aByteBufffer
with the content that will never be used and creatingPreEncodedHttpFields
that are not relevant! To handle this case, we must add a special welcome file mode to theWelcomeFactory
,WelcomeAction
and to theResourceService
itself. This adds complexity even if a deployment doesn't need this feature.If instead, the
HttpContent
was aRequest.Processor
, then aWelcomeFileContentFactory
wrapper could replace any directoryHttpContent
s it receives from it's nested factory with a processor that would generate the contents of the directory. This might be a redirect, generate a xhtml listing, or it may be a fully async redispatch through the handler tree, ultimately talking to a PHP server in another process. Better yet, this could be done before the caching content factory, so the welcome file content could be cached.Better yet, new ways of handling directories and/or welcome files can be added simply by adding new
ContentFactory
wrappers rather than adding more ifs and buts into the already complexResourceService
.As far as I can tell, the counter argument is that this is "optimising the slow path". I don't think that is the case because the async API is perfectly capable of being used to implement the simple direct behaviour that we currently have which is (simplified):
I think this should just become:
and by the miracle of object oriented programming, the content would serve itself with the best technique available. If no welcome files are configured, then there is no welcome file handling code in the hot path. If welcome files are configured the content provided for a directory resource will either already be the welcome file, or be a processor that can run the welcome file logic.
If we wanted to have a simple cache that blocks loading the content into a buffer then we could had the implementation would end up something like:
If that is the style of cache we want to have, there is no significant cost to this API approach, as it simply replaces if statements with virtual method call. There may be some extra complexity in loading the wrapped HttpContent into a ByteBuffer, as we will need a fake request,response,callback, but that can be done in a utility class and will be generally useful anyway. This is not "optimizing" the slow path. This is generalising the hot path.
But by organising the API in this way, we can:
ResourceService
Vary
andCache-Control
In fact, we could almost go even further, and have the
HttpContent
as aRequest.Processor
be returned from thehandle
method of theResourceHandler
!?!?In short, in Jetty-12 it is the
Request.Processor
API that we use to serve content. We wrap the processor to add behaviours like compression. We should use this same approach for static as well as dynamic content.The text was updated successfully, but these errors were encountered: