Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we change the body attribute of Request/Response to a method? #606

Closed
horo-t opened this issue Jan 20, 2015 · 55 comments
Closed

Can we change the body attribute of Request/Response to a method? #606

horo-t opened this issue Jan 20, 2015 · 55 comments

Comments

@horo-t
Copy link
Member

horo-t commented Jan 20, 2015

When the ServiceWorker responds to the FetchEvent with a cached response, ServiceWorker don't need to know the body content of the response object. In such case, we don't want to send the content to the ServiceWorker. For optimization, we want to send the content only when the ServiceWorker script accesses to the content.

self.addEventListener('fetch', function(event) {
  event.respondWith(
    cache.match('http://example.com/file/')
      .then(function(response) {
          return response;
        }));
  });

Response.body provides a way to access to the content. So we want to send the content when Response.body is accessed. This means Response.body is a side-effecting attribute.

I think side-effecting attributes are not good. For example, console.log(response); may cause unexpected behaviour. So I want Response.body to be a method.

Furthermore, if Response.body is a method, comparing two response objects in tests become easier. We are currently using custom assert_object_equals() in tests not to access to Response.body attribute while comparing two response objects.

@domenic
Copy link
Contributor

domenic commented Jan 20, 2015

Is it side-effecting from the author perspective, or only from the implementation perspective?

@horo-t
Copy link
Member Author

horo-t commented Jan 21, 2015

It is from the perspective of optimization for implementation.

And also it is from the perspective of the web developer who are using fetch API.
If response.body is a method, comparing two response objects become easier.

@kinu
Copy link
Contributor

kinu commented Jan 21, 2015

I'm probably missing something, but when developers want to touch the content of Response.body basically they use async methods that return promise (internally or explicitly), is it correct? Then why do we need to change the body itself to a method?

@horo-t
Copy link
Member Author

horo-t commented Jan 21, 2015

Response.body is a ReadableStream. The reader of ReadableStream calls ReadableStream.read() method only when ReadableStream.state is "readable". I want to start sending the content to the ServiceWorker, when some method is called. But there is no method to be called.

https://streams.spec.whatwg.org/#rs-intro

function streamToConsole(readableStream) {
  readableStream.closed.then(
    () => console.log("--- all done!"),
    e => console.error(e);
  );

  pump();

  function pump() {
    while (readableStream.state === "readable") {
      console.log(readableStream.read());
    }

    if (readableStream.state === "waiting") {
      readableStream.ready.then(pump);
    }

    // otherwise "closed" or "errored", which will be handled above.
  }
}

@domenic
Copy link
Contributor

domenic commented Jan 21, 2015

What scenarios do we envision where it's not useful for the body to be populated? In that case wouldn't developers just use HEAD requests?

@domenic
Copy link
Contributor

domenic commented Jan 21, 2015

Maybe I am confused about what cost we are avoiding.

Presumably any non-HEAD HTTP request body will be used by the service worker in some way. Whether that be to read chunks with req.body.read(), to read the whole thing with a method like req.text(), or to put it in a cache, or to forward it onward, or something else. So, the UA implementation will be fetching the body from the network and storing chunks of it into memory in all these cases, right?

In these cases, what is the optimization in the OP? Is the issue perhaps the cost of reifying the body chunks (which the UA already has, right?) into JavaScript?

@horo-t
Copy link
Member Author

horo-t commented Jan 21, 2015

I'm mainly thinking about the cost of IPC(Inter Process Communication).

In Chrome there are three processes.
(1) Renderer process for page.
(2) Browser process which handles requests and caches.
(3) Renderer process for ServiceWorker.

When the page sends a HTTP request with body, and the ServiceWorker executes "fetch(event.request)".

  • The renderer(1) sends the request to the browser process(2).
  • The browser process(2) sends it the ServiceWorker(3).
  • The ServiceWorker(3) sends it to the browser process(2).
  • The browser process(2) sends it to the network.
    In such case we don't need to send the body content to the ServiceWorker(3).

When the ServiceWorker(3) get the response in the cashe from the browser process(2) and returns it to the page(1), we don't need to send the body to the ServiceWorker(3).

@horo-t horo-t closed this as completed Jan 21, 2015
@horo-t horo-t reopened this Jan 21, 2015
@horo-t
Copy link
Member Author

horo-t commented Jan 21, 2015

Sorry miss operation.

@wanderview
Copy link
Member

@horo-t

We have similar data copying issues issues in gecko, although we don't have a separate process for ServiceWorker (yet?).

The way we avoid copying lots of data across process boundaries for something like this is to dup(2) the file descriptor for the cached response's body. This means we're only copying an integer across process boundaries at first and the child process is responsible for all the reading into memory.

Can you do something similar?

So in this case you would just be passing the file descriptor to the SW process and if it doesn't actually touch the data, then you never actually read any of the stream data into memory.

Overall, though, it seems part of the point of making the body methods like text() and json() async with promises is because there is an expectation the UA will have to do work to get the data into memory when you access them. I'm not sure I would really classify this as a "side-effect". Just my opinion, of course, though.

@domenic
Copy link
Contributor

domenic commented Jan 21, 2015

I think I understand now. I agree with @wanderview that "side effecting" is not quite the right framing. More, "will this cause any work" (and specifically, work that needs to be async---of which IPC is one case).

I think it would be a simpler mental model for developers if req.body was always "present" and accessible. That is, the mental model is "Requests have a body which is a stream, plus convenience methods like .text() for reading the stream."

Whereas if they have to call req.body() (or probably req.stream() would be a better name) the mental model becomes "Requests are opaque objects which don't necessarily have an associated body. You can call .stream() to reify the body as a stream, or .text() to reify the body as text, etc."

But I do not feel that strongly about this, so if implementers think that mental model 2 is a better fit and allows better optimizations, we should probably do that.

@domenic
Copy link
Contributor

domenic commented Jan 21, 2015

Although: if req.stream() requires IPC, then shouldn't it be asynchronous, i.e. promise-returning? That seems unfortunate for developers :(

@wanderview
Copy link
Member

IMO, body should not be asynchronous. The convenience methods are already async and reading data from a stream is obviously async. So all the async stuff hangs off body. If IPC has to be done, it can be done lazily on first async operation.

Given that, I would also vote to keep body an attribute. I think having a method with no content observable change would be confusing for developers.

@domenic
Copy link
Contributor

domenic commented Jan 21, 2015

If IPC has to be done, it can be done lazily on first async operation.

This is part of the problem though. As @horo-t's comment shows, the current API doesn't actually have any async operations.

I discuss this issue in more detail in whatwg/streams#269.

@wanderview
Copy link
Member

I understand now. I missed that the stream needs data in memory to be in the "readable" state as you describe in whatwg/streams#269.

I'll comment over there. I think requiring a stream to be a readable state synchronously at creation time might be a problem. (Or I am confused again!)

@wanderview
Copy link
Member

Or, does the stream get constructed in the "waiting" state, immediately tries to get the first chunk, and moves to "readable" when its in memory? If thats the case, then it seems stream() does not need to be async.

@domenic
Copy link
Contributor

domenic commented Jan 21, 2015

I think requiring a stream to be a readable state synchronously at creation time might be a problem. (Or I am confused again!)

Unclear if you are confused :). The stream doesn't necessarily need to be readable synchronously at creation time. But it does have to be "on its way" to becoming readable. That is, a very normal usage pattern would be something like

var stream = createStreamOrGetItFromSomewhereLikeReqDotBody();

assert(stream.state === "waiting");
stream.ready.then(() => {
  // The question is, do we ever get here?
  assert(stream.state == "readable");
  console.log(stream.read());
});

Or, does the stream get constructed in the "waiting" state, immediately tries to get the first chunk, and moves to "readable" when its in memory?

Yes, exactly! This came in as I was typing the above :).

If thats the case, then it seems stream() does not need to be async.

I guess so. The idea being that .stream() both returns the stream immediately, and then in the background asynchronously kicks off the IPC stuff necessary to start it filling up?

@wanderview
Copy link
Member

The idea being that .stream() both returns the stream immediately, and then in the background asynchronously kicks off the IPC stuff necessary to start it filling up?

I think that makes sense to me now. Now that I understand that we have this initial chunk load, I agree a stream() method makes sense.

And I like the concept that request is opaque with a bunch of reify methods like stream() and text() that all set bodyUsed. So devs can think of them all the same way where "call a reify method, it drains the body completely, and so bodyUsed is set".

@annevk
Copy link
Member

annevk commented Jan 21, 2015

I don't understand this thread. Does a method mean a promise or does a method simply mean a method, but otherwise the same as a getter (which is what I would like us to use here)?

@wanderview
Copy link
Member

I don't understand this thread. Does a method mean a promise or does a method simply mean a method, but otherwise the same as a getter (which is what I would like us to use here)?

Personally, I'd like stream() to behave similar to text(), json(), etc. On first call it returns an async thing (Promise or Stream). On subsequent calls it rejects with bodyUsed, etc.

Of course this can be done with a getter, but I think it would be confusing with this kind of state change.

@domenic
Copy link
Contributor

domenic commented Jan 21, 2015

Personally, I'd like stream() to behave similar to text(), json(), etc. On first call it returns an async thing (Promise or Stream). On subsequent calls it rejects with bodyUsed, etc.

This is kind of the heart of the issue. Namely, should subsequent calls return the same stream object, or a new one?

I think it'd be pretty weird for subsequent calls to return a new, always-errored stream. This would mean you can never pass around the req object to people who might operate on the stream. Instead you have to pass around the { req, stream }pair so that everyone operates on the same stream object.

Maybe a body getter is the better approach, even if it has to kick off an IPC in the background for Blink. The user-facing model is no different, after all---it is just an unobservable optimization that if you don't touch the body getter, no IPC happens.

@annevk
Copy link
Member

annevk commented Jan 21, 2015

@wanderview could you elaborate on why you would want to return a new object each time? Conceptually that does not make much sense to me. I would expect a Request/Response to have a single associated stream. Transforming that stream into a string however seems okay as a one off operation.

@wanderview
Copy link
Member

Maybe a body getter is the better approach, even if it has to kick off an IPC in the background for Blink. The user-facing model is no different, after all---it is just an unobservable optimization that if you don't touch the body getter, no IPC happens.

This sounds ok I guess. I just find its interaction with the bodyUsed state confusing.

@domenic
Copy link
Contributor

domenic commented Jan 21, 2015

I feel like over in #452 we are coming to a better understanding of bodyUsed. My #452 (comment) combined with tyoshino's exhaustive analysis in #452 (comment) seem to give something that would be useful for developers. However there are still questions, including apparently security concerns from horo-t.

(Not @-mentioning tyoshino or horo-t to avoid spamming their inboxes.)

@horo-t
Copy link
Member Author

horo-t commented Jan 22, 2015

If the response is from the cache, when response.body attribute is accessed, the browser starts seeking the disk.
I thought, this is curious. So I thought that the body attribute should be a method.

But I'm not familiar with the common sense of JavaScript.
If this behaviour is acceptable, the body attribute is OK for me.

@domenic
Copy link
Contributor

domenic commented Jan 22, 2015

I agree it is a border-line case. But I think it's OK since it's an implementation-specific disk seek, the result can be returned synchronously, and there's no developer-observable side effects.

@tyoshino
Copy link
Contributor

Ideally, we should be able to instruct whether the cache should start sending data immediately or not on cache.match() and fetch() call.


Anyway, I basically prefer the .body() (sync) "method" idea by Horo.

  1. .body() returns a Stream instance synchronously
  2. The initial call of .body() tells the implementation that the script is interested in reading the data through the ReadableStream interface. The details of what the implementation does is left undefined in the spec.
  3. Subsequent .body() calls also return the same Stream instance
  4. What's allowed after .body() call -> To be discussed at Backpressure on fetch integrated with Streams #452 (comment)

@horo-t
Copy link
Member Author

horo-t commented Jan 22, 2015

I discussed with @tyoshino, and I changed my mind. I think .body should be a method.

In my understanding, there are five players.
[1] The network server.
[2] The browser process who handles network requests and cache APIs.
[3] ServiceWorker in JavaScript world.
[4] The renderer process for page.
[5] Cache data storage.

  1. When the ServiceWorker calls fetch() and gets the Response object, Response has HTTP headers information and the pointer to the data flow ([1] -> [2]).
  2. When the ServiceWorker calls FetchEvent.respondWith(response), the ServiceWorker asks the browser to forward the flow ([1]->[2]) to the page [4].
  3. When the ServiceWorker calls cache.put(response), the ServiceWorker asks the browser to forward the flow ([1]->[2]) to the cache [5].
  4. When the ServiceWorker wants to read the body of the response, the ServiceWorker needs to ask the browser to forward the flow ([1]->[2]) to the ServiceWorker [3]. (I think the new flow ([2]->[3]) is what the ReadableStream represents.) So getting the ReadableStream is an action. So .body() should be a method.
  5. If the ServiceWorker reads some amount of the body from the ReadableStream, and call FetchEvent.respondWith(response), the remaining body data will go through this flow ([1]->[2]->[3]->[2]->[4])
  6. If the ServiceWorker reads some amount of the body from the ReadableStream, and call cache.put(response), the remaining body data will go through this flow ([1]->[2]->[3]->[2]->[5])

@tyoshino
Copy link
Contributor

Re: @annevk #606 (comment)

Domenic's summary at whatwg/streams#269 (comment) explains it. We want to be able to have the browser refrain from populating the ReadableStream until we say "Go" for that.

Once a ReadableStream instance is created, it runs start() of the underlying source https://streams.spec.whatwg.org/#underlying-source, and if it finishes, starts calling pull() of the underlying source. This starts retrieving data from the browser process and buffering in the renderer process.

This "buffering" is what Horo wanted to avoid. This happens when a ReadableStream is created and start() of its underlying source is finished. Not when .read() is called.

.read() is a synchronous method. So, the renderer must be able to return some data synchronously when .read() is called. So, .read() cannot work as the trigger of the data retrieval which is asynchronous. Only the followings can work as a trigger

  • construction of the ReadableStream
  • completion of start()

@tyoshino
Copy link
Contributor

(Cont'd) As you said, if the ReadableStream API had some interface to tell it explicitly to start, the issue can be addressed on Streams API spec side. That's the plan (2) of Domenic's post.

@annevk
Copy link
Member

annevk commented Jan 22, 2015

Okay, so it sounds like we should wait for the design of Streams to stabilize before we do anything here.

@domenic
Copy link
Contributor

domenic commented Jan 22, 2015

I don't think the design of streams will change that much here. Just like promises, creating the stream implies starting the underlying operation. We could try to re-design streams to no longer be a promise-like primitive representing flowing data, but instead be abstract "handles". But I think the latter is best represented by objects like Request.

I think at this point we are just debating whether body should be a method or a getter. Again I think it is fine to be a getter since from a developer point of view it doesn't have any side effects, and it is implementation-dependent whether it will have any side effects anyway. That is:

We want to be able to have the browser refrain from populating the ReadableStream until we say "Go" for that.

That is an implementation perspective. From a developer perspective, it is equally valid to look at the public API and say "req.body was already being populated!" They can't tell the difference.

@horo-t
Copy link
Member Author

horo-t commented Jan 22, 2015

In my understanding, a Response object is a handle. With this Response object the developer in the JavaScript world can decide whether to pass the response to the page or to the cache storage.
If the developer in the JavaScript world wants to read the body data of the Response, the developer have to say "Hey browser, please pass it to the JavaScript world! I want to read it!".
I think this is an operation, it should be a method.

I think it is same as Blob.
To read the Blob data, the developer needs to create a FileReader.

@wanderview
Copy link
Member

For whatever its worth, I'm fine with either a getter or method at this point.

The only real hazard I see with the getter is that you could accidentally start the stream reading process by iterating the Request/Response properties with toJSON() or something.

@domenic
Copy link
Contributor

domenic commented Jan 22, 2015

The only real hazard I see with the getter is that you could accidentally start the stream reading process by iterating the Request/Response properties with toJSON() or something.

Note that this won't have any script-observable consequences though; it will just cause an IPC.

I think part of the issue is that I am used to thinking about things from a self-hosting perspective. In that case there is no "Hey browser, please pass it to the JavaScript world! I want to read it!"---it's all JavaScript world already. I tend to think of developer perceptions of APIs from this perspective too. Which leads to stuff like my comment whatwg/streams#269 (comment)

@annevk
Copy link
Member

annevk commented Jan 22, 2015

Note that this won't have any script-observable consequences though; it will just cause an IPC.

If that is really the case we should use a getter (which always returns the same object). I thought that the suggestion was that getting the object should set some flag, but if that's not the case, I agree it can be a getter.

@wanderview
Copy link
Member

Note that this won't have any script-observable consequences though; it will just cause an IPC.

Right. I think what @horo-t is saying, though, is if we require script to be more explicit about its intent then we can make the UA implementation better. Performance is a feature.

Anyway, like I said, I'm ok with a getter. Just trying to look at both sides.

@horo-t
Copy link
Member Author

horo-t commented Jan 22, 2015

(Cont'd) As you said, if the ReadableStream API had some interface to tell it explicitly to start, the issue can be addressed on Streams API spec side. That's the plan (2) of Domenic's post.

Yes. If the ReadableStream API had some interface to tell it explicitly to start, .body getter is OK for me.

Note that this won't have any script-observable consequences though; it will just cause an IPC.

Yes.
If the response is from fetch() API, it will be an IPC for a data move from the network buffer in the kernel layer to the buffer in the JavaScipt world.
If the response is from Cache API, it will be an IPC for a data copy from the HDD to the buffer in the JavaScipt world.

I think I will be surprised if I hear the sound from the HDD when I just touched .body attribute.
But if it is not surprising for the developer who are using JavaScript, .body getter is OK for me.

@tyoshino
Copy link
Contributor

I'd categorize side-effects into the following three groups:

  1. Visible to the JS code
  2. Invisible to the JS code
    1. (though it depends on platform) Causes something big (IPC, HDD access, tell the socket to receive more data, etc.)
    2. Doesn't causes anything big

I and Horo (Ben also but slightly?) think that it's user friendly that both (1) and (2)-i are given the form of method.

On platforms where .body is not (2)-i but (2)-ii, it's strange if .body is implemented as a method. #606 (comment)

As it's just strange, not harmful, I think we should rather cover the demand of platforms where .body is (2)-i by making it a method.

Now it looks we are appealing one's perception to each other. Let's finish it and find some landing point...

Even if we add start() to the Streams API (whatwg/streams#269) start() is (2)-i on some platforms and (2)-ii on some platforms.

@domenic
Copy link
Contributor

domenic commented Jan 23, 2015

My perspective is that web APIs should not be designed according to platform-dependent concerns, and should instead be designed from the perspective of a JS programmer.

Now it looks we are appealing one's perception to each other. Let's finish it and find some landing point...

Agreed :). I don't know if anyone feels very strongly here. Maybe we should force the spec editor to make an executive decision for us?

@tyoshino
Copy link
Contributor

Sorry. I'd like to correct the last paragraph in my comment #606 (comment)

.start() does make a change on the ReadableStream which is visible to JS. Before .start(), the ReadableStream never becomes "readable", but after .start(), it will.

@tyoshino
Copy link
Contributor

Domenic,

Agreed :). I don't know if anyone feels very strongly here. Maybe we should force the spec editor to make an executive decision for us?

I'm fine with the plan.

@annevk
Copy link
Member

annevk commented Jan 23, 2015

If 2i is indeed platform dependent and we can imagine solutions that do not have this issue (and I think we can, e.g. if the whole stack was JavaScript), we should go with a getter.

And to be clear, there are some drawbacks against the method solution that I'm not sure have been mentioned:

  • A method typically returns a fresh object, whereas this would keep returning the same object. (x.y() === x.y().)
  • All methods on Request and Response that have to do with the body return a fresh promise.

@wanderview
Copy link
Member

@tyoshino @horo-t

Can you initialize your stream to a dormant "waiting" state and then only start the IPC when someone first tries to observe that state? So checking the state, adding a state event listener, trying to read, etc.

If possible, this would avoid the IPC being triggered in the getter.

@horo-t
Copy link
Member Author

horo-t commented Jan 26, 2015

Both readableStream.state and readableStream.ready are getter.
So when we call console.log(response); or move the mouse over the response in DevTools window, IPC and HDD access (for chaced response) are triggered.

@domenic
Copy link
Contributor

domenic commented Jan 26, 2015

Getters are not triggered by console.log or by devtools.

@tyoshino
Copy link
Contributor

Let's clarify the term "getter".

Horo said that, on Blink, if it's implemented as response.body not response.body(), it's evaluated by the DevTools and leads to triggering the side-effects.

Domenic said that if it's "getter" (maybe in the ES6 terminology, when the property is an accessor property), even if it's available for access in the form of response.body, it shouldn't be evaluated by the DevTools. If it does, it's a bug of the inspecting tool. Right?

Should the DevTools stop evaluation when [[Get]] (P, Receiver) algorithm reaches Otherwise, IsAccessorDescriptor(desc) must be true so ... step?

@tyoshino
Copy link
Contributor

Re: Anne (#606 (comment))

A method typically returns a fresh object, whereas this would keep returning the same object. (x.y() === x.y().)

If it's against the common sense of JS, we could add a separate method e.g. .fillStream() to Response which tells the underlying source of the ReadableStream returned at .body to complete start() to start invoking pull() to retrieve data from the far place (browser process in Chromium's case).

Though it requires two steps for stream users ...

@domenic
Copy link
Contributor

domenic commented Jan 27, 2015

DevTools generally does not evaluate getters, in Chrome at least. Try this in your console:

var x = { get y() { console.log('foo'); } };
console.log(x);

As you can see "foo" is not logged.

@tyoshino
Copy link
Contributor

Thanks for the example, Domenic. In Blink's V8 binding, we implement "getter"/"setter" for attributes in C++. It's exposed as a data property but when it's requested to return a value, asks the C++ impl to build the value. We've been calling them "getter". Sorry for confusing.

Right. Properties seen as accessor properties from V8's point of view are not printed by DevTools. This is just a problem with Blink V8 binding.

Now the issue posed by Horo is no longer spec issue. We can close this. It still looks slightly better to me if .body is used in the form of .body(). But it doesn't really matter.

@horo-t
Copy link
Member Author

horo-t commented Jan 27, 2015

Sorry, I didn't understand well about "getter".

@horo-t horo-t closed this as completed Jan 27, 2015
@tyoshino
Copy link
Contributor

@domenic, @horo-t: I talked to a guy who's working on Blink V8 binding. He said he understands the issue but didn't simply agree the idea that an accessor property shouldn't be evaluated automatically in DevTools. See also https://code.google.com/p/chromium/issues/detail?id=450623 where our DevTools team is considering evaluating and printing accessor properties automatically.

@tyoshino
Copy link
Contributor

Checked Firefox's Web Console. It doesn't evaluate an accessor.

var x = { get y() { console.log('foo'); } };
x
Object { y: Getter }

@annevk
Copy link
Member

annevk commented Jan 28, 2015

That using developer tools is a performance hit is already known. Why is it a concern?

@tyoshino
Copy link
Contributor

I just wanted to say that Domenic's #606 (comment) may become false on Chromium.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants