-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getFrameData() should return a new VRFrameData object instead of data-copying. #128
Comments
The rationale for having getFrameData fill in a VRFrameData object is to promote object reuse and avoid creating garbage objects every frame that need to be GCed. |
Yeah I totally understand that concern. While some APIs might need special treatment to avoid creating excessive garbage, for example something very low level you call thousands of times per frame, I don't think VRFrameData requires this. You only create a few/one per frame Browsers must cope with small amounts of garbage It's unlikely your app will produce zero garbage Further JS engines garbage collect the code being generated by the app. This means that you often see small garbage collections collecting the optimized, or de-optimized code, scope records, closures, or various other internal data structures in your program. Browsers use garbage collectors internally too For example calling requestAnimationFrame in Chrome produces garbage for the callback, even if you pass the same function every frame. |
One thing to bear in mind is that some VR devices refresh at 90 or 120 frames per second, thus the margins are potentially much smaller than for a typical web app that updates at 60Hz. The consequences of missing a frame are also potentially more severe (reprojection can help, but not without introducing artefacts that people react differently too, in terms of comfort). Also, for mobile hardware (e.g. GearVR, Cardboard, Day Dream, Hololens, and future stuff) the issue is obviously significantly harder to deal with due to higher memory latencies, slower CPUs, etc.. I can tell you that we hunted down even single allocations (Unity/.Net native) in the steady state game loop on all the Hololens apps I was involved with! :) |
Can you provide a code example of something that doesn't allocate any garbage? Are these examples where you need to remove every allocation only built on asm.js? We are trying to reconcile the fact that a lot of the current WebVR experiments use three.js or similar (which produce garbage each frame) vs. the position here that WebVR must never have any allocations per frame. |
It's probably not very likely that all (or even most) WebVR apps will run without any steady state allocations, but there's an argument to be made that the API should avoid making the situation worse, to give the app more head room. |
I think that if all code examples and major libraries do it properly, we'll be advocating for a performant way of handling VRFrameData at the same time that we have an API that's consistent with the rest of the platform |
I agree with Sebastian on reducing memory allocations as much as possible. Recently, the WebGL working group was asked to change multiple functions the WebGL 2.0 API (bufferData, bufferSubData, texImage2D, etc) to make it easier for callers of the API to avoid creating garbage in the form of ArrayBufferViews. This feedback came from the Emscripten, people who care about performance. So when it comes to other graphics APIs in the web platform, there is an increasing bias towards reducing or eliminating memory allocation, both internally to the browser as well in the caller. Obviously, we're not going to eliminate every memory allocation under the sun but the more we can improve, the better. |
cc/ @littledan @ajklein So there are a few things here: We are arguing that this is a very small allocation and this isn't worth diverging from the rest of the platform. If you have data that shows an allocation this size is bad for apps that would be interesting to see. I think your argument is that this is worth diverging from the rest of the platform? That said what you are arguing for here is a world where (for example) any new input event (e.g. pointermove) aren't event objects but done by object populating like in this example? |
I am saying there have already been cases where creating small objects in graphics API's like WebGL have forced us to add new API entrypoints to avoid these allocations. As we move towards WebVR applications with increasing computational and GC needs of their own, I would hate for us to be in the same boat as WebGL. |
It's not clear to me that the WebGL2 API avoiding the ArrayBufferView allocations makes sense either. I'm very concerned here that there's two very different platforms being built. The modern API designs like Promises generate small amounts of garbage, for example the return value of the Promise, and the result of then(). The argument here seems to be that the platform should avoid creating any garbage at all, which is arguing we should be adding versions of event listeners that don't create garbage, stop using Promises and switch back to callbacks, and many changes in the direction the whole platform has been moving in over the past five+ years. ex. having a form of Object.keys() which takes an existing array and fills it with the string keys instead of returning a new array. I think we need to either make a principled decision that the entire platform needs to provide pre-allocated methods and then change Object.keys, stop using Promises, and so on, or we need to assume that the rest of the platform makes sense and make new APIs like WebGL2 and WebVR match the same ergonomics as all the other DOM APIs. |
Games, graphics and in particular VR have much stricter performance requirements than other scenarios. For VR in particular, framerate issues can literally make people sick. If WebGL and WebVR make minor compromises on ergonomics to improve performance while less performance sensitive APIs do the opposite, I don't really see a problem with that. In fact, I think such an outcome would be desirable. Having APIs that are tuned for their intended use cases isn't a bad thing. |
I second @esprehn's points. Allocating small, short-lived objects is generally supposed to be a "good path" for JS engines due to generational GC (though that might not always work perfectly for platform-allocated objects in all browsers). If you have any benchmarks which suffer when allocating small, short-lived objects like this, it'd be great to send them in the direction of the V8 GC team, cc @mlippautz |
I think we might be slightly talking past each other here, so just to clarify: I'm not suggesting that this one allocation will trigger some kind of pathological corner case or anything, but rather expressing concern that even the "good path" for typical web apps is actually problematic for VR due to higher frame rates (90 or even 120Hz) and higher stakes for missed frames (physical discomfort). Some back-of-the-napkin math suggests that this object, and its associated VRPose, would weigh in at about 1KB or more (depending on 32-bit vs 64-bit etc.), which means that if you only have a few megs of young generation space, this one allocation alone would cause a GC (and thus a missed frame) several times a minute. Given the special performance circumstances for VR, and since WebGL already had to go back and make changes for this same reason, I think we should be a bit more specific about what we're actually "signing up for" by making this proposed change. @littledan do you have any back-of-the-envelope numbers for what "good path" performance means for your GC for desktop, mobile, etc.? All I can find online with cursory searching is that v8 targets 5ms, which would mean potentially 1 or more missed frames even at 60Hz, and much worse for 90 or 120Hz. Is that accurate? What kind of gen 0 heap sizes do you typically use? |
We're working towards a 60fps goal, but really we're trying to get things "as good as possible" balancing several metrics (responsiveness vs memory usage being a big one). The parameters for GC are different on different platforms, depending on the memory size, etc. Is it really possible to write any sort of WebVR thing which doesn't invoke GC at all? I find it hard to believe that you wouldn't be "causing GC several times a minute". Anyway, recent changes in V8 are exploring concepts like preemptively launching parts of full (not just young generation) GC even in the absence of tons of pressure, based on idle time or just fixed timers. |
Can only repeat what @littledan already said. If you have a repro of something janky, even if it's just a prototype for a spec, let us know. I cannot comment on the spec changes other than we really suggest not using any sort of object pooling, as the runtime (compilers) will have a really hard time optimizing for it. E.g. escape analysis, write barrier elimination, allocation folding, and other optimizations will not be applied. Ever changing GC characteristics: Young generation in V8 currently grows up to 8M when the allocation rate is high. Workloads that produce lots of dead memory are cheap from a GC perspective as we only need to touch live memory (sub-ms young generation collection). We also employ idle time GC, which especially helped WebGL workloads where in some cases all GC is performed during idle times. |
@littledan I don't think it's likely that apps will have no GC, but I don't think that's an argument for making the API generate garbage (see WebGL). The less garbage we generate, the more the app can generate. The fact that WebGL already had to go back and change their APIs to reduce generated garbage is a pretty strong data point IMO. We should learn from that exercise and not introduce garbage in our core per-frame APIs without some solid reasons for why we think the lessons learned with WebGL don't apply. Especially since VR has significantly higher performance constraints than typical WebGL apps. |
FWIW, I just loaded up the first scene I could find on Sketchfab in the latest Chrome WebVR build and found on my machine:
This is using the existing API, which does not generate garbage directly in the getFrameData call, so the above is what you get from whatever ambient allocations chrome does internally, as well as what the app does. There's a few conclusion you could draw:
So you could either say "we're hosed either way, so no harm adding even more garbage", or you can say "wow, it's already pretty hard to do a good job performance-wise here even on a relatively beefy desktop PC, so we should definitely avoid adding to the problem". So not sure if that this helps us come to a conclusion either way. Sketchfab may be a particularly unoptimized scenario and thus not representative, or it may be typical, hard to say. As I've mentioned before I'm inclined to get out of the way as much as possible so that if a developer does want to heavily optimize their app to get rid of GC pauses, we won't stand in their way. |
Can you provide a trace file? I ran the regular WebGL version of the scene on my Macbook on a recent Canary yesterday. The vast majority of GCs were triggered in idle time. These GCs are triggered by the scheduler and have 0 impact on latency. The more idle time we have, the more GCs we will usually do. I also just downloaded the Windows Chrome WebVR build and ran a trace on the provided link. I do not know anything about the build but would assume that it is configured to try to run with 120Hz, even on a regular machine. I barely get to 60fps, but GC is not an issue there. I can also see the compositor trying to output a frame every 16ms, so I wonder what's actually going on there. Repeating myself: Applications producing short-living objects find themselves in the sweet spot of the GC; this will be a fast path. I recommend to not trade this sweet spot for a different approach that prevents compiler optimizations. |
@mlippautz: I'm assuming you weren't using VR hardware with the WebVR build? In the case that you're not actively presenting to a VR device the content will still run at the screen refresh rate (60Hz in most cases). In reality the performance in that mode should not be significantly different than top of tree Chrome. |
I was using an Oculus yes, and the GCs were not during idle time, but rather in the middle of the rendering code. This presumably varies a bit by machine too. EDIT: EDIT2: |
We were doing some testing to see how much of your RAF frame budget you could use and still hit frame-rate. Could you use 16ms? Probably not, since you may not have a full 16ms depending on the delay of the RAF by the event loop, but maybe 12 would work, etc... As part of this we were putting dead loops in that just timed out a sample update/render. We used performance.now(). This caused a frame hiccup every couple hundred frames due to GC with NOTHING ELSE happening because of the doubles allocated by performance.now(). Now, these doubles didn't leak (they were used within a single function, in a local variable loop, and could have potentially been optimized away entirely. But they weren't. And the fact that a couple of megs of doubles coming from a timing function which we expect people to use fairly heavily for their timings was causing a frame skip every 200 frames is pretty startling. This means the fill in the buffer model we are using is certainly helping out in some circumstances. We are trying to find an avenue to share some of this code with the general group. It does bring up the potential that it is impossible with some architectures to achieve 60fps with no frame skips, ever, given the way they work. I don't know if the best possible fix here is to ensure that we get GC updates for this issue or that we keep our APIs as friendly as possible to current GC architectures. |
I work on Chakra's GC here. My two cents is that it is always good to avoid creating garbage. We try hard to build heuristics so that we can avoid GC when script is running, and try to find idle time to do GC, although it doesn't always work because of the amount of object the script is creating. Even within the engine itself, we try to not allocate short lived objects. Generally, the more garbage is generate in the script before idle time, the more likely that we will have to GC and interrupt at an inopportune time. While GC should be able to deal with some amount of garbage, you already have a very tight budget here in this scenario between frames to do what program script needs to do. Why add more burden to the system with extra allocation and pressure on the GC? |
Here's a trace from my machine, looks like the GC cost dropped a bit (typically 1.3ms now): https://1drv.ms/u/s!AjL69IOnjhAA4Z03cS_uQZ4AbNb0Eg A lot of the GCs seem to happen at the start of a frame, before the RAFs but after the frame boundary (e.g. the first GC), and some smack in the middle of the frame (e.g. the second one). |
If it weren't for the Emscriptem folks, I would be inclined to let getFrameData allocate. However, the fact they ran into trouble and convinced the WebGL working group to make this change gives me pause. The problem is no longer theoretical in nature as far as I am concerned. As user agents, we should create as much headroom as possible for Javascript developers to be excellent at more than just demos or simple pages. I have tremendous respect for the people that build Javascript engines. However, when it comes to allocations for high usage APIs like getFrameData, I think we're better off adopting a "the fastest code is code that doesn't run" approach. |
Alright, thanks for taking the trace. Did you include the V8 category when recording the trace? If yes, then all I see is MinorGCs (=Scavenges) directly executed from tasks. We currently only schedule idle tasks for Scavenges. This means that the scheduler thought it might be a good idea to do a GC here. If you didn't enable the V8 category specifically, could you take another trace? |
I didn't enable anything in particular - I'm sorry I'm not super familiar with Chrome dev tools. I'll try to find time to get another trace at some point. E.g. the 2nd GC looks like it interrupted rendering work. |
@DigiTec: If I recall correctly we left this conversation at the F2F with the next steps of Oculus providing some traces or traceable examples to the Chrome team to better evaluate the impact of returning a new object each frame? Where are we with that? |
I'm quite confused by the GC arguments here. The one implementation of getFrameData that I've looked at (Mozilla's) doesn't allocate a new VRFrameData object, but does allocate new objects for all the properties if you touch them, so you're only avoiding generating garbage if you don't examine your returned VRFrameData at all. Now maybe that's not how this is meant to be implemented? Per #197 the spec doesn't define how getFrameData works... |
I would like to clarify if this discussion relates only to WebVR 2.0 or if anyone intends on amending WebVR 1.1. Could we ship with the existing WebIDL in 1.1 and apply any changes resulting from this discussion to WebVR 2.0? |
I agree with @kearwood that we should address this issue one way or another as part of WebVR 2.0. FWIW, Edge allocates all subobjects (matrices and VRPose) of VRFrameData when you construct it. Zero memory allocations happen inside of getFrameData. |
I'll also add that I am OK with changing getFrameData to return a new object if others feel strongly about it. Since we're likely going to have getFrameData take parameters in the future, the returned object is going to have a more temporary status moving forward. |
After a lot of swaying between positions I'm with Rafael here. Sticking closer to the web platform norms feels better at this point. If this is something that presents an unreasonable burden for anyone, I would strongly encourage you to reply with traces/logs/graphs to demonstrate the specific issue, at which point it can be addressed more concretely. |
@bzbarsky I don't think V8 has the same kind of lazy allocation infrastructure that SpiderMonkey does. |
SpiderMonkey doesn't have any interesting lazy allocation infrastructure I'm aware of, so I'm not sure what you mean.... |
This was resolved a long time ago in WebXR (We're returning an new XRDevicePose when the equivalent function is queried.) We still need to be sensitive to garbage creation, but I don't think keeping this issue open helps that cause. Closing now. |
We don’t have this pattern on the web today and I don’t think that we should introduce it.
I can’t see a good reason to have this API over one that returns a new object each time. Both versions of the API require the same amount of mem-copying from what I can see.
E.g.
A
null
return value could represent the false value in the spec today.The text was updated successfully, but these errors were encountered: