-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Coordinate systems and reference frames to WebVR #149
Add Coordinate systems and reference frames to WebVR #149
Conversation
Currently, the WebVR standard assumes that at any given time there is a single world space frame of reference (which can be reset by resetPose). While this is a simple model that corresponds to how developers are used to thinking about things, it does not map 1:1 to how inside-out-trackers see the world. Specifically, as you move away from the origin an inside-out-tracker has less information available to accurately locate where that original "world origin" is in relation to the HMD current position. For that reason placement of objects can start showing precision issues and drift. To solve this, we would need to reason about multiple frames of reference in some way, so that apps can more accurately render their experiences as users move away from the original origin. This pull request represents an path to explicitly representing to developers the "squishy" nature of the tracking technologies. In this proposal there is no special "blessed" world space frame of reference. The user can create any number of frames of reference of various kinds, and will explicitly decide which one they're using for the experience they are trying to build. They will then supply this frame of reference whenever querying for transform data (such as getFrameData). In the near future we expect that this will then be extended to include other types such as anchors to specific locations in the world and surface reconstruction meshes. One benefit of this system is that it maps very directly to inside-out-tracking algorithms, which reduces the risk that we over-simplify the mental model for developers. It also means we can support truly large scale applications where the user would move far from their original position (e.g. Pokemon Go in MR). In addition, it means that things like the stage can be naturally expressed as yet another frame of reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good! Left some comments, with one really big concern about gamepads. Please excuse some bikeshedding on the names. :)
<dfn>Face-locked</dfn> | ||
Content that is not related to the user's environment. Regardless of the user changing orientation or position, the content stays at the same place in the user's field of view. | ||
|
||
<dfn>Head-locked</dfn> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we just use one of these terms to prevent confusion. (Mild preference for "Head-Locked" on my part)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree with this; is there some difference in face-locked vs head-locked somewhere else? Aside from that there may be people are using both of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen both terms used, so I wanted to be clear that they are referring to the same concept to reduce confusion. That said, I'm happy to have the definition on "head locked" rather than on "face locked" since the former is more commonly used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to reduce confusion maybe tag onto the end of the head locked description "This is sometimes referred to as 'Face-locked'". That way we establish that they're the same concept, but also that we will be using a single canonical term to describe it.
intent and often does not provide the precision necessary for high-quality VR/MR. The WebVR API provides purpose-built interfaces | ||
to VR/MR hardware to allow developers to build compelling, comfortable VR/MR experiences. | ||
|
||
## Terminology ## {#intro-terminology} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this section has been a long time coming. Thanks! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work here! 👍
<dfn method for="VRDisplay">createAttachedFrameOfReference()</dfn> | ||
Creates a new {{VRAttachedFrameOfReference}} for the {{VRDisplay}}. The returned frame of reference's {{VRCoordinateSystem}} should be supplied to {{getFrameData()}} for 3DOF experiences (such as 360 video) and as a fallback for other experiences when positional tracking is unavailable. | ||
|
||
While the returned {{VRAttachedFrameOfReference}} is body-locked, neck-modeling may be included and, as such, {{VRFrameData}} objects filled in by calls to {{getFrameData()}} using the {{VRAttachedFrameOfReference}}.{{VRAttachedFrameOfReference/coordinateSystem}} MAY include position information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would also encourage adding something to the effect of "Use of a VRAttachedFrameOfReference may provide power savings on some devices relative to using a VRStationaryFrameOfReference or VRStageFrameOfReference" to give devs a little more motivation to use this when applicable rather than always defaulting to using a VRStationaryFrameOfReference and stripping positional data out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. Encouraging dev's to do the right thing by making it explicitly in their best interests.
* sitting-space experiences. | ||
*/ | ||
void resetPose(); | ||
VRStageFrameOfReference? createStageFrameOfReference(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you're inheriting our previous name here, but 'stage' feels really awkward to me in this context. Not sure what a better one would be, though. "FloorLevel" seems not terrible? I'll give it some more thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to suggestions!
* the current frame. | ||
*/ | ||
boolean getFrameData(VRFrameData frameData); | ||
VRAttachedFrameOfReference createAttachedFrameOfReference(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this terminology comes from Win Holographic, and I'm not really opposed to it, but it seems a little odd to me that we go out of our way to define things like "Body Locked" and "3DOF experience" above, and then use an unrelated term here. Could this maybe be create3DOFFrameOfReference
? Reads a little weird. :P I mostly want to cut down on the amount of jargon we produce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I hear ya. The concern I have with saying 3DOF is that there are positional elements when neck modeling is included so it's not actually 3DOF. Open to discussion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the term "frame of reference" could also be explained in the terminology list above. I know, it does not solve the naming issue but I think it might be useful for developers (if we agree upon the name "frame of reference" and its need).
readonly attribute boolean hasPosition; | ||
readonly attribute boolean hasOrientation; | ||
|
||
boolean getPose(VRCoordinateSystem coordinateSystem, VRPose pose); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so here's the big issue for Gamepad that I still don't have a good answer for: A gamepad may have a pose and not be related to a VRDisplay
at all. This covers basically anything with a gyro in it (Like the PS4 or Wii controllers). They'll need a way to report values in their own CoordinateSystem, which will likely look like an Attached one but it's really hard to say.
Like I said, I don't have a good answer for this. Suggestions welcome!
|
||
<pre class="idl"> | ||
interface VRStageFrameOfReference { | ||
readonly attribute VRCoordinateSystem coordinateSystem; | ||
|
||
readonly attribute float sizeX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we felt sizeX and sizeZ were appropriate here? I have an urge to change it to width and depth while we're mucking around anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, how do folks feel about defining the stage boundaries as a polygon rather than a rect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually wondering about that, especially after seeing Oculus' guardian system. I think it would be a good idea to expose a boundary polygon (defined as 2D points at floor level), but for developer's sanity maybe also provide a simple quad that fits within that as the prescriptive play area.
Worth noting, however, that the boundary polygon could be a hell of a fingerprinting data source.
|
||
<dfn attribute for="VRStageParameters">sizeX</dfn> | ||
<dfn attribute for="VRStageFrameOfReference">sizeX</dfn> | ||
Width of the play-area bounds in meters. The bounds are defined as an axis-aligned rectangle on the floor. The center of the rectangle is at (0,0,0) in standing-space coordinates. These bounds are defined for safety purposes. Content should not require the user to move beyond these bounds; however, it is possible for the user to ignore the bounds resulting in position values outside of this rectangle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're here can we add a note that this may be zero if the bounds aren't known?
Width of the play-area bounds in meters. The bounds are defined as an axis-aligned rectangle on the floor. The center of the rectangle is at (0,0,0) in standing-space coordinates. These bounds are defined for safety purposes. Content should not require the user to move beyond these bounds; however, it is possible for the user to ignore the bounds resulting in position values outside of this rectangle. | ||
|
||
<dfn attribute for="VRStageParameters">sizeZ</dfn> | ||
<dfn attribute for="VRStageFrameOfReference">sizeZ</dfn> | ||
Depth of the play-area bounds in meters. The bounds are defined as an axis-aligned rectangle on the floor. The center of the rectangle is at (0,0,0) in standing-space coordinates. These bounds are defined for safety purposes. Content should not require the user to move beyond these bounds; however, it is possible for the user to ignore the bounds resulting in position values outside of this rectangle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
|
||
// Render to the right eye's view to the right half of the canvas | ||
gl.viewport(canvas.width * 0.5, 0, canvas.width * 0.5, canvas.height); | ||
gl.uniformMatrix4fv(projectionMatrixLocation, false, frameData.rightProjectionMatrix); | ||
gl.uniformMatrix4fv(viewMatrixLocation, false, frameData.rightViewMatrix); | ||
drawGeometry(); | ||
drawFunction(); | ||
|
||
// Indicate that we are ready to present the rendered frame to the VRDisplay | ||
vrDisplay.submitFrame(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here's a fun question, though it seems Windows Holographic must already handle this: I render my scene using N different CoordinateSystems. Maybe a couple of stationary ones for objects around my play space and an attached one for the UI because reasons. When it comes time to call submitFrame
what do I feed back to the VR API for reprojection purposes?
I think it actually works out fine on most devices that I'm familiar with, because they won't actually be reporting N different poses. They'll report one internally and the various CoordinateSystems will just be transforms from that, so the one internal one will be what's used as the basis for reprojection. Do we feel like that'll hold true for every device type, though?
|
||
<pre class="idl"> | ||
interface VRAttachedFrameOfReference { | ||
readonly attribute VRCoordinateSystem coordinateSystem; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want the ability to call reset()
on the FrameOfReferences to reorient the CoordinateSystem using the users current orientation as the new forward vector. The complication is that sometimes it won't be resettable. For example: Resetting a Vive's room-scale CoordinateSystem would be a no-op (always oriented to the room), and I've just found out that resetting a Daydream device's orientation programatically is forbidden (Always done with a user gesture on the controller) BUT if you use the same phone with a Cardboard harness you resetting is just fine. :P
At it's most basic supporting that feature would probably mean including a readonly boolean canReset
attribute and void reset()
method. Maybe you could also argue that you can implicitly reset if you just call create___FrameOfReference
again, and thus you don't need the method? Not sure how I feel about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I considered having a reset() function and ended up removing it because I didn't want to have two ways of accomplishing the same end result.
If we do just use create___FrameOfReference() to function as a "reset", do you have a scenario concern about the "origin" of the new FrameOfReference being the same as the previous one on Daydream devices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not concerned about multiple instances of a FoR having the same origin. That'll implicitly be the case for a room-scale FoR anyway. I guess what I'm most interested in is finding a reasonable way to communicate to the developer whether or not a certain action will reset the origin or not, so they can design the UI accordingly.
Data returned in calls to {{VRDisplay}}.{{getFrameData()}} using the reference frame's {{VRCoordinateSystem}} will be relative to that fixed orientation and may also include position data if the {{VRDisplay}} performs neck-modeling. | ||
|
||
<pre class="idl"> | ||
interface VRAttachedFrameOfReference { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Attached and Stationary FrameOfReference interfaces are identical at this point. Why not just have a single VRFrameOfReference that's returned by both? And then the StageFoR could inherit from it.
I'm glad to see this. I've been thinking about this very thing lately, as I've been looking at WebVR. I'm going to jump in here (even though I will say up front that I'm just "coming up to speed on this") because I really want to be part of this discussion. My group at Georgia Tech been working with AR (what the MS folks would call MR) and the web for a while (see the current version of argon.js, our AR framework, at @argonjs and http://argonjs.io on github), and have gravitated to making Frames of Reference a very explicit, first class concept that developers use to specify where things are in the world, for similar reasons mentioned here. I've just joined Mozilla to work on AR, and I've just started contemplating if the best route to supporting AR/MR on the web would be to extend WebVR to support AR (MR). This proposal seems like a great step toward that. In support of these suggested changes, I will say that we've also found it really important to provide an abstraction to separate the local coordinate systems used for rendering (e.g., what is the origin of the graphics system) from the world reference frames, especially when pursuing examples that use full geospatial data (e.g., the mentioned "Pokemon GO for MR" example). We've gone "full geospatial" in @argonjs, by leveraging the math libraries from @AnalyticalGraphicsInc cesiumjs software, something that turned out to be a very powerful approach (in contrast, say, to focusing on having the programmer/developer define their own local coordinate frames for their application content). Considering what it would mean to support full-blowing geospatial coordinate frames may be something you want to consider here, if we want to start talking about adding AR/MR concepts to WebVR. I'm not suggesting using geospatial coordinates, but rather thinking about what it would take to let apps work with them cleanly, if they want. It may be that this should be managed at a higher layer (e.g., via a library like argon.js, which builds on cesium.js, and which could manage the WebVR frames of reference); but you should decide if that's the approach you want. The ability to create many frames, use one as the local rendering frame, and then find out the transforms between that frame and other frames is essential. So, the suggestions here are really good. Especially when we start mixing very different frames of reference for the user and for content (e.g., geospatial frames of reference, local frames of reference from inside out or outside in tracking, and frames of reference used by other kinds of sensing (e.g., computer vision tracking like PTC's Vuforia, etc)), having more explicit queries to determine if one frame is known relative to another (the "getTransformTo" on VRCoordinateSystem suggestion) will end up being a core capability to make apps work robustly. Here's an example of an issue when doing large-scale AR/MR. We eventually decided to not have the programmer specify the coordinate frame to use as a the local origin for graphics rendering, but rather to have the system pick it. We want to support experiences over a very large scale, not just rooms or buildings, but experiences that may run as the user moves through a town or city, or even travels across the country. To support this, we have the system decide when to "recenter" the "local coordinates used by the system for rendering and math" (e.g., the local euclidean system) at a new geospatial location, and have the programmer react when it changes. When you start working at geospatial (objects on the earth, likely represented in ECEF coordinates, or even off the earth, such as near-earth satellites or other planetary bodies) you need to keep the rendering local, but still be able to represent the objects in their natural frames. I'm not sure the best way to manage that in WebVR. If you want a VR or MR app that will work as the user travels (kids playing "Pokemon GO for MR" in the backseat of the car on a road trip, or a VR app that lets you travel around a city or country) this will be a concern. Forcing each programmer to decide when to move their local frame of reference based on things like accuracy issues (do I need to get my head pose relative to a new frame of reference now?) seemed impractical to us. Certainly, devices like Hololens will have to deal with this as they move from working in "building scale" to working in "world scale" smoothly. |
An experience that utilizes knowledge of the floor plane and encourages users to walk around within specific bounds. An example of this category of experience is CAD modeling which allows a user to walk around the object being modeled. | ||
|
||
<dfn>World-scale experience</dfn> | ||
An experience that takes advantage of the ability to walk anywhere without bounds. In such experiences, there is no single floor plane. An example of this category of experience is turn-by-turn directions within a multistory building. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may want to differentiate between "buidling-scale" and "world-scale". As we move beyond "a single room with a single ground plane" into "a local world that is more complex" there are issues to be dealt with. But, a local 3D coordinate system is still sufficient to relate things to each other and do rendering. As you move to geospatial coordinates (true world-scale), the use of a local coordinate system becomes difficult. For example, a natural coordinate system for the world is ECEF (earth-centered earth-fixed), which has the "ground plane from the user's viewpoint" not sitting on the X/Z plane with Y up (rather the ground plane is a tangent plane to the earth). So, perhaps world-scale isn't the right term here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to see discussion of including base-level support for working in both geospatial and 3D computer graphics coordinate systems. I expect that there are plenty of good use cases where people would like to bring data from geospatial databases into a VR scene.
X3D resolved to use ECEF (WGS84) as the base world coordinate system with an optional origin shift due to precision issues while maintaining a local Y rotated up coordinate system to support standard navigation functionality in the browser. We also realized that there are needs for the scene author to have the ability to work in both coordinate systems. Perhaps WebVR could have a VREarthFrameOfReference
with a .getTransformTo()
method for getting the transformation matrices between the coordinate systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things Cesium does is provide a convenient way of asking for the East-North-Up coordinate system on the surface of the WGS84 elipsoid at a point on the surface (which is how we get a reasonable local coordinate system in @argonjs.) The downside is it's not a cheap calculation, so you don't want people asking for this on a regular basis. But, yes, a VREarthFrameOfReference
(ECEF) would be a nice extension. Especially since, once you can go between ECEF and WebVR coordinates, you can then choose to use libraries like cesium.js to do much more complex things in geospatial coordinates.
@@ -730,6 +822,15 @@ partial interface Gamepad { | |||
<dfn attribute for="Gamepad" id="gamepad-getvrdisplays-attribute">displayId</dfn> | |||
Return the {{VRDisplay/displayId}} of the {{VRDisplay}} this {{Gamepad}} is associated with. A {{Gamepad}} is considered to be associated with a {{VRDisplay}} if it reports a pose that is in the same space as the {{VRDisplay}} pose. If the {{Gamepad}} is not associated with a {{VRDisplay}} should return 0. | |||
|
|||
<dfn attribute for="Gamepad">hasPosition</dfn> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a thought: Seems like the concepts of having a position/orientation/etc are something that is actually tied into the FrameOfReference? The attached coordinate system has no position, while the others do, etc. Would it be reasonable to move these capability bits onto the FrameOfReference, and then make it so that the devices capabilities are inferred based on the FramesOfReference that it supports? that would eliminate the need to awkwardly patch these values into the Gamepad, at least. Or maybe I'm overthinking it. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having similar thoughts. When reviewing getPose() for instance, there is this magic between the Gamepad and the VRCoordinateSystem of the VRFrameData...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, a "pose" requires both position and orientation. When one isn't available, it's pretty common for a system to use a default (identity, for example). Having a flag that says which of these is "valid" would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're at it, I'd love to see all the pose() values we report from sensors having some accuracy estimates associated with them. For example, you could use them to estimate how precise the alignment of a virtual and physical thing is (we did this years ago for AR, it was super useful when you start getting out into real spaces, see http://www.cc.gatech.edu/projects/ael/projects/accounting.html)
The {{Gamepad}}.{{Gamepad/hasOrientation}} attribute MUST return whether the {{Gamepad}}'s orientation is capable of being tracked. | ||
|
||
<dfn method for="Gamepad">getPose()</dfn> | ||
Retrieves a {{VRPose}} in the supplied {{VRCoordinateSystem}} for the current {{VRFrameData}}. This function will return false if the Gamepad's pose cannot be expressed in the supplied {{VRCoordinateSystem}} and will return true otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gamepad's can gain value from being tracked at a faster refresh rate or separately from the HMD in some cases allowing for better embodied presence. Is it correct then to state that the Gamepad's pose must be related to the current VRFrameData?
Maybe this is how it has to be for Web VR to make sense though, some normalization process has to occur otherwise having Gamepad's tracked in completely different ways from the HMD and maybe other tracked objects would be hard to reason about.
void resetPose(); | ||
VRStageFrameOfReference? createStageFrameOfReference(); | ||
|
||
boolean getFrameData(VRCoordinateSystem coordinateSystem, VRFrameData frameData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you moved the create methods into an option on the new VRFrameData() constructor, then each VRFrameData could carry a read-only value for its frame of reference. I recognize that frame of reference has to carry implementation specific data internally to make sense and thus its object identity is important. If it weren't we could reduce it down to always an enumeration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd still need a way to query what frames of reference the VRDisplay supports.
<dfn method for="VRDisplay">createAttachedFrameOfReference()</dfn> | ||
Creates a new {{VRAttachedFrameOfReference}} for the {{VRDisplay}}. The returned frame of reference's {{VRCoordinateSystem}} should be supplied to {{getFrameData()}} for 3DOF experiences (such as 360 video) and as a fallback for other experiences when positional tracking is unavailable. | ||
|
||
While the returned {{VRAttachedFrameOfReference}} is body-locked, neck-modeling may be included and, as such, {{VRFrameData}} objects filled in by calls to {{getFrameData()}} using the {{VRAttachedFrameOfReference}}.{{VRAttachedFrameOfReference/coordinateSystem}} MAY include position information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should assume that a developer will use all possible combinations of a frame of reference with all possible types of devices and describe explicitly the fallbacks. The line on 197 (the last bit of it) implies that it should be "supplied" as a fallback, however, that implies that the author had to do some work to choose this attachment. Should't it just be that they create the attachment they want and then we have some guarantees about how an attachment of that sort works no matter what the underlying device supports?
Thoughts that grew out of discussing this proposal with zSpace today: I'm still generally concerned about how to make this work with most gamepads, because they seem to fall into two camps:
Of course, if your device doesn't have any external frame of reference there's only so much it can report. Basically just the deltas from it's last state. Which... sounds an awful lot like the proposed So given that, does it make sense that the
Or:
This would affect where we handle pose resetting, since it would no longer be practical to do it on an object like this. In any case, any other FoR would be queried from the device somehow because they're all going to have something to do with the hardware's capabilities and how it views the world, but this one is "special" and can be used with minimal hassle anywhere. I feel like I'm maybe overlooking something here. Thoughts? |
The reference frame used isn't just informational, it can also affect behavior directly. I think we should make sure that the spec covers this distinction. For example, on an OpenVR (Vive) system, requesting seated relative poses modifies the way that the Chaperone warning system works. As long as your headset remains close enough to the seated origin, the Chaperone display is suppressed. This is useful for cases where the seated position is at the edge of the roomscale stage area, or possibly even outside it. Using standing mode in this scenario would be very annoying if it permanently shows chaperone boundaries in your field of view. In this case, the underlying implementation should be able to call I think this distinction is covered if poses are specifically requested for an appropriate coordinate system, but would be difficult to implement if it's based on providing conversion matrices where inferring the intended usage scenario may be difficult. |
This concept has been incorporated into the 2.0 explainer for a while now, so I'm closing this as a matter of housekeeping. |
Currently, the WebVR standard assumes that at any given time there is a single world space frame of reference (which can be reset by resetPose). While this is a simple model that corresponds to how developers are used to thinking about things, it does not map 1:1 to how inside-out-trackers see the world. Specifically, as you move away from the origin an inside-out-tracker has less information available to accurately locate where that original "world origin" is in relation to the HMD current position. For that reason placement of objects can start showing precision issues and drift.
To solve this, we would need to reason about multiple frames of reference in some way, so that apps can more accurately render their experiences as users move away from the original origin. This pull request represents an path to explicitly representing to developers the "squishy" nature of the tracking technologies. In this proposal there is no special "blessed" world space frame of reference. The user can create any number of frames of reference of various kinds, and will explicitly decide which one they're using for the experience they are trying to build. They will then supply this frame of reference whenever querying for transform data (such as getFrameData). In the near future we expect that this will then be extended to include other types such as anchors to specific locations in the world and surface reconstruction meshes.
One benefit of this system is that it maps very directly to inside-out-tracking algorithms, which reduces the risk that we over-simplify the mental model for developers. It also means we can support truly large scale applications where the user would move far from their original position (e.g. Pokemon Go in MR). In addition, it means that things like the stage can be naturally expressed as yet another frame of reference. A risk is that we could be introducing some relatively mind-bending concepts in core areas of the API.
This pull request is intended as a starting point for this conversation, and we’re looking forward to feedback from both consumers of the API, other implementers, and HMD makers.