Should createAnchor be an API on XRSession than XRFrame #37

raviramachandra · 2020-04-17T17:40:34Z

From the spec:
partial interface XRFrame {
Promise createAnchor(XRRigidTransform pose, XRSpace space);
};
Should it not be XRSession ??

bialpio · 2020-04-17T20:18:31Z

I deliberately moved it from XRSession to XRFrame to make it clearer to the application that it's using data that is potentially only valid during this specific frame to create an anchor. For example, if the application decided that it wants to put an anchor 3m above local space because it computed it using some state relevant only within an XRFrame (since that's when it can obtain poses from XR), we will create the anchor that is 3m above the local space, but at the time of anchor creation, the computation that the application used to come up with desired anchor location may no longer be valid.

I'm not sure how reasonable the above concern is (and if putting the method on XRFrame actually sends this signal), I think we might be fine with exposing this method on an XRSession instead. Note that we also have XRHitTestResult that exposes createAnchor() & the XRHitTestResult instance is only valid during an XRFrame.

raviramachandra · 2020-04-17T20:51:07Z

For clarification:
Let us consider "local" refspace. During RAF, the typical flow will be to query viewer pose via: let viewer_pose = frame.getViewerPose(refSpace) and lets say at this point, the viewer pose is at 45 deg left of origin. At this point if I wanted to create an Anchor at 3m above and 1m front, then I would translate let anchor_pose = viewer_pose.translate({0,3,-1}} and then call frame.session.createAnchor(pose, xrrefspace) and the anchor would be created at 45 deg left, 3m above and 1m away from the origin of the local ref space. Is my understanding correct ?

W/ Specification: Do you have other specifications that move the API to a different partial interface than where the actual API will be implemented. I am not sure. May be @toji or Manish might know it better.

bialpio · 2020-04-18T00:42:55Z

For clarification: (...)

Yes, the entire flow that you describe here is how I imagine it will work. The point I'm trying to make is that the position and the orientation of the anchor may depend in some way on something that the application computed within rAF callback - the result of such computation becomes outdated the moment rAF callback completes, but we will still use it for anchor creation & assume it's supposed to be interpreted relative to the space that was passed to the call.

It actually makes me wonder if the example I'm giving is not actually arguing more in favor of moving the method to XRSession, exactly as you are proposing. @toji, @thetuvix, @Manishearth - do you have any opinions here?

Do you have other specifications that move the API to a different partial interface than where the actual API will be implemented.

I don't think I understand - if we think the method should be implemented in a different partial interface, we should just move it there in the spec and update the algorithms to account for the change. Is this what you meant?

raviramachandra · 2020-04-18T01:18:16Z

This is my understanding:
if you specify that an API foo is implemented on partial interface MyInterface, I would expect the Javascript to call: MyInterface.foo(). But in the specification you say createAnchor is a method exposed by XRFrame, but in reality createAnchor is a method exposed on XRSession interface. That is what confuses me.

Manishearth · 2020-04-20T18:42:04Z

I think the XRFrame must be involved in the API somewhere, but I think XRSession.createAnchor(frame, space) might be better. Maybe. I'm not convinced of this myself.

But in the specification you say createAnchor is a method exposed by XRFrame, but in reality createAnchor is a method exposed on XRSession interface.

No, in reality it's a method on XRFrame right now.

bialpio · 2020-04-20T20:18:39Z

I think the XRFrame must be involved in the API somewhere, but I think XRSession.createAnchor(frame, space) might be better. Maybe. I'm not convinced of this myself.

I'm now on the fence about this ("frame must be involved in the API") - on one hand, in the example I wrote about above it may be important to make it clear to the app that if it computed some data derived from things valid only during an XRFrame, we will still use that data as-is even if the anchor creation happens in the future. But, exactly the same example can be used to argue that the API should be exposed on XRSession - none of the steps taken by the algorithm depend on the state that's valid within an XRFrame (i.e. the frame is only needed so that we can grab a session from it).

raviramachandra · 2020-04-20T20:22:28Z

I think the XRFrame must be involved in the API somewhere, but I think XRSession.createAnchor(frame, space) might be better. Maybe. I'm not convinced of this myself.

Okay.

But in the specification you say createAnchor is a method exposed by XRFrame, but in reality createAnchor is a method exposed on XRSession interface.

No, in reality it's a method on XRFrame right now.

I think I might have been confused by this example: line 328
view-source:https://storage.googleapis.com/chromium-webxr-test/r695783/proposals/phone-ar-plane-detection-anchors.html
I was looking into Chromium81.x. Apologies, I did notice now that it was moved:
https://chromium-review.googlesource.com/c/chromium/src/+/2051394

@bialpio sorry for this confusion. I do agree that anchors' creation needs some kind of snapshot data in a RAF. Even though their lifetimes/relevance dont' match other Frame Specific Data. Anchors in theory will have it's lifetime match that of a session. But I don't have a strong opinion.

Manishearth · 2020-04-20T21:29:36Z

none of the steps taken by the algorithm depend on the state that's valid within an XRFrame

Aren't they? We need frames to get pose data for spaces.

bialpio · 2020-04-20T22:34:04Z

none of the steps taken by the algorithm depend on the state that's valid within an XRFrame

Aren't they? We need frames to get pose data for spaces.

The way it's currently specified, we only need to grab the space's native origin and treat the application-provided rigid transform as expressed relative to that - the current pose of the space is not relevant to the algorithm. In fact, the pose may already be outdated the moment JS gets its hands on it (in Chrome, the device process will not wait for the renderer/JS app to finish with its rAF callback - the device-side state may be updated while the rAF is still executing).

Speaking of anchor creation algorithm, it should actually be described that the rigid transform passed in to it is expressed relative to effective origin (opened issue #39), but that is still not dependent on XRFrame - origin-offset of a space is immutable once the space is created.

Manishearth · 2020-04-20T23:05:10Z

Ah, makes sense!

toji · 2020-04-22T05:43:18Z

In reading through this I realized that the anchor's API may not work the way I had been expecting in my head, and clarifying that would help me decide whether or not a frame needs to be involved.

Let's say I created one of these transform-based anchors relative to an XRSpace that's mobile. For example, at Y = 0.2 above the gripSpace of a controller. Then I rotate that controller 90 degrees, and query the anchorSpace's pose. What's the expected result?

The anchor space should stay locked in the air above where the controller was at creation time. This would suggest the timing of the anchor creation is significant, as it should be placed relative to where the space was at the time and placing it later would cause unexpected results. As such, it should be a method on XRFrame.
The anchor space should track with the controller as it rotates, and would appear 0.2m to the side now. This would suggest that the actual timing of the anchor creation doesn't really matter since it's always going to be a static transform relative to another space. As such, is should be a method on XRSession.

blairmacintyre · 2020-04-22T14:17:16Z

@toji only 1 makes sense to me. The spaces were there to provide a convenient frame of reference to the specify the offset in.

And I agree it needs to be on the XRFrame.

@bialpio

The way it's currently specified, we only need to grab the space's native origin and treat the application-provided rigid transform as expressed relative to that - the current pose of the space is not relevant to the algorithm. In fact, the pose may already be outdated the moment JS gets its hands on it (in Chrome, the device process will not wait for the renderer/JS app to finish with its rAF callback - the device-side state may be updated while the rAF is still executing).

Speaking of anchor creation algorithm, it should actually be described that the rigid transform passed in to it is expressed relative to effective origin (opened issue #39), but that is still not dependent on XRFrame - origin-offset of a space is immutable once the space is created.

That does not make sense to me. yes, it's outdated when we get it (same will be true of any platform/implementation that doesn't offer time-travel 😉) BUT knowing exactly which data was used (i.e., the frame) for the spaces is required to allow the platform to do the best job of getting it right.

For example: internally, you know the set of transforms for the frame, so you know what the final anchor position is relative to device coordinates for that frame, and when the anchor is created, you also should be able to estimate (or know) the relationship between that frame's coordinates and the current device coordinates. That will allow the implementation to create the anchor in the correct coordinates. Now, some implementation is free to do "the dumb thing" and just use the current values of the spaces, etc., and ignore any internal changes that might have happened to device coordinates, but that's their choice.

On ARKit, as we move from one frame to the next, the device coordinates may change, and thus the values of the various spaces relative to the world, each other, and any anchors. But knowing what the values were on the frame will let us "do the right thing" with the data when we call the underlying "create anchor" command; assuming we keep the frame data around that was used by the app when it created the anchor.

raviramachandra · 2020-04-22T14:17:41Z

I thought the initial spec of anchors could not do 2 as mentioned by @toji. My understanding is that anchors will "always" be fixed to a feature (like a plane, or hittest result) or arbitrary (floating, but fixed in world space). Tracking controller was not something anchors would not have to do.

raviramachandra · 2020-04-22T14:24:29Z

For example: internally, you know the set of transforms for the frame, so you know what the final anchor position is relative to device coordinates for that frame, and when the anchor is created, you also should be able to estimate (or know) the relationship between that frame's coordinates and the current device coordinates. That will allow the implementation to create the anchor in the correct coordinates. Now, some implementation is free to do "the dumb thing" and just use the current values of the spaces, etc., and ignore any internal changes that might have happened to device coordinates, but that's their choice.

Yes, this is what my understanding is also. Basically at the anchor creation time, we record frame and device transform snapshots and use that to create platform anchors in world space. Of course that anchor can move a bit (at least on our platform) with better understanding of the world. But would not expect to deviate a lot.

blairmacintyre · 2020-04-22T14:31:17Z

Tracking controller was not something anchors would have to do.

Anchors aren't needed for this anyway. If you want something "above and to the left of a space, like a controller" just render it above and to the left of the space each frame. I don't understand what (2) means aside from that.

raviramachandra · 2020-04-22T14:44:26Z

Tracking controller was not something anchors would have to do.

Anchors aren't needed for this anyway. If you want something "above and to the left of a space, like a controller" just render it above and to the left of the space each frame. I don't understand what (2) means aside from that.

That was a typo. I meant the other way. Fixed the comment.

toji · 2020-04-22T15:44:02Z

Thanks Blair and Ravi. That matches what I was expecting, but there were a couple of things in this thread that made me question the intent. I appreciate the clarification!

And to reiterate: Since spaces move relative to one another and the environment, placing an object relative to a space needs to know the time at which you wanted to place it. XRFrame is our APIs timing mechanism, so that's where the method should go.

bialpio · 2020-04-22T19:01:19Z

Let's start with clarification, the intended behavior is no. 1 after the anchor got created. The problem is that anchor creation may be requested assuming state at t_0, but the device may have already progressed to t_n. When the creation request is in flight, things may be changing, and my approach was to spec it in a way that would take the most recent data into account (i.e. "assume the pose is relative to some effective origin and create the anchor there").

Now, some implementation is free to do "the dumb thing" and just use the current values of the spaces, etc., and ignore any internal changes that might have happened to device coordinates, but that's their choice.

One person's "dumb thing" may be another person's treasure. :) Let's say that you want to create a free-floating anchor 2 meters above local-floor space. If we were to compute the "snapshot" of the anchor position & attempted to create an anchor not taking into account potential changes in the space, and it so happened that the floor estimation changed, would it still make sense to create an anchor using that snapshot?

On the other hand, if the application's intended behavior is to just create the anchor using the snapshot, it could simply compute the anchor's pose w.r.t. a space that is more stable (for example, compute the desired anchor's pose in local space and use that to create the anchor) & issue anchor creation request with that pose and local space. In case local space was reset, the app could decide that anchor creation requests that were in flight when the reset happened should be ignored (because, per spec draft, we will use latest information about a space to create anchors) and create new ones.

On ARKit, as we move from one frame to the next, the device coordinates may change, and thus the values of the various spaces relative to the world, each other, and any anchors. But knowing what the values were on the frame will let us "do the right thing" with the data when we call the underlying "create anchor" command; assuming we keep the frame data around that was used by the app when it created the anchor.

On ARCore, the world space is not stable from frame to frame, so we cannot just re-use poses computed on old frames and assume they'll work. Additionally, I think there is no way for us to learn about the transform between device coordinates from frame_1 to frame_2, even if I keep the frame data around. ISTM that the only choice left is to use something that the device is able to track and describe poses relative to it, and this aligns with WebXR's notion of native origins perfectly.

blairmacintyre · 2020-04-22T19:55:06Z

We're probably thinking similar things but talking across each other.

I think we all agree that all anchor creation is done in some space. The issue really is that some spaces move in the world over time (e.g., hands, viewer, eventually a tracked object, etc), and some are meant to be stationary but will change over time as well because the underlying system refines it's knowledge (e.g., your local-floor).

When I say "create an anchor 2m above local-floor" the system could do two different things:

just create it 2m above the current local-floor. If the assumption is local-floor is not moving, just having it's position refined, this is correct (I think).
if I know the difference between what local-floor was when the "2m" was determined and what it is now, I could offset that. But, since the app said "2m above local-floor" it's unclear what they meant, and so this is probably wrong. (did they mean "the point I'm looking at that is 2m above the floor" or an abstract "put it 2m above the thing called the floor" ... if the former, they should have expressed the anchor relative to the viewer, I think).

When I say "create an anchor 2cm above right-hand, the system also has two choices, but now only one is what I'd call correct: create it 2cm above where the hand space was when that 2cm was computed (the frame). In this case, we should be able to compute the difference between where the hand was and where it is, and offset appropriately. But, that requires keeping track of extra data, perhaps, which might be prohibitive or expensive.

In the end, the time and offset differences are pretty small.

As long as we're just doing things relative to things that we keep track of abstractly, like platform anchors (but not the ever-changing global coordinates), the results will be mostly correct. Which, I think, is what you are saying too.

So, I expect many implementations would do "the dumb thing", since it's usually going to be correct (I didn't mean that as pejoratively as it sounds). But, it would be possible to be smarter. For example, create a platform anchor (updating occasionally as needed) and save the relationship between these spaces and the anchor for the frames would give you a good estimate of how things changed. But, it's not clear the end result is worth that effort -- I suspect it depends on the platform.

In the end, making this call explicitly related to the frame gives platforms the option of doing smarter things; decoupling it from the frame prevents it. So, I'd opt for associating it with the frame, even though some current implementation might not benefit.

bialpio · 2020-04-22T21:00:29Z

We're probably thinking similar things but talking across each other.

That may very well be the case. :)

When I say "create an anchor 2cm above right-hand, the system also has two choices, but now only one is what I'd call correct: create it 2cm above where the hand space was when that 2cm was computed (the frame).

I think there's still one correct choice: if the user wanted me to create an anchor at the point where the hand was, they would've computed its pose in a space that is assumed to be stationary across frames (i.e. local space, or maybe local-floor space, both modulo reset events & small adjustments that could happen) and passed that pose along with the space of choice to the request. Since they chose to say something else, we must assume that this is what they meant (customer's always right!). The API in this case is expressive enough to cover both approaches, and we need to trust the user to say what they mean. The user may need to be in an active XRFrame to do the calculations to do the Right Thing that you refer to, but the API can be exposed through an XRSession since it does not care about timings.

In this case, we should be able to compute the difference between where the hand was and where it is, and offset appropriately.

I'm arguing that this is not possible in general in ARCore. I can get 2 poses of the hand from the system at different times, pose_1, relative to world space space_1, and pose_2, relative to world space space_2. There is no way for me to know the relationship between space_1 and space_2, so I cannot do the smart thing and compute hand's pose pose_1 relative to space_2. I can place a native origin (platform anchor) that is supposed to be stationary relative to the world and do the offsetting relative to it, but we already should have such native origin - it's the one backing up local space! So the user can simply say "place anchor relative to local space" and we will automatically do the right thing (by doing the dumb thing!).

(I didn't mean that as pejoratively as it sounds)

I didn't treat it as pejorative (and I have thick skin ;)), I'm trying to argue that it's not really our choice to ignore the internal device coordinate system changes, because I think we have no way of observing them in ARCore. With world space being unstable from frame to frame, we theoretically cannot reason anything about the platform anchor location changes ("it moved 0.1m to the right - does that mean world space moved, or the anchor moved, or a bit of both?"). KIDS ("keep it dumb, stupid") principle is forced on us here, unless I'm missing something. :)

TL;DR: native origins are I think the only things that we can reason about & we need to always assume poses are specified relative to them. Some native origins are assumed to be stationary relative to the world (local, local-floor spaces) - users can use those for more deterministic anchor creation. Others are not stationary (input source spaces) - users would have to compute the poses for anchor creations relative to some stationary native origins (this needs to happen within a frame) if they wanted to have more control over where exactly the anchor will be created.

blairmacintyre · 2020-04-22T23:53:42Z

My closing comment on the previous message essentially said "sure, that's fine, I imagine many platforms will choose to do what you suggest" and "since some platforms might benefit from knowing which frame the data was computed from, we should really associate create anchor with a frame" ...

I'm not arguing that ARCore should or shouldn't do something; just that, some other platform might benefit from the greater knowledge of knowing the frame associated with it. So, given that there isn't a significant downside to associating the method with the frame, why not do that?

bialpio · 2020-04-23T00:15:36Z

I chatted with @toji offline, given the constraints we have, it may make sense to change the signature of the anchor creation method to accept only XReferenceSpaces and keep the current approach of assuming the poses are expressed relative to their effective origins (based on my understanding of ARCore, this is the only viable approach to implement it correctly in Chrome for Android, and in general when operating using concepts from WebXR spec w/o introducing new ones). Additionally, we'd want to disallow creating an anchor relative to a "viewer" space as this is a potential footgun (viewer space may be rapidly changing on frame-by-frame basis and creating anchors relative to it may work differently than apps expect). This way, the apps will be forced to compute the poses relative to spaces that are assumed to be fairly stationary w.r.t. the world. The advantage of this approach is that it's easy to relax it to accept any XRSpaces w/o an API breaking change.

Also, moving the API to be exposed on XRSession feels to me like a good way to signal that the anchor pose does not depend on any particular XRFrame and that the pose will be assumed as expressed relative to the passed in space.

@raviramachandra, @toji, @Manishearth - is this acceptable?
@thetuvix - I'd really like to hear your opinion on this as well!

@blairmacintyre - I just saw your message before hitting "Comment", quick response:
If we were to spec that the pose is interpreted as relative to space's effective origin at anchor creation, I do not think that associating a frame with it would enable other systems to do things differently, so at that point exposing the method on XRFrame changes nothing except sending potentially confusing signal to the app developers.

Manishearth · 2020-04-23T00:16:53Z

This is acceptable to me from an API standpoint. I haven't fully followed this entire discussion and might be less aware of the nuances of anchors so I defer to Blair, Brandon, and others on that point

blairmacintyre · 2020-04-23T00:51:15Z

Additionally, we'd want to disallow creating an anchor relative to a "viewer" space as this is a potential footgun (viewer space may be rapidly changing on frame-by-frame basis and creating anchors relative to it may work differently than apps expect). This way, the apps will be forced to compute the poses relative to spaces that are assumed to be fairly stationary w.r.t. the world. The advantage of this approach is that it's easy to relax it to accept any XRSpaces w/o an API breaking change.

I am not, in the surface, in favor of this, but I might not be thinking about all the possible spaces. If I am using a phone, and want to put an anchor 10cm behind the phone (let's say along the viewer z-axis), how do I express this concisely without using the viewer space?

blairmacintyre · 2020-04-23T00:56:34Z

If we were to spec that the pose is interpreted as relative to space's effective origin at anchor creation, [...]

That is a not really a reasonable choice. The pose should be interpreted relative to the space based on the data the program is using to compute. In other words, the data in the current frame.

If (as a programmer) I react to a click on a button on a controller by saying "create an anchor at the tip of the controller" and the controller happens to be moving, I want the system to do its very best to put the anchor where the controller was when the button was pressed. Not some arbitrary position later.

Arguments based on current implementation limitations of ARCore are not persuasive, TBH. We should draft a spec that is aims to be correct. We can put in the spec description that different platforms may not be able to achieve this (in fact, many won't), but shouldn't hamstring future implementations this way.

toji · 2020-04-23T04:09:16Z

Arguments based on current implementation limitations of ARCore are not persuasive, TBH.

I agree with this fully, but in my (admittedly brief) time researching the various AR APIs out there it appeared to me none of them actually offered any way of controlling the timing of the anchor creation, and certainly none allowed you to specify that the anchor should be created against a past world state (which a strictly frame-based API would require for at least some browser architectures.) In principle I feel like specifying the anchor in terms of a specific XRFrame is ideal because it would allow the greatest level of developer control but I'm worried that the realities of the native runtimes would make it a misleading API shape since we don't seem to be able to offer those guarantees.

(I would love to be wrong about this, so if anyone that knows the APIs better can inform us otherwise I'd love it hear it!)

If I am using a phone, and want to put an anchor 10cm behind the phone (let's say along the viewer z-axis), how do I express this concisely without using the viewer space?

The idea would be that you'd query the viewer's pose in the apps primary reference space, compute the offset XRRigidTransform manually from that, and create the anchor in the primary reference space with that transform. It's a bit more work on the part of the page author, but would consistently yield more predictable results if the assumption from above holds that we can't have strict timing of the anchor creation. In that case trying to create an anchor in a potentially fast-moving XR space is going to be more observably "laggy".

On the other hand if we figure out how to have a stronger creation timing guarantee then it's a moot point and there's not much reason to restrict the spaces that can be used.

bialpio · 2020-04-23T18:34:43Z

If I am using a phone, and want to put an anchor 10cm behind the phone (let's say along the viewer z-axis), how do I express this concisely without using the viewer space?

Something like:

let viewerSpace = ...;
let localSpace = ...;

// in rAFcb:
let offsetViewerSpace = viewerSpace.getOffsetReferenceSpace(
  new XRRigidTransform(
    new DOMPoint(0, 0, 0.1)));  // let the UA do the math!
let offsetPoseInLocalSpace = xrFrame.getPose(offsetViewerSpace, localSpace);

xrFrame.createAnchor(localSpace, offsetPoseInLocalSpace.transform);
// ... or XRSession ...

That is a not really a reasonable choice. The pose should be interpreted relative to the space based on the data the program is using to compute. In other words, the data in the current frame.

I think the only way we could achieve that is if the API was not promise based, in a single-threaded, single-process application, or maybe if we were to pause the processing on the device-side while rAFcb is running, both of which are not realistic. To me the API is, by its nature, speaking about something that will happen in the future (& it's the future in which the device may already be living), so we need to find a way to talk about things that will happen in the future. The current draft is proposing one way to do so, and given the assumption that the platform is doing the right thing regarding native origins that should be stationary, I think the proposed approach is workable.

Arguments based on current implementation limitations of ARCore are not persuasive, TBH. We should draft a spec that is aims to be correct. We can put in the spec description that different platforms may not be able to achieve this (in fact, many won't), but shouldn't hamstring future implementations this way.

If it turns out that we can do better given a frame once some new devices are available, it will be easy to add anchor creation method on an XRFrame & spec it in a way that takes the frame into account. To me, it sounds like it may not be possible w/o some absolute frame of reference for the device to use - ARCore, ARKit, HoloLens and Magic Leap I think are all in the same boat here (*). But, let's say that there's now "AR with base stations", which knows with high degree of accuracy what internal device coordinate changes have happened & can account for them, so having it expose an API on XRFrame may be beneficial. But then, if the device is 99.9% certain where it is located, it can expose a truly stationary native origins and the current wording of the spec Just Works for them...

(*) I'm making some assumptions here - @Yonet, @thetuvix, @raviramachandra, @grorg & others - please keep me honest!

In that case trying to create an anchor in a potentially fast-moving XR space is going to be more observably "laggy".

@toji - I do not think I understand the "laggy" part, can you explain what you meant?

raviramachandra · 2020-04-25T02:09:33Z

@bialpio these are my thoughts:

Should createAnchors be Promise based (async in nature)?
Yes, we do think createAnchors should be Async and promise based.

How I originally thought Anchors would work on MLOne device ?
For a given Pose and Space, at the time of anchor creation, we would always calculate what the pose is w/ device's native origin in the context of the given frame data (the frame data in use for the calculation may be stale). This is what I call "snaphot" at t_0. This is what we will provide to the underlying native runtimes to use for creation of Anchors. At this time the runtime may choose to do the "simplest" thing and associate a native anchor at the "snapshot" value. Or may do "smarter" things (which I am not aware) to compensate for the "staleness" of the frame data.
Hence, I am slightly leaning towards XRFrame to host the API.

Should the API be changed only to take XRReferenceSpaces ?
I always thought this would make things clearer to the user of the API.

thetuvix · 2020-04-25T02:44:07Z

When calling XRFrame.createAnchor(pose, space), the meaning is always to create a static anchor that remains at a fixed location in the world, per the spec:

The application can create an anchor using one of the 2 ways:

By creating an anchor from frame - created anchor will not be attached to any particular real world object.

By creating an anchor from hit test result - created anchor will be attached to a real world object if the underlying XR device supports it.

When you create an anchor that way, the space is explicitly specified by the app. However, it is not intended to track any nearby physical objects over time.

One person's "dumb thing" may be another person's treasure. :) Let's say that you want to create a free-floating anchor 2 meters above local-floor space. If we were to compute the "snapshot" of the anchor position & attempted to create an anchor not taking into account potential changes in the space, and it so happened that the floor estimation changed, would it still make sense to create an anchor using that snapshot?

The reason that the XRFrame is involved is to time-index the space in case it is a dynamic space that moves every frame, such as the "viewer" reference space or a gripSpace for an XRInputSource. However, that should not introduce any new async IPC concerns beyond what a UA must already solve to correctly implement XRFrame.getPose(space, baseSpace) for any active XRFrame (either the frame from the current rAF or an interpolated historical frame provided for an XRInputSourceEvent).

In the example above, the app did the following:

var created = await frame.createAnchor(new XRRigidTransform({ x: 0, y: 2, z: 0 }, localFloorSpace);

This should have precisely the same behavior as if the app wrote this instead:

var created = false;
var localSpacePose = frame.getPose(localFloorSpace, localSpace);
if (localSpacePose) {
    var transform = localSpacePose.transform;
    created = await frame.createAnchor(new XRRigidTransform({ x: transform.x, y: transform.y + 2, z: transform.z }), localSpace);
}

This would work for either the current frame or a historical input event frame.

I think there's still one correct choice: if the user wanted me to create an anchor at the point where the hand was, they would've computed its pose in a space that is assumed to be stationary across frames (i.e. local space, or maybe local-floor space, both modulo reset events & small adjustments that could happen) and passed that pose along with the space of choice to the request. Since they chose to say something else, we must assume that this is what they meant (customer's always right!). The API in this case is expressive enough to cover both approaches, and we need to trust the user to say what they mean. The user may need to be in an active XRFrame to do the calculations to do the Right Thing that you refer to, but the API can be exposed through an XRSession since it does not care about timings.

I don't follow here. Let's say the app is intending to place an anchor right at the user's hand, where the hand was when the user pressed the trigger. If the app takes the historical XRInputSourceEvent.frame they got from the press event and calls frame.getPose(identityPose, gripSpace), they were very clear - they intend the anchor to be created at the location where gripSpace's native origin was at the historical time represented by frame. Placing it where gripSpace is "now" would be incorrect and would negate the purpose of having getPose on XRFrame. It is the responsibility of the UA to retain any information needed to correctly answer pose queries for a given XRFrame while that frame is "active".

On ARCore, the world space is not stable from frame to frame, so we cannot just re-use poses computed on old frames and assume they'll work. Additionally, I think there is no way for us to learn about the transform between device coordinates from frame_1 to frame_2, even if I keep the frame data around. ISTM that the only choice left is to use something that the device is able to track and describe poses relative to it, and this aligns with WebXR's notion of native origins perfectly.

I'm not sure I follow here - your UA must already solve this problem to correctly implement the "local" space today.

Here is the definition of "local" space from the core WebXR spec:

Passing a type of local creates an XRReferenceSpace instance. It represents a tracking space with a native origin near the viewer at the time of creation. The exact position and orientation will be initialized based on the conventions of the underlying platform. When using this reference space the user is not expected to move beyond their initial position much, if at all, and tracking is optimized for that purpose. For devices with 6DoF tracking, local reference spaces should emphasize keeping the origin stable relative to the user’s environment.

Basically, "local" space is defined to be equivalent to an XRAnchor placed at that "native origin near the viewer at the time of creation" and kept "stable relative to the user's environment".

To correctly implement the WebXR spec, I have thus far expected that an ARCore-backed UA would:

Bind the "unbounded" reference space directly to raw ARCore world coordinates.
- An AR-focused WebXR application that intends to run primarily on ARCore phones, HoloLens headsets and Magic Leap headsets would just use the "unbounded" space, hit-test against the world to create XRAnchors for rendering content, and ignore the "local" space)
Bind the "local" reference space to an anchor created implicitly in the first frame by calling ArSession_acquireNewAnchor with an identity pose.
- A more general WebXR application that just wants to place content 2 meters in front of the user's starting location on all devices can just render content in "local" space directly, creating no anchors.

With that "local" space definition, a UA should then be able to correctly implement XRFrame.getPose(space, baseSpace) for both current and historical XRFrames, and then use the simple translation above to correctly implement XRFrame.createAnchor(pose, space) on top of getPose for any arbitrary XRFrame, space and baseSpace, even if space and/or baseSpace are dynamic spaces such as gripSpace or the "viewer" reference space.

I agree with this fully, but in my (admittedly brief) time researching the various AR APIs out there it appeared to me none of them actually offered any way of controlling the timing of the anchor creation, and certainly none allowed you to specify that the anchor should be created against a past world state (which a strictly frame-based API would require for at least some browser architectures.) In principle I feel like specifying the anchor in terms of a specific XRFrame is ideal because it would allow the greatest level of developer control but I'm worried that the realities of the native runtimes would make it a misleading API shape since we don't seem to be able to offer those guarantees.

For HoloLens, we will implement XRFrame.createAnchor on top of the equivalent xrCreateSpatialAnchor function in OpenXR's spatial anchor extension (currently a vendor extension and on its way to be an EXT cross-vendor extension). The XrSpatialAnchorCreateInfo struct passed to that function takes an XrSpace, an XrPose and an XrTime, all of which are necessary for an app to fully specify an exact fixed location in the world when the XrSpace it passes in is dynamic and moves over time. Our correct UA implementation for XRFrame.createAnchor will be to pass the XrTime associated with the specified XRFrame down as the time when creating the anchor, fulfilling the promise of WebXR's XRFrame.createAnchor API shape.

As a side note, the equivalent method in our previous native API, SpatialAnchor.TryCreateRelativeTo, did not take a time parameter. However, this is because all SpatialCoordinateSystem objects in that API represent static coordinate systems that are fixed in space. Objects like SpatialLocator that track dynamic objects require you to first jump through hoops to tear off a stationary SpatialCoordinateSystem before you could then pass that coordinate system to SpatialAnchor.TryCreateRelativeTo. That was cumbersome and so we learned from that when we were all designing OpenXR and WebXR, ensuring that XrSpace in OpenXR and XRSpace in WebXR can both directly represent dynamic coordinate spaces. The one extra responsibility this places on the app developer is to time-index such spaces when the API they're calling may need to resolve a pose in a dynamic space down to a static location in the world - however, we've found that to be the right trade so far!

blairmacintyre · 2020-04-27T21:30:25Z

Thanks @thetuvix, you've confirmed what I was trying to get at. Appreciate it.

bialpio · 2020-04-27T22:27:43Z

When calling XRFrame.createAnchor(pose, space), the meaning is always to create a static anchor that remains at a fixed location in the world, per the spec: (...)

No arguments here.

The reason that the XRFrame is involved is to time-index the space in case it is a dynamic space that moves every frame, such as the "viewer" reference space or a gripSpace for an XRInputSource. However, that should not introduce any new async IPC concerns beyond what a UA must already solve to correctly implement XRFrame.getPose(space, baseSpace) for any active XRFrame (either the frame from the current rAF or an interpolated historical frame provided for an XRInputSourceEvent).

I think it introduces IPC concerns. XRFrame.getPose() is not async & as long as the frame is active, we have all the data we need in order to return the poses (no need to query the device). But, anchor creation asynchronously calls into a device that is potentially on a different timeline, so we need to have a way of describing poses in terms the device understands. To me that's the reason why native origins exist, and treating poses as expressed relative to native origins is the right way to go here.

In the example above, the app did the following:

var created = await frame.createAnchor(new XRRigidTransform({ x: 0, y: 2, z: 0 }, localFloorSpace);

This should have precisely the same behavior as if the app wrote this instead:

var created = false;
var localSpacePose = frame.getPose(localFloorSpace, localSpace);
if (localSpacePose) {
    var transform = localSpacePose.transform;
    created = await frame.createAnchor(new XRRigidTransform({ x: transform.x, y: transform.y + 2, z: transform.z }), localSpace);
}

What I'm arguing is that it should be a different behavior (& then we could expose the API on XRSession). If we make it behave the same, our API becomes less expressive - there'd be no way for the app to say that it wants to place the anchor 2m above local-floor, wherever that may be at anchor creation time (local-floor estimate may change between anchor creation request and the time the anchor actually gets created). Not sure if that's a useful thing to express, but it won't be available. There'd still be a way to express that the anchor should use kind-of-old state of the world (modulo small adjustments to local space) - your second snippet does that.

Additionally, do we want this snippet to work w/o logging an error?

// In RAF1:
let anchorPromise = frame.createAnchor(desiredPose, localFloorSpace);
// Anchor creation happens, & then, in the first RAF after the anchor was created:
let anchorPose = frame.getPose(anchorSpace, localFloorSpace);
if(anchorPose.transform !== desiredPose) {
  // anchorPose should be equal to desiredPose,
  // at least in the first frame after the anchor was created, right?
  console.error("whaaat?");
}

This would work for either the current frame or a historical input event frame.

I think it would not (inactive frames don't play), the only way we have of relating past to the present or the future is expressing things relative to native origins, and if we express something as relative to native origin, by definition it always stays valid even when pose of the native origin changes.

Do you think you (or anyone else) could come up with an algorithm for anchor creation that would describe the steps that need to happen when anchor gets created on a frame? The currently proposed algorithm works on XRFrame but does not use any information about it, and I'm failing to see how it could be done given the concepts we have at our disposal.

I think there's still one correct choice: (...)

I don't follow here. Let's say the app is intending to place an anchor right at the user's hand, where the hand was when the user pressed the trigger. If the app takes the historical XRInputSourceEvent.frame they got from the press event and calls frame.getPose(identityPose, gripSpace), they were very clear - they intend the anchor to be created at the location where gripSpace's native origin was at the historical time represented by frame. Placing it where gripSpace is "now" would be incorrect and would negate the purpose of having getPose on XRFrame. It is the responsibility of the UA to retain any information needed to correctly answer pose queries for a given XRFrame while that frame is "active".

They cannot take the historical frame, because anchor creation (as drafted now) does not work on inactive frames, same as getPose() (& for the same reason, at least to me). Moreover "they were very clear - they intend the anchor to be created at the location where gripSpace's native origin was at the historical time represented by frame." - this is the reason why I want to move the API to be exposed through XRSession, because I do not think we have a way of achieving that so we're sending the wrong message. We'd have to force UAs to store the entire historical data for all live frames (defeating the purpose of having active on an XRFrame). So, we may not be able to rely on historical data.

With my proposed change to the API (move it to XRSession, keep the algorithm the way it is, optionally limit the args to only accept stationary XRReferenceSpaces), they'd have to choose:

// Option 1 (within or outside of rAFcb):
session.createAnchor(identityTransform, gripSpace);  // Will use the state of grip space as of the "future now" - the space is not really stationary so the result will not be great.
// Option 2 (only within rAFcb):
let desiredPose = frame.getPose(gripSpace, localSpace).transform; // or "unbounded", or "local-floor", or other space that will be mostly stationary.
session.createAnchor(desiredPose, localSpace); // Will use the state of local space as of the "future now" - since the space should be stationary w.r.t. the real world, the result is probably what the app wants.

On ARCore, the world space is not stable from frame to frame, (...)

I'm not sure I follow here - your UA must already solve this problem to correctly implement the "local" space today.
(...)

What I'm trying & failing to say here is that changes to "local" space (or "unbounded") cannot be observed because XRReferenceSpaces are what establishes frames of reference - we have no absolute frame of reference, no aether in which changes to the reference spaces could be observed (so we can not adjust the app-provided poses for those changes). We can only describe (& observe) things as relative to native origins. In other words, we have no way of checking that between frame_1 and frame_2, local space moved 5cm to the right & so the app-provided pose that will be used for anchor creation need to be adjusted by this.

(...)
With that "local" space definition, a UA should then be able to correctly implement XRFrame.getPose(space, baseSpace) for both current and historical XRFrames, and then use the simple translation above to correctly implement XRFrame.createAnchor(pose, space) on top of getPose for any arbitrary XRFrame, space and baseSpace, even if space and/or baseSpace are dynamic spaces such as gripSpace or the "viewer" reference space.

No arguments about how implementation of local and unbounded should behave on ARCore.

I agree with this fully, (...)

For HoloLens, we will implement XRFrame.createAnchor on top of the equivalent xrCreateSpatialAnchor function in OpenXR's spatial anchor extension (currently a vendor extension and on its way to be an EXT cross-vendor extension). The XrSpatialAnchorCreateInfo struct passed to that function takes an XrSpace, an XrPose and an XrTime, all of which are necessary for an app to fully specify an exact fixed location in the world when the XrSpace it passes in is dynamic and moves over time. Our correct UA implementation for XRFrame.createAnchor will be to pass the XrTime associated with the specified XRFrame down as the time when creating the anchor, fulfilling the promise of WebXR's XRFrame.createAnchor API shape.

As a side note, the equivalent method in our previous native API, SpatialAnchor.TryCreateRelativeTo, did not take a time parameter. However, this is because all SpatialCoordinateSystem objects in that API represent static coordinate systems that are fixed in space. Objects like SpatialLocator that track dynamic objects require you to first jump through hoops to tear off a stationary SpatialCoordinateSystem before you could then pass that coordinate system to SpatialAnchor.TryCreateRelativeTo. That was cumbersome and so we learned from that when we were all designing OpenXR and WebXR, ensuring that XrSpace in OpenXR and XRSpace in WebXR can both directly represent dynamic coordinate spaces. The one extra responsibility this places on the app developer is to time-index such spaces when the API they're calling may need to resolve a pose in a dynamic space down to a static location in the world - however, we've found that to be the right trade so far!

That's really interesting approach! To me, WebXR has a slimmed-down version of time-indexing - inactive frame is useless for all practical purposes so the only valid time index is "now". Even with that, the anchor creation happens in the "future", and there is no good way to me to translate from "now" to "future" (in general, there's no way to observe changes to spaces, so no way to adjust app-provided poses to account for the changes). Does this all make sense, or am I missing something completely here?

Alternative way to look at the problem:
I have an XRSpace (& its native origin, & let's say that it's a "local" space), XRFrame frame_1 w/ time_1, & app calls frame_1.createAnchor(identity, localSpace). Anchor creation will happen in some future frame, frame_2 w/ time_2. Let's say we're in rAFcb of frame_2 now so that it is active. How can I determine if any adjustment is needed to the app-provided pose so that I can create the anchor in the state of the world that was valid as of frame_1? We do not impose any requirements on the XR device that it has to be able to provide historical data (we allude we can query it in a time-indexed way, but I don't think explicit requirements on XR devices are called out and it's not described what needs to happen if the systems do not support such capabilities), so if we come up with the algorithm that requires this, it would place a huge burden on the implementers. Does this mean that OpenXR is committing itself to store snapshots of the state for all possible values of time?

bialpio · 2020-04-28T00:00:24Z

/agenda to make sure we discuss it in the CG in case we cannot come up with good solution prior to the call.

bialpio · 2020-04-28T01:39:18Z

IIUC, it seems that OpenXR is introducing some implicit space in the anchor creation extension ("If space cannot be located relative to the environment at the moment of the call to xrCreateSpatialAnchorMSFT, the runtime (...)."), so it looks like the API is actually taking 2 spaces into account - the one passed in by the user, and some implicit space of the environment, and that enables the system to relate the app-provided space (& pose relative to it) at 2 different timestamps.

thetuvix · 2020-04-28T04:14:45Z

This would work for either the current frame or a historical input event frame.

I think it would not (inactive frames don't play), the only way we have of relating past to the present or the future is expressing things relative to native origins, and if we express something as relative to native origin, by definition it always stays valid even when pose of the native origin changes.

I believe this may be the core of our disconnect here. Per the XRInputSourceEvent section of the spec, the XRFrame delivered to an XRInputSourceEvent has its "active" boolean set to true for the duration of its event handlers:

In ARCore, all input sources are transient, and so their input events always share the current rAF frame. This means that it's indeed less interesting to ever be explicit about which frame you're talking about, since everything is always about the current or most recent rAF's frame.
However, for VR/AR headsets, onselect/onsqueeze events coming from sensed hands or motion controllers will have some inherent detection latency or wireless latency before those events are delivered to the app. That motion to photon latency can be from 20-100ms (1-6 frames), depending on the hardware involved, and will almost certainly fall somewhere between two visible frames. This means that the XRInputSourceEvent will somewhere up to 6 frames in the past at the point that it is delivered. It is important that apps can reason about where the input source and other objects were at the moment in the past that the user pressed the trigger, and so the XRFrame given to the app will be a non-animation "frame" interpolated to that moment in the past. The XRFrame will have its "animationFrame" boolean set to false, indicating that getViewerPose cannot be called.

The design of the core WebXR spec is carefully balanced in this way to serve the needs of both phone/tablet-style and headset-style XR devices. Whether an XRInputSourceEvent gave you an XRFrame that represents the "current" frame (most recent rAF) for transient input, or some interpolated "frame" from 20-100ms ago, you can call frame.getPose or frame.createAnchor and you will consistently target and place objects based on the state of the world at that moment in time.

I think there's still one correct choice: (...)

I don't follow here. Let's say the app is intending to place an anchor right at the user's hand, where the hand was when the user pressed the trigger. If the app takes the historical XRInputSourceEvent.frame they got from the press event and calls frame.getPose(identityPose, gripSpace), they were very clear - they intend the anchor to be created at the location where gripSpace's native origin was at the historical time represented by frame. Placing it where gripSpace is "now" would be incorrect and would negate the purpose of having getPose on XRFrame. It is the responsibility of the UA to retain any information needed to correctly answer pose queries for a given XRFrame while that frame is "active".

They cannot take the historical frame, because anchor creation (as drafted now) does not work on inactive frames, same as getPose() (& for the same reason, at least to me). Moreover "they were very clear - they intend the anchor to be created at the location where gripSpace's native origin was at the historical time represented by frame." - this is the reason why I want to move the API to be exposed through XRSession, because I do not think we have a way of achieving that so we're sending the wrong message. We'd have to force UAs to store the entire historical data for all live frames (defeating the purpose of having active on an XRFrame). So, we may not be able to rely on historical data.

As discussed above, UAs for phones/tablets that only ever serve up XRInputSourceEvents about the "current" frame have no extra retention burden here. UAs for headsets that serve up XRInputSourceEvents representing trigger pulls or hand taps from the past do have the responsibility to accurately represent the time at which that event occurred. However, that is already necessary for just the core spec to allow a trigger pull that happened 80ms ago to aim with the targetRaySpace for where the controller was rather than where it is now. This simply brings that capability forward to anchor creation as well.

With my proposed change to the API (move it to XRSession, keep the algorithm the way it is, optionally limit the args to only accept stationary XRReferenceSpaces), they'd have to choose:

// Option 1 (within or outside of rAFcb):
session.createAnchor(identityTransform, gripSpace);  // Will use the state of grip space as of the "future now" - the space is not really stationary so the result will not be great.
// Option 2 (only within rAFcb):
let desiredPose = frame.getPose(gripSpace, localSpace).transform; // or "unbounded", or "local-floor", or other space that will be mostly stationary.
session.createAnchor(desiredPose, localSpace); // Will use the state of local space as of the "future now" - since the space should be stationary w.r.t. the real world, the result is probably what the app wants.

As you point out, "option 1" above is what the developer would most naturally want to type, but it has a subtle gotcha that would manifest on VR/AR headsets in mispositioning of the user's placed anchor by 20-100ms worth of hand motion.

If instead we keep the current design, the primary gotcha you pointed out is that an app might miss a floor adjustment that occurs in the next frame or two (however long the IPC takes). However, the app is already signing up to permanently miss all future floor adjustments when they chose to create an independent anchor - any included or excluded adjustment there will be in the noise relative to missing up to 1/10 of a second of hand motion.

However, as discussed below, I believe that the difference between static and dynamic spaces lets us still get the best of all worlds here.

On ARCore, the world space is not stable from frame to frame, (...)

I'm not sure I follow here - your UA must already solve this problem to correctly implement the "local" space today.
(...)

What I'm trying & failing to say here is that changes to "local" space (or "unbounded") cannot be observed because XRReferenceSpaces are what establishes frames of reference - we have no absolute frame of reference, no aether in which changes to the reference spaces could be observed (so we can not adjust the app-provided poses for those changes). We can only describe (& observe) things as relative to native origins. In other words, we have no way of checking that between frame_1 and frame_2, local space moved 5cm to the right & so the app-provided pose that will be used for anchor creation need to be adjusted by this.

That's true! No matter how clever we are in a WebXR process, any reference space we are relying on could theoretically adjust out from underneath us. However, this is true for all reference spaces including "local" space. For example, we discussed above whether apps should do their own hoop-jumping to move from gripSpace space to "local" space before requesting anchor creation. However, as you point out, even "local" space can adjust slightly as the device learns.

I generally reason about spaces as being in two categories:

Static spaces that experience periodic minor adjustments over time, like "local"/"local-floor"/"unbounded", etc.)
Dynamic spaces that experience continuous movement over time, like gripSpace/targetRaySpace/"viewer", etc.)

Even in OpenXR where you time-index all pose requests, what you are choosing is the target time for the prediction/interpolation: the moment in the future or past you intend to reason about if the space is dynamic, tracking a physical object that actually moves around in the real world. You are explicitly not asking for the prediction the system had for a static object at that moment in the past - in fact, in OpenXR if you make the same xrLocateSpace call with the same parameters closer to your target photon time, you may get an improved (but different!) result.

OpenXR runtimes are not expected to keep any history of less-informed beliefs about where static objects are - they are only expected to keep a history to interpolate into of where dynamic continuously-moving objects were (or are predicted to be) for some short time window. Given the careful restrictions on when WebXR XRFrame objects are "active", that type of retention requirement does not apply to ARCore UAs (although it does apply to headset UAs), since phone/tablet UAs only ever give you XRFrame objects for reasoning about the most recent rAF's AR frame.

A key principle we can extract given the above is that: For async APIs that span frames, it is fine for UAs to absorb late-breaking platform adjustments to both static and dynamic spaces as device understanding improves during the operation, so long as target time is maintained for dynamic spaces. This continues to maintain the invariant for WebXR apps that pose data is snapshotted for the duration of a rAF, while unblocking anchor creation and other latent pose-based requests to improve their accuracy during IPC.

Building on that principle, I believe that the current API shape is the right balance that accommodates phones, tablets and headsets as we've all intended here:

For logically static spaces, it is always fine to use the latest platform adjustments as the device zeroes in on that logically stationary native origin.
- UAs across phones, tablets and headsets will request creation of anchors relative to static spaces by passing down the XRSpace's space and the pose offset, and ignore the XRFrame.
For logically dynamic spaces, the app is targeting that space's native origin at the time represented by that XRFrame and this should be respected.
- If the XRFrame represents the current frame (either from the rAF itself or from an XRInputSourceEvent on a phone/tablet UA), the UA will request creation of the anchor by passing down the XRSpace's space and the pose offset, for the key timestamp represented by the current XRFrame (photon time for headsets, capture time for phones/tablets). For UAs whose underlying platform does not keep any frame history, the UA can make the anchor request relative to whichever reference space it prefers to act as its substrate "aether" (e.g. "local" space). It is just fine if that space adjusts 5cm during the IPC call, as that represents the sort of static improvement that is always allowed per the principle above - the target time was still correctly used to time-index the dynamic space.
- If the XRFrame represents a historical frame (from an XRInputSourceEvent on a headset UA), the UA will request creation of the anchor at the platform's best current understanding of where the dynamic object was at that historical time, by passing down the XRFrame's time, the XRSpace's space and the pose offset. This is straightforward to do in XR APIs such as OpenXR.

Note that I am excluding major adjustments of reference spaces here (e.g. when the user on a VR headset recalibrates where "local" space's native origin is), since the UA already communicates that with a ReferenceSpaceEvent that includes a transform, and thus it knows the size and timing of major adjustments and can account for any that occur during the IPC, negating the adjustment on the other end for the purpose of anchor creation.

thetuvix · 2020-04-28T04:19:08Z

IIUC, it seems that OpenXR is introducing some implicit space in the anchor creation extension ("If space cannot be located relative to the environment at the moment of the call to xrCreateSpatialAnchorMSFT, the runtime (...)."), so it looks like the API is actually taking 2 spaces into account - the one passed in by the user, and some implicit space of the environment, and that enables the system to relate the app-provided space (& pose relative to it) at 2 different timestamps.

"cannot be located relative to the environment" here represents whether tracking has been lost or not. Any 6DoF tracking system implicitly involves two spaces: ground truth and the platform's best estimate of that ground truth. If positional tracking was just lost, the platform's virtual space "cannot be located relative to the environment at the moment of the call" and so anchor creation fails.

bialpio · 2020-04-28T20:58:42Z

Thanks, this definitely helps and points out the wrong assumption I had! I do have a few follow-up questions.

For UAs whose underlying platform does not keep any frame history, the UA can make the anchor request relative to whichever reference space it prefers to act as its substrate "aether" (e.g. "local" space).

I'm worried here that as implementers / spec authors, we'd be the ones having to choose a reference space that gets elevated to serve as "aether". If we have a session that uses both "unbounded" and "local" space, should we be the ones deciding which one is the one to use? With the approach I'm suggesting, the "aether" becomes the space that the user passed in (and they should be able to translate from dynamic spaces to stationary spaces within the historical frame since getPose() should work, but as we both agree, it's not obvious that this is needed so it's a potential surprise area and I'd like to avoid it).

Any 6DoF tracking system implicitly involves two spaces: ground truth and the platform's best estimate of that ground truth.

I've been treating native origins as proxies for the ground truth, since those to me are the only things the system can communicate about - isn't OpenXR's approach here similar to silently picking a space to act as an aether for the user? From the system's perspective, there's no such thing as "ground truth", there's only its best available approximation of it (in the form of various native origins), so saying that something is relative to the environment to me means that its relative to some space (backed by some native origin), but w/o specifying which one.

So far, the possible approaches that I could see us try to pursue:

XRFrame.createAnchor(XRRigidTransform pose, XRSpace space), with the behavior as in the draft - sends the wrong message since the frame timing is not taken into account.
XRSession.createAnchor(XRRigidTransform pose, XRSpace space), with the behavior as in the draft - should work, but it's surprising to the app developers that the latest state of space will be taken into account (there is no wrong messaging about timing though). It should work both for systems that allow time-indexing (the historical information of dynamic space can be queried by the user within an XRFrame & passed to the API) and those that do not. With this, the differences between various systems are all encapsulated by the XRFrame.getPose().
(Sub-option of 2). XRSession.createAnchor(XRRigidTransform pose, XRReferenceSpace space) and reject reference spaces that are considered dynamic ("viewer") - there should be no surprise to the callers that option 2 has, because the "incorrect" use of the API becomes obvious since UA will throw exceptions.
XRFrame.createAnchor(XRRigidTransform pose, XRSpace space) roughly the way you described - the part that I do not like here is that UA needs to make the call about which space to nominate as "aether".
XRFrame.createAnchor(XRRigidTransform pose, XRSpace space, XRReferenceSpace aetherSpace)(parameter naming TBD :) ) that behaves roughly the way you described (aetherSpace has to be a stationary space) & dodges the responsibility put on UAs of having to pick a space to use when communicating with the device.
(sub-option of 5) XRFrame.createAnchor(XRSpace space, XRReferenceSpace aetherSpace) - ~~the desired anchor pose can be collapsed into the space parameter (the user can synchronously create an offset space from XRSpace)~~ - scratch that, offset spaces can only be created off of XRReferenceSpaces.

Let me digest it a bit more, great discussion! If there's anything I might be missing or misunderstanding here, please let me know!

bialpio · 2020-05-04T22:29:22Z

I've had a quick chat with Alex last Friday to ensure I'm not missing anything here - after weighing in the options, it seems that option 4 is the one we should attempt to proceed with. This gives the UAs freedom to provide the best experience that they are able to (while also avoiding potential gotchas & footguns of the other proposals). It does mean that the algorithm becomes a bit hand-wavy though.

Please take a look at the PR that I have issued. The hope is that the non-normative text sufficiently expresses the intent of the API - it would allow us to clarify what is the meaning of the API w/o diving too deeply into how the sausage is made.

bialpio · 2020-05-07T21:03:21Z

Closing the issue now since it seems it was resolved in a way that works for everyone involved - please file a new issue if there's still something unclear. Thanks for being patient with me!

raviramachandra mentioned this issue Apr 17, 2020

Fix typo for createAnchor API interface #38

Closed

probot-label bot added the agenda label Apr 28, 2020

bialpio mentioned this issue May 4, 2020

Take frame time into account in anchor creation, clarify intent #42

Merged

bialpio closed this as completed May 7, 2020

bialpio removed the agenda label May 7, 2020

bialpio mentioned this issue Jun 5, 2020

Clarify some things about native origins immersive-web/webxr#1071

Merged

bialpio mentioned this issue Mar 16, 2021

Should plane data be available outside of rAF? immersive-web/real-world-geometry#5

Open

Should createAnchor be an API on XRSession than XRFrame #37

Should createAnchor be an API on XRSession than XRFrame #37

Comments

raviramachandra commented Apr 17, 2020

bialpio commented Apr 17, 2020

raviramachandra commented Apr 17, 2020 • edited Loading

bialpio commented Apr 18, 2020

raviramachandra commented Apr 18, 2020

Manishearth commented Apr 20, 2020 • edited Loading

bialpio commented Apr 20, 2020

raviramachandra commented Apr 20, 2020 • edited Loading

Manishearth commented Apr 20, 2020

bialpio commented Apr 20, 2020

Manishearth commented Apr 20, 2020

toji commented Apr 22, 2020

blairmacintyre commented Apr 22, 2020

raviramachandra commented Apr 22, 2020 • edited Loading

raviramachandra commented Apr 22, 2020

blairmacintyre commented Apr 22, 2020

raviramachandra commented Apr 22, 2020

toji commented Apr 22, 2020

bialpio commented Apr 22, 2020

blairmacintyre commented Apr 22, 2020 • edited Loading

bialpio commented Apr 22, 2020

blairmacintyre commented Apr 22, 2020

bialpio commented Apr 23, 2020

Manishearth commented Apr 23, 2020

blairmacintyre commented Apr 23, 2020

blairmacintyre commented Apr 23, 2020

toji commented Apr 23, 2020

bialpio commented Apr 23, 2020

raviramachandra commented Apr 25, 2020 • edited Loading

thetuvix commented Apr 25, 2020 • edited Loading

blairmacintyre commented Apr 27, 2020

bialpio commented Apr 27, 2020

bialpio commented Apr 28, 2020

bialpio commented Apr 28, 2020

thetuvix commented Apr 28, 2020

thetuvix commented Apr 28, 2020

bialpio commented Apr 28, 2020

bialpio commented May 4, 2020

bialpio commented May 7, 2020

raviramachandra commented Apr 17, 2020 •

edited

Loading

Manishearth commented Apr 20, 2020 •

edited

Loading

raviramachandra commented Apr 20, 2020 •

edited

Loading

raviramachandra commented Apr 22, 2020 •

edited

Loading

blairmacintyre commented Apr 22, 2020 •

edited

Loading

raviramachandra commented Apr 25, 2020 •

edited

Loading

thetuvix commented Apr 25, 2020 •

edited

Loading