-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should createAnchor be an API on XRSession than XRFrame #37
Comments
I deliberately moved it from XRSession to XRFrame to make it clearer to the application that it's using data that is potentially only valid during this specific frame to create an anchor. For example, if the application decided that it wants to put an anchor 3m above local space because it computed it using some state relevant only within an XRFrame (since that's when it can obtain poses from XR), we will create the anchor that is 3m above the local space, but at the time of anchor creation, the computation that the application used to come up with desired anchor location may no longer be valid. I'm not sure how reasonable the above concern is (and if putting the method on XRFrame actually sends this signal), I think we might be fine with exposing this method on an XRSession instead. Note that we also have XRHitTestResult that exposes |
For clarification: W/ Specification: Do you have other specifications that move the API to a different partial interface than where the actual API will be implemented. I am not sure. May be @toji or Manish might know it better. |
Yes, the entire flow that you describe here is how I imagine it will work. The point I'm trying to make is that the position and the orientation of the anchor may depend in some way on something that the application computed within rAF callback - the result of such computation becomes outdated the moment rAF callback completes, but we will still use it for anchor creation & assume it's supposed to be interpreted relative to the space that was passed to the call. It actually makes me wonder if the example I'm giving is not actually arguing more in favor of moving the method to XRSession, exactly as you are proposing. @toji, @thetuvix, @Manishearth - do you have any opinions here?
I don't think I understand - if we think the method should be implemented in a different partial interface, we should just move it there in the spec and update the algorithms to account for the change. Is this what you meant? |
This is my understanding: |
I think the
No, in reality it's a method on |
I'm now on the fence about this ("frame must be involved in the API") - on one hand, in the example I wrote about above it may be important to make it clear to the app that if it computed some data derived from things valid only during an |
Okay.
I think I might have been confused by this example: line 328 @bialpio sorry for this confusion. I do agree that anchors' creation needs some kind of snapshot data in a RAF. Even though their lifetimes/relevance dont' match other Frame Specific Data. Anchors in theory will have it's lifetime match that of a session. But I don't have a strong opinion. |
Aren't they? We need frames to get pose data for spaces. |
The way it's currently specified, we only need to grab the space's native origin and treat the application-provided rigid transform as expressed relative to that - the current pose of the space is not relevant to the algorithm. In fact, the pose may already be outdated the moment JS gets its hands on it (in Chrome, the device process will not wait for the renderer/JS app to finish with its rAF callback - the device-side state may be updated while the rAF is still executing). Speaking of anchor creation algorithm, it should actually be described that the rigid transform passed in to it is expressed relative to effective origin (opened issue #39), but that is still not dependent on XRFrame - origin-offset of a space is immutable once the space is created. |
Ah, makes sense! |
In reading through this I realized that the anchor's API may not work the way I had been expecting in my head, and clarifying that would help me decide whether or not a frame needs to be involved. Let's say I created one of these transform-based anchors relative to an
|
@toji only 1 makes sense to me. The spaces were there to provide a convenient frame of reference to the specify the offset in. And I agree it needs to be on the XRFrame.
That does not make sense to me. yes, it's outdated when we get it (same will be true of any platform/implementation that doesn't offer time-travel 😉) BUT knowing exactly which data was used (i.e., the frame) for the spaces is required to allow the platform to do the best job of getting it right. For example: internally, you know the set of transforms for the frame, so you know what the final anchor position is relative to device coordinates for that frame, and when the anchor is created, you also should be able to estimate (or know) the relationship between that frame's coordinates and the current device coordinates. That will allow the implementation to create the anchor in the correct coordinates. Now, some implementation is free to do "the dumb thing" and just use the current values of the spaces, etc., and ignore any internal changes that might have happened to device coordinates, but that's their choice. On ARKit, as we move from one frame to the next, the device coordinates may change, and thus the values of the various spaces relative to the world, each other, and any anchors. But knowing what the values were on the frame will let us "do the right thing" with the data when we call the underlying "create anchor" command; assuming we keep the frame data around that was used by the app when it created the anchor. |
I thought the initial spec of anchors could not do 2 as mentioned by @toji. My understanding is that anchors will "always" be fixed to a feature (like a plane, or hittest result) or arbitrary (floating, but fixed in world space). Tracking controller was not something anchors would not have to do. |
Yes, this is what my understanding is also. Basically at the anchor creation time, we record frame and device transform snapshots and use that to create platform anchors in world space. Of course that anchor can move a bit (at least on our platform) with better understanding of the world. But would not expect to deviate a lot. |
Anchors aren't needed for this anyway. If you want something "above and to the left of a space, like a controller" just render it above and to the left of the space each frame. I don't understand what (2) means aside from that. |
That was a typo. I meant the other way. Fixed the comment. |
Thanks Blair and Ravi. That matches what I was expecting, but there were a couple of things in this thread that made me question the intent. I appreciate the clarification! And to reiterate: Since spaces move relative to one another and the environment, placing an object relative to a space needs to know the time at which you wanted to place it. |
Let's start with clarification, the intended behavior is no. 1 after the anchor got created. The problem is that anchor creation may be requested assuming state at t_0, but the device may have already progressed to t_n. When the creation request is in flight, things may be changing, and my approach was to spec it in a way that would take the most recent data into account (i.e. "assume the pose is relative to some effective origin and create the anchor there").
One person's "dumb thing" may be another person's treasure. :) Let's say that you want to create a free-floating anchor 2 meters above On the other hand, if the application's intended behavior is to just create the anchor using the snapshot, it could simply compute the anchor's pose w.r.t. a space that is more stable (for example, compute the desired anchor's pose in local space and use that to create the anchor) & issue anchor creation request with that pose and local space. In case local space was reset, the app could decide that anchor creation requests that were in flight when the reset happened should be ignored (because, per spec draft, we will use latest information about a space to create anchors) and create new ones.
On ARCore, the world space is not stable from frame to frame, so we cannot just re-use poses computed on old frames and assume they'll work. Additionally, I think there is no way for us to learn about the transform between device coordinates from frame_1 to frame_2, even if I keep the frame data around. ISTM that the only choice left is to use something that the device is able to track and describe poses relative to it, and this aligns with WebXR's notion of native origins perfectly. |
We're probably thinking similar things but talking across each other. I think we all agree that all anchor creation is done in some space. The issue really is that some spaces move in the world over time (e.g., hands, viewer, eventually a tracked object, etc), and some are meant to be stationary but will change over time as well because the underlying system refines it's knowledge (e.g., your When I say "create an anchor 2m above
When I say "create an anchor 2cm above In the end, the time and offset differences are pretty small. As long as we're just doing things relative to things that we keep track of abstractly, like platform anchors (but not the ever-changing global coordinates), the results will be mostly correct. Which, I think, is what you are saying too. So, I expect many implementations would do "the dumb thing", since it's usually going to be correct (I didn't mean that as pejoratively as it sounds). But, it would be possible to be smarter. For example, create a platform anchor (updating occasionally as needed) and save the relationship between these spaces and the anchor for the frames would give you a good estimate of how things changed. But, it's not clear the end result is worth that effort -- I suspect it depends on the platform. In the end, making this call explicitly related to the frame gives platforms the option of doing smarter things; decoupling it from the frame prevents it. So, I'd opt for associating it with the frame, even though some current implementation might not benefit. |
That may very well be the case. :)
I think there's still one correct choice: if the user wanted me to create an anchor at the point where the hand was, they would've computed its pose in a space that is assumed to be stationary across frames (i.e. local space, or maybe local-floor space, both modulo reset events & small adjustments that could happen) and passed that pose along with the space of choice to the request. Since they chose to say something else, we must assume that this is what they meant (customer's always right!). The API in this case is expressive enough to cover both approaches, and we need to trust the user to say what they mean. The user may need to be in an active XRFrame to do the calculations to do the Right Thing that you refer to, but the API can be exposed through an XRSession since it does not care about timings.
I'm arguing that this is not possible in general in ARCore. I can get 2 poses of the hand from the system at different times, pose_1, relative to world space space_1, and pose_2, relative to world space space_2. There is no way for me to know the relationship between space_1 and space_2, so I cannot do the smart thing and compute hand's pose pose_1 relative to space_2. I can place a native origin (platform anchor) that is supposed to be stationary relative to the world and do the offsetting relative to it, but we already should have such native origin - it's the one backing up local space! So the user can simply say "place anchor relative to local space" and we will automatically do the right thing (by doing the dumb thing!).
I didn't treat it as pejorative (and I have thick skin ;)), I'm trying to argue that it's not really our choice to ignore the internal device coordinate system changes, because I think we have no way of observing them in ARCore. With world space being unstable from frame to frame, we theoretically cannot reason anything about the platform anchor location changes ("it moved 0.1m to the right - does that mean world space moved, or the anchor moved, or a bit of both?"). KIDS ("keep it dumb, stupid") principle is forced on us here, unless I'm missing something. :) TL;DR: native origins are I think the only things that we can reason about & we need to always assume poses are specified relative to them. Some native origins are assumed to be stationary relative to the world (local, local-floor spaces) - users can use those for more deterministic anchor creation. Others are not stationary (input source spaces) - users would have to compute the poses for anchor creations relative to some stationary native origins (this needs to happen within a frame) if they wanted to have more control over where exactly the anchor will be created. |
My closing comment on the previous message essentially said "sure, that's fine, I imagine many platforms will choose to do what you suggest" and "since some platforms might benefit from knowing which frame the data was computed from, we should really associate create anchor with a frame" ... I'm not arguing that ARCore should or shouldn't do something; just that, some other platform might benefit from the greater knowledge of knowing the frame associated with it. So, given that there isn't a significant downside to associating the method with the frame, why not do that? |
I chatted with @toji offline, given the constraints we have, it may make sense to change the signature of the anchor creation method to accept only Also, moving the API to be exposed on @raviramachandra, @toji, @Manishearth - is this acceptable? @blairmacintyre - I just saw your message before hitting "Comment", quick response: |
This is acceptable to me from an API standpoint. I haven't fully followed this entire discussion and might be less aware of the nuances of anchors so I defer to Blair, Brandon, and others on that point |
I am not, in the surface, in favor of this, but I might not be thinking about all the possible spaces. If I am using a phone, and want to put an anchor 10cm behind the phone (let's say along the viewer z-axis), how do I express this concisely without using the |
That is a not really a reasonable choice. The pose should be interpreted relative to the space based on the data the program is using to compute. In other words, the data in the current frame. If (as a programmer) I react to a click on a button on a controller by saying "create an anchor at the tip of the controller" and the controller happens to be moving, I want the system to do its very best to put the anchor where the controller was when the button was pressed. Not some arbitrary position later. Arguments based on current implementation limitations of ARCore are not persuasive, TBH. We should draft a spec that is aims to be correct. We can put in the spec description that different platforms may not be able to achieve this (in fact, many won't), but shouldn't hamstring future implementations this way. |
I agree with this fully, but in my (admittedly brief) time researching the various AR APIs out there it appeared to me none of them actually offered any way of controlling the timing of the anchor creation, and certainly none allowed you to specify that the anchor should be created against a past world state (which a strictly frame-based API would require for at least some browser architectures.) In principle I feel like specifying the anchor in terms of a specific (I would love to be wrong about this, so if anyone that knows the APIs better can inform us otherwise I'd love it hear it!)
The idea would be that you'd query the On the other hand if we figure out how to have a stronger creation timing guarantee then it's a moot point and there's not much reason to restrict the spaces that can be used. |
Something like: let viewerSpace = ...;
let localSpace = ...;
// in rAFcb:
let offsetViewerSpace = viewerSpace.getOffsetReferenceSpace(
new XRRigidTransform(
new DOMPoint(0, 0, 0.1))); // let the UA do the math!
let offsetPoseInLocalSpace = xrFrame.getPose(offsetViewerSpace, localSpace);
xrFrame.createAnchor(localSpace, offsetPoseInLocalSpace.transform);
// ... or XRSession ...
I think the only way we could achieve that is if the API was not promise based, in a single-threaded, single-process application, or maybe if we were to pause the processing on the device-side while rAFcb is running, both of which are not realistic. To me the API is, by its nature, speaking about something that will happen in the future (& it's the future in which the device may already be living), so we need to find a way to talk about things that will happen in the future. The current draft is proposing one way to do so, and given the assumption that the platform is doing the right thing regarding native origins that should be stationary, I think the proposed approach is workable.
If it turns out that we can do better given a frame once some new devices are available, it will be easy to add anchor creation method on an XRFrame & spec it in a way that takes the frame into account. To me, it sounds like it may not be possible w/o some absolute frame of reference for the device to use - ARCore, ARKit, HoloLens and Magic Leap I think are all in the same boat here (*). But, let's say that there's now "AR with base stations", which knows with high degree of accuracy what internal device coordinate changes have happened & can account for them, so having it expose an API on XRFrame may be beneficial. But then, if the device is 99.9% certain where it is located, it can expose a truly stationary native origins and the current wording of the spec Just Works for them... (*) I'm making some assumptions here - @Yonet, @thetuvix, @raviramachandra, @grorg & others - please keep me honest!
@toji - I do not think I understand the "laggy" part, can you explain what you meant? |
@bialpio these are my thoughts: Should How I originally thought Anchors would work on MLOne device ? Should the API be changed only to take |
When calling
When you create an anchor that way, the
The reason that the In the example above, the app did the following: var created = await frame.createAnchor(new XRRigidTransform({ x: 0, y: 2, z: 0 }, localFloorSpace); This should have precisely the same behavior as if the app wrote this instead: var created = false;
var localSpacePose = frame.getPose(localFloorSpace, localSpace);
if (localSpacePose) {
var transform = localSpacePose.transform;
created = await frame.createAnchor(new XRRigidTransform({ x: transform.x, y: transform.y + 2, z: transform.z }), localSpace);
} This would work for either the current frame or a historical input event frame.
I don't follow here. Let's say the app is intending to place an anchor right at the user's hand, where the hand was when the user pressed the trigger. If the app takes the historical
I'm not sure I follow here - your UA must already solve this problem to correctly implement the Here is the definition of
Basically, To correctly implement the WebXR spec, I have thus far expected that an ARCore-backed UA would:
With that
For HoloLens, we will implement As a side note, the equivalent method in our previous native API, |
Thanks @thetuvix, you've confirmed what I was trying to get at. Appreciate it. |
No arguments here.
I think it introduces IPC concerns.
What I'm arguing is that it should be a different behavior (& then we could expose the API on Additionally, do we want this snippet to work w/o logging an error? // In RAF1:
let anchorPromise = frame.createAnchor(desiredPose, localFloorSpace);
// Anchor creation happens, & then, in the first RAF after the anchor was created:
let anchorPose = frame.getPose(anchorSpace, localFloorSpace);
if(anchorPose.transform !== desiredPose) {
// anchorPose should be equal to desiredPose,
// at least in the first frame after the anchor was created, right?
console.error("whaaat?");
}
I think it would not (inactive frames don't play), the only way we have of relating past to the present or the future is expressing things relative to native origins, and if we express something as relative to native origin, by definition it always stays valid even when pose of the native origin changes. Do you think you (or anyone else) could come up with an algorithm for anchor creation that would describe the steps that need to happen when anchor gets created on a frame? The currently proposed algorithm works on
They cannot take the historical frame, because anchor creation (as drafted now) does not work on inactive frames, same as With my proposed change to the API (move it to XRSession, keep the algorithm the way it is, optionally limit the args to only accept stationary XRReferenceSpaces), they'd have to choose: // Option 1 (within or outside of rAFcb):
session.createAnchor(identityTransform, gripSpace); // Will use the state of grip space as of the "future now" - the space is not really stationary so the result will not be great.
// Option 2 (only within rAFcb):
let desiredPose = frame.getPose(gripSpace, localSpace).transform; // or "unbounded", or "local-floor", or other space that will be mostly stationary.
session.createAnchor(desiredPose, localSpace); // Will use the state of local space as of the "future now" - since the space should be stationary w.r.t. the real world, the result is probably what the app wants.
What I'm trying & failing to say here is that changes to
No arguments about how implementation of local and unbounded should behave on ARCore.
That's really interesting approach! To me, WebXR has a slimmed-down version of time-indexing - inactive frame is useless for all practical purposes so the only valid time index is "now". Even with that, the anchor creation happens in the "future", and there is no good way to me to translate from "now" to "future" (in general, there's no way to observe changes to spaces, so no way to adjust app-provided poses to account for the changes). Does this all make sense, or am I missing something completely here? Alternative way to look at the problem: |
/agenda to make sure we discuss it in the CG in case we cannot come up with good solution prior to the call. |
IIUC, it seems that OpenXR is introducing some implicit space in the anchor creation extension ("If space cannot be located relative to the environment at the moment of the call to xrCreateSpatialAnchorMSFT, the runtime (...)."), so it looks like the API is actually taking 2 spaces into account - the one passed in by the user, and some implicit space of the environment, and that enables the system to relate the app-provided space (& pose relative to it) at 2 different timestamps. |
I believe this may be the core of our disconnect here. Per the
The design of the core WebXR spec is carefully balanced in this way to serve the needs of both phone/tablet-style and headset-style XR devices. Whether an
As discussed above, UAs for phones/tablets that only ever serve up
As you point out, "option 1" above is what the developer would most naturally want to type, but it has a subtle gotcha that would manifest on VR/AR headsets in mispositioning of the user's placed anchor by 20-100ms worth of hand motion. If instead we keep the current design, the primary gotcha you pointed out is that an app might miss a floor adjustment that occurs in the next frame or two (however long the IPC takes). However, the app is already signing up to permanently miss all future floor adjustments when they chose to create an independent anchor - any included or excluded adjustment there will be in the noise relative to missing up to 1/10 of a second of hand motion. However, as discussed below, I believe that the difference between static and dynamic spaces lets us still get the best of all worlds here.
That's true! No matter how clever we are in a WebXR process, any reference space we are relying on could theoretically adjust out from underneath us. However, this is true for all reference spaces including I generally reason about spaces as being in two categories:
Even in OpenXR where you time-index all pose requests, what you are choosing is the target time for the prediction/interpolation: the moment in the future or past you intend to reason about if the space is dynamic, tracking a physical object that actually moves around in the real world. You are explicitly not asking for the prediction the system had for a static object at that moment in the past - in fact, in OpenXR if you make the same OpenXR runtimes are not expected to keep any history of less-informed beliefs about where static objects are - they are only expected to keep a history to interpolate into of where dynamic continuously-moving objects were (or are predicted to be) for some short time window. Given the careful restrictions on when WebXR A key principle we can extract given the above is that: For async APIs that span frames, it is fine for UAs to absorb late-breaking platform adjustments to both static and dynamic spaces as device understanding improves during the operation, so long as target time is maintained for dynamic spaces. This continues to maintain the invariant for WebXR apps that pose data is snapshotted for the duration of a rAF, while unblocking anchor creation and other latent pose-based requests to improve their accuracy during IPC. Building on that principle, I believe that the current API shape is the right balance that accommodates phones, tablets and headsets as we've all intended here:
Note that I am excluding major adjustments of reference spaces here (e.g. when the user on a VR headset recalibrates where |
"cannot be located relative to the environment" here represents whether tracking has been lost or not. Any 6DoF tracking system implicitly involves two spaces: ground truth and the platform's best estimate of that ground truth. If positional tracking was just lost, the platform's virtual space "cannot be located relative to the environment at the moment of the call" and so anchor creation fails. |
Thanks, this definitely helps and points out the wrong assumption I had! I do have a few follow-up questions.
I'm worried here that as implementers / spec authors, we'd be the ones having to choose a reference space that gets elevated to serve as "aether". If we have a session that uses both
I've been treating native origins as proxies for the ground truth, since those to me are the only things the system can communicate about - isn't OpenXR's approach here similar to silently picking a space to act as an aether for the user? From the system's perspective, there's no such thing as "ground truth", there's only its best available approximation of it (in the form of various native origins), so saying that something is relative to the environment to me means that its relative to some space (backed by some native origin), but w/o specifying which one. So far, the possible approaches that I could see us try to pursue:
Let me digest it a bit more, great discussion! If there's anything I might be missing or misunderstanding here, please let me know! |
I've had a quick chat with Alex last Friday to ensure I'm not missing anything here - after weighing in the options, it seems that option 4 is the one we should attempt to proceed with. This gives the UAs freedom to provide the best experience that they are able to (while also avoiding potential gotchas & footguns of the other proposals). It does mean that the algorithm becomes a bit hand-wavy though. Please take a look at the PR that I have issued. The hope is that the non-normative text sufficiently expresses the intent of the API - it would allow us to clarify what is the meaning of the API w/o diving too deeply into how the sausage is made. |
Closing the issue now since it seems it was resolved in a way that works for everyone involved - please file a new issue if there's still something unclear. Thanks for being patient with me! |
From the spec:
partial interface XRFrame {
Promise createAnchor(XRRigidTransform pose, XRSpace space);
};
Should it not be XRSession ??
The text was updated successfully, but these errors were encountered: