Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore using meshes for all RWG #6

Open
johnpallett opened this issue Apr 19, 2019 · 9 comments
Open

Explore using meshes for all RWG #6

johnpallett opened this issue Apr 19, 2019 · 9 comments

Comments

@johnpallett
Copy link
Contributor

Discussion about data types relating to real-world geometry and whether a mesh should be used.

@johnpallett
Copy link
Contributor Author

From #2, Blair says: I'm less keen on having the planes be a field in returned value; this implies that any kind of info will be in it's own area. Why aren't planes a kind of mesh? If all things that are meshs are meshes, then an app that cares only about meshes (for occlusion, for example, or drawing pretty effects on anything known in the world) can just deal with that.

meshes.forEach(mesh => {
   let pose = mesh.getPose(xrReferenceSpace); // origin of mesh or plane
   if (mesh.isPlane) {
       // mesh.polygon is an array of vertices defining the boundary of 2D polygon
       // containing x,y,z coordinates
       let planeVertices = mesh.polygon; 
      // ...draw plane_vertices relative to pose...
   } else {
      // draw more general mesh that isn't a special kind of mesh
      // the plane mesh would also have 
      let vertices = mesh.vertices  // vertices for the mesh.  For planes, might be the same
                    // as mesh.polygon plus one at the origin
      let triangles = mesh.triangles // the triangles using the vertices

      // ... draw mesh relative to post
   }
 });

@johnpallett
Copy link
Contributor Author

From #2, bialpio says: Not all things that are related to world knowledge are meshes (although in this repo we’ll focus on RWG). Meshes are definitely less redundant (when they are available) but we feel they need to be separate detectable features to allow flexibility in the future.

There is also a question of confidence differences between different types of real-world data. For example ARKit and ARCore may have different confidence requirements for returning a mesh or a plane. If we say we can represent a plane as mesh that is kind of true but not really since an accurate reflection of the data returned by ArKit/ArCore would require attaching metadata to the mesh for things like plane center/ extents or even surface normal. Which basically puts us kind of back in the situation of having multiple types of data.

@johnpallett
Copy link
Contributor Author

From #2, Blair says:

not all things that are related to world knowledge are meshes (although in this repo we’ll focus on RWG). Meshes are definitely less redundant (when they are available) but we feel they need to be separate detectable features to allow flexibility in the future.

That's orthogonal to my point. Obviously there will be world knowledge that has nothing to do with meshes.

But the idea of having a "planes" member, with a convex boundary, is too specific. In this case, for example, I would posit it will be obsolete before this proposal is finished -- it's pretty easy to imagine that some time soon planes (in ARKit and/or ARCore) will support arbitrary concave boundaries, with holes in them. The current planes are woefully inadequate.

My suggestion to have (when the underlying object is compatible, obviously) a generic mesh type, which has a relatively simple representation, but can have additional fields based on what it actually represents, has all the advantages of having a specific "plane" field, with a bunch of other advantages. As we move toward additional capabilities (e.g., world meshes like on Hololens/ML1), they can be represented similarly; future things (like segmenting moving objects out of a static world) will also fit. But for applications that DON'T know (or care) about these new types, they can fall back to just using the mesh, if they want.

There is also a question of confidence differences between different types of real-world data. For example ARKit and ARCore may have different confidence requirements for returning a mesh or a plane. If we say we can represent a plane as mesh that is kind of true but not really since an accurate reflection of the data returned by ArKit/ArCore would require attaching metadata to the mesh for things like plane center/ extents or even surface normal. Which basically puts us kind of back in the situation of having multiple types of data.

You are right we need additional metadata (that's what I proposed, right?), but adding it absolutely does not put us back in the same place. If I don't know what a "segmentedObject" or (in this case) "plane" or some future "concavePlane" is, but they are all "mesh" with the required "mesh" data provided, I can use them.

I'm very concerned about evolution and headway over time. Having simple convex planes with a simplistic geometric boundary as a base data type doesn't really work.

@blairmacintyre
Copy link

Here's the post in #1 that kicked off my thinking on this, I'm including it because it summarizes a few things that are only "implied" above:

A simple solution to world geometry is pick a lowest-common representation, like "a forest of meshes".

But, then ARKit/ARCore planes become meshes, and lose semantics: we no longer know that they are meant to correspond to vertical or horizontal planes. Similarly, faces could be exposed as a mesh that comes and goes, as could moving objects or detected images, etc.

So, a slightly less "wasteful" approach might be to say

  • we expose a forest of meshes
  • a mesh does not need to satisfy any particular geometric properties, aside from sharing a set of vertices that are used to construct the "mesh".
  • each mesh has an origin in world coordinates, and be defined relative to it. That origin could change each frame (e.g., so ARKit/Core planes/faces can be represented this way, as can tracked objects if the browser supports such a thing)
  • each mesh can be typed, perhaps just using string names, and have additional information with it beyond the mesh that depends on the type. Apps that know of the type can use it, others will just use the mesh.
    • planes might have a surface normal
    • faces might have an array of blend shapes
    • when other things are detectable, they can be exposed similarly. So, if we exposed "image detection and tracking", the detected image could have a mesh defined for it that corresponds to the image in the real world -- same for objects being detected (the real-world equivalent OR the thing used for detection could be returned)

@bialpio
Copy link
Contributor

bialpio commented Apr 24, 2019

Having simple convex planes with a simplistic geometric boundary as a base data type doesn't really work.

I don’t think that we are proposing planes as a base data type. My approach here is that I don’t think we need to have a basic data type - we can return different kinds of objects & just specify that each of them has a mesh (contrast with “each of them is a mesh” as with inheritance). This also allows us to enable feature detection based on the object type (detectedPlanes array is undefined => device doesn’t support plane detection).

Additionally, to me the following 2 code snippets are almost equivalent (I prefer the 2nd one as it doesn't require an app to do something akin to if(plane instanceof XRPlane)):

meshes.forEach(mesh => {
   let pose = mesh.getPose(xrReferenceSpace); // origin of mesh or plane
   if (mesh.isPlane) {
       // mesh.polygon is an array of vertices defining the boundary of 2D polygon
       // containing x,y,z coordinates
       let planeVertices = mesh.polygon; 
      // ...draw plane_vertices relative to pose...
   } else {
      // draw more general mesh that isn't a special kind of mesh
      // the plane mesh would also have 
      let vertices = mesh.vertices  // vertices for the mesh.  For planes, might be the same
                    // as mesh.polygon plus one at the origin
      let triangles = mesh.triangles // the triangles using the vertices
      // ... draw mesh relative to pose ...
   }
 });

Compare with:

let planes = xrFrame.worldInformation.detectedPlanes;
let meshes = xrFrame.worldInformation.detectedMeshes;
// let objects = xrFrame.worldInormation.detectedObjects;  // planes + meshes + all other future types
// not sure if it’s worth adding it if it’s simply a concatenation of all the arrays

planes.forEach(plane => {
  let pose = plane.getPose(xrReferenceSpace); // origin of plane
  // mesh.polygon is an array of vertices defining the boundary of 2D polygon
  // containing x,y,z coordinates
  let planeVertices = plane.polygon; 
  // ...draw planeVertices relative to pose...
 });

meshes.forEach(mesh => {
  // draw more general mesh that isn't a special kind of mesh
  let vertices = mesh.vertices  // vertices for the mesh.  For planes, might be the same
                    // as mesh.polygon plus one at the origin
  let triangles = mesh.triangles // the triangles using the vertices
  // ... draw mesh relative to pose ...
});

@blairmacintyre
Copy link

That code seems (a) overly complex and (b) subject to failure in the future.

In this design, if a web page wants to render all the geometry it gets as a pretty world mesh (thinks of the mesh rendering you see when you tap on the world with hololens), it has to know about all of the various fields. If a new type of object is added ("movingObjects"), any old web pages will fail to render them because they don't have a case handler for them.

I took inspiration from https://docs.microsoft.com/en-us/uwp/api/windows.perception.spatial.surfaces.spatialsurfacemesh on what kind of API to expose for meshes.

I just implemented my version, here is what the code looks like.

I'm including it all below, since it's a complete example that renders planes in one color, and faces in another (I currently expose ARKit faces as a kind of mesh, just so I have two kinds). The only time I look at the type of the geometry is to decide on the color. Otherwise, both expose a common "mesh" api (vertices, triangles and optional normals and texture coordinates).

In my current implementation, I expose the normals on the plane (by assigning the plane normal to all vertex normals), just to test, and don't yet compute the normals for the face mesh. Since this is expensive, I'll plan on adding a property to the request:

  let sensingState = xrSession.updateWorldSensingState({
                    illuminationDetectionState : {
                        enabled : true
                    },
                    meshDetectionState : {
                        enabled : true,
                        normals: true
                    }
                })
            /// called from my rAF handler
            function updateScene(frame){
                let worldInfo = frame.worldInformation
                if(worldInfo.estimatedLight){
                    let ambientIntensity = worldInfo.estimatedLight.ambientIntensity
                    ambientLight.intensity = ambientIntensity;
                    directionalLight.intensity = ambientIntensity * 0.5;
                }
                if(worldInfo.meshes){
                    meshMap.forEach(object => { object.seen = false })

                    worldInfo.meshes.forEach(worldMesh => {
                        var object = meshMap.get(worldMesh.uid);
                        if (object) {
                            handleUpdateNode(worldMesh, object)
                        } else {
                            handleNewNode(worldMesh)
                        }
                    })

                    meshMap.forEach(object => { 
                        if (!object.seen) {
                            handleRemoveNode(object.worldMesh)
                        } 
                    })
                }
            }

function newMeshNode(worldMesh) {
                let edgeColor, polyColor
                if (worldMesh instanceof XRFaceMesh) {
                    edgeColor = '#999999'
                    polyColor = '#999900'
                } else {
                    edgeColor = '#11FF11'
                    polyColor = '#009900'
                }

                let mesh = new THREE.Group();
                let geometry = new THREE.BufferGeometry()

                let indices = new THREE.BufferAttribute(worldMesh.triangleIndices, 1)
                indices.dynamic = true
                geometry.setIndex(indices)
                
                let verticesBufferAttribute = new THREE.BufferAttribute( worldMesh.vertexPositions, 3 )
                verticesBufferAttribute.dynamic = true
                geometry.addAttribute( 'position', verticesBufferAttribute );

                let uvBufferAttribute = new THREE.BufferAttribute( worldMesh.textureCoordinates, 2 )
                uvBufferAttribute.dynamic = true
                geometry.addAttribute( 'uv', uvBufferAttribute );

                if (worldMesh.vertexNormals.length > 0) {
                    let normalsBufferAttribute = new THREE.BufferAttribute( worldMesh.vertexNormals, 3 )
                    normalsBufferAttribute.dynamic = true
                    geometry.addAttribute( 'normal', normalsBufferAttribute );
                } else {
                    geometry.computeVertexNormals()
                }

                // transparent mesh
                var wireMaterial = new THREE.MeshPhongMaterial({color: edgeColor, wireframe: true})
                var material = new THREE.MeshPhongMaterial({color: polyColor, transparent: true, opacity: 0.25})

                mesh.add(new THREE.Mesh(geometry, material))
                mesh.add(new THREE.Mesh(geometry, wireMaterial))

                mesh.geometry = geometry;  // for later use

                //worldMesh.mesh = mesh;
                return mesh
            }
 
            function handleUpdateNode(worldMesh, object) {
                object.seen = true

                // we don't need to do anything if the timestamp isn't updated
                if (worldMesh.timeStamp <= object.ts) {
                    return;
                }

                if (worldMesh.vertexCountChanged) {
                    let newMesh = newMeshNode(worldMesh)
                    object.threeMesh.geometry.dispose()
                    object.node.remove(object.threeMesh)
                    object.node.add(newMesh)
                    object.threeMesh = newMesh
                } else {
                    if (worldMesh.vertexPositionsChanged) {
                        let position = object.threeMesh.geometry.attributes.position
                        if (position.array.length != worldMesh.vertexPositions.length) {
                            console.error("position and vertex arrays are different sizes", position, worldMesh)
                        }
                        position.setArray(worldMesh.vertexPositions);
                        position.needsUpdate = true;
                    }
                    if (worldMesh.textureCoordinatesChanged) {
                        let uv = object.threeMesh.geometry.attributes.uv
                        if (uv.array.length != worldMesh.textureCoordinates.length) {
                            console.error("uv and vertex arrays are different sizes", uv, worldMesh)
                        }
                        uv.setArray(worldMesh.textureCoordinates);
                        uv.needsUpdate = true;
                    }
                    if (worldMesh.triangleIndicesChanged) {
                        let index = object.threeMesh.geometry.index
                        if (index.array.length != worldMesh.triangleIndices) {
                            console.error("uv and vertex arrays are different sizes", index, worldMesh)
                        }
                        index.setArray(worldMesh.triangleIndices);
                        index.needsUpdate = true;
                    }
                    if (worldMesh.vertexNormalsChanged && worldMap.vertexNormals.length > 0) {
                        // normals are optional
                        let normals = object.threeMesh.geometry.attributes.normals
                        if (normals.array.length != worldMesh.vertexNormals) {
                            console.error("uv and vertex arrays are different sizes", normals, worldMesh)
                        }
                        normals.setArray(worldMesh.vertexNormals);
                        normals.needsUpdate = true;
                    }
                }
            }

            function handleRemoveNode(worldMesh, object) {
                object.threeMesh.geometry.dispose()
                engine.removeAnchoredNode(worldMesh);
                meshMap.delete(worldMesh.uid)
            }

            function handleNewNode(worldMesh) {
                let worldMeshGroup = new THREE.Group();
                var mesh = null;

                mesh = newMeshNode(worldMesh)

                worldMeshGroup.add(mesh)

                var axesHelper = engine.createAxesHelper([0.1,0.1,0.1])
                worldMeshGroup.add( axesHelper );
                
                //worldMesh.node = worldMeshGroup;
                engine.addAnchoredNode(worldMesh, worldMeshGroup)

                meshMap.set(worldMesh.uid, {
                    ts: worldMesh.timeStamp, 
                    worldMesh: worldMesh, 
                    node: worldMeshGroup, 
                    seen: true, 
                    threeMesh: mesh
                })
            }

            function newMeshNode(worldMesh) {
                let edgeColor, polyColor
                if (worldMesh instanceof XRFaceMesh) {
                    edgeColor = '#999999'
                    polyColor = '#999900'
                } else {
                    edgeColor = '#11FF11'
                    polyColor = '#009900'
                }

                let mesh = new THREE.Group();
                let geometry = new THREE.BufferGeometry()

                let indices = new THREE.BufferAttribute(worldMesh.triangleIndices, 1)
                indices.dynamic = true
                geometry.setIndex(indices)
                
                let verticesBufferAttribute = new THREE.BufferAttribute( worldMesh.vertexPositions, 3 )
                verticesBufferAttribute.dynamic = true
                geometry.addAttribute( 'position', verticesBufferAttribute );

                let uvBufferAttribute = new THREE.BufferAttribute( worldMesh.textureCoordinates, 2 )
                uvBufferAttribute.dynamic = true
                geometry.addAttribute( 'uv', uvBufferAttribute );

                if (worldMesh.vertexNormals.length > 0) {
                    let normalsBufferAttribute = new THREE.BufferAttribute( worldMesh.vertexNormals, 3 )
                    normalsBufferAttribute.dynamic = true
                    geometry.addAttribute( 'normal', normalsBufferAttribute );
                } else {
                    geometry.computeVertexNormals()
                }

                // transparent mesh
                var wireMaterial = new THREE.MeshPhongMaterial({color: edgeColor, wireframe: true})
                var material = new THREE.MeshPhongMaterial({color: polyColor, transparent: true, opacity: 0.25})

                mesh.add(new THREE.Mesh(geometry, material))
                mesh.add(new THREE.Mesh(geometry, wireMaterial))

                mesh.geometry = geometry;  // for later use

                //worldMesh.mesh = mesh;
                return mesh
            }

@bricetebbs
Copy link

I'd like to propose another way of looking at this question. Rather than thinking about what best represents the underlying system's mapping of the environment, we could consider what is the thing which many applications seem to want. That is, I have an object I'd like to place on a flat surface so what I want is for the system (regardless of its underlying representations) to give me the best planar surface it can since that's what my app understands. The app's request for Planes is telling the system to optimize its efforts into providing this for the App. Other use cases might need more detailed meshes and apps could also ask for those. Maybe not all devices could fulfill that? but all the devices we have currently could I think provide planes much in the same way they can all resolve a hit test.

If we have an option for meshes it doesn't have to mean that you need to check the meshes and the planes. It could mean at that point that all planes will also show up as meshes. Its more a question of the kind of query you are making of the environment.

@blairmacintyre
Copy link

@bricetebbs I like what you are suggesting. Essentially, allow multiple kinds of "geometry-like" things to be requested, and provide the ones you can as appropriate. If the user wants occlusion, and they want "tables" (planes above the ground) and they want "an indication of where the ground is" and so on, we can focus on just giving them those things, or indicating we can't (and letting them polyfill if appropriate).

That really gels with my current thinking, too, after I've spent the last month building an authoring tool for AR on top of WebXR. Part of where I was starting above is "we need a common, lowest common denominator" that developers can rely on. But we definitely want the option for UAs to provide (and, perhaps eventually, the standard to require) more semantically meaningful representations that developers expect.

Planes are an obvious one; when I was at MSR (Microsoft Research) for a summer working on RoomAlive (projection-based AR), there were folks doing excellent work on extracting planes and other semantically meaningfull structures from depth data, and it's super useful.

The other obvious one is "ground" ... it would be great if UAs could provide (assuming permission and capabilities) an estimate of the height above ground (below the device) and (in the case of planes and meshes) tag one or more with an indication they are "the ground or floor".

@cabanier
Copy link
Member

@bricetebbs I like what you are suggesting. Essentially, allow multiple kinds of "geometry-like" things to be requested, and provide the ones you can as appropriate. If the user wants occlusion, and they want "tables" (planes above the ground) and they want "an indication of where the ground is" and so on, we can focus on just giving them those things, or indicating we can't (and letting them polyfill if appropriate).

I agree. Processing a mesh and asking for plane data are very different.
Meshes can be huge and constantly changing and the processing of them is likely UA dependant.

If people want access to an occlusion mesh or if they want to detect a plane, we need to provide them with separate APIs.

The other obvious one is "ground" ... it would be great if UAs could provide (assuming permission and capabilities) an estimate of the height above ground (below the device) and (in the case of planes and meshes) tag one or more with an indication they are "the ground or floor".

Ideally, the author should be able to request other types of planes. (ie Walls, ceilings, general surfaces, ec)
There could be a lof of "planes" in a user's environment. Usually, planes are detected when the user points or looks at an area of their environment and the author knows what to look for (ie a floor) so the API could be a simple function to look in a certain area for a type of plane.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants