Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stricter Skinning Requirements #1665

Open
marstaik opened this issue Aug 23, 2019 · 64 comments · May be fixed by #1747
Open

Stricter Skinning Requirements #1665

marstaik opened this issue Aug 23, 2019 · 64 comments · May be fixed by #1747

Comments

@marstaik
Copy link

After messing around with the gltf spec and various engines, I feel there are quite a few cases with the current specification that make it extremely difficult for most importers to handle with their own internal rendering engines. Either this leads to a bunch of extraneous work and guessing on the importer side, or leads to duplication of data by the exporters.

Here are some ideas that I believe could tighten up the specification:

  1. The skeleton root should be defined, otherwise the direct parent of the highest joint in the skin hierarchy must be used. The direct parent does not have to be a joint, to allow for multi rooted skeletons. Note that the direct parent may be the root, but it should not be the root as it was before when the skeleton property was undefined.

  2. All joints in a skin must form a connected subtree with the skeleton root/direct parent (1)
    This means there lie no non joints (or other skins joints) as connecting joints between skin joints. This however still allows you to have other nodes attached to joints, as long as no subsequent joints of the skin follow.
    A lot of engines treat the skeleton/skin as a single entity, and allowing for non joints embedded in the tree of joints makes the engine have to create a bunch of shadow bones to get things to work properly.

  3. A mesh can only bind to a single skin
    I know this is pretty implicit, but I believe it should be defined.

  4. As a per previous discussion here: Multiple duplicated skins are exported for each mesh child of the armature glTF-Blender-IO#566 (comment)
    All meshes skinned must be normalized to the local space of the skeleton.

Now I have another requirement that I am torn between two options: (5) and (6)

  1. Each joint shall belong to only one skin
    This is the ideal choice as it makes things extremely simple for importers as they do not need to resolve/union skinning trees to find the master skeleton/skin (see below)

  2. Each skin shall either define a new tree of unused joints, or, explicitly be a subtree of a previously defined skin.
    This allows for multiple skins per skeleton, but the subtree definition implies that all skins for a single skeleton must have a "master" skin that holds all of the joints for a skeleton. This makes it easy for an importer to map smaller skin definitions to a master skeleton containing all the joints.

@vpenades
Copy link
Contributor

Reusing nodes on multiple skins should not only be allowed, but it is also a great feature!

Consider a character model with 150 bones, made with a mesh and a skin; there's toolsets that might split the skin and the mesh to allow loading it in engines that have a limited number of bones per shader. For example MonoGame has a limit of 72 bones per skin. And when you do the split, you really, really need that the extra skins keep pointing to the original nodes, so yes, you do need to share nodes between skins.

There's more use cases where you want multiple skins to share nodes, consider a model with multiple LODs; several LOD meshes of a character would use their own skins, poiting to a commonly animated skeleton.

About requiring "Skeleton", I think it's been discussed before, and I believe the conclussion was that it was just giving redundant information that could be calculated from the nodes themselves. Truth is I am doing full skinning animation and I don't need it at all.... so I believe requiring Skeleton would actually complicate things, specially when Skeleton is defined in a way that conflicts with what can be inferred from the node tree.

Trying to understand how skinning works is difficult given the way the schema has been laid, the fact that a skinned node is defined within a node is very misleading, so I came with these thoughts:

  • Nodes are just WorldMatrix providers, nothing more, nothing less.
  • Every Node with a mesh can be interpreted as an MeshInstance that needs to be rendered.
  • A MeshInstance without skin will be transformed by the node that contains it.
  • A MeshInstance with a skin will be transformed by the nodes pointed by the skin.

So a renderer just needs to do this:

  • calculate the world transform of every node in the scene into a Matrix[] Table.
  • for every mesh instance (a node with a mesh):
    • if the mesh has no skin, upload just one matrix from the table.
    • if the mesh has a skin, upload all the matrices from the table pointed by the skin.

As you can see, you don't need the Skeleton property anywhere in the process.

@marstaik
Copy link
Author

marstaik commented Aug 26, 2019

I agree that the skeleton property isn't useful, if constraints are still met.

Imagine Node A having children B, C, and C has children D, E.

Nothing in the spec prevents me from exporting a skin with nodes B, D, E with no skeleton property defined.

A
| \
b C
   | \
   d e

What is an importer supposed to do in this case? You have three disjoint sets of nodes, of which, they aren't even on the same hierarchy/level. How is an importer supposed to correctly interpret that?

An example of a strange solution is to duplicate C into a phantom bone C* and create a multi rooted skeleton using B C* D E. Then you can parent C to C*, and have C* steal C's transform. Remember, C may not even be a joint, or maybe it is, of a different skin. Or what if it's a mesh? How do I treat those joints which belong to a single skin as an entity I can't represent correctly in a node tree?

Now, I understand your argument for multiple skins per joint hierarchy, and that's why I would like to point out #6. You can still have multiple skins, but they would at least have to be a subset of an explicit node tree, or, each be a separate skin that no one used.

Example using #6 and above points (and assuming they are all joints)
Skin #1: A B C D E
Skin #2: B
Skin #3: D
Skin #4: E

The current spec allows for a bit too much flexibility, and then the importer has to handle a bunch of additional strange cases.

@vpenades
Copy link
Contributor

vpenades commented Aug 26, 2019

I believe the current spec already says that all the nodes of a skin must share a common "parent" node, so there should be no need for a "skeleton" node.

But in my experience, even the requirement of having a common ancestor is not needed!, remember, as I said before, Nodes are just world transform providers.

Consider this pipeline:

     Node tree
       🡇
  World Matrix Table
       🡇
     Skins

First, you calculate the world matrices of ALL the nodes of the scene into a world matrix table (you can even use the node indices to build a flat list), once you have the world matrix table, you no longer need the nodes for anything, you just have a bunch of matrices in their final, world space positions.

Second, you loop through the MeshInstances, picking the matrices you need from the matrix table.

I'll give an extreme example: Imagine a scene with two character who happens to be a siamese couple, so both characters are made with just one mesh and one skin.... but pointing to two full skeletons, each with its own root node. So what would be the problem? in the end, all these nodes boil down to a plain list of world transforms!!

So in the end, how the nodes are arranged is completely meaningless, as long as you precalculate the world transforms, before traversing the mesh instances.

@vpenades
Copy link
Contributor

@marstaik Okey, to be fair, I believe the problem is, as you suggested, in importing glTF into existing engines that might interpret how nodes and meshes and skins are related to each other.

I recently suffered this problem when loading glTF files into MonoGame. As it happens, MonoGame's default Model object only supports one skeleton tree and one skinned mesh (with up to 72 bones) per model, and these limitations are impossible to overcome.

And I've seen more engines that try the "node centric" approach; that is, Node trees rule over how the scene is rendered, and that approach is very limiting.

In the end, I made my own monogame model, which is "mesh instance" centric, and nodes are only used to provide the world transforms, and suddenly, all the problems were gone.

@marstaik
Copy link
Author

marstaik commented Aug 26, 2019

@vpenades I'm not trying to be rude but please hear me out:

What is the point of a scene node hierarchy of you just chose to ignore it's existance?

What you define isn't a method for exporting and importing scenes. You've put down a completely custom implementation that discards the scene graph. And that's fine to do, but that doesn't mean it should affect the standard (as you said for MonoGame)

Your Siamese couple may have two full skeletons but I would hope they at least have a common direct parent. Then, in that case it is a single skin with multiple roots that can indeed be one skeleton. But they start at the same hierarchy level.

Maybe when I get back from vacation I'll draw diagrams to better represent these issues.

The main issue is as you said that most engines have strict requirements.

But that's an easy solution, just force every skin to be a complete tree and you fix all these issues.

@vpenades
Copy link
Contributor

@marstaik I didn't say I ignore the existence of the node hierarchy, I said that once you have baked the node hierarchy into world matrices, the hierarchy itself becomes irrelevant, only the world matrices matter, because after all, the Skin object only needs world matrices to do its job, the skin doesn't know or care from where these world matrices come from, or which was the relationship between them.

@marstaik
Copy link
Author

marstaik commented Aug 26, 2019

@vpenades I am sorry, I am still confused with what you are trying to say. What's the point of doing this? The skin should already have precalculated IBMs and the nodes themselves have transforms. Why are you calculating the transforms in the first place?

Also the skin object is a definition, not an actual entity. 90% of all implementation will turn the skin into a skeleton, because that's what is expected of a series of linked nodes defined as joints.

That's how we create and target animations efficiently across multiple skeletons. The animation channels target said joints, which most engines will bake down to the skeleton level.

But the main idea I'm trying to convey is that almost every engine treats a skin as a skeleton, as a skin defines nodes as joints. Most importers expect a skeleton to be a subtree of some sort. The current spec allows for Hodge lodge monstrosities to exist that really don't have any benefit.

@vpenades
Copy link
Contributor

vpenades commented Aug 28, 2019

@marstaik I believe the problem lies in treating a skin as a skeleton. A skin binds a mesh to a selection of joints of a skeleton tree, but it is not a skeleton in itself. If importers expect a skin to be a full skeleton by itself, well, I am sorry but it is a wrong, or at least, an outdated approach.

What you call monstrosities I call a beautiful design. You see, I've been struggling with skinning for years, most formats before glTF used to handle skinning as if they did not know very well what they were doing, almost with fear, or by adding a lot of restrictions to cope with the limitations of the engines they were indended for.

But glTF is the first format that treats skinning as a first class citizen, and handles it with a standarized approach, so no wonder many engines, used to handle older formats with more limitations, now struggle to handle glTF.

So what now, we move engines forward to adopt glTF's standarized skinning? or we move glTF back because some engines have many limitations? then I would vote to limit skins to have up to 72 bones then.

Why should I not be allowed to create a model like this which happens to break every single point in your list? Yes, today some engines are not able to render it correctly. But some others can, and over time, there will be many more.

My I ask which engine are you using?

@marstaik
Copy link
Author

marstaik commented Aug 28, 2019

The file you provided fails to open in blender and Houdini, can you post the separate gltf json + binary glb?

I've used a variety of engines, UE4, godot, and I've made my own in the past. But it's important to note that it's not always a matter of limitations. You could make every engine in some way handle every type of format sure, but the reason many game engines have a skeleton as a single entity is for performance reasons, gpu bindings, etc.

There's a reason we bake data entities (matrices, points, normals) to flat buffers so we can pass them to the gpu. Allowing a skin to point to joints from different skeleton hierarchies makes the optimizations game engines put in for real time graphics pointless, or at the very least tedious and CPU cycle wastebins.

The other side of that argument is that if you know something doesn't work well in that format then just don't use it for that project. If I'm using a game engine that doesn't support more than 72 skeleton nodes, maybe I just shouldn't export more than 72 nodes.

But all of this was not the point of the post I made.

I made this point because the current definitions has way too much ambiguity.

If two skins define a node as a joint, what the hell am I supposed to do? You've basically given a node two parents in the sense that two skins need to know about this joint. There are no problems in an engine where everything is a node as you said, where every "joint" is a node in a tree and not part of a skeleton. But to do so is absolutely destructive for any real game engine that aims to have multiple animated skeletons in the viewport that need to render 60fps. It's just not feasible.

Remember that a skin just tells a mesh how to bind to the joints. It's still the transforms on the nodes themselves that cause animations to happen.

Almost every modeling program known to man (Maya, blender, Houdini, etc) have a skeleton system just for this purpose.

I thought the purpose of GLTF was to get a decent standard for passing scene data between programs and game engines, not a global, infinitely large expandable specification like USD.

If I can't even safely import a scene into 90% of game engines without having to write custom interpreting code then what the hell is the point of gltf? Might as well go back and use fbx if we're going to stick around with broken definitions.

@A-Lamia
Copy link

A-Lamia commented Aug 28, 2019

I have to agree with @marstaik on this situation, the only thing an importer should be doing is a 1:1 construction of the file, i don't think importers should be doing crazy logic to try figure out how to construct a file, there should be a clear consensus to make implementation simple and clean.

@vpenades
Copy link
Contributor

vpenades commented Aug 28, 2019

The file you provided fails to open in blender and Houdini, can you post the separate gltf json + binary glb?

Yes, here you have the zipped glTF , you can also find the source code that geneated it, and some info about it here.

Right now if you want to preview it you have to drop it on BabylonJS, or use the latest version of Windows10 3D viewer (previous version was also having trouble with the skins, but they fixed it recently)

That model might look sily, but this way of creating and reusing meshes can be very useful to create vegetation meshes, like grass, plants, trees, etc, where you have very few meshes instanced many times, and you want them to move with the wind.

I don't know about UE4, but I did report some skinning issues to godot a while ago, they're resolving them here.

glTF already forbids one skin to point to joints of different node trees within the scene, what it doesn't forbid is to have multiple skins poiting to the same skeleton, or multiple meshes using the same skin, which can be useful:

  • To overcome the 72max bones in engines that have that limitation.
  • To have the same mesh instantiated AND animated multiple times in the same scene.
  • For characters with multiple meshes, where you want to enable/disable some meshes at runtime.

@marstaik
Copy link
Author

marstaik commented Aug 29, 2019

@vpenades I inspected your file, and I fail to see how it invalidates any of the points I made. None of the skins reuse the same joints. None of the skins have extraneous joints in between themselves, 0 > 1 > 2 > 3 ... 9 are just a single tree on joint nodes. I may have missed something since I'm using my tablet.

Yes, there are multiple nodes that use the same mesh definition (instances) and a different skin, but that's fine. For #3 I was referring to a single mesh instance node can only point to a single skin (which is why I said it was implicit).

From a rough look, the file looks like a good example of what to do. None of the skin definitions share common nodes that would require some sort of skin merging for skeletons. All the skins have 1 root.

As for the "skeleton" node, I'm all for having it always defined or never defined. Having it be optional is useless.

@marstaik
Copy link
Author

marstaik commented Sep 2, 2019

https://github.com/KhronosGroup/glTF-Asset-Generator/tree/master/Output/Positive/Animation_Skin

skinD is a case that I believe to be a problem. Leaving non-joints in a joint hierarchy just causes so much extra effort on most importers to handle correctly.

@jbherdman
Copy link

Almost every modeling program known to man (Maya, blender, Houdini, etc) have a skeleton system just for this purpose.

I have to disagree with this assessment. I have dealt with data conversion to and from most 3D modeling programs over the past 18 years of my career. While a given UI may (or may not) encourage creation of "connected skeletons", there is often nothing inherently wrong or prohibited about using multi-rooted/disconnected sets of joints. A lot of systems, such as 3DS MAX, allow basically any transform-node to be referenced as a "joint" by skinning, regardless of where it may lie within the scenegraph.

If we impose these arbitrary restrictions on the glTF format, that just shifts the work from the importers to the exporters. That could benefit some importers, but it also has the potential to negatively impact importers that are actually flexible enough to support the current (and more generic) spec.

Surely, if there is a common set of "scene optimizations" for skinning, which benefit only a subset of potential importers/runtimes, shouldn't this be handled in some sort of transform/optimize tool (e.g. glTF->glTF), rather than imposing arbitrary restrictions on the spec? That potentially offloads the work from both the importers + exporters, and places it in a central tool that could be applied as-needed.

@A-Lamia
Copy link

A-Lamia commented Sep 7, 2019

I understand what you're saying but i don't agree with some of your points, currently working with Blender, Godot and Houdini I'm having an issue where there's always some sort of unique problem with how my glTF files are imported or exported.

If we impose these arbitrary restrictions on the glTF format, that just shifts the work from the importers to the exporters.

I'm not sure why that's arbitrary, the restrictions are there so when you use glTF in different software you can always expect it to work as it should, through out the software I'm using it seems like the importers are playing some crazy logic guessing game and things are not being depicted as they should, If the goal is a consistent experience through out many software it's currently not working.

If the exporter has a standardized logical system to follow, then the importer will always know how to build your file because the exporter will always export the same data.

@jbherdman
Copy link

Your comment indicates that you think the current skinning system is not standardized and/or logical. I would argue that it is. Where it maybe needs "cleaned up" is in terms of the raw math involved.

The spec itself is fairly quiet on the exact math behind the skinning calculations. It references the "glTF overview", but I'm not sure there is quite enough information to uniquely resolve all the corner cases that come up in skinning. That is probably where a lot of the inconsistent behaviour is coming from, in terms of importers/runtimes.

The core problem, as I see it, is that there needs to be a definitive spec for "glTF skinning math", including either pseudo-code or math formulae to describe the exact post-skinning position of a given vertex 'v' (in specific terms of the related glTF skin/joint/mesh elements). It probably also needs to be expressed in terms of "world space" coordinates, for clarity.

What the spec provides at the moment seems a little hand-wavy in sections, mostly a "go do skinning; your engine already has some stuff, I bet". That leads to all sorts of trouble, because every runtime engine (and modeller) out there is going to make slightly different decisions about how to handle the corner-cases of skinning. That's not something that can be solved by restricting skinning to "a single connected skeleton-tree".

Once the "glTF skinning math" is locked down, it would be up to the importers + exporters to ensure that they are being consistent with that math.

One prime example of a "corner case" is whether the transform on the (skinned) mesh-instance factors into the "post-skinning, world-space" position of each vertex. I've seen systems go either direction on that decision, and there is no solution other than to recognize this and account for it during import/export. glTF would need to have a policy covering that case, and then it would be up to the importers + exporters to modify the data as-needed according to their own internal logic/rules/math.

@vpenades
Copy link
Contributor

vpenades commented Sep 8, 2019

One prime example of a "corner case" is whether the transform on the (skinned) mesh-instance factors into the "post-skinning, world-space" position of each vertex.

I think that glTF already agreed that the transform of a skinned node does not factor in and it should be discarded. Actually, I believe the glTF validator gives an error if it finds a node with a Skin and a transform.

Part of the discussion before is that a @marstaik needs to interpret Skin as a full skeleton, which, as I tried to explain, is a wrong interpretation of what a Skin in glTF is.

To clarify, the definition of a Node's transform could be reworded as this:

A node can have either:

  • A Simple transform
  • A Complex Transform

if a Node has a Skin, then it can be considered to have a "complex transform", otherwise it is a simple transform, so they're mutually exclusive.

  • When in simple transform mode, the mesh is brought to world space by the world transform of the node.
  • When in complex transform mode, the mesh is brought to world space by the world transforms of the nodes pointed by the skin.

I think this way of seeing how skinning woks is a bit more clear, but I agree the current design may be deceiving.... for that purpose I proposed #1660 , which enforces the exclusivity of simple/complex behavior explicitly.

@jbherdman
Copy link

I think that glTF already agreed that the transform of a skinned node does not factor in and it should be discarded. Actually, I believe the glTF validator gives an error if it finds a node with a Skin and a transform.

That does seem to be implied in the spec (2nd "implementation note" under the '#skins' section of the spec), but the wording also seems open to potential misinterpretation. Rather, it is hard to decode the meaning of that implementation note, as currently worded, if you don't already realize that there is a choice to be made in how skinning-math can be implemented.

In contrast, in the 'glTF overview' images, there is an example vertex-shader for skinning, which includes the line "gl_Position = modelViewProjection * skinMatrix * position". Without further context, it seems easy to misinterpret that line as saying that the skinned node's transform should be taken into account.

Part of the discussion before is that a @marstaik needs to interpret Skin as a full skeleton, which, as I tried to explain, is a wrong interpretation of what a Skin in glTF is.

I very much agree with you on that point, and disagree with any requirements to define any restrictive rules pertaining to "full skeletons".

While external runtimes may have their own rules about what support structures are needed for skinning/skeletons (such as fully-connected skeletons, etc), I don't see why those restrictions need to be pushed upstream into glTF. The existing skinning system is flexible, and fairly elegant. It may need some corner cases tightened up, or at least existing decisions to be reflected more clearly in the spec, but it seems to be on the right track without needing additional restrictions on the layout of joints/skeletons.

@reduz
Copy link
Contributor

reduz commented Sep 8, 2019

@jbherdman

If we impose these arbitrary restrictions on the glTF format, that just shifts the work from the importers to the exporters. That could benefit some importers, but it also has the potential to negatively impact importers that are actually flexible enough to support the current (and more generic) spec.

Sorry, but this logic is broken. This will negatively affect 1% of the importers and positively affect the remaining 99%, by making them immensely simpler. For a very rare corner case you are making the specification an order of magnitude more complex for every importer. I don't think this is in the spirit of GLTF2 also, which aims to enforce a single way of doing things to ensure the best possible compatibility. Your way of thinking is what made Collada a failure, we should stay away from that.

I really think @marstaik suggestions should be made core for version 3.0.

@reduz
Copy link
Contributor

reduz commented Sep 8, 2019

@donmccurdy @pjcozzi We are probably never ever going to support these situations in our importer, and I doubt any large game engine will either. I really do suggest Khronos or those responsible for GLTF spec do some damage control on this situation before more exporters keep exporting unusable files and GLTF2 ends up becoming another Collada.

As it stands, the format will keep finding incompatibility between exporters and importers for trying to be too flexible, and I believe this is entirely what GLTF2 tried to prevent.

I am sure that, when the spec was originally created, it was never intended to be used this way, so I would really try to close this gap in the 2.0 spec by adding extra clarifications, else by the time 3.0 comes out, it may be too late.

@reduz
Copy link
Contributor

reduz commented Sep 9, 2019

The clarifications I would add to the spec, taking from the OP:

  • Making the skeleton property in skin mandatory, which should always be to the parent of a bone (this way we can easily tell that a bindpose is relative to this node). This will greatly reduce ambiguity in the current situation of the spec, where it is optional. I know importers can somehow guess this, but not including this property is forcing more complex bug/prone importers, whereas for exporters adding this property (what bindposes are relative to) is no effort.
  • All joints in a skin must be connected. No game engine supports disconnected joints. Having them disconnected may work in a GLTF2 viewer (which plays single animations in skeleton local space and does no blending), but it makes importers to game engines hell (we need bind poses for all bones, because we convert animations to bone local space for animation blending, skeleton local does not work for blending. If bones are missing, we need to do heavy guesswork and invent incorrect bind matrices) . If an exporter really wants to have disconnected joints, then it needs to export another skeleton.
  • A mesh must only be able to bind to a single skin. No game engine supports binding a mesh to multiple skins. If an exporter wants to do this, it needs to join both skeletons.
  • While this is implicit in the spec, it should be made clearer that the expected way to export skinned meshes is by making the geometry skeleton-local, by applying skeleton_xform_world_inv * mesh_xform to the mesh vertices, tangents and normals.
  • Having multiple skins share joints should be outright forbidden. I've seen many exporters doing this and it also makes it hell for game engine importers. No game engine supports this, so we are forced to make really complex guesswork and duplicating everything (considerably reducing performance). If they really want to do this, they should just create the joints multiple times, but I've seen exporters do this to work around the "mesh needs to be local to skeleton" limitation, duplicating skeletons instead of making the meshes local. Having this as a requirement will force the exporters to properly localize meshes to the skeleton (which is why again, I suggest clarifying how this process is done, as the math is not super obvious for most).

All the above would make the spec regarding skins strict enough so exporters are forced to make gltf files that are easy to open and don't need any guesswork for importers.

I know many files exists that will be broken after these changes, but this ensures that, from now on, exporters have clear rules they have to follow to produce non ambiguous gltf2 files that will always open on importers (that won't need to do guesswork, or write an implementation to later realize some files don' t work).

@jbherdman
Copy link

* While this is implicit in the spec, it should be made clearer that the expected way to export skinned meshes is by making the geometry skeleton-local, by applying `skeleton_xform_world_inv * mesh_xform` to the mesh vertices, tangents and normals.

Is that actually the correct behaviour? I would like for someone with more knowledge than me to rough out the exact math-equation for "postSkinnedPosition = someMatrix * origPosition". There are a lot of bits-and-pieces related to this within the spec (and glTF-overview) at the moment, but there isn't any one cohesive end-to-end layout of all the math. Note: implementations/runtimes may vary on how they get to the same mathematical answer, but there needs to be some standard equation in the spec, to avoid guesswork + errors.

@reduz
Copy link
Contributor

reduz commented Sep 9, 2019

@jbherdman Yes, this is the intended behavior, it's explained here:

https://github.com/KhronosGroup/glTF/tree/master/specification/2.0#skins

Implementation Note: Client implementations should apply only the transform of the skeleton root node to the skinned mesh while ignoring the transform of the skinned mesh node. In the example below, the translation of node_0 and the scale of node_1 are applied while the translation of node_3 and rotation of node_4 are ignored.

This limitation is vital, because otherwise exporters can easily screw up and make skeletons and meshes not share the same space (which was a common problem in exported Collada files).

This is why I say that it should be made clearer how to do this, because exporters half of the time do it the wrong way. They make the skeleton mesh local instead, which does not work when you have multiple meshes affected by one skeleton, so in this case they duplicate the skeleton, and make the copies share joints.

This is a waste, because when this is imported to game engines (which of course none support this), you end up with one copy of skeleton per mesh, and all animation tracks duplicated. which is a lot more inefficient.

This is why, the right thing to do is to force exporters to do this process properly, by forbidding joints sharing by skeletons, and then explicitly explaining the process (math) of making meshes local to skeleton. This would ensure importers get GLTF2 files without waste.

@marstaik
Copy link
Author

marstaik commented Sep 9, 2019

One of the biggest issues is that in many, many files, the InverseBindMatrices exported for a skin represent some arbitrary bind pose that is different from the pose the joints create in the scene graph!

You will see that a lot of the time the InverseBindMatrices expect a joint transformation that is different than what is actually seen by the transforms shown in the node tree.

I have seen issues where the the "scene pose" of the joints is in A-Pose, and half of the skins are bound to A-Pose (and their IBM's reflect A-Pose) and yet the other half of the skins are bound to T-Pose (and their IBM's reflect T-Pose, presumably because that mesh was made for T-Pose).

These meshes need to be transformed to the space that the skeleton is represented with in the scene graph. There is way too much broken behavior and loop holes going around here.

You can no longer ignore the IBM's exported, as they contain encoded pose data for when the mesh was bound! There is no elegant way to work around this either.

@jbherdman
Copy link

Based on my own understanding of "skinning" in general, which lines up with the viewpoint that @vpenades seems to be taking, I'm actually surprised to see that there is a 'skeleton' attribute on the 'skin' at all.

@reduz After re-reading that particular implementation note for the 20th time or so, and paying very-close-attention to the term "skeleton root node" in there, I now have a very different understanding of how glTF probably intends to do the math. It doesn't surprise me at all that this is a point of confusion for all sorts of people/importers/exporters. The spec probably needs to be less subtle & implicit about what is being declared.

In my own view of the world (perhaps not shared by glTF) bind-poses are either done relative to world-space, or the mesh carries its own bind-matrix (relative to the IBM's stored for the joints).

@reduz
Copy link
Contributor

reduz commented Sep 9, 2019

@marstaik The inverse bind matrix usually is just the rest from your modelling program. You can obtain it easily in Maya, Blender, Max, etc. The only confusing part is what they are relative to. This is why I think the skeleton tag must be mandatory, this just simplifies guessworking entirely for us game engines which do all have the concept of skeleton.

If this tag is not included, then we need to kind of guess where to put the skeleton node, and there is room for exporters to screw up. Collada has a much better concept of Skin in this regard. Still this is probably the least harmful of the points I listed (it just makes importers less bug prone) and worst case it could be left as-is. The others should definitely be mandatory changes.

@reduz
Copy link
Contributor

reduz commented Sep 9, 2019

@jbherdman Yes, same happened to me, at first I didn't understand why exporters were doing such convoluted things like using multipe skeletons sharing a skin, but then it becomes obvious that they were trying to workaround this limitation the wrong way.

This is why insist that we need to combine forbidding the sharing of joints for multiple skeletons with a good description on how to localize the meshes to the skeleton else we'll continue seeing exporters that produce files that are unreadable for game engines.

@jbherdman
Copy link

@reduz This might be a silly question, but if all joints were to have the same bind-pose (IBMs) for all skins, does that get around your desire to forbid multiple skins sharing joints? I'm just trying to get my head around the "real problem" here.

@marstaik
Copy link
Author

marstaik commented Sep 10, 2019

@jbherdman I think I can see now where a lot of our confusion stems from. I will try my best to describe the situation I am seeing.

You are right that I was defining a bind-pose, but that in itself is not the issue.

In the current skin-centric model, the skin describes a the bind pose of a mesh relative to a set joints, with what position they were in when bound recorded into the IBM's. This is the "bind pose" we are talking about.

The problem that I am seeing is that this allows another skin to have IBM's that describe the joints as if they were posed in a different "bind pose".

Again, in a skin-centric model, this may be fine. But that assumes that you have what you mention earlier:

Dealer's choice, really, I just don't have a "mesh bind-pose" slot to insert it, which is fine. Then, I go take my joint-bind-poses, invert them (because that's how they are most useful at runtime), and store them into the IBM's for the skin

It relies on the existence of per-mesh bind poses. And why is that?

Well, most game engines are going to put these joints in one skeleton (and for good reason), and the meshes then link to the skeleton, and therefore will use the skeletons IBM's. They then get copied into the GPU buffer straight from the skeleton.

And this is the problem. If you have one skeleton, you can only have one bind pose for said skeleton. But above I mentioned that two skins referring to the same joints can have two different bind poses.

Say we use the first mesh's IBM's for the skeletons bind pose. Now the second mesh will bind to the skeleton incorrectly. We would have to make a custom node to map the skeletons bind pose to the bind pose that the second mesh expects.

But now, there's even more ambiguity. Say that mesh one and mesh two had two different bind-poses, but now the joints themselves were exported in an entirely different pose. What do I decide to use as the actual bind-pose?

But then, here is another issue: To even create said bind-pose mappings for the second mesh, the first and second IBM's need to have their world space matrix removed. So I need to use the data from the mesh + skin and compare it to the the root joints location in the scene to compute the world transform of the root, and extract that out from all of the IBM's. This ends up being a huge mess.

Please let me know if something needs additional explanation.

@jbherdman
Copy link

@marstaik Yes, thank you, I think we are both on the same page now, in terms of understanding the problem.

What we're talking about, though, are the fundamental limitations of "skeleton"-based systems vs. "skin"-based systems. At the end of the day, if the skeleton-based system can only store one IBM per joint, then "some modification" needs to happen. In the past, I've specifically written exporters that go from "skin"-based systems to "skeleton"-based systems. It is a pain. Basically, the only perfect solution is to duplicate the joint-transform-nodes into different hierarchies, just so that each joint-transform-node can have its own unique IBM. That is terribly expensive at runtime, of course, because now you are unneccessarily computing the animation/transforms for 2x (or 3x or worse) as many joint-transform-nodes.

One step shy of that, if you get really lucky on how the artist created the input-data, you can try to massage the bind-poses so that they overlap properly. That is, you can sometimes get lucky and make the IBMs 'shared' across multiple meshes/skins, if the problem is simple enough that you can solve it by multiplying the mesh-bind-pose into the mesh.

For example, let's go with that example of a skinned-character who has a separate "head" and "body" mesh. If you're lucky, both meshes were bound in the same set of joint-bind-poses, but probably have different mesh-bind-poses (as their mesh-local origins are likely different). As long as you multiply the mesh-bind-poses into each mesh (rather than into the joint-bind-poses), then you can successfully share the same joints in a "skeleton"-based system. But, if the artist did "something weirder" (such as binding the head to a bind-pose where the characters neck-joints are at a different set of relative angles vs. the body-bind-pose), then you will be left with no choice but to duplicate the joint-hierarchy when exporting to a "skeleton"-based system.

The more optimistic situation is that ideally the game-engine could shift from "skeleton-based" to "skin-based". It requires just one extra layer of indirection -- each skinned mesh would have a collection of "joint" objects which held the IBM + a pointer to the scenegraph-node from which that "joint" receives its animated transform-data. It's a relatively minor shift, and it lets you get away from the worst-case-scenario where you need the exporter (or importer) to potentially duplicate the hierarchy of joint-transform-nodes.

I personally think that glTF supporting a "skin"-based system is great, and still the way to go. If a lot of runtime engines need support transforming data from a "skin"-based system to a "skeleton"-based system, that seems like a common tool that could be written (e.g. glTF->glTF transformation). Exporters could try to be more aware of the situation & challenges, but sometimes their hands are tied by the source data being "weird" because of something the artist did.

@vpenades
Copy link
Contributor

vpenades commented Sep 10, 2019

Having a glTF with multiple skins that share joints, but don't share IBMs is required for some artistic workflows.

One of the classic issues with skinning is extreme vertex deformation at acute joint angles. I've seen very skilled artists come with a very clever solution: Instead of modelling the character in the classic T-Pose, they model it in a fetal or relaxed pose. This way of modeling allows for a more natural vertex deformation at the joints.

Skin_Pose

But when they want to put clothes over the naked character, and the clothes are made of separated meshes, they move the skeleton into a classic T-Pose, because it's easier to model clothes in that pose.

combination
glTF clothing live demo

So, since the base body and the clothes have been bound to the skeleton at different times, the IBMs of the base body skin and the clothes skins will be different, which is good because a skilled artists can take advantage of this to reduce deformation artifacts.

This is by no means a rare case. Now, the solution that @marstaik proposes is that the exporter merges all the meshes and skins sharing joints into a single big mesh-skin, doing the reverse maths and duplicating joints if neccesary.

The problems I see to it are:

  • This effectively changes glTF from a skinned-centric file format, to a skeleton-centric file format, it is a huge paradigm change, which is probably too late to overcome, specially when some engines have been successful in taking advantage of the skinned-centric approach of glTF.
  • What if I don't want to merge the meshes, because in my application's logic I want to enable/disable some of the meshes visibility?
  • What if one of the meshes you want to merge has morph targets? how the heck are you going to merge that?
  • And why should we do it? Skinning-centric paradigm is superior than skeleton-centric paradigm, because it has less limitations and lets artists workflow to be exported seamlessly. I don't see the point in moving to a lower standard with more limitations.

Now, being practical: for those skeleton-centric engines that have a hard time importing glTF, maybe they need to rethink how they're handling glTF, instead of pretending it's a skeleton-centric format and failing to import 30% of the glTFs around. Maybe what's needed for those cases is to wrap the glTF scene into a master glTF node within their hierarchical node system.

@marstaik keep in mind that not everybody is doing videogames, in our case, we're using glTF for biomechanical and anthropometric research and visualization, so rendering performance is not a priority, but having a consistent skinning-centric file format is absolutely critical.

Skinning with shared joints with different IBMs is not a monstruosity, it's a much needed feature, and you cannot ask it to be removed just because you don't consider it important.

@marstaik
Copy link
Author

marstaik commented Sep 10, 2019

@vpenades @jbherdman What if we can have the best of both worlds?

I think now that we have identified the multiple-bind issue, we can make a solution that benefits everyone.

The good thing is, for the IBM's at least, there is a "performant" solution for game engines to this problem. We can add a final bind transform on the mesh instance, in the the same flat buffer format as the GPU, call this the bind_pose_offset. The shader can then take:
final_ibm[i] = bind_pose_offset[i].inverse() * skeleton_bind_pose[i].inverse() where i is the joint index, since final_bind_pose = skeleton_bind_pose[i] * bind_pose_offset[i]
to compute the actual IBM's/Bind Pose needed. The biggest cost of this is memory, as we need to hold the skin's bind_pose differences somewhere.

You have to understand that there is a big downside to a skin based system, because if I have to follow the references a skin has every frame (let it be a c++ pointer) and have to copy the transforms into a buffer to get sent to the GPU, that is a lot of wasted cycles and memory copies every frame. Especially since it now goes from one skeleton per frame, to one mesh_instance per frame. Ouch.
The above solution could make it very minimal, with only a larger memory footprint.

However, I still think some of the points I made above stand.

  • The "joints" defined in a tree need to be a strict sub-tree. Removing the existence of this:
    A[j] > B[j] > C[n] > D[j] where the skin is defined as: Skin: A, B, D and forcing the exporter to make either C a joint, or, parent the real C to a joint Cb so that the Skin definition is a complete sub-tree should be required.

Now, we can do something similar to point [6] in the OP, and either make a "skeleton" or call it a "master skin", but it would be ideal that there is a way to gracefully resolve the different IBM's.

Something like this (rough idea):

  • Every skin must be a complete subtree with the skeleton property defined (no longer optional).
  • Each skin shall either define a new tree of unused joints, or, explicitly be a subtree of a previously defined skin. Let this be known as the "master skin". If a skin is a subtree of a master skin, let it be called a "child skin".
  • The skeleton property points to a node that may or may not be a joint in the skin, but it must have its child be a joint in the master skin.
  • The child skins skeleton property is the same as the master skins. (This lets us group them together easily and find the relations)
  • Let the IBM's of both master skins and child skins be localized to the skeleton node (ie, treat the skeleton node as the origin).
  • The joint indices defined in JOINT_0 of the mesh should be the indices of the master skin.

The benefits I see here are:

  • No more random nodes in-between joints
  • You can directly parent the skinned-mesh to the skeleton node so that it inherits the transform, and the IBM's agree with this, since they are skeleton local. Transform of the skeleton node + IBM's should always create a valid bind.
  • You can still have multiple bind-poses for meshes. All of the skins can define the IBM's. If it is a child skin, we can easily compute the difference between the bind pose transform of the child skin compared to the master skin, and accommodate that with a solution similar to the one I posted above.
  • Still lets you have a skin-centric model. Multiple skins can use the same joints, we just have to be more clear about it and have them be a subset.
  • If an importer was written correctly from the start, and obeyed the skeleton property being defined, then this should work with all existing importers without much modification, as if they supported the skin-centric approach they wouldn't even have to deal with the master skin except for the JOINTS_0 of the mesh being the indices of the master skin and not the child skins (but that's a really easy map).

The downsides:

  • Exporters need to do a bit more work.
  • Current importers need minor modification: While importers that could handle the current skins correctly could also handle the new skins almost perfect, there is still a little bit of work to do.

We could keep GLTF skin-centric, which I agree is more flexible with binding meshes to joints, but also make it much, much easier for importers (not limited to game engines) to parse and represent properly.

@jbherdman
Copy link

@marstaik It seems like you didn't really absorb the excellent use-cases that @vpenades put forward, though? In those cases, the multiple/conflicting IBMs used in different skins can be a feature, not a bug. So, I'm not really sure why you still want to move towards a glTF restriction that seems to move away from those capabilities?

Every skin must be a complete subtree with the skeleton property defined

This really doesn't matter (or even make sense) to a "skin"-based skinning system. I'm still unclear on why you consider this necessary, except that it might make life easier for your particular skeleton-based runtime engine?

Likewise, the performance issue you raised seems negligible. Under a "skin"-based system, you are still only paying a proportional cost to what you are rendering. You never even have to "duplicate" the node-transforms, if you can index into them appropriately. And you still only have 'N' node-transforms in your glTF file, even if some of them are being referenced by some 'k > 1' number of skins. You can just build an array of IBMs per-skin, and index into the appropriate IBMs-array as needed. Assuming that the engine/runtime is doing anything other than just throwing skinned meshes at the graphics card, I'm not convinced the performance difference would even be measurable.

At a certain point, if you are using a target runtime-engine that is "less capable" than glTF (e.g. skeleton-based skinning), the only real option is to control your art pipeline. I would argue that, what it sounds like you want to do is actually best served by:

  • Ensure that your artists aren't creating files that have multiple meshes/skeletons/skins, or disconnected-skeletons, or whatever else is causing grief for your engine
  • Perhaps contribute a fix to whatever glTF exporter is causing you grief. If, for example, the incompatible-IBMs are being caused in cases where you "know they shouldn't be", maybe an exporter somewhere could benefit from a change to multiply the mesh-bind-pose into the mesh-data instead of the IBMs?

Or, as I previously suggested, you could "pre-process" glTF files to more directly suit your purposes. With a good understanding of the "skin"-based system that currently exists, you could manually process the glTF files to be more compatible with your skeleton-based engine. If you add that processing into your own art-pipeline, then you can prevent "bad data" from hitting your engine (where the definition of "bad data" is purely from your engine's perspective).

Think of it this way, if your runtime engine didn't support morph-targets, you wouldn't be complaining about morph-targets being part of the glTF spec. You would be controlling your art-pipeline so that your engine didn't have to deal with morph-targets.

@marstaik
Copy link
Author

marstaik commented Sep 10, 2019

@jbherdman Please re-read my post. What I posted supports @vpenades use-cases. It still allows child-skins to define their own IBM's. They just need to be relative to the skin root.

This really doesn't matter (or even make sense) to a "skin"-based skinning system. I'm still unclear on why you consider this necessary, except that it might make life easier for your particular skeleton-based runtime engine?

Yes, it makes life much easier for skeleton based systems. Is that such a bad thing?

Likewise, the performance issue you raised seems negligible. Under a "skin"-based system, you are still only paying a proportional cost to what you are rendering. You never even have to "duplicate" the node-transforms, if you can index into them appropriately.

You are neglecting that the "joints" in the scene get posed by animations, and without a skeleton saving those in some flat buffer, you need to visit those nodes one by one and get their transforms, every frame, and put them into a buffer for the GPU. That is a lot of wasted CPU cycles.

What is the point of a format that is only good for everything other than high performance rendering? The current reasoning is "Lets just make some of the primary consumers of GLTF data have to go through many hoops to get a satisfactory result." It sucks. You could be a little bit stricter on the skin definitions and make life not a living hell for them.

@jbherdman
Copy link

You are neglecting that the "joints" in the scene get posed by animations, and without a skeleton saving those in some flat buffer, you need to visit those nodes one by one and get their transforms, every frame, and put them into a buffer for the GPU. That is a lot of wasted CPU cycles.

@marstaik The transform-nodes get updated by animation data, sure. And computing the transform-node matrices from the animation-data each frame is expensive on the CPU -- far more expensive than copying those computed matrices around. And in the bigger picture, that is all still basically "free" compared to the rest of the CPU cycles you are likely to be spending each frame.

That said, I don't see why you couldn't design a system to place all the transform-nodes matrices into a single flat-buffer (indexed by transform-node-id), and send that to the GPU the same way your "skeleton" case would. I'm just not convinced that optimization would make any measurable difference to the runtime performance. (Sure, it might technically save you CPU cycles, but would you ever be able to measure the performance difference between a system that had that optimization, and one that didn't?)

@vpenades
Copy link
Contributor

Regarding optimization and performance, it is possible to create a very inefficient glTF file, in the same way you can create a JPEG with a very bad compression algorythm that takes a lot of space while the quality of the image is bad.

For example, you can create a glTF model with 100 meshes where each mesh has its own vertex/index buffer. So it's 100 buffer bindings and 100 render calls.

A glTF optimization pipeline can take that glTF file, analyze it and squeeze every bit from it so it produces a new glTF with a single mesh, or multiple meshes if the meshes are not compatible, but maybe with a single vertex/index buffer, so it will render faster.

But this doesn't mean that an engine, any engine, should only be able to display the optimized mesh, and complain about the unoptimized mesh. A glTF compatible engine should try to replay the contents of a glTF model with as much precission as possible, optimized or not.

If a glTF model comes with 17 mesh-instances and an engine needs to do 17 rendering calls, so be it, it's what's needed to display that particular glTF model. If performance is an issue, we can develop tools to try to merge what's mergeable and optimize the vertex/index buffers, and convert the textures from JPEG to DDS or whatever.

But ultimately, any engine should try to render the contents of a glTF as it comes, and not try to overthink how should had been rearranged.

One solution I do like is what Windows 10 3D View app... it tells you the number of render calls, along with number of polygons, so you can get an idea of how expensive is to render a particular model, and then an artist or a developer can try to improve it.

@marstaik , @jbherdman , and guys, I feel like we're running in circles, all the arguments have been laid politely, and I don't think I have much more to say, so I'll leave this open so people from khronos can read this thread and leave their opinion or veredict on this.

On my side, I'll probably release my monogame glTF code soon, so it might serve as an use case.

Peace! 😄

@Selmar
Copy link

Selmar commented Oct 2, 2019

Just to give you another idea of what happens out there in the wild, I figured I could contribute my experience, for what it's worth. I am by no means a skinning expert, though I believe to understand the individual parts by now.

Shortly after I started at my current company, I used our engine's (skeleton-centered) skinning system to implement glTF skinned meshes. There are still a number of issues outside of the mentioned limitations below.

Specification issues

I've had my share of issues with ambiguities and guesses, which came in part from a lack of understanding and in part from a lack of clarity on the specification. I've made an attempt to summarize the problems encountered:

  • Why are inverse-bind matrices necessary, when I can derive the t-pose matrices from the scene graph?

    I currently believe this is only really necessary for advanced skinning techniques.

  • What are inverse-bind matrices relative to?

    I still do not know for sure. I was assuming inverse bind matrix * mesh's node transform (as defined by the initial node hierarchy) * vertex transform == local joint space vertex transform, so relative to the mesh node transform, though we do not use the IBM's, so I haven't dug into it.

  • Is the joints array the skeleton?

    No, a skeleton does not really exist in glTF. The skeleton is implicitly represented by the node hierarchy.

  • What does the skeleton property mean?

    It is the root of the implicit skeleton, but the exporter we worked with did not point to the correct node, leading me to calculate the skeleton root myself.

  • What spaces are the different matrices/objects in that are described in the gltf skinning explanation (this image)? most sentences say either what they transform from or to, not both, or their starting point is ambiguous (i.e. transforms the mesh into local space of the joint does not explicitly mention in what space the mesh should be).

    I thought several times that I knew, but I'm still not sure whether our implementation (and by extension the implementation of the exporter we use) on this topic is correct, although things are working.

Engine implementation

Our engine has a 1-1 pairing of skin-skeleton and treats nodes and bones as separate entities. Thus, with the limited knowledge I had back then (and time constraints), the result is something rather inefficient, but mostly functional:

  • for every skin, we have a unique skin/skeleton pair
  • the explicit skeleton hierarchy required by our engine is built from the skin's joint list, finding the actual skeleton root and recreating the hierarchy
  • nodes used by a skin exist as both a bone and a node (duplicate transforms)
  • nodes used by multiple skins exist as a separate bone in every skin-skeleton pair (duplicate transforms)
  • supplied inverse bind matrices are ignored (our engine didn't have a way to use them directly anyway)
  • we do not support non-uniform scaling for nodes, but we do for bones, leading to potential mismatches if non-uniform scaling is used for bones (we currently enforce uniform scales)
  • not really related, but we use material names as unique identifiers for the materials of the skinned object, meaning we have to enforce unique material names when importing a glTF file.

Conclusion

To me, the above results are purely engine limitations; the engine is less flexible than the glTF specification. This can be annoying, but I like the glTF approach more, on a conceptual level. It can do everything a skeleton-centered approach can do, and more. When done right, I believe an implementation does not need to have a larger complexity or runtime cost than a skeleton-centered approach, either. But, of course, we usually don't take the time to rewrite our skinning code.

@marstaik
Copy link
Author

marstaik commented Oct 4, 2019

I'm coming back to this issue after spending some more time with Maya, Blender, and a few different Importers and Exporters.

I have accepted that the joints array in the skin definition does not need to be a strict hierarchy. In terms of exporting a closer rendition of the scenes defined in 3d modeling programs, this is now reasonable to me.

From Maya and Blender, I was able to bind to joints/bones (not any random node) not part of the same hierarchy:

2c3

Note that in Blender, I had to bind to two separate armatures to mimic this behavior, but yes, it is possible:

2c2

However, I was not able to get a mesh to skin itself to another mesh (or other non-joint object).

I was able to handle importing these non-strict-tree skin definitions in Godot's glTF importer by performing union of disjoint sets and creating fake joints where non-joints lie in between joints.

But, now having used/dealt with various importers and exporters, I have come to the conclusion that the real issue that creates ambiguity in exported files is this:

Implementation Note: A node definition does not specify whether the node should be treated as a joint. Client implementations may wish to traverse the skins array first, marking each joint node.

Joints need to become explicit in the glTF specification, and I'll show you why:

joints

The current specification implies that skins define what the joints are in the scene. This too however is incorrect. It's the modeling program that defines the joints, not the glTF file.

If you export from whatever modeling program and re-import the exported file, you will not get back the same result most of the time.

Imagine trying to export separate meshes bound to joints and bring them into a single scene later.
Imagine trying to export animations separately from meshes and bring them into a different scene later.
Imagine trying to export just a skeleton tree (connected joints) and you can't (without an empty skin).

Now imagine trying to do all of the above while having to insert fake bones and create a skeleton definition for a game engine. However, because each scene in the modeling program has skins that mark different nodes as joints, the logic required to interpolate a skeleton so may never produce the same skeleton for different scenes.

Now try to deal with assets from Maya and Blender, with exporters written by different people.
Some strip out zero skin weights - and why shouldn't they? The are unused by the skin.
See: iimachines/Maya2glTF#93

So should exporters get around this by exporting a skin with no IBM's just to mark joints? This seems extremely stupid.

I believe that glTF needs to treat joints as first class citizens. They need to be marked on the nodes, the same way that meshes and skins are marked, even if its just a boolean flag.

Further, since I cannot get Blender or Maya to bind to anything other than joints/bones, I would propose that any "joints" in the "skin" must actually be marked "joints" in the node hierarchy.

Finally, the modification to the specification should look something like this:

  • Joints must be marked clearly in the "nodes" array, and thus be promoted to first-class citizens.
  • Joints must not be a mesh, camera, or anything other than a transform.
  • Skin joints array must point to nodes which are marked as being "joints".

So, could an exporter still not mark the original joints in the modeling software as joints?
Yes. But it would be completely stupid to do so.

I believe that these simple changes (in addition to perhaps some clarification of the IBM's) could greatly improve the consistency of scene exports/imports across multiple applications.

@jbherdman
Copy link

I'm glad to see that you are starting to come around.

I believe that glTF needs to treat joints as first class citizens. They need to be marked on the nodes, the same way that meshes and skins are marked, even if its just a boolean flag.

Further, since I cannot get Blender or Maya to bind to anything other than joints/bones, I would propose that any "joints" in the "skin" must actually be marked "joints" in the node hierarchy.

The thing is, I would claim that Blender and Maya fail to treat their joints as first-class citizens. I had to go remind myself, but Maya more or less limits skinning to use its special "joint" nodes. (You may or may not be able to get around that at a lower API level, but it would probably give the UI a headache.) I haven't dealt with Blender much, but the "armature" system brings to mind Lightwave-style bone-skeletons (and other "bones are different than transform nodes" systems from the 90's).

A system like 3DS MAX doesn't have those restrictions. You can happily fire up MAX and skin a mesh using 2 camera-nodes as its "joints". That is because a "joint" isn't a special/distinct node type; you can use anything that has a transform-node as a joint, assuming that you store the appropriate bind-pose data somewhere.

So, in my mind, glTF is already treating its joints as "first class citizens" by simply allowing any transform-node to be referenced as a joint, and not requiring joints to be specially marked via some separate mechanism.

If you export from whatever modeling program and re-import the exported file, you will not get back the same result most of the time.

That is generally true for all non-trivial data conversion. It is much like running a sentence through Google Translate into a different language, and back again. Best case scenario, you will get something "functional", but the process will strip off a lot of nuance and artistic style from the original.

@marstaik
Copy link
Author

marstaik commented Oct 4, 2019

So, in my mind, glTF is already treating its joints as "first class citizens" by simply allowing any transform-node to be referenced as a joint, and not requiring joints to be specially marked via some separate mechanism.

This seems extremely contradictory. If you want to say that skins are just use node transforms, that is fine - by your logic skins don't define joints, they just use nodes. Then keep it that way in the definition. But in a 3D modeling application and for many importers, they expect to be able to easily tell what a joint is.

A joint is a named entity in almost all modeling software. I see absolutely no reason to ignore its existence and shove it under the rug.

That is generally true for all non-trivial data conversion. It is much like running a sentence through Google Translate into a different language, and back again. Best case scenario, you will get something "functional", but the process will strip off a lot of nuance and artistic style from the original.

If you are a bad translator, sure. But I would expect a proper open source specification to allow, lets say, a blender document to be exported via glTF to Maya/MotionBuilder to do some proper motion capture handling, and then be able to be brought back into Blender. Or maybe go from MAX to Blender and back. A lot of 3D pipelines require consistent transfer between applications.

What is the harm in representing actual joint nodes in the specification? If the importing application doesn't care, then it doesn't care. But most of them do.

If you wanted to be less strict you could:

  • Rename the "joints" array in skin to "nodes"
  • Add in the "joint" flag on actual nodes, and force them to only be a transform

Maybe its better that way. At least the specification isn't trying to lie to itself.
You can let any skin bind to whatever nodes it wants. But if we still mark actual joint nodes, importers can easily tell the user "Hey, we don't support skins on non-joint nodes" and call it a day. The current specification makes it difficult to even do that, because the skin defines "joints".

Sadly glTF doesn't have the weight that Autodesk has with FBX, since Autodesk has a complete modeling/animation pipeline and age to back it. And at this rate it never will if you don't allow game engines to make better use of this format. I and many others may as well go back and use FBX. It may be broken/inconsistent but at least anyone that uses the provided SDK's can generally make an importer that doesn't explode.

@lexaknyazev
Copy link
Member

@julienduroure
Could you please provide Blender-IO perspective on #1665 (comment)?

@Selmar
Copy link

Selmar commented Oct 4, 2019

Imagine trying to export separate meshes bound to joints and bring them into a single scene later.

This is already problematic because joints are indices into the nodes array. To match indices, the entire hierarchy would have to be exported in every scene. I don't think partial exports are in any exporter's mind, currently. We're starting an implementation ourselves, where the most difficult challenge is animation target identification across glTF files. But that's a different topic.

Imagine trying to export animations separately from meshes and bring them into a different scene later.

I don't see why this is problematic specifically with skins and joints. Exporting animations separately is already difficult, because, as described above, the only thing you have to identify nodes across different glTF files is the name, which isn't necessarily unique. Animations work just like the skins; they can reference any node arbitrarily. Since joints are nodes, I don't see a joint-related problem here.

Imagine trying to export just a skeleton tree (connected joints) and you can't (without an empty skin).

There would be no way to mark it as a skeleton tree, indeed, but you should still be able to export the hierarchy as usual.

Now imagine trying to do all of the above while having to insert fake bones and create a skeleton definition for a game engine. However, because each scene in the modeling program has skins that mark different nodes as joints, the logic required to interpolate a skeleton so may never produce the same skeleton for different scenes.

Assuming the above problems are solvable, it may not be the same skeleton, but it should still have consistent results, correct?

Add in the "joint" flag on actual nodes, and force them to only be a transform

It seems to me this is the centerpiece of this discussion right now.

In an optimal world, it wouldn't be necessary to force a joint to be "just" a transform, for the same reason that a node can have both a camera and a mesh. Perhaps even a light on top of this, if you use this extension. This is currently possible, in the specification. Our own engine doesn't support this, so to this end I have made camera nodes be children of the nodes they are attached to in the glTF.

If "joint" would be an exclusive property of a node, then in my perspective, then cameras, meshes and lights should also be exclusive. In 3DSMax, this is already the case, though I don't know about Blender and Maya. Whether that's a good idea or not I don't know, but it seems unnecessary to enforce this in the specification.

Personally, I don't see a compelling reason to treat joint nodes any differently from regular nodes, other than implementation details which differ across engines. Our engine's implementation would not gain anything from this, currently.

If you are a bad translator, sure.

The JPEG analogy may work better for his argument, I think.

@vpenades
Copy link
Contributor

vpenades commented Oct 4, 2019

@marstaik @Selmar If I understand correctly, you're trying to import glTF by taking its internal building blocks and trying to convert them into their respective engine specific counterparts.

If that's the case, then I understand why you're having so much trouble trying to import glTFs; if the glTF components and relationships don't have a perfect match with the engine's component counterparts, then some glTF configurations cannot be imported correctly.

I believe a good alternative approach to import glTF is with Sandboxing. So whenever you import a glTF model, all their internal structures are preserved within the sandbox. The engine interacts with the glTF through the sandbox, instead of trying to import all the components.

In this way, you don't have conflicting issues between glTFs, and you protect your engine from future changes in the glTF specification.

If the issue is about sharing resources across multiple glTFs, then I believe the right approach is to use a glTF toolchain to merge the scenes of multiple glTF files into a single big glTF with all the scenes contained inside. So the engine only needs to import the master glTF to access all the scenes through the sandbox.

BTW, I while ago I published the showcase of loading and rendering glTF files in monoGame, you can find the example here.

The monogame loader example loads every glTF model into a "sandbox". The interaction with monogame's graphics engine is minimal, since only glTF meshes and materials are converted to monogame's counterparts. But nodes, animations, hierarchy, etc, is preserved within the glTF sandbox.

@marstaik
Copy link
Author

marstaik commented Oct 4, 2019

I believe a good alternative approach to import glTF is with Sandboxing. So whenever you import a glTF model, all their internal structures are preserved within the sandbox. The engine interacts with the glTF through the sandbox, instead of trying to import all the components.

What? You want to render a gltf files json straight to the renderer every time? Why wouldn't you want to match scene entities to the engines version and construct a scene? This seems absolutely stupid.

If the issue is about sharing resources across multiple glTFs, then I believe the right approach is to use a glTF toolchain to merge the scenes of multiple glTF files into a single big glTF with all the scenes contained inside. So the engine only needs to import the master glTF to access all the scenes through the sandbox.

Why on Earth would this be a correct solution? If an RPG had 1000 armors, you want me to have to import for an entire day everytime a mesh gets added?

If "joint" would be an exclusive property of a node, then in my perspective, then cameras, meshes and lights should also be exclusive.

To be honest, I don't know any engine that could support a camera-mesh-light node anyways. Seems like stupid design.

At this point we may as well take away cameras. Oh, and you know what, maybe lights should go to, they are not that special. The importer can figure it out. Hmm, now why should I bother exporting a mesh? The importer can also figure out that...

Aha, let's just only export nodes with no attributes, that will definitely make the format much more useful.

And again, no one seems to care about consistency of export.

By most of the logic presented here, we should just leave glTF to be a showgirl format and abandon it for something more practical. There goes collada, gltf, gltf2, maybe collada2 will finally solve it.

Or maybe USD.

@donmccurdy
Copy link
Contributor

donmccurdy commented Oct 4, 2019

In the interest of keeping this discussion productive, let's constrain the scope a bit — nearly all engines and DCC tools have some pre-existing concept of skins, skeletons, bones, and meshes, or at least some of those things. What would a strict skinning specification look like, maximizing portability of a glTF file across existing tools, with the assumption that the glTF file will be loaded into the tool's native object representations?


For my own opinion, while the current skinning specification does a sufficient job of defining a skinning representation that offers flexible technical features, it is perhaps not specific enough about the structure and best practices that allow a skin to actually be broadly portable across tools.

Unfortunately, I'm not at all confident that I know what a broadly portable skinning specification would look like. I would certainly be curious to get more feedback on @marstaik's suggestions in #1669.

If there is consensus on useful restrictions, clarifications or best practice – here is how I would imagine the process could proceed. We can't simply add the restrictions listed here to the glTF 2.X specification; doing so would invalidate many existing models, and is not compatible with our versioning process. Because glTF 3.X is likely to be some ways off, a near-term alternative could be to provide an extension (KHR_skinning_strict?) that imposes additional requirements without modifying the schema. Implementations can begin using that extension, and if things go well, the extension could become part of the glTF 3.0 specification later on.

Ideally, the extension would add new restrictions to the existing specification, rather than introducing a new representation that loses backward-compatibility with tools that support the existing spec.

@donmccurdy
Copy link
Contributor

Here is an attempt to define a stricter skinning subset, for greater portability across engines at the cost of some flexibility: #1747.

@WyattKimble

This comment has been minimized.

@donmccurdy
Copy link
Contributor

@WyattKimble I've marked your comment as off-topic. You may disagree with @reduz's claims, but please refrain from personal criticism and review the Khronos Group Code of Conduct on respecting differing experiences. Constructive disagreement is welcome, but this is already a complex and challenging thread, so please be conscious of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants