-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stricter Skinning Requirements #1665
Comments
Reusing nodes on multiple skins should not only be allowed, but it is also a great feature! Consider a character model with 150 bones, made with a mesh and a skin; there's toolsets that might split the skin and the mesh to allow loading it in engines that have a limited number of bones per shader. For example MonoGame has a limit of 72 bones per skin. And when you do the split, you really, really need that the extra skins keep pointing to the original nodes, so yes, you do need to share nodes between skins. There's more use cases where you want multiple skins to share nodes, consider a model with multiple LODs; several LOD meshes of a character would use their own skins, poiting to a commonly animated skeleton. About requiring "Skeleton", I think it's been discussed before, and I believe the conclussion was that it was just giving redundant information that could be calculated from the nodes themselves. Truth is I am doing full skinning animation and I don't need it at all.... so I believe requiring Skeleton would actually complicate things, specially when Skeleton is defined in a way that conflicts with what can be inferred from the node tree. Trying to understand how skinning works is difficult given the way the schema has been laid, the fact that a skinned node is defined within a node is very misleading, so I came with these thoughts:
So a renderer just needs to do this:
As you can see, you don't need the Skeleton property anywhere in the process. |
I agree that the skeleton property isn't useful, if constraints are still met. Imagine Node A having children B, C, and C has children D, E. Nothing in the spec prevents me from exporting a skin with nodes B, D, E with no skeleton property defined.
What is an importer supposed to do in this case? You have three disjoint sets of nodes, of which, they aren't even on the same hierarchy/level. How is an importer supposed to correctly interpret that? An example of a strange solution is to duplicate C into a phantom bone C* and create a multi rooted skeleton using B C* D E. Then you can parent C to C*, and have C* steal C's transform. Remember, C may not even be a joint, or maybe it is, of a different skin. Or what if it's a mesh? How do I treat those joints which belong to a single skin as an entity I can't represent correctly in a node tree? Now, I understand your argument for multiple skins per joint hierarchy, and that's why I would like to point out #6. You can still have multiple skins, but they would at least have to be a subset of an explicit node tree, or, each be a separate skin that no one used. Example using #6 and above points (and assuming they are all joints) The current spec allows for a bit too much flexibility, and then the importer has to handle a bunch of additional strange cases. |
I believe the current spec already says that all the nodes of a skin must share a common "parent" node, so there should be no need for a "skeleton" node. But in my experience, even the requirement of having a common ancestor is not needed!, remember, as I said before, Nodes are just world transform providers. Consider this pipeline:
First, you calculate the world matrices of ALL the nodes of the scene into a world matrix table (you can even use the node indices to build a flat list), once you have the world matrix table, you no longer need the nodes for anything, you just have a bunch of matrices in their final, world space positions. Second, you loop through the MeshInstances, picking the matrices you need from the matrix table. I'll give an extreme example: Imagine a scene with two character who happens to be a siamese couple, so both characters are made with just one mesh and one skin.... but pointing to two full skeletons, each with its own root node. So what would be the problem? in the end, all these nodes boil down to a plain list of world transforms!! So in the end, how the nodes are arranged is completely meaningless, as long as you precalculate the world transforms, before traversing the mesh instances. |
@marstaik Okey, to be fair, I believe the problem is, as you suggested, in importing glTF into existing engines that might interpret how nodes and meshes and skins are related to each other. I recently suffered this problem when loading glTF files into MonoGame. As it happens, MonoGame's default Model object only supports one skeleton tree and one skinned mesh (with up to 72 bones) per model, and these limitations are impossible to overcome. And I've seen more engines that try the "node centric" approach; that is, Node trees rule over how the scene is rendered, and that approach is very limiting. In the end, I made my own monogame model, which is "mesh instance" centric, and nodes are only used to provide the world transforms, and suddenly, all the problems were gone. |
@vpenades I'm not trying to be rude but please hear me out: What is the point of a scene node hierarchy of you just chose to ignore it's existance? What you define isn't a method for exporting and importing scenes. You've put down a completely custom implementation that discards the scene graph. And that's fine to do, but that doesn't mean it should affect the standard (as you said for MonoGame) Your Siamese couple may have two full skeletons but I would hope they at least have a common direct parent. Then, in that case it is a single skin with multiple roots that can indeed be one skeleton. But they start at the same hierarchy level. Maybe when I get back from vacation I'll draw diagrams to better represent these issues. The main issue is as you said that most engines have strict requirements. But that's an easy solution, just force every skin to be a complete tree and you fix all these issues. |
@marstaik I didn't say I ignore the existence of the node hierarchy, I said that once you have baked the node hierarchy into world matrices, the hierarchy itself becomes irrelevant, only the world matrices matter, because after all, the Skin object only needs world matrices to do its job, the skin doesn't know or care from where these world matrices come from, or which was the relationship between them. |
@vpenades I am sorry, I am still confused with what you are trying to say. What's the point of doing this? The skin should already have precalculated IBMs and the nodes themselves have transforms. Why are you calculating the transforms in the first place? Also the skin object is a definition, not an actual entity. 90% of all implementation will turn the skin into a skeleton, because that's what is expected of a series of linked nodes defined as joints. That's how we create and target animations efficiently across multiple skeletons. The animation channels target said joints, which most engines will bake down to the skeleton level. But the main idea I'm trying to convey is that almost every engine treats a skin as a skeleton, as a skin defines nodes as joints. Most importers expect a skeleton to be a subtree of some sort. The current spec allows for Hodge lodge monstrosities to exist that really don't have any benefit. |
@marstaik I believe the problem lies in treating a skin as a skeleton. A skin binds a mesh to a selection of joints of a skeleton tree, but it is not a skeleton in itself. If importers expect a skin to be a full skeleton by itself, well, I am sorry but it is a wrong, or at least, an outdated approach. What you call monstrosities I call a beautiful design. You see, I've been struggling with skinning for years, most formats before glTF used to handle skinning as if they did not know very well what they were doing, almost with fear, or by adding a lot of restrictions to cope with the limitations of the engines they were indended for. But glTF is the first format that treats skinning as a first class citizen, and handles it with a standarized approach, so no wonder many engines, used to handle older formats with more limitations, now struggle to handle glTF. So what now, we move engines forward to adopt glTF's standarized skinning? or we move glTF back because some engines have many limitations? then I would vote to limit skins to have up to 72 bones then. Why should I not be allowed to create a model like this which happens to break every single point in your list? Yes, today some engines are not able to render it correctly. But some others can, and over time, there will be many more. My I ask which engine are you using? |
The file you provided fails to open in blender and Houdini, can you post the separate gltf json + binary glb? I've used a variety of engines, UE4, godot, and I've made my own in the past. But it's important to note that it's not always a matter of limitations. You could make every engine in some way handle every type of format sure, but the reason many game engines have a skeleton as a single entity is for performance reasons, gpu bindings, etc. There's a reason we bake data entities (matrices, points, normals) to flat buffers so we can pass them to the gpu. Allowing a skin to point to joints from different skeleton hierarchies makes the optimizations game engines put in for real time graphics pointless, or at the very least tedious and CPU cycle wastebins. The other side of that argument is that if you know something doesn't work well in that format then just don't use it for that project. If I'm using a game engine that doesn't support more than 72 skeleton nodes, maybe I just shouldn't export more than 72 nodes. But all of this was not the point of the post I made. I made this point because the current definitions has way too much ambiguity. If two skins define a node as a joint, what the hell am I supposed to do? You've basically given a node two parents in the sense that two skins need to know about this joint. There are no problems in an engine where everything is a node as you said, where every "joint" is a node in a tree and not part of a skeleton. But to do so is absolutely destructive for any real game engine that aims to have multiple animated skeletons in the viewport that need to render 60fps. It's just not feasible. Remember that a skin just tells a mesh how to bind to the joints. It's still the transforms on the nodes themselves that cause animations to happen. Almost every modeling program known to man (Maya, blender, Houdini, etc) have a skeleton system just for this purpose. I thought the purpose of GLTF was to get a decent standard for passing scene data between programs and game engines, not a global, infinitely large expandable specification like USD. If I can't even safely import a scene into 90% of game engines without having to write custom interpreting code then what the hell is the point of gltf? Might as well go back and use fbx if we're going to stick around with broken definitions. |
I have to agree with @marstaik on this situation, the only thing an importer should be doing is a 1:1 construction of the file, i don't think importers should be doing crazy logic to try figure out how to construct a file, there should be a clear consensus to make implementation simple and clean. |
Yes, here you have the zipped glTF , you can also find the source code that geneated it, and some info about it here. Right now if you want to preview it you have to drop it on BabylonJS, or use the latest version of Windows10 3D viewer (previous version was also having trouble with the skins, but they fixed it recently) That model might look sily, but this way of creating and reusing meshes can be very useful to create vegetation meshes, like grass, plants, trees, etc, where you have very few meshes instanced many times, and you want them to move with the wind. I don't know about UE4, but I did report some skinning issues to godot a while ago, they're resolving them here. glTF already forbids one skin to point to joints of different node trees within the scene, what it doesn't forbid is to have multiple skins poiting to the same skeleton, or multiple meshes using the same skin, which can be useful:
|
@vpenades I inspected your file, and I fail to see how it invalidates any of the points I made. None of the skins reuse the same joints. None of the skins have extraneous joints in between themselves, 0 > 1 > 2 > 3 ... 9 are just a single tree on joint nodes. I may have missed something since I'm using my tablet. Yes, there are multiple nodes that use the same mesh definition (instances) and a different skin, but that's fine. For #3 I was referring to a single mesh instance node can only point to a single skin (which is why I said it was implicit). From a rough look, the file looks like a good example of what to do. None of the skin definitions share common nodes that would require some sort of skin merging for skeletons. All the skins have 1 root. As for the "skeleton" node, I'm all for having it always defined or never defined. Having it be optional is useless. |
skinD is a case that I believe to be a problem. Leaving non-joints in a joint hierarchy just causes so much extra effort on most importers to handle correctly. |
I have to disagree with this assessment. I have dealt with data conversion to and from most 3D modeling programs over the past 18 years of my career. While a given UI may (or may not) encourage creation of "connected skeletons", there is often nothing inherently wrong or prohibited about using multi-rooted/disconnected sets of joints. A lot of systems, such as 3DS MAX, allow basically any transform-node to be referenced as a "joint" by skinning, regardless of where it may lie within the scenegraph. If we impose these arbitrary restrictions on the glTF format, that just shifts the work from the importers to the exporters. That could benefit some importers, but it also has the potential to negatively impact importers that are actually flexible enough to support the current (and more generic) spec. Surely, if there is a common set of "scene optimizations" for skinning, which benefit only a subset of potential importers/runtimes, shouldn't this be handled in some sort of transform/optimize tool (e.g. glTF->glTF), rather than imposing arbitrary restrictions on the spec? That potentially offloads the work from both the importers + exporters, and places it in a central tool that could be applied as-needed. |
I understand what you're saying but i don't agree with some of your points, currently working with Blender, Godot and Houdini I'm having an issue where there's always some sort of unique problem with how my glTF files are imported or exported.
I'm not sure why that's arbitrary, the restrictions are there so when you use glTF in different software you can always expect it to work as it should, through out the software I'm using it seems like the importers are playing some crazy logic guessing game and things are not being depicted as they should, If the goal is a consistent experience through out many software it's currently not working. If the exporter has a standardized logical system to follow, then the importer will always know how to build your file because the exporter will always export the same data. |
Your comment indicates that you think the current skinning system is not standardized and/or logical. I would argue that it is. Where it maybe needs "cleaned up" is in terms of the raw math involved. The spec itself is fairly quiet on the exact math behind the skinning calculations. It references the "glTF overview", but I'm not sure there is quite enough information to uniquely resolve all the corner cases that come up in skinning. That is probably where a lot of the inconsistent behaviour is coming from, in terms of importers/runtimes. The core problem, as I see it, is that there needs to be a definitive spec for "glTF skinning math", including either pseudo-code or math formulae to describe the exact post-skinning position of a given vertex 'v' (in specific terms of the related glTF skin/joint/mesh elements). It probably also needs to be expressed in terms of "world space" coordinates, for clarity. What the spec provides at the moment seems a little hand-wavy in sections, mostly a "go do skinning; your engine already has some stuff, I bet". That leads to all sorts of trouble, because every runtime engine (and modeller) out there is going to make slightly different decisions about how to handle the corner-cases of skinning. That's not something that can be solved by restricting skinning to "a single connected skeleton-tree". Once the "glTF skinning math" is locked down, it would be up to the importers + exporters to ensure that they are being consistent with that math. One prime example of a "corner case" is whether the transform on the (skinned) mesh-instance factors into the "post-skinning, world-space" position of each vertex. I've seen systems go either direction on that decision, and there is no solution other than to recognize this and account for it during import/export. glTF would need to have a policy covering that case, and then it would be up to the importers + exporters to modify the data as-needed according to their own internal logic/rules/math. |
I think that glTF already agreed that the transform of a skinned node does not factor in and it should be discarded. Actually, I believe the glTF validator gives an error if it finds a node with a Skin and a transform. Part of the discussion before is that a @marstaik needs to interpret Skin as a full skeleton, which, as I tried to explain, is a wrong interpretation of what a Skin in glTF is. To clarify, the definition of a Node's transform could be reworded as this: A node can have either:
if a Node has a Skin, then it can be considered to have a "complex transform", otherwise it is a simple transform, so they're mutually exclusive.
I think this way of seeing how skinning woks is a bit more clear, but I agree the current design may be deceiving.... for that purpose I proposed #1660 , which enforces the exclusivity of simple/complex behavior explicitly. |
That does seem to be implied in the spec (2nd "implementation note" under the '#skins' section of the spec), but the wording also seems open to potential misinterpretation. Rather, it is hard to decode the meaning of that implementation note, as currently worded, if you don't already realize that there is a choice to be made in how skinning-math can be implemented. In contrast, in the 'glTF overview' images, there is an example vertex-shader for skinning, which includes the line "gl_Position = modelViewProjection * skinMatrix * position". Without further context, it seems easy to misinterpret that line as saying that the skinned node's transform should be taken into account.
I very much agree with you on that point, and disagree with any requirements to define any restrictive rules pertaining to "full skeletons". While external runtimes may have their own rules about what support structures are needed for skinning/skeletons (such as fully-connected skeletons, etc), I don't see why those restrictions need to be pushed upstream into glTF. The existing skinning system is flexible, and fairly elegant. It may need some corner cases tightened up, or at least existing decisions to be reflected more clearly in the spec, but it seems to be on the right track without needing additional restrictions on the layout of joints/skeletons. |
Sorry, but this logic is broken. This will negatively affect 1% of the importers and positively affect the remaining 99%, by making them immensely simpler. For a very rare corner case you are making the specification an order of magnitude more complex for every importer. I don't think this is in the spirit of GLTF2 also, which aims to enforce a single way of doing things to ensure the best possible compatibility. Your way of thinking is what made Collada a failure, we should stay away from that. I really think @marstaik suggestions should be made core for version 3.0. |
@donmccurdy @pjcozzi We are probably never ever going to support these situations in our importer, and I doubt any large game engine will either. I really do suggest Khronos or those responsible for GLTF spec do some damage control on this situation before more exporters keep exporting unusable files and GLTF2 ends up becoming another Collada. As it stands, the format will keep finding incompatibility between exporters and importers for trying to be too flexible, and I believe this is entirely what GLTF2 tried to prevent. I am sure that, when the spec was originally created, it was never intended to be used this way, so I would really try to close this gap in the 2.0 spec by adding extra clarifications, else by the time 3.0 comes out, it may be too late. |
The clarifications I would add to the spec, taking from the OP:
All the above would make the spec regarding skins strict enough so exporters are forced to make gltf files that are easy to open and don't need any guesswork for importers. I know many files exists that will be broken after these changes, but this ensures that, from now on, exporters have clear rules they have to follow to produce non ambiguous gltf2 files that will always open on importers (that won't need to do guesswork, or write an implementation to later realize some files don' t work). |
Is that actually the correct behaviour? I would like for someone with more knowledge than me to rough out the exact math-equation for "postSkinnedPosition = someMatrix * origPosition". There are a lot of bits-and-pieces related to this within the spec (and glTF-overview) at the moment, but there isn't any one cohesive end-to-end layout of all the math. Note: implementations/runtimes may vary on how they get to the same mathematical answer, but there needs to be some standard equation in the spec, to avoid guesswork + errors. |
@jbherdman Yes, this is the intended behavior, it's explained here: https://github.com/KhronosGroup/glTF/tree/master/specification/2.0#skins
This limitation is vital, because otherwise exporters can easily screw up and make skeletons and meshes not share the same space (which was a common problem in exported Collada files). This is why I say that it should be made clearer how to do this, because exporters half of the time do it the wrong way. They make the skeleton mesh local instead, which does not work when you have multiple meshes affected by one skeleton, so in this case they duplicate the skeleton, and make the copies share joints. This is a waste, because when this is imported to game engines (which of course none support this), you end up with one copy of skeleton per mesh, and all animation tracks duplicated. which is a lot more inefficient. This is why, the right thing to do is to force exporters to do this process properly, by forbidding joints sharing by skeletons, and then explicitly explaining the process (math) of making meshes local to skeleton. This would ensure importers get GLTF2 files without waste. |
One of the biggest issues is that in many, many files, the InverseBindMatrices exported for a skin represent some arbitrary bind pose that is different from the pose the joints create in the scene graph! You will see that a lot of the time the InverseBindMatrices expect a joint transformation that is different than what is actually seen by the transforms shown in the node tree. I have seen issues where the the "scene pose" of the joints is in A-Pose, and half of the skins are bound to A-Pose (and their IBM's reflect A-Pose) and yet the other half of the skins are bound to T-Pose (and their IBM's reflect T-Pose, presumably because that mesh was made for T-Pose). These meshes need to be transformed to the space that the skeleton is represented with in the scene graph. There is way too much broken behavior and loop holes going around here. You can no longer ignore the IBM's exported, as they contain encoded pose data for when the mesh was bound! There is no elegant way to work around this either. |
Based on my own understanding of "skinning" in general, which lines up with the viewpoint that @vpenades seems to be taking, I'm actually surprised to see that there is a 'skeleton' attribute on the 'skin' at all. @reduz After re-reading that particular implementation note for the 20th time or so, and paying very-close-attention to the term "skeleton root node" in there, I now have a very different understanding of how glTF probably intends to do the math. It doesn't surprise me at all that this is a point of confusion for all sorts of people/importers/exporters. The spec probably needs to be less subtle & implicit about what is being declared. In my own view of the world (perhaps not shared by glTF) bind-poses are either done relative to world-space, or the mesh carries its own bind-matrix (relative to the IBM's stored for the joints). |
@marstaik The inverse bind matrix usually is just the rest from your modelling program. You can obtain it easily in Maya, Blender, Max, etc. The only confusing part is what they are relative to. This is why I think the skeleton tag must be mandatory, this just simplifies guessworking entirely for us game engines which do all have the concept of skeleton. If this tag is not included, then we need to kind of guess where to put the skeleton node, and there is room for exporters to screw up. Collada has a much better concept of Skin in this regard. Still this is probably the least harmful of the points I listed (it just makes importers less bug prone) and worst case it could be left as-is. The others should definitely be mandatory changes. |
@jbherdman Yes, same happened to me, at first I didn't understand why exporters were doing such convoluted things like using multipe skeletons sharing a skin, but then it becomes obvious that they were trying to workaround this limitation the wrong way. This is why insist that we need to combine forbidding the sharing of joints for multiple skeletons with a good description on how to localize the meshes to the skeleton else we'll continue seeing exporters that produce files that are unreadable for game engines. |
@reduz This might be a silly question, but if all joints were to have the same bind-pose (IBMs) for all skins, does that get around your desire to forbid multiple skins sharing joints? I'm just trying to get my head around the "real problem" here. |
@jbherdman I think I can see now where a lot of our confusion stems from. I will try my best to describe the situation I am seeing. You are right that I was defining a bind-pose, but that in itself is not the issue. In the current skin-centric model, the skin describes a the bind pose of a mesh relative to a set joints, with what position they were in when bound recorded into the IBM's. This is the "bind pose" we are talking about. The problem that I am seeing is that this allows another skin to have IBM's that describe the joints as if they were posed in a different "bind pose". Again, in a skin-centric model, this may be fine. But that assumes that you have what you mention earlier:
It relies on the existence of per-mesh bind poses. And why is that? Well, most game engines are going to put these joints in one skeleton (and for good reason), and the meshes then link to the skeleton, and therefore will use the skeletons IBM's. They then get copied into the GPU buffer straight from the skeleton. And this is the problem. If you have one skeleton, you can only have one bind pose for said skeleton. But above I mentioned that two skins referring to the same joints can have two different bind poses. Say we use the first mesh's IBM's for the skeletons bind pose. Now the second mesh will bind to the skeleton incorrectly. We would have to make a custom node to map the skeletons bind pose to the bind pose that the second mesh expects. But now, there's even more ambiguity. Say that mesh one and mesh two had two different bind-poses, but now the joints themselves were exported in an entirely different pose. What do I decide to use as the actual bind-pose? But then, here is another issue: To even create said bind-pose mappings for the second mesh, the first and second IBM's need to have their world space matrix removed. So I need to use the data from the mesh + skin and compare it to the the root joints location in the scene to compute the world transform of the root, and extract that out from all of the IBM's. This ends up being a huge mess. Please let me know if something needs additional explanation. |
@marstaik Yes, thank you, I think we are both on the same page now, in terms of understanding the problem. What we're talking about, though, are the fundamental limitations of "skeleton"-based systems vs. "skin"-based systems. At the end of the day, if the skeleton-based system can only store one IBM per joint, then "some modification" needs to happen. In the past, I've specifically written exporters that go from "skin"-based systems to "skeleton"-based systems. It is a pain. Basically, the only perfect solution is to duplicate the joint-transform-nodes into different hierarchies, just so that each joint-transform-node can have its own unique IBM. That is terribly expensive at runtime, of course, because now you are unneccessarily computing the animation/transforms for 2x (or 3x or worse) as many joint-transform-nodes. One step shy of that, if you get really lucky on how the artist created the input-data, you can try to massage the bind-poses so that they overlap properly. That is, you can sometimes get lucky and make the IBMs 'shared' across multiple meshes/skins, if the problem is simple enough that you can solve it by multiplying the mesh-bind-pose into the mesh. For example, let's go with that example of a skinned-character who has a separate "head" and "body" mesh. If you're lucky, both meshes were bound in the same set of joint-bind-poses, but probably have different mesh-bind-poses (as their mesh-local origins are likely different). As long as you multiply the mesh-bind-poses into each mesh (rather than into the joint-bind-poses), then you can successfully share the same joints in a "skeleton"-based system. But, if the artist did "something weirder" (such as binding the head to a bind-pose where the characters neck-joints are at a different set of relative angles vs. the body-bind-pose), then you will be left with no choice but to duplicate the joint-hierarchy when exporting to a "skeleton"-based system. The more optimistic situation is that ideally the game-engine could shift from "skeleton-based" to "skin-based". It requires just one extra layer of indirection -- each skinned mesh would have a collection of "joint" objects which held the IBM + a pointer to the scenegraph-node from which that "joint" receives its animated transform-data. It's a relatively minor shift, and it lets you get away from the worst-case-scenario where you need the exporter (or importer) to potentially duplicate the hierarchy of joint-transform-nodes. I personally think that glTF supporting a "skin"-based system is great, and still the way to go. If a lot of runtime engines need support transforming data from a "skin"-based system to a "skeleton"-based system, that seems like a common tool that could be written (e.g. glTF->glTF transformation). Exporters could try to be more aware of the situation & challenges, but sometimes their hands are tied by the source data being "weird" because of something the artist did. |
Having a glTF with multiple skins that share joints, but don't share IBMs is required for some artistic workflows. One of the classic issues with skinning is extreme vertex deformation at acute joint angles. I've seen very skilled artists come with a very clever solution: Instead of modelling the character in the classic T-Pose, they model it in a fetal or relaxed pose. This way of modeling allows for a more natural vertex deformation at the joints. But when they want to put clothes over the naked character, and the clothes are made of separated meshes, they move the skeleton into a classic T-Pose, because it's easier to model clothes in that pose. So, since the base body and the clothes have been bound to the skeleton at different times, the IBMs of the base body skin and the clothes skins will be different, which is good because a skilled artists can take advantage of this to reduce deformation artifacts. This is by no means a rare case. Now, the solution that @marstaik proposes is that the exporter merges all the meshes and skins sharing joints into a single big mesh-skin, doing the reverse maths and duplicating joints if neccesary. The problems I see to it are:
Now, being practical: for those skeleton-centric engines that have a hard time importing glTF, maybe they need to rethink how they're handling glTF, instead of pretending it's a skeleton-centric format and failing to import 30% of the glTFs around. Maybe what's needed for those cases is to wrap the glTF scene into a master glTF node within their hierarchical node system. @marstaik keep in mind that not everybody is doing videogames, in our case, we're using glTF for biomechanical and anthropometric research and visualization, so rendering performance is not a priority, but having a consistent skinning-centric file format is absolutely critical. Skinning with shared joints with different IBMs is not a monstruosity, it's a much needed feature, and you cannot ask it to be removed just because you don't consider it important. |
@vpenades @jbherdman What if we can have the best of both worlds? I think now that we have identified the multiple-bind issue, we can make a solution that benefits everyone. The good thing is, for the IBM's at least, there is a "performant" solution for game engines to this problem. We can add a final bind transform on the mesh instance, in the the same flat buffer format as the GPU, call this the bind_pose_offset. The shader can then take: You have to understand that there is a big downside to a skin based system, because if I have to follow the references a skin has every frame (let it be a c++ pointer) and have to copy the transforms into a buffer to get sent to the GPU, that is a lot of wasted cycles and memory copies every frame. Especially since it now goes from one skeleton per frame, to one mesh_instance per frame. Ouch. However, I still think some of the points I made above stand.
Now, we can do something similar to point [6] in the OP, and either make a "skeleton" or call it a "master skin", but it would be ideal that there is a way to gracefully resolve the different IBM's. Something like this (rough idea):
The benefits I see here are:
The downsides:
We could keep GLTF skin-centric, which I agree is more flexible with binding meshes to joints, but also make it much, much easier for importers (not limited to game engines) to parse and represent properly. |
@marstaik It seems like you didn't really absorb the excellent use-cases that @vpenades put forward, though? In those cases, the multiple/conflicting IBMs used in different skins can be a feature, not a bug. So, I'm not really sure why you still want to move towards a glTF restriction that seems to move away from those capabilities?
This really doesn't matter (or even make sense) to a "skin"-based skinning system. I'm still unclear on why you consider this necessary, except that it might make life easier for your particular skeleton-based runtime engine? Likewise, the performance issue you raised seems negligible. Under a "skin"-based system, you are still only paying a proportional cost to what you are rendering. You never even have to "duplicate" the node-transforms, if you can index into them appropriately. And you still only have 'N' node-transforms in your glTF file, even if some of them are being referenced by some 'k > 1' number of skins. You can just build an array of IBMs per-skin, and index into the appropriate IBMs-array as needed. Assuming that the engine/runtime is doing anything other than just throwing skinned meshes at the graphics card, I'm not convinced the performance difference would even be measurable. At a certain point, if you are using a target runtime-engine that is "less capable" than glTF (e.g. skeleton-based skinning), the only real option is to control your art pipeline. I would argue that, what it sounds like you want to do is actually best served by:
Or, as I previously suggested, you could "pre-process" glTF files to more directly suit your purposes. With a good understanding of the "skin"-based system that currently exists, you could manually process the glTF files to be more compatible with your skeleton-based engine. If you add that processing into your own art-pipeline, then you can prevent "bad data" from hitting your engine (where the definition of "bad data" is purely from your engine's perspective). Think of it this way, if your runtime engine didn't support morph-targets, you wouldn't be complaining about morph-targets being part of the glTF spec. You would be controlling your art-pipeline so that your engine didn't have to deal with morph-targets. |
@jbherdman Please re-read my post. What I posted supports @vpenades use-cases. It still allows child-skins to define their own IBM's. They just need to be relative to the skin root.
Yes, it makes life much easier for skeleton based systems. Is that such a bad thing?
You are neglecting that the "joints" in the scene get posed by animations, and without a skeleton saving those in some flat buffer, you need to visit those nodes one by one and get their transforms, every frame, and put them into a buffer for the GPU. That is a lot of wasted CPU cycles. What is the point of a format that is only good for everything other than high performance rendering? The current reasoning is "Lets just make some of the primary consumers of GLTF data have to go through many hoops to get a satisfactory result." It sucks. You could be a little bit stricter on the skin definitions and make life not a living hell for them. |
@marstaik The transform-nodes get updated by animation data, sure. And computing the transform-node matrices from the animation-data each frame is expensive on the CPU -- far more expensive than copying those computed matrices around. And in the bigger picture, that is all still basically "free" compared to the rest of the CPU cycles you are likely to be spending each frame. That said, I don't see why you couldn't design a system to place all the transform-nodes matrices into a single flat-buffer (indexed by transform-node-id), and send that to the GPU the same way your "skeleton" case would. I'm just not convinced that optimization would make any measurable difference to the runtime performance. (Sure, it might technically save you CPU cycles, but would you ever be able to measure the performance difference between a system that had that optimization, and one that didn't?) |
Regarding optimization and performance, it is possible to create a very inefficient glTF file, in the same way you can create a JPEG with a very bad compression algorythm that takes a lot of space while the quality of the image is bad. For example, you can create a glTF model with 100 meshes where each mesh has its own vertex/index buffer. So it's 100 buffer bindings and 100 render calls. A glTF optimization pipeline can take that glTF file, analyze it and squeeze every bit from it so it produces a new glTF with a single mesh, or multiple meshes if the meshes are not compatible, but maybe with a single vertex/index buffer, so it will render faster. But this doesn't mean that an engine, any engine, should only be able to display the optimized mesh, and complain about the unoptimized mesh. A glTF compatible engine should try to replay the contents of a glTF model with as much precission as possible, optimized or not. If a glTF model comes with 17 mesh-instances and an engine needs to do 17 rendering calls, so be it, it's what's needed to display that particular glTF model. If performance is an issue, we can develop tools to try to merge what's mergeable and optimize the vertex/index buffers, and convert the textures from JPEG to DDS or whatever. But ultimately, any engine should try to render the contents of a glTF as it comes, and not try to overthink how should had been rearranged. One solution I do like is what Windows 10 3D View app... it tells you the number of render calls, along with number of polygons, so you can get an idea of how expensive is to render a particular model, and then an artist or a developer can try to improve it. @marstaik , @jbherdman , and guys, I feel like we're running in circles, all the arguments have been laid politely, and I don't think I have much more to say, so I'll leave this open so people from khronos can read this thread and leave their opinion or veredict on this. On my side, I'll probably release my monogame glTF code soon, so it might serve as an use case. Peace! 😄 |
Just to give you another idea of what happens out there in the wild, I figured I could contribute my experience, for what it's worth. I am by no means a skinning expert, though I believe to understand the individual parts by now. Shortly after I started at my current company, I used our engine's (skeleton-centered) skinning system to implement glTF skinned meshes. There are still a number of issues outside of the mentioned limitations below. Specification issuesI've had my share of issues with ambiguities and guesses, which came in part from a lack of understanding and in part from a lack of clarity on the specification. I've made an attempt to summarize the problems encountered:
Engine implementationOur engine has a 1-1 pairing of skin-skeleton and treats nodes and bones as separate entities. Thus, with the limited knowledge I had back then (and time constraints), the result is something rather inefficient, but mostly functional:
ConclusionTo me, the above results are purely engine limitations; the engine is less flexible than the glTF specification. This can be annoying, but I like the glTF approach more, on a conceptual level. It can do everything a skeleton-centered approach can do, and more. When done right, I believe an implementation does not need to have a larger complexity or runtime cost than a skeleton-centered approach, either. But, of course, we usually don't take the time to rewrite our skinning code. |
I'm coming back to this issue after spending some more time with Maya, Blender, and a few different Importers and Exporters. I have accepted that the joints array in the skin definition does not need to be a strict hierarchy. In terms of exporting a closer rendition of the scenes defined in 3d modeling programs, this is now reasonable to me. From Maya and Blender, I was able to bind to joints/bones (not any random node) not part of the same hierarchy: Note that in Blender, I had to bind to two separate armatures to mimic this behavior, but yes, it is possible: However, I was not able to get a mesh to skin itself to another mesh (or other non-joint object). I was able to handle importing these non-strict-tree skin definitions in Godot's glTF importer by performing union of disjoint sets and creating fake joints where non-joints lie in between joints. But, now having used/dealt with various importers and exporters, I have come to the conclusion that the real issue that creates ambiguity in exported files is this:
Joints need to become explicit in the glTF specification, and I'll show you why: The current specification implies that skins define what the joints are in the scene. This too however is incorrect. It's the modeling program that defines the joints, not the glTF file. If you export from whatever modeling program and re-import the exported file, you will not get back the same result most of the time. Imagine trying to export separate meshes bound to joints and bring them into a single scene later. Now imagine trying to do all of the above while having to insert fake bones and create a skeleton definition for a game engine. However, because each scene in the modeling program has skins that mark different nodes as joints, the logic required to interpolate a skeleton so may never produce the same skeleton for different scenes. Now try to deal with assets from Maya and Blender, with exporters written by different people. So should exporters get around this by exporting a skin with no IBM's just to mark joints? This seems extremely stupid. I believe that glTF needs to treat joints as first class citizens. They need to be marked on the nodes, the same way that meshes and skins are marked, even if its just a boolean flag. Further, since I cannot get Blender or Maya to bind to anything other than joints/bones, I would propose that any "joints" in the "skin" must actually be marked "joints" in the node hierarchy. Finally, the modification to the specification should look something like this:
So, could an exporter still not mark the original joints in the modeling software as joints? I believe that these simple changes (in addition to perhaps some clarification of the IBM's) could greatly improve the consistency of scene exports/imports across multiple applications. |
I'm glad to see that you are starting to come around.
The thing is, I would claim that Blender and Maya fail to treat their joints as first-class citizens. I had to go remind myself, but Maya more or less limits skinning to use its special "joint" nodes. (You may or may not be able to get around that at a lower API level, but it would probably give the UI a headache.) I haven't dealt with Blender much, but the "armature" system brings to mind Lightwave-style bone-skeletons (and other "bones are different than transform nodes" systems from the 90's). A system like 3DS MAX doesn't have those restrictions. You can happily fire up MAX and skin a mesh using 2 camera-nodes as its "joints". That is because a "joint" isn't a special/distinct node type; you can use anything that has a transform-node as a joint, assuming that you store the appropriate bind-pose data somewhere. So, in my mind, glTF is already treating its joints as "first class citizens" by simply allowing any transform-node to be referenced as a joint, and not requiring joints to be specially marked via some separate mechanism.
That is generally true for all non-trivial data conversion. It is much like running a sentence through Google Translate into a different language, and back again. Best case scenario, you will get something "functional", but the process will strip off a lot of nuance and artistic style from the original. |
This seems extremely contradictory. If you want to say that skins are just use node transforms, that is fine - by your logic skins don't define joints, they just use nodes. Then keep it that way in the definition. But in a 3D modeling application and for many importers, they expect to be able to easily tell what a joint is. A joint is a named entity in almost all modeling software. I see absolutely no reason to ignore its existence and shove it under the rug.
If you are a bad translator, sure. But I would expect a proper open source specification to allow, lets say, a blender document to be exported via glTF to Maya/MotionBuilder to do some proper motion capture handling, and then be able to be brought back into Blender. Or maybe go from MAX to Blender and back. A lot of 3D pipelines require consistent transfer between applications. What is the harm in representing actual joint nodes in the specification? If the importing application doesn't care, then it doesn't care. But most of them do. If you wanted to be less strict you could:
Maybe its better that way. At least the specification isn't trying to lie to itself. Sadly glTF doesn't have the weight that Autodesk has with FBX, since Autodesk has a complete modeling/animation pipeline and age to back it. And at this rate it never will if you don't allow game engines to make better use of this format. I and many others may as well go back and use FBX. It may be broken/inconsistent but at least anyone that uses the provided SDK's can generally make an importer that doesn't explode. |
@julienduroure |
This is already problematic because joints are indices into the nodes array. To match indices, the entire hierarchy would have to be exported in every scene. I don't think partial exports are in any exporter's mind, currently. We're starting an implementation ourselves, where the most difficult challenge is animation target identification across glTF files. But that's a different topic.
I don't see why this is problematic specifically with skins and joints. Exporting animations separately is already difficult, because, as described above, the only thing you have to identify nodes across different glTF files is the name, which isn't necessarily unique. Animations work just like the skins; they can reference any node arbitrarily. Since joints are nodes, I don't see a joint-related problem here.
There would be no way to mark it as a skeleton tree, indeed, but you should still be able to export the hierarchy as usual.
Assuming the above problems are solvable, it may not be the same skeleton, but it should still have consistent results, correct?
It seems to me this is the centerpiece of this discussion right now. In an optimal world, it wouldn't be necessary to force a joint to be "just" a transform, for the same reason that a node can have both a camera and a mesh. Perhaps even a light on top of this, if you use this extension. This is currently possible, in the specification. Our own engine doesn't support this, so to this end I have made camera nodes be children of the nodes they are attached to in the glTF. If "joint" would be an exclusive property of a node, then in my perspective, then cameras, meshes and lights should also be exclusive. In 3DSMax, this is already the case, though I don't know about Blender and Maya. Whether that's a good idea or not I don't know, but it seems unnecessary to enforce this in the specification. Personally, I don't see a compelling reason to treat joint nodes any differently from regular nodes, other than implementation details which differ across engines. Our engine's implementation would not gain anything from this, currently.
The JPEG analogy may work better for his argument, I think. |
@marstaik @Selmar If I understand correctly, you're trying to import glTF by taking its internal building blocks and trying to convert them into their respective engine specific counterparts. If that's the case, then I understand why you're having so much trouble trying to import glTFs; if the glTF components and relationships don't have a perfect match with the engine's component counterparts, then some glTF configurations cannot be imported correctly. I believe a good alternative approach to import glTF is with Sandboxing. So whenever you import a glTF model, all their internal structures are preserved within the sandbox. The engine interacts with the glTF through the sandbox, instead of trying to import all the components. In this way, you don't have conflicting issues between glTFs, and you protect your engine from future changes in the glTF specification. If the issue is about sharing resources across multiple glTFs, then I believe the right approach is to use a glTF toolchain to merge the scenes of multiple glTF files into a single big glTF with all the scenes contained inside. So the engine only needs to import the master glTF to access all the scenes through the sandbox. BTW, I while ago I published the showcase of loading and rendering glTF files in monoGame, you can find the example here. The monogame loader example loads every glTF model into a "sandbox". The interaction with monogame's graphics engine is minimal, since only glTF meshes and materials are converted to monogame's counterparts. But nodes, animations, hierarchy, etc, is preserved within the glTF sandbox. |
What? You want to render a gltf files json straight to the renderer every time? Why wouldn't you want to match scene entities to the engines version and construct a scene? This seems absolutely stupid.
Why on Earth would this be a correct solution? If an RPG had 1000 armors, you want me to have to import for an entire day everytime a mesh gets added?
To be honest, I don't know any engine that could support a camera-mesh-light node anyways. Seems like stupid design. At this point we may as well take away cameras. Oh, and you know what, maybe lights should go to, they are not that special. The importer can figure it out. Hmm, now why should I bother exporting a mesh? The importer can also figure out that... Aha, let's just only export nodes with no attributes, that will definitely make the format much more useful. And again, no one seems to care about consistency of export. By most of the logic presented here, we should just leave glTF to be a showgirl format and abandon it for something more practical. There goes Or maybe USD. |
In the interest of keeping this discussion productive, let's constrain the scope a bit — nearly all engines and DCC tools have some pre-existing concept of skins, skeletons, bones, and meshes, or at least some of those things. What would a strict skinning specification look like, maximizing portability of a glTF file across existing tools, with the assumption that the glTF file will be loaded into the tool's native object representations? For my own opinion, while the current skinning specification does a sufficient job of defining a skinning representation that offers flexible technical features, it is perhaps not specific enough about the structure and best practices that allow a skin to actually be broadly portable across tools. Unfortunately, I'm not at all confident that I know what a broadly portable skinning specification would look like. I would certainly be curious to get more feedback on @marstaik's suggestions in #1669. If there is consensus on useful restrictions, clarifications or best practice – here is how I would imagine the process could proceed. We can't simply add the restrictions listed here to the glTF 2.X specification; doing so would invalidate many existing models, and is not compatible with our versioning process. Because glTF 3.X is likely to be some ways off, a near-term alternative could be to provide an extension ( Ideally, the extension would add new restrictions to the existing specification, rather than introducing a new representation that loses backward-compatibility with tools that support the existing spec. |
Here is an attempt to define a stricter skinning subset, for greater portability across engines at the cost of some flexibility: #1747. |
This comment has been minimized.
This comment has been minimized.
@WyattKimble I've marked your comment as off-topic. You may disagree with @reduz's claims, but please refrain from personal criticism and review the Khronos Group Code of Conduct on respecting differing experiences. Constructive disagreement is welcome, but this is already a complex and challenging thread, so please be conscious of that. |
After messing around with the gltf spec and various engines, I feel there are quite a few cases with the current specification that make it extremely difficult for most importers to handle with their own internal rendering engines. Either this leads to a bunch of extraneous work and guessing on the importer side, or leads to duplication of data by the exporters.
Here are some ideas that I believe could tighten up the specification:
The skeleton root should be defined, otherwise the direct parent of the highest joint in the skin hierarchy must be used. The direct parent does not have to be a joint, to allow for multi rooted skeletons. Note that the direct parent may be the root, but it should not be the root as it was before when the skeleton property was undefined.
All joints in a skin must form a connected subtree with the skeleton root/direct parent (1)
This means there lie no non joints (or other skins joints) as connecting joints between skin joints. This however still allows you to have other nodes attached to joints, as long as no subsequent joints of the skin follow.
A lot of engines treat the skeleton/skin as a single entity, and allowing for non joints embedded in the tree of joints makes the engine have to create a bunch of shadow bones to get things to work properly.
A mesh can only bind to a single skin
I know this is pretty implicit, but I believe it should be defined.
As a per previous discussion here: Multiple duplicated skins are exported for each mesh child of the armature glTF-Blender-IO#566 (comment)
All meshes skinned must be normalized to the local space of the skeleton.
Now I have another requirement that I am torn between two options: (5) and (6)
Each joint shall belong to only one skin
This is the ideal choice as it makes things extremely simple for importers as they do not need to resolve/union skinning trees to find the master skeleton/skin (see below)
Each skin shall either define a new tree of unused joints, or, explicitly be a subtree of a previously defined skin.
This allows for multiple skins per skeleton, but the subtree definition implies that all skins for a single skeleton must have a "master" skin that holds all of the joints for a skeleton. This makes it easy for an importer to map smaller skin definitions to a master skeleton containing all the joints.
The text was updated successfully, but these errors were encountered: