Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lightweight, centralized object storage #1021

Closed
14 tasks done
hannobraun opened this issue Aug 31, 2022 · 14 comments · Fixed by #1255
Closed
14 tasks done

Implement lightweight, centralized object storage #1021

hannobraun opened this issue Aug 31, 2022 · 14 comments · Fixed by #1255
Labels
topic: core Issues relating to core geometry, operations, algorithms type: development Work to ease development or maintenance, without direct effect on features or bugs

Comments

@hannobraun
Copy link
Owner

hannobraun commented Aug 31, 2022

Not too long ago, we used to have the Shape data structure which represented a single shape. It stored the shape's objects and provided a whole lot of infrastructure to manage them. Shape was overly complex, and was removed after simpler ways to do the things it did were found (#697). Since then, we no longer have any form of centralized storage of objects.

Even back then, I suspected that we'd need to grow back some form of centralized object storage eventually. Although much lighter-weight, and without the problematic parts that Shape had. I've been thinking about this in the back of my mind, and this issue is the result of that.

Why do we need centralized object storage?

So why do we need it? Centralized object storage gives us one thing we can't get any other way: We'll be able to refer to existing objects using a handle, which is kind of a reference to the object in the store.

But how is that an advantage? There are two ways that I can see:

  1. It's lighter-weight. In principle, objects reference other objects. In practice, right now, it's not a reference. Instead, the referencing object owns a copy of the referenced object, resulting in much duplication.
  2. It allows to distinguish between object identity and object equality. This allows for more stringent data structures that could prevent some bugs.

Advantage 1. doesn't really matter right now. Yes, it would be nice to slim down the data structures somewhat (I don't have numbers, but I suspect that they are many times bigger than they'd need to be), but that doesn't cause any problems right now. Long-term, it might make a difference in performance.

Advantage 2. is a bit more substantial, in that it gives us back a capability that we lost when removing Shape. Back then, I said that object identity wasn't as important as I thought it would be, but I'm no longer sure. Accidentally creating an object that is identical to an already existing one, when you should just use the already existing one instead, might be a sign of sloppy code that could cause other problems.

Although, to be honest, I can't point to evidence of that right now. This is just an impression that has grown over time, as I'm working on the kernel code. But in any case, centralized object storage gives us the option of being strict about object identity, while still allowing for escape valves where necessary (in the form of a method like store.get_handle_for_or_insert(object)).

How is that going to be different from Shape?

How can we make sure not to repeat the same mistakes though? Here's where I see the differences between the new object store and Shape:

  • It's going to be a global store for all shapes, instead of a per-shape store. This avoids most of the redundancy that was inherent to Shape.
  • Most of Shape's problematic use cases have already been solved in other ways. It simply won't be necessary to rebuild them in the new API.
  • The new API will strictly be an append-only data structure.
  • It will use a lighter-weight handle type. Since there will be no mutation of objects, the handle type can just implement Deref, which will make it convenient to use.

Conclusion

Centralized object storage will allow us to de-duplicate the currently highly redundant data structures, likely reducing their size by a significant factor. It also gives us the option for being more strict about object identity, without forcing us into anything.

Neither of these solve any problems that are particularly pressing right now, making this issue a low priority. Other than that, it seems like a clear advantage though, so I'm sure it will happen eventually.

Progress

Work on this issue has started. Centralized object storage exists, but not all objects are being st. The following list reflects the current state of the implementation:

@hannobraun hannobraun added type: development Work to ease development or maintenance, without direct effect on features or bugs topic: core Issues relating to core geometry, operations, algorithms labels Aug 31, 2022
@hannobraun
Copy link
Owner Author

This issue just entered the critical path towards #42 (via #1079 and #993). Adding to the active milestone and assigning myself.

@dwcarr: I know you expressed interest in working on this, and I don't mean to step on your toes. But considering that this issue is now blocking my main priority (#42), and that I'm probably in the best position to address it in a timely manner (Fornjot being my full-time job and all), I'm taking it. Please let me know, if you want to collaborate in any way.

@dwcarr
Copy link

dwcarr commented Sep 15, 2022

@dwcarr: I know you expressed interest in working on this, and I don't mean to step on your toes.

No worries. I am having a down week (brain fog!!!), so don't wait on me. My thought on this are that we should have a centralized data store, along with a state manager that is the sole place where the data can be modified. This would make the future implementation of sketch constraints more straightforward as well, as constraints would be implemented as callbacks on data changes from within that state manager.

@hannobraun
Copy link
Owner Author

Thank you for the feedback!

I'll have to think about that state manager some more. Right now, I don't see how it fits into what we have, since objects are immutable and therefore don't change, and I meant to carry that forward by making the store append-only. (The old Shape storage was mutable, as I implied but didn't spell out in the issue description. That didn't work so well, in my opinion.)

This whole model kind of follows naturally from the code-based approach (for now), as the model is a program that needs to be evaluated when it changes, and from each evaluation you get a new model. I can see reasons why that wouldn't work well anymore going forward, once models become complex enough so re-evaluating them all at once becomes impractical. I have that filed away in my head under "not today's problem", like so many other things 😄

@dwcarr
Copy link

dwcarr commented Sep 16, 2022

Right, I keep getting tripped up by the code-based approach and how that affects the design workflow. I do see a better way now, I think, though I'm still having trouble with how "removal" operations, boolean or otherwise, can work with an append-only data store. My fog is slowly lifting, so this stuff will get clearer for me.

@hannobraun
Copy link
Owner Author

Before I continue, let me emphasize that I don't want to come across as dictating how things must be done. As I wrote elsewhere, there is a limit to how far I can plan ahead without sabotaging progress on what's needed right now. And given the complexity of the subject matter (and my own limits of knowledge and experience), this planning horizon doesn't go very far.

So if I express an opinion about something, it should be understood within that context. I might strongly believe that something is the right choice right now, but I never expect anything to be the "final" design.

Having said that, let me try to address your confusion:

I do see a better way now, I think, though I'm still having trouble with how "removal" operations, boolean or otherwise, can work with an append-only data store. My fog is slowly lifting, so this stuff will get clearer for me.

To modify an object, you add a new version of it to the store, as well as new versions of all objects that reference it (and all objects that reference them, recursively). Basically, it's a copy-on-write situation.

The objects in the store form a graph, and a shape is just a pointer into that graph.

@LegNeato
Copy link

You might want to look into https://github.com/TimelyDataflow/differential-dataflow

@hannobraun
Copy link
Owner Author

@LegNeato Wow, that looks really interesting! I can't claim to understand what's going on there, but it's definitely worth a close look, once we have the need to incrementally update our data. I've added it to our list of useful resources.

@hannobraun
Copy link
Owner Author

I've implemented the infrastructure required for centralized object storage in #1108, and also added a store for GlobalCurves in that pull request. Since that should be enough to un-block #1079, I'm unassigning myself from this issue, and will remove it from the milestone I'm currently working on.

I've updated the issue description with a list that displays the current implementation status. We can continue work on this at our leisure.

@hannobraun
Copy link
Owner Author

My main priority (#42) is currently blocked by #1162. To help address this issue, I'm going to expand the scope of centralized object storage, by adding HalfEdge and all objects it references (directly or indirectly).

Assigning myself and adding to current milestone.

@hannobraun
Copy link
Owner Author

I've integrated Surface in #1163 and updated the list in the issue description.

So far surface equality is still used in validation code. Surface identity should be used instead, as that would make the validation stricter and prevent possibly bugs. I have that change in a local branch, but I'm getting test failures. I will look into that next week.

@hannobraun
Copy link
Owner Author

All validation code has been updated to rely on surface identity instead of equality. Moving on to integrating Curve into the centralized object storage now.

@hannobraun
Copy link
Owner Author

And Curves are done (#1176)! After spending days getting Surfaces fully integrated (including the stricter validation code), I expected more of a battle 😂

@hannobraun
Copy link
Owner Author

GlobalVertex and SurfaceVertex have been added to the centralized object storage. All validation code has been enhanced to make use of this, making it much stricter by making any comparisons of these objects using object identity instead of object equality.

I will now continue with adding Vertex or GlobalEdge to the centralized object storage.

@hannobraun
Copy link
Owner Author

Vertex, GlobalEdge, and HalfEdge are now fully integrated into the centralized object storage. Those went pretty quick! In the case of Vertex and HalfEdge, because those don't tend to get shared much between other objects, giving little opportunity for object duplication issues to spring up. In the case of GlobalEdge, I suspect it's rather the lack of validation code that could detect such issues (#1129).

This concludes my current round of work on this issue. We now have enough objects in storage for me to continue with my plan to address #1162.

This was some really interesting work. Putting all those different objects into centralized storage allowed the validation code that worked with those objects to become much more strict. As a result, I've fixed many potential and some confirmed bugs. This also made any code that's constructing objects much more complicated. I invented a whole new API to make it possible at all, but I have yet to figure out how to express these new concepts in a way, that doesn't turn a whole lot of object construction code into a huge mess.

I think this is fine for now. Despite being really messy and hard to understand, that code is not fragile. It's supported by the object validation code that uncovered all those issues in the first place. It will be interesting to see how things progress from here. I'm confident that it's possible to express that code in a much cleaner way, but the high-level concepts to do so have yet to be discovered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: core Issues relating to core geometry, operations, algorithms type: development Work to ease development or maintenance, without direct effect on features or bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants