Strengths and weaknesses

Datavirke · May 24, 2024 · d728f8c · d728f8c
1 parent 4f75585
commit d728f8c
Showing 1 changed file with 227 additions and 4 deletions.
diff --git a/content/posts/contemplating-entity-component-architecture/index.md b/content/posts/contemplating-entity-component-architecture/index.md
@@ -4,6 +4,8 @@ date = 2024-05-22
 draft = false
 [taxonomies]
 tags = ["ecs", "programmaing", "rust", "correctness"]
+[extra]
+toc = true
 +++
 
 Ever since discovering the [Entity Component System](https://en.wikipedia.org/wiki/Entity_component_system) pattern
@@ -50,7 +52,7 @@ in an attempt to describe the entire animal domain, it also allows us to maintai
 
 It's just functions. Really. The beauty of them is that they act on components themselves and make no assumptions about
 the entity to which they belong. In contrast to object methods, this makes them very robust because the reduced scope means
-that the basic assumptions about a Component are far less likely to change [than that of an Object](https://en.wikipedia.org/wiki/God_object).
+that the basic assumptions about a Component are far less likely to change than that of an Object.
 
 ### Practical Applications
 
@@ -379,12 +381,233 @@ Simple enough. Let's get it reviewed, and compare it to the feedback from last t
     Solution: Add a checkbox which employees can check and thereby add the `ObservesCelebratemas` marker to their entities.
 
 
+A lot of our fabricated problems melt away, and the ones that remain are pretty easy to solve. But why?
 
+### 360 No-Scope
 
-## Table Storage
+In an OOP world, your objects have to be *exhaustive*.
 
+When you set out to design your `Supplier` type, you must engage in dialectic with your entire organization about what `Supplier` means to each department that might interact with your software, and *none* of these departments are going to be in agreement, so you end up with an [all-encompassing](https://en.wikipedia.org/wiki/God_object) `Supplier` class which contains every single aspect of what a supplier *might* be, in order to satisfy everyone.
 
+This Supplier object has now become a binding *contract* with unbounded scope, which your entire organization has to uphold for all eternity. When the domain shifts (and it *will* shift!), you've maximized the effort required to enact change, and every assumption every department ever made about what a Supplier is has to be challenged.
 
-## Archetypes
+With ECS, you essentially get exactly what Microservices proclaims to give you: Narrow scope and distributed responsibility. When the domain shifts, you don't have to rethink the world from scratch, or bring in representative greybeards with tacit knowledge from every department: The effort of discovering how each component involved is used is necessarily much smaller than that of the entities they represent.
+
+These domain shifts are still going to be painful, but by reducing both the scope of the assumptions that need to be challenged, you've made the work a lot easier for yourself, and those that come after you.
+
+## Keeping Score
+
+I hope to at this point have convinced you that the idea at the very least has merit, and if you're itching to apply this pattern in your own project, or just play around with it elsewhere, I don't think you're going to learn much more about it from me, but I would like to just sketch out some of the strengths and weaknesses of this approach that I've discovered while using it.
+
+### Strength: Table Storage
+
+ECS lends itself extremely well to the good old-fashioned way of storing data: tables.
+
+Components should generally be kept small, so as to minimize the contract it implies and usually only contains a handful of properties, so storing your components in whatever database software you prefer is pretty easy.
+
+The following components map effortlessly to SQL:
+
+```rust
+struct Todo {
+    thing: String,
+}
+
+struct DueDate {
+    datetime: DateTime,
+}
+
+struct Completion {
+    done_by: Entity,
+    completed_at: DateTime,
+}
+
+struct Name {
+    display_name: String,
+}
+```
+
+Maps easily to SQL:
+
+```sql
+create table todos(
+    entity uuid primary key,
+    thing text not null,
+);
+
+create table due_dates(
+    entity uuid primary key,
+    due_date datetime not null,
+);
+
+create table completions(
+    entity uuid primary key,
+    done_by uuid not null,
+    completed_at datetime not null,
+)
+
+create table names(
+    entity uuid primary key,
+    display_name text not null,
+);
+```
+<small>The lack of foreign keys is intentional, since it encodes assumptions about the components themselves and how they might be used. You could perhaps reasonably create a foreign key constraint between the `entity` field of `due_dates` and `todos` since obviously the former require the latter, but by doing so you are also preventing others from re-using your `DueDate` component for other purposes in the future!</small>
+
+With this we can represent our todo app using "Items" made up of `Todo` components and optionally `DueDate`. Upon completion the `Completion` components is simply added.
+
+Our `Done` component contains references to another entity which is (probably) a user with a `Name` component, but could theoretically also be an automated task which marks tasks with expired due dates as `Done`.
+
+It might look like we've just re-invented [database normalization](https://en.wikipedia.org/wiki/Database_normalization), but ECS is a lot more than just the structure of data, it's a method of separating *Objects* into their constituent parts.
+
+A derivative win for this, is that [Object-relational mapping](https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping), or *ERM* I suppose, actually stops being something you have to fight into submission. ORMs typically break down when you have to attempt to express complex relations between objects, and then effortlessly map that into a multi-layered nested Object structures so your API can express it in JSON terms, only for your client or frontend to throw away 80% of the information. Of course you could implement your API using GraphQL and hundredfold the complexity of both your frontend and backend, or you can simply expose an API endpoint where your frontend or clients can simply choose which components they actually need at that particular time.
+
+Of course this segues us neatly into our next weakness:
+
+### Weakness: Many to Many
+
+When going beyond the examples I've shown above, a few cracks appear. Usually ECS architecture is designed around the idea that each (entity, component) pair is unique. Your entity cannot have multiple `Name` or `DueDate` components associated with it. Most of the time this makes sense since multiple of either of these would be pretty ambiguous, but in some cases this is perfectly reasonable.
+
+It is for example not at all unreasonable to have multiple `ContactEmail` components associated with a company!
+
+One way of resolving this, is to introduce a relational component, like `IsCompanyContactFor`:
+
+```rust
+IsCompanyContact { 
+    // This would point back at the *Company* entity
+    company: Entity 
+}
+```
+
+This could then be attached to *Employee* entities which all have a singular `ContactEmail` component, linking the two together. Finding the contact for a company is now only a *slightly* more involved process:
+```sql
+select
+    contact_emails.email
+from
+    contact_emails
+join
+    is_company_contact
+on
+    is_company_contact.entity = contact_emails.entity
+where 
+    is_company_contact.company = $company_entity_id
+```
+
+But this query is not complicated and easy to express in most ORMs.
+
+Another way of going about this, is to have the component itself by multi-dimensional:
 
-## Kubernetes
+```rust
+struct ContactEmails(
+    List<String>,
+)
+```
+
+As long as your system of record supports it, this doesn't violate the principles of ECS in any way. I would caution against storing *large* amounts of data in this fashion.
+
+The problem exacerbated by Many-to-Many relationships. These relationships are hard to express succintly in any format, and in ECS they require a third entity to express, just as they would in SQL.
+
+Building on our above example, we could theorize that a person might be the contact for multiple companies, necessitating an external mapping:
+
+```rust
+struct CompanyContact {
+    contact: Entity,
+    company: Entity
+}
+```
+
+These `CompanyContact` components would be belong to entirely separate entities, most likely never containing any other components and existing merely as "glue" entities.
+
+### Strength: Extensibility
+
+Shifting gears a bit, I'd like to talk a bit about Kubernetes.
+
+I'm a big fan of Kubernetes, and especially it's extensible API-model and Controller/Operator patterns. One thing I *don't* like about Kubernetes, is having to sprinkle annotations all over the place in order to get things to interoperate properly.
+
+As a brief introduction, Kubernetes models the world internally as *Objects*: Services, Pods, Deployments, and so on. An example:
+
+```yaml
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  name: my-website
+spec:
+  containers:
+  - name: nginx
+    image: nginx:1.14.2
+    ports:
+    - name: http
+      containerPort: 80
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: my-service
+  annotations: # Here's your API, good luck!
+    some-key: some-value
+spec:
+  selector:
+    app.kubernetes.io/name: my-website
+  ports:
+    - protocol: TCP
+      port: 80
+      targetPort: http
+```
+
+The problem with this approach is that these objects are very much not extensible, except through this untyped metadata field known as *Annotations*.
+
+Annotations are simple key-value fields which can contain arbitrary (string) data, and is used by developers and Controllers (*systems*) to communicate with each other.
+
+For example, you might configure an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) object (which is used to route traffic from outside the cluster) with an annotations telling the [Certificate Manager](https://cert-manager.io/) to procure som Let's Encrypt certificates matching the hostname for the ingress:
+
+```
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: my-ingress
+  annotations:
+    cert-manager.io/issuer: "letsencrypt"
+spec:
+  ingressClassName: nginx
+  rules:
+  - http:
+      paths:
+      - path: /
+        pathType: Prefix
+        backend:
+          service:
+            name: my-service
+            port:
+              number: 80
+```
+
+As a Kubernetes administrator you get used to this type of thing, but it's such an ugly *hack*. These these fields are not at all validated, and stringly typed, and if you misspell the key or value, whatever you're trying to accomplish just won't work and you'll need to troubleshoot it somehow.
+
+But it didn't have to be this way! Annotations were never meant to be critical to the functioning of cluster resources, they're just *metadata*, but nonetheless this approach is everywhere in Kubernetes today, simply because it is the easiest way of augmenting existing resources.
+
+If Kubernetes instead was built in an Entity-Component fashion, you could have represented these things in a *much* more extensible way!
+
+Instead of Pods and Containers, you could simply have `Pod` entities defining a context in which to run containers. Containers would exist as independent entities with `Image`, `SecurityContext` and `RunInContext` components pointing to a pod.
+
+When you want to expose a port from one of your containers, you could just attach an `Endpoint` component to your pods *or* containers and model each exposed port as its own entity (with reference to the *Container* entity), allowing `HttpRoutes` and `Certificate` components to be assigned to each of them.
+
+And why stop there? Instead of having distinct `DaemonSet`, `StatefulSet` and `Deplyoment` objects, wherein you duplicate the entire specification for a Pod, you could just create distinct deployment strategy components for each use case, or even express the statefulness of your deployment as yet another component, instead of having to build this information directly into your object definition.
+
+Of course this approach can lead to slightly confusing scenarios... Like what if you end up with an entity consisting only of `Image` and `Certificate`?
+
+### Weakness: Fluid Objects
+
+One of the key benefits of ECS is that we don't have to define exactly what an entity *is*, but this comes with a few downsides as well. As a huge fan of [sum types](https://en.wikipedia.org/wiki/Tagged_union) as implemented in Rust for example, I really appreciate being able to use pattern matching to know *exactly* what kind of object I'm dealing with. With the ECS approach, emergent behavior becomes possible, for better or [worse](https://www.bay12games.com/dwarves/mantisbt/view.php?id=9195).
+
+What this means in practice is that you need to be very precise when designing your components. Imagine you work for a Winery where you decide to delete customers younger than 18 by targeting all entities with the components `Name` and `Age`, you might inadverdently be mangling half your inventory because the stock-keeping department decided to use `Age` for a slightly different purpose.
+
+I would still argue that this is *better* than OOP, since the boundaries of what needs to be agreed upon are that much smaller, but it is a potential foot-gun.
+
+One way to guard against this is to either store data which absolutely does not belong to the same domain (inventory and people for example) in separate databases entirely, or the simpler option which allows some interoperability in the future (personalized bottles?): adding *Markers* to differentiate between entities.
+
+The latter solution might sound like a regressing to an OOP-worldview, but the difference here is that these markers are composable. An entity could simultaneously be `ECommerceCustomer` and an `Employee`, without causing a schism.
+
+
+
+
+
+## Archetypes