Algebraic Property Graphs
npm i @underlay/apg
@underlay/apg is a TypeScript implementation of an algebraic graph data model, generally adapted from this paper by Shinavier and Wisnesky. This repo has type definitions for schemas, values, and mappings (schema-to-schema transformations) for the data model, along with functions for creating, manipulating, validating, and applying them.
Schemas are self-hosting, which means that schemas themselves are serialized as instances of a "schema schema".
The birds-eye view is that this library defines a collection of structures that you can use to model, serialize, and parse data - similar to JSON or Protobuf. The reason you'd want to do this is that this data model in particular is a little bit magical: it's unusually good at representing most other data models, and it also gives us a little grammar of schema mappings that we can use transform, migrate, and integrate data more reliably than we could if we were just writing code.
Type | Value | Expression |
---|---|---|
reference |
Pointer |
dereference |
uri |
NamedNode |
identifier |
literal |
Literal |
constant |
product |
Record |
tuple / projection |
coproduct |
Variant |
injection /match |
There are three basic kinds of structures defined in src/apg.ts: labels, which are terms in a grammar of types, schemas, which are collections of labels; instances, which are collections of values.
A schema is a set of labels, each of which has an associated type:
Here, the grey rectangles are labels, and the white ellipses are types. The broad intuition is that types can be these complex things composed of other types, and that labels are like "handles" or variables that some of those types are assigned to.
There are a few different kinds of types. Primitive (or "scalar") types are are types like "number" or "string". Then there are two kinds of composite types, which are made up of other types. And lastly there are reference types that point back up to one of the labels.
Type | Kind | Interpretation |
---|---|---|
reference | primitive | label, pointer, recursion |
uri | primitive | RDF Named Nodes, "identifier", "key" |
literal | primitive | RDF Literals, "value" |
product | composite | tuple, record, struct, "AND" |
coproduct | composite | sum, variant, union, "OR" |
Literal types are "configured" with a fixed datatype. In other words, there's no generic "RDF literal" type - literal types are always "RDF literals with datatype ${some IRI}". Similarly, products and coproducts are "configured" to be over a fixed, finite set of other types, and references are configured to point to a fixed label in the same schema.
Except for references, there can't be any cycles in the "type tree" - for example, a product can't have itself as a child component. In this sense, labels can work like explicit "re-entry points" for recursive schemas.
So how does this all represented?
A schema is a map from URI keys to Type
values, and there are five kinds of types.
namespace Schema {
type Schema = Record<string, Type>
type Type = Reference | Uri | Literal | Product | Coproduct
type Reference = { kind: "reference"; value: string }
type Uri = { kind: "uri" }
type Literal = { kind: "literal"; datatype: string }
type Product = { kind: "product"; components: { [key: string]: Type } }
type Coproduct = { kind: "coproduct"; options: { [key: string]: Type } }
}
The "parts" of a product type are called components, and the parts of a coproduct type are called options.
So we've seen how schemas and types are represented - what do values of those types look like?
namespace Instance {
type Value = Reference | Uri | Literal | Product | Coproduct
type Reference = { kind: "reference"; index: number }
type Uri = { kind: "uri"; value: string }
type Literal = { kind: "literal"; value: string }
type Product = { kind: "product"; components: Record<string, Value> }
type Coproduct = { kind: "coproduct"; option: string; value: Value }
type Instance = Record<string, Value[]>
}