EDN-LD is a set of conventions and a library for working with Linked Data (LD) using Extensible Data Notation (EDN) and the Clojure programming language. EDN-LD builds on EDN and JSON-LD, but is not otherwise affiliated with those projects.
This project is in early development!
Linked data is an approach to working with data on the Web:
- instead of tables we have graphs -- networks of data
- instead of rows we have resources -- nodes in the graph
- the values in our cells are also nodes -- either resources or literals: strings, numbers, dates
- and instead of columns we have named relations that link nodes to form the graph
Just think of your tables as big sets of row-column-cell "triples". By switching from rigid tables to flexible graphs, we can easily merge data from across the web.
Linked data is simple. The tools for working with it are powerful: big Java libraries such as Jena, Sesame, OWLAPI, etc. Unfortunately, most of the tools are not simple.
EDN-LD is a simple linked data tool.
EDN-LD is a Clojure library. The easiest way to get started is to use Leiningen and add this to your project.clj
dependencies:
[edn-ld "0.2.1"]
Try out EDN-LD with our interactive online tutorial, or by cloning this project and starting a REPL:
$ git clone https://github.com/ontodev/edn-ld.git
$ cd edn-ld
$ lein repl
nREPL server started ...
user=> (use 'edn-ld.core 'edn-ld.common)
nil
user=> (require '[clojure.string :as string])
nil
user=> "Ready!"
Ready!
Say we have a (very small) table of books and their authors called books.tsv
:
Title | Author |
---|---|
The Iliad | Homer |
A common way to represent this in Clojure is as a list of maps, with the column names as the keys. We can slurp
and split the data until we get what we want:
user=> (defn split-row [row] (string/split row #"\t"))
#'user/split-row
user=> (defn read-tsv [path] (->> path slurp string/split-lines (drop 1) (mapv split-row)))
#'user/read-tsv
user=> (def rows (read-tsv "test-resources/books.tsv"))
#'user/rows
user=> rows
[["The Iliad" "Homer"]]
Now we use zipmap
to associate keys with values:
user=> (def data (mapv (partial zipmap [:title :author]) rows))
#'user/data
user=> data
[{:title "The Iliad", :author "Homer"}]
We have the data in a convenient shape, but what does it mean? Well, there's some resource that has "The Iliad" as its title, and some guy named "Homer" who is the author of that resource. We also know from the context that it's a book.
The first thing to do is give names to our resources. Linked data names are IRIs: globally unique identifiers that generalize the familiar URL you see in your browser's location bar. We can use some standard names for our relations from the Dublin Core metadata standard, and we'll make up some more.
Name | IRI |
---|---|
title | http://purl.org/dc/elements/1.1/title |
author | http://purl.org/dc/elements/1.1/author |
The Iliad | http://example.com/the-iliad |
Homer | http://example.com/Homer |
book | http://example.com/book |
IRIs can be long and cumbersome, so let's define some prefixes that we can use to shorten them:
Prefix | IRI |
---|---|
dc |
http://purl.org/dc/elements/1.1/ |
ex |
http://example.com/ |
The ex
prefix will be our default. We use strings for full IRIs and keywords when we're using some sort of contraction.
IRI | Contraction |
---|---|
http://purl.org/dc/elements/1.1/title |
:dc:title |
http://purl.org/dc/elements/1.1/author |
:dc:author |
http://example.com/the-iliad |
:the-iliad |
http://example.com/Homer |
:Homer |
http://example.com/book |
:book |
We'll put this naming information in a context map:
user=> (def context {:dc "http://purl.org/dc/elements/1.1/", :ex "http://example.com/", nil :ex, :title :dc:title, :author :dc:author})
#'user/context
The nil
key indicates the default prefix :ex
. Now we can use the context to expand contractions and to contract IRIs:
user=> (expand context :title)
http://purl.org/dc/elements/1.1/title
user=> (expand context :Homer)
http://example.com/Homer
user=> (contract context "http://purl.org/dc/elements/1.1/title")
:title
user=> (contract context "http://purl.org/dc/elements/1.1/foo")
:dc:foo
user=> (expand-all context data)
[{"http://purl.org/dc/elements/1.1/title" "The Iliad", "http://purl.org/dc/elements/1.1/author" "Homer"}]
Sometimes we also want to resolve a name to an IRI. We can define a resources map from string to IRIs or contractions:
user=> (def resources {"Homer" :Homer, "The Iliad" :the-iliad})
#'user/resources
We should include this information in our data by assigning a special :subject-iri
to each of our maps. We can do this one at a time with assoc
:
user=> (def book (assoc (first data) :subject-iri :the-iliad))
#'user/book
user=> book
{:title "The Iliad", :author "Homer", :subject-iri :the-iliad}
Or we can use a higher-order function to find the title from the resources map:
user=> (def books (mapv #(assoc % :subject-iri (get resources (:title %))) data))
#'user/books
user=> books
[{:title "The Iliad", :author "Homer", :subject-iri :the-iliad}]
Now it's time to convert our book data to "triples", i.e. statements about things to put in our graph. A triple consists of a subject, a predicate, and an object:
- the subject is the name of a resource: an IRI
- the predicate is the name of a relation: also an IRI
- the object can either be an IRI or literal data.
We represent an IRI with a string, or a contracted IRI with a keyword. We represent literal data as a map with special keys:
:value
is the string value ("lexical value") of the data, e.g. "The Iliad", "100.31":type
is the IRI of a data type, withxsd:string
as the default:lang
is an optional language code, e.g. "en", "en-uk"
The literal
function is a convenient way to create a literal map:
user=> (literal "The Iliad")
{:value "The Iliad"}
user=> (literal 100.31)
{:value "100.31", :type :xsd:float}
The objectify
function takes a resource map and a value, and determines whether to convert the value to an IRI or a literal:
user=> (objectify resources "Some string")
{:value "Some string"}
user=> (objectify resources "Homer")
:Homer
Now we can treat each map as a set of statements about a resources, and triplify
it to a lazy sequence of triples. The format will be "flat triples", a list with slots for: subject, predicate, object, type, and lang.
The triplify
function takes our resource map and a map of data that includes a :subject-iri
key. It returns a lazy sequence of triples.
user=> (def triples (triplify resources book))
#'user/triples
user=> (vec triples)
[[:the-iliad :title {:value "The Iliad"}] [:the-iliad :author :Homer]]
You'll notice that the subject :the-iliad
is repeated here. With a larger set of triples the redundancy will be greater. Instead we can use a nested data structure:
user=> (def subjects (subjectify triples))
#'user/subjects
user=> subjects
{:the-iliad {:title #{{:value "The Iliad"}}, :author #{:Homer}}}
From the inside out, it works like this:
- object-set: the set of object with the same subject and predicate
- predicate-map: a map from predicate IRIs to object sets
- subject-map: map from subject IRIs to predicate sets
We work with these data structures like any other Clojure data, using merge
, assoc
, update
, and the rest of the standard Clojure toolkit:
user=> (def context+ (merge default-context context))
#'user/context+
user=> (def subjects+ (assoc-in subjects [:the-iliad :rdf:type] #{:book}))
#'user/subjects+
user=> (def triples+ (conj triples [:the-iliad :rdf:type :book]))
#'user/triples+
Now, we can write to standard linked data formats, such as Turtle:
user=> (def prefixes (assoc (get-prefixes context) :rdf rdf :xsd xsd))
#'user/prefixes
user=> (def expanded-triples (map #(expand-all context+ %) triples+))
#'user/expanded-triples
user=> (edn-ld.jena/write-triple-string prefixes expanded-triples)
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix ex: <http://example.com/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
ex:the-iliad a ex:book ;
dc:author ex:Homer ;
dc:title "The Iliad"^^xsd:string .
One more thing before we're done: named graphs. A graph is just a set of triples. When we want to talk about a particular graph, we give it a name: an IRI, of course. Then we can talk about sets of named graphs when we want to compare them, merge them, etc. The official name for a set of graphs is an "RDF dataset". A dataset includes "default graph" with no name.
By adding the name of a graph, our triples become quads ("quadruples"). We define a quad and some new functions to handle them.
user=> (def library [(assoc book :graph-iri :library)])
#'user/library
user=> library
[{:title "The Iliad", :author "Homer", :subject-iri :the-iliad, :graph-iri :library}]
user=> (def quads (quadruplify-all resources library))
#'user/quads
user=> (vec quads)
[[:library :the-iliad :title {:value "The Iliad"}] [:library :the-iliad :author :Homer]]
user=> (graphify quads)
{:library {:the-iliad {:title #{{:value "The Iliad"}}, :author #{:Homer}}}}
- 0.2.1
- fix bug in edn-ld.jena/make-node
- 0.2.0
- use Apache Jena for reading and writing
- fix
triplify
functions to use:subject-iri
key - add
quadruplify
andgraphify
functions, using:graph-iri
key - rename
squash
functions toflatten
- fix
flatten
functions - many more unit tests
- prefer Triples to FlatTriples
- 0.1.0
- first release
- finish streaming RDFXML reader and writer
- ClojureScript support? Would require different libraries for reading and writing
Copyright © 2015 James A. Overton
Distributed under the BSD 3-Clause License.