Data Rich Documents

rufuspollock · 2023-10-31T22:38:36Z

rufuspollock
Oct 31, 2023
Maintainer

Data Rich Documents is a specification for authoring and rendering content with data.

Data Rich Documents are markdown files with defined extensions for embedding or linking data and for presenting data in tables, graphs and maps.

Demo

TODO ... (create a website using flowershow and sample content - an update of https://github.com/datopian/data-literate)

Motivation

I want to turn a (data-oriented) markdown file into an elegant data-driven web page

I want to publish a dataset to the web quickly, elegantly (and potentially interactively)

Data-oriented markdown file is the kind of thing in https://github.com/datasets/awesome-data or one of the issues https://github.com/datasets/awesome-data/issues

We want things like:

Auto-conversion of links to csv or excel files into nice tables or previews
Create charts using react components (in markdown) a la MDX
Have datasets as markdown files using the frontmatter for metadata
Datasets within markdown files using backtick notation e.g. ```dataset
Embedded csv or json data turned into nice tables

We term these kind of markdown documents with additional data-oriented functionality "data rich".

Our "data rich" document is a markdown (or MDX) file with the following additional features:

Table specification and rendering
Graph specification and rendering
Map specification and rendering
Auto-render linked tables: auto-convert links to CSV or Excel into inline tables. [in progress]

Use cases (in full)

Researching or writing up a (data-heavy) topic and sharing it

I want to jot down, notes and links, preview data, display or graph data etc. I want to do this in markdown as that's what i work in. I want to be able to preview and then share this with others.

Curating data

I want to quickly turn some data I've found into a properly curated dataset. I might want to do this iteratively: starting with the minimum e.g. just a link and a few notes, then moving to caching the data, then previewing the data etc. The kind of thing you find in the issues of github.com/datasets/awesome-data/issues. I want to view this dataset as i've developing it and publish this dataset when its down (or as i go along)

Content + Data are naturally co-occurring in many settings

In reality, data and content usually go together. For example, consider two main and common use cases:

Data-driven content: this is the case of research, data journalism, high end visualization etc.
Data with content: most datasets end up with some content in the form of documentation. In fact, many datasets start out as content e.g. a few notes and a link to a possible data source and then evolve into a fully polished dataset.

Desired features of the tooling:

Extensible
Open source
Simple

How is this different from ...

So one big thing is we've no interest to tackle interactivity like Jupyter or ObservableHQ. The aim is for the documents and the data themselves to be static. Any processing you do of them is done elsewhere and any loading and procesing of data is purely declarative (ie. here's the url of the data).

Inspiration

Bl.ocks https://bl.ocks.org/
ObservableHQ https://observablehq.com/
Jupyter Notebooks
DataHub.io v2
RMarkdown

History

This work goes way back personally to efforts in the 2000s and early work on open economics. Then work on ReclineJS in early versions of CKAN (c. 2010-2013).

In its recent form started in 2020/2021 with demo code in https://github.com/datopian/data-literate

davidgasquez · 2023-11-01T11:43:48Z

davidgasquez
Nov 1, 2023

Fully agree with the motivation! Having easy ways of publishing datasets with a nice UX would be awesome.

Sharing two more resources that could also help as inspiration:

They both usually include transformations inside the documents but support loading data declaratively and plotting it.

2 replies

rufuspollock Nov 1, 2023
Maintainer Author

I think those are both great references and see this as providing a common definition of a (hopefully) shared approach.

rufuspollock Nov 1, 2023
Maintainer Author

I also emphasize that this is less maybe about publishing datasets as publishing data rich / data driven analysis and stories.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Rich Documents - a definition (draft, in progress) #1047

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Data Rich Documents - a definition (draft, in progress) #1047

rufuspollock Oct 31, 2023 Maintainer

Demo

Motivation