Skip to content

Commit

Permalink
docs, shorthand, etc.
Browse files Browse the repository at this point in the history
  • Loading branch information
mbostock committed Nov 1, 2024
1 parent 7704416 commit 1dde616
Show file tree
Hide file tree
Showing 10 changed files with 361 additions and 171 deletions.
42 changes: 29 additions & 13 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -303,27 +303,43 @@ export default {

## duckdb <a href="https://github.com/observablehq/framework/pull/1734" class="observablehq-version-badge" data-version="prerelease" title="Added in #1734"></a>

The **duckdb** option allows you to specify the DuckDB [extensions](./sql#extensions) that you want to self-host and make available in the `sql` and `DuckDBClient` instances.
The **duckdb** option configures [self-hosting](./lib/duckdb#self-hosting-of-extensions) and loading of [DuckDB extensions](./lib/duckdb#extensions) for use in [SQL code blocks](./sql) and the `sql` and `DuckDBClient` built-ins. For example, a geospatial data app might enable the [`spatial`](https://duckdb.org/docs/extensions/spatial/overview.html) and [`h3`](https://duckdb.org/community_extensions/extensions/h3.html) extensions like so:

Its **extensions** property is an object where keys are extension names, and values describe the **source** for the extension, and whether to **install** (self-host) it, and **load** it immediately.

The **source** property is the reference of the repo from which to download the extension. It defaults to `core`, which points to `https://extensions.duckdb.org/`. You can use `core`, `community` (which points to `https://community-extensions.duckdb.org/`), or a custom URL, for example if you develop your own extensions.

By default "json" and "parquet" are installed, but not loaded (since they are autoloaded, there is no reason to load them before we actually need them). If you don’t want to self-host an extension, set its **install** property to false. You will still be able to load it from its source by calling `INSTALL` and `LOAD`.
```js run=false
export default {
duckdb: {
extensions: ["spatial", "h3"]
}
};
```

As a shorthand, you can specify `name: true` to install and load the named extension from the "core" repository. (And `name: false` is shorthand for `{install: false, load: false}`.)
The **extensions** option can either be an array of extension names, or an object whose keys are extension names and whose values are configuration options for the given extension, including its **source** repository (defaulting to the keyword _core_ for core extensions, and otherwise _community_; can also be a custom repository URL), whether to **load** it immediately (defaulting to true, except for known extensions that support autoloading), and whether to **install** it (_i.e._ to self-host, defaulting to true). As additional shorthand, you can specify `[name]: true` to install and load the named extension from the default (_core_ or _community_) source repository, or `[name]: string` to install and load the named extension from the given source repository.

For example, a typical configuration for a geospatial data app might install and load “spatial” from `core` and “h3” from `community`:
The configuration above is equivalent to:

```js run=false
duckdb: {
extensions: {
spatial: true,
h3: {source: "community"}
export default {
duckdb: {
extensions: {
spatial: {
source: "https://extensions.duckdb.org/",
install: true,
load: true
},
h3: {
source: "https://community-extensions.duckdb.org/",
install: true,
load: true
}
}
}
}
};
```

The `json` and `parquet` are configured (and therefore self-hosted) by default. To expressly disable self-hosting of extension, you can set its **install** property to false, or equivalently pass null as the extension configuration object.

For more, see [DuckDB extensions](./lib/duckdb#extensions).

## markdownIt <a href="https://github.com/observablehq/framework/releases/tag/v1.1.0" class="observablehq-version-badge" data-version="^1.1.0" title="Added in v1.1.0"></a>

A hook for registering additional [markdown-it](https://github.com/markdown-it/markdown-it) plugins. For example, to use [markdown-it-footnote](https://github.com/markdown-it/markdown-it-footnote), first install the plugin with either `npm add markdown-it-footnote` or `yarn add markdown-it-footnote`, then register it like so:
Expand Down
91 changes: 83 additions & 8 deletions docs/lib/duckdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ const db2 = await DuckDBClient.of({base: FileAttachment("quakes.db")});
db2.queryRow(`SELECT COUNT() FROM base.events`)
```

For externally-hosted data, you can create an empty `DuckDBClient` and load a table from a SQL query, say using [`read_parquet`](https://duckdb.org/docs/guides/import/parquet_import) or [`read_csv`](https://duckdb.org/docs/guides/import/csv_import). DuckDB offers many affordances to make this easier (in many cases it detects the file format and uses the correct loader automatically).
For externally-hosted data, you can create an empty `DuckDBClient` and load a table from a SQL query, say using [`read_parquet`](https://duckdb.org/docs/guides/import/parquet_import) or [`read_csv`](https://duckdb.org/docs/guides/import/csv_import). DuckDB offers many affordances to make this easier. (In many cases it detects the file format and uses the correct loader automatically.)

```js run=false
const db = await DuckDBClient.of();
Expand Down Expand Up @@ -106,20 +106,95 @@ const sql = DuckDBClient.sql({quakes: `https://earthquake.usgs.gov/earthquakes/f
SELECT * FROM quakes ORDER BY updated DESC;
```

## Extensions
## Extensions <a href="https://github.com/observablehq/framework/pull/1734" class="observablehq-version-badge" data-version="prerelease" title="Added in #1734"></a>

DuckDB’s [extensions](../sql#extensions)<a href="https://github.com/observablehq/framework/pull/1734" class="observablehq-version-badge" data-version="prerelease" title="Added in #1734"></a> are supported.
[DuckDB extensions](https://duckdb.org/docs/extensions/overview.html) extend DuckDB’s functionality, adding support for additional file formats, new types, and domain-specific functions. For example, the [`json` extension](https://duckdb.org/docs/data/json/overview.html) provides a `read_json` method for reading JSON files:

By default, `DuckDBClient.of` and `DuckDBClient.sql` load the extensions referenced in the [configuration](../config#duckdb). If you want a different environment, you can pass options listing the extensions you want to load.
```sql echo
SELECT bbox FROM read_json('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson');
```

To read a local file (or data loader), use `FileAttachment` and interpolation `${…}`:

```sql echo
SELECT bbox FROM read_json(${FileAttachment("../quakes.json").href});
```

For convenience, Framework configures the `json` and `parquet` extensions by default. Some other [core extensions](https://duckdb.org/docs/extensions/core_extensions.html) also autoload, meaning that you don’t need to explicitly enable them; however, Framework will only [self-host extensions](#self-hosting-of-extensions) if you explicitly configure them, and therefore we recommend that you always use the [**duckdb** config option](../config#duckdb) to configure DuckDB extensions. Any configured extensions will be automatically [installed and loaded](https://duckdb.org/docs/extensions/overview#explicit-install-and-load), making them available in SQL code blocks as well as the `sql` and `DuckDBClient` built-ins.

For example, to configure the [`spatial` extension](https://duckdb.org/docs/extensions/spatial/overview.html):

```js run=false
export default {
duckdb: {
extensions: ["spatial"]
}
};
```

You can then use the `ST_Area` function to compute the area of a polygon:

```sql echo run=false
SELECT ST_Area('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::GEOMETRY) as area;
```

To tell which extensions have been loaded, you can run the following query:

```sql echo
FROM duckdb_extensions() WHERE loaded;
```

<div class="warning">

If the `duckdb_extensions()` function runs before DuckDB autoloads a core extension (such as `json`), it might not be included in the returned set.

For example, pass an empty array to instantiate a DuckDBClient with no loaded extensions (even if your configuration lists several extensions):
</div>

### Self-hosting of extensions

As with [npm imports](../imports#self-hosting-of-npm-imports), configured DuckDB extensions are self-hosted, improving performance, stability, & security, and allowing you to develop offline. Extensions are downloaded to the DuckDB cache folder, which lives in <code>.observablehq/<wbr>cache/<wbr>_duckdb</code> within the source root (typically `src`). You can clear the cache and restart the preview server to re-fetch the latest versions of any DuckDB extensions. If you use an [autoloading core extension](https://duckdb.org/docs/extensions/core_extensions.html#list-of-core-extensions) that is not configured, DuckDB-Wasm [will load it](https://duckdb.org/docs/api/wasm/extensions.html#fetching-duckdb-wasm-extensions) from the default extension repository, `extensions.duckdb.org`, at runtime.

## Configuring

The second argument to `DuckDBClient.of` and `DuckDBClient.sql` is a [`DuckDBConfig`](https://shell.duckdb.org/docs/interfaces/index.DuckDBConfig.html) object which configures the behavior of DuckDB-Wasm. By default, Framework sets the `castBigIntToDouble` and `castTimestampToDate` query options to true. To instead use [`BigInt`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt):

```js run=false
const bigdb = DuckDBClient.of({}, {query: {castBigIntToDouble: false}});
```

By default, `DuckDBClient.of` and `DuckDBClient.sql` automatically load all [configured extensions](#extensions). To change the loaded extensions for a particular `DuckDBClient`, use the **extensions** config option. For example, pass an empty array to instantiate a DuckDBClient with no loaded extensions (even if your configuration lists several):

```js echo run=false
const simpledb = DuckDBClient.of({}, {load: []});
const simpledb = DuckDBClient.of({}, {extensions: []});
```

Or, create a geospatial tagged template literal:
Alternatively, you can configure extensions to be self-hosted but not load by default using the **duckdb** config option and the `load: false` shorthand:

```js run=false
export default {
duckdb: {
extensions: {
spatial: false,
h3: false
}
}
};
```

You can then selectively load extensions as needed like so:

```js echo run=false
const geosql = DuckDBClient.sql({}, {load: ["spatial", "h3"]});
const geosql = DuckDBClient.sql({}, {extensions: ["spatial", "h3"]});
```

In the future, we’d like to allow DuckDB to be configured globally (beyond just [extensions](#extensions)) via the [**duckdb** config option](../config#duckdb); please upvote [#1791](https://github.com/observablehq/framework/issues/1791) if you are interested in this feature.

## Versioning

Framework currently uses [DuckDB-Wasm 1.29.0](https://github.com/duckdb/duckdb-wasm/releases/tag/v1.29.0), which aligns with [DuckDB 1.1.1](https://github.com/duckdb/duckdb/releases/tag/v1.1.1). You can load a different version of DuckDB-Wasm by importing `npm:@duckdb/duckdb-wasm` directly, for example:

```js run=false
import * as duckdb from "npm:@duckdb/[email protected]";
```

However, you will not be able to change the version of DuckDB-Wasm used by SQL code blocks or the `sql` or `DuckDBClient` built-ins, nor can you use Framework’s support for self-hosting extensions with a different version of DuckDB-Wasm.
46 changes: 1 addition & 45 deletions docs/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ sql:

<div class="tip">For performance and reliability, we recommend using local files rather than loading data from external servers at runtime. You can use a <a href="./data-loaders">data loader</a> to take a snapshot of a remote data during build if needed.</div>

You can also register tables via code (say to have sources that are defined dynamically via user input) by defining the `sql` symbol with [DuckDBClient.sql](./lib/duckdb).
You can also register tables via code (say to have sources that are defined dynamically via user input) by defining the `sql` symbol with [DuckDBClient.sql](./lib/duckdb). To register [DuckDB extensions](./lib/duckdb#extensions), use the [**duckdb** config option](./config#duckdb).

## SQL code blocks

Expand Down Expand Up @@ -206,47 +206,3 @@ Inputs.table(await sql([`SELECT * FROM gaia WHERE source_id IN (${[source_ids]})
When interpolating values into SQL queries, be careful to avoid [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) by properly escaping or sanitizing user input. The example above is safe only because `source_ids` are known to be numeric.

</div>

## Extensions <a href="https://github.com/observablehq/framework/pull/1734" class="observablehq-version-badge" data-version="prerelease" title="Added in #1734"></a>

DuckDB has a flexible extension mechanism that allows for dynamically loading extensions. These may extend DuckDB's functionality by providing support for additional file formats, introducing new types, and domain-specific functionality.

Framework can download and host the extensions of your choice. By default, only "json" and "parquet" are self-hosted, but you can add more by specifying them in the [configuration](./config). The self-hosted extensions are served from the `/_duckdb/` directory with a content-hashed URL, ensuring optimal performance and allowing you to work offline and from a server you control.

The self-hosted extensions are immediately available in all the `sql` code blocks and [DuckDBClient](./lib/duckdb) instances. For example, the query below works instantly since the "json" extension is configured:

```sql echo
SELECT bbox FROM read_json('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson');
```

Likewise, with the “spatial” extension configured, you could directly run:

```sql echo run=false
SELECT ST_Area('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::GEOMETRY) as area;
```

If you use an extension that is not self-hosted, DuckDB falls back to loading it directly from DuckDB’s servers. For example, this documentation does not have the “inet” extension configured for self-hosting.

```sql echo
SELECT '127.0.0.1'::INET AS ipv4, '2001:db8:3c4d::/48'::INET AS ipv6;
```

During development, you can experiment freely with extensions that are not self-hosted. For example to try out the “h3” `community` extension:

```sql echo run=false
INSTALL h3 FROM community;
LOAD h3;
SELECT format('{:x}', h3_latlng_to_cell(37.77, -122.43, 9)) AS cell_id;
```

<small>(this returns the H3 cell [`892830828a3ffff`](https://h3geo.org/#hex=892830828a3ffff))</small>

For performance and ergonomy, we strongly recommend adding all the extensions you actually use to the [configuration](./config#duckdb).

<div class="tip">

To tell which extensions are effectively in use on a page, inspect the network tab in your browser, or run the following query: `FROM duckdb_extensions() WHERE loaded;`.

</div>

These features are tied to DuckDB wasm’s 1.29 version, and strongly dependent on its development cycle.
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@
"test": "concurrently npm:test:mocha npm:test:tsc npm:test:lint npm:test:prettier",
"test:coverage": "c8 --check-coverage --lines 80 --per-file yarn test:mocha",
"test:build": "rimraf test/build && cross-env npm_package_version=1.0.0-test node build.js --sourcemap --outdir=test/build \"{src,test}/**/*.{ts,js,css}\" --ignore \"test/input/**\" --ignore \"test/output/**\" --ignore \"test/preview/dashboard/**\" --ignore \"**/*.d.ts\" && cp -r templates test/build",
"test:mocha": "yarn test:build && rimraf --glob test/.observablehq/cache test/input/build/*/.observablehq/cache && cross-env OBSERVABLE_TELEMETRY_DISABLE=1 TZ=America/Los_Angeles mocha --timeout 30000 -p \"test/build/test/**/*-test.js\" && yarn test:annotate",
"test:mocha:serial": "yarn test:build && rimraf --glob test/.observablehq/cache test/input/build/*/.observablehq/cache && cross-env OBSERVABLE_TELEMETRY_DISABLE=1 TZ=America/Los_Angeles mocha --timeout 30000 \"test/build/test/**/*-test.js\" && yarn test:annotate",
"test:mocha": "yarn test:build && rimraf --glob test/.observablehq/cache test/input/build/*/.observablehq/cache && cross-env OBSERVABLE_TELEMETRY_DISABLE=1 TZ=America/Los_Angeles mocha --timeout 30000 -p \"test/build/test/**/*-test.js\"",
"test:mocha:serial": "yarn test:build && rimraf --glob test/.observablehq/cache test/input/build/*/.observablehq/cache && cross-env OBSERVABLE_TELEMETRY_DISABLE=1 TZ=America/Los_Angeles mocha --timeout 30000 \"test/build/test/**/*-test.js\"",
"test:annotate": "yarn test:build && cross-env OBSERVABLE_ANNOTATE_FILES=true TZ=America/Los_Angeles mocha --timeout 30000 \"test/build/test/**/annotate.js\"",
"test:lint": "eslint src test --max-warnings=0",
"test:prettier": "prettier --check src test",
Expand Down
10 changes: 4 additions & 6 deletions src/client/stdlib/duckdb.js
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ import * as duckdb from "npm:@duckdb/duckdb-wasm";
// Baked-in manifest.
// eslint-disable-next-line no-undef
const manifest = DUCKDB_MANIFEST;

const candidates = {
...(manifest.bundles.includes("mvp") && {
mvp: {
Expand All @@ -49,7 +48,6 @@ const candidates = {
};
const bundle = await duckdb.selectBundle(candidates);
const activePlatform = manifest.bundles.find((key) => bundle.mainModule === candidates[key].mainModule);

const logger = new duckdb.ConsoleLogger(duckdb.LogLevel.WARNING);

let db;
Expand Down Expand Up @@ -179,7 +177,7 @@ export class DuckDBClient {
config = {...config, query: {...config.query, castBigIntToDouble: true}};
}
await db.open(config);
await registerExtensions(db, config);
await registerExtensions(db, config.extensions);
await Promise.all(Object.entries(sources).map(([name, source]) => insertSource(db, name, source)));
return new DuckDBClient(db);
}
Expand All @@ -191,14 +189,14 @@ export class DuckDBClient {

Object.defineProperty(DuckDBClient.prototype, "dialect", {value: "duckdb"});

async function registerExtensions(db, {load}) {
async function registerExtensions(db, extensions = []) {
const connection = await db.connect();
try {
await Promise.all(
manifest.extensions.map(([name, {[activePlatform]: ref, load: l}]) =>
manifest.extensions.map(([name, {[activePlatform]: ref, load}]) =>
connection
.query(`INSTALL "${name}" FROM '${ref.startsWith("https://") ? ref : import.meta.resolve(`../..${ref}`)}'`)
.then(() => (load ? load.includes(name) : l) && connection.query(`LOAD "${name}"`))
.then(() => load && extensions.includes(name) && connection.query(`LOAD "${name}"`))
)
);
} finally {
Expand Down
Loading

0 comments on commit 1dde616

Please sign in to comment.