diff --git a/docs/files.md b/docs/files.md index 40dcae2b7..900733ea6 100644 --- a/docs/files.md +++ b/docs/files.md @@ -60,15 +60,16 @@ For missing files, `file.lastModified` is undefined. The `file.mimeType` is dete | method | return type | - | - +| [`file.arquero`][arquero] | Arquero [`Table`][arquero-table] | [`file.arrayBuffer`][binary] | [`ArrayBuffer`][array-buffer] -| [`file.arrow`][arrow] | [`Table`][arrow-table] +| [`file.arrow`][arrow] | Arrow [`Table`][arrow-table] | [`file.blob`][binary] | [`Blob`][blob] | [`file.csv`][csv] | [`Array`][array] | [`file.dsv`][csv] | [`Array`][array] | [`file.html`][markup] | [`Document`][document] | [`file.image`][media] | [`HTMLImageElement`][image] | [`file.json`][json] | [`Array`][array], [`Object`][object], _etc._ -| [`file.parquet`][arrow] | [`Table`][arrow-table] +| [`file.parquet`][arrow] | Arrow [`Table`][arrow-table] | [`file.sqlite`][sqlite] | [`SQLiteDatabaseClient`][sqlite] | [`file.stream`][binary] | [`ReadableStream`][stream] | [`file.text`][text] | [`string`][string] @@ -77,6 +78,8 @@ For missing files, `file.lastModified` is undefined. The `file.mimeType` is dete | [`file.xml`][markup] | [`Document`][document] | [`file.zip`][zip] | [`ZipArchive`][zip] +[arquero]: ./lib/arquero +[arquero-table]: https://idl.uw.edu/arquero/api/#table [array-buffer]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer [arrow-table]: https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html [blob]: https://developer.mozilla.org/en-US/docs/Web/API/Blob @@ -98,7 +101,7 @@ For missing files, `file.lastModified` is undefined. The `file.mimeType` is dete [xlsx]: ./lib/xlsx [zip]: ./lib/zip -The contents of a file often dictate the appropriate method — for example, an Apache Arrow file is almost always read with `file.arrow`. When multiple methods are valid, choose based on your needs. For example, you can load a CSV file using `file.text` to implement parsing yourself. +The contents of a file often dictate the appropriate method — for example, an Excel XLSX file is almost always read with `file.xlsx`. When multiple methods are valid, choose based on your needs. For example, you can load a CSV file using `file.arquero` to load it into [arquero](./lib/arquero), or even using `file.text` to implement parsing yourself. In addition to the above, you can get the resolved absolute URL of the file using `file.href`: diff --git a/docs/lib/arquero.md b/docs/lib/arquero.md index 05c003f93..bc29044d8 100644 --- a/docs/lib/arquero.md +++ b/docs/lib/arquero.md @@ -1,8 +1,17 @@ + + # Arquero [Arquero](https://uwdata.github.io/arquero/) is a JavaScript library for “query processing and transformation of array-backed data tables.” Arquero is available by default as `aq` in Markdown, but you can import it explicitly like so: -```js echo +```js run=false import * as aq from "npm:arquero"; ``` @@ -19,14 +28,13 @@ const dt = aq.table({ Arquero is column-oriented: each column is an array of values of a given type. Here, numbers representing hours of sunshine per month. But an Arquero table is also iterable and as such, its contents can be displayed with [`Inputs.table`](/lib/inputs#table). ```js echo -Inputs.table(dt, {maxWidth: 640}) +Inputs.table(dt) ``` An Arquero table can also be used to make charts with [Observable Plot](./plot): ```js echo Plot.plot({ - width: Math.min(width, 640), x: {tickFormat: Plot.formatMonth()}, y: {grid: true, label: "Hours of sunshine ☀️ per month"}, marks: [ @@ -41,25 +49,25 @@ Plot.plot({ Arquero supports a range of data transformation tasks, including filter, sample, aggregation, window, join, and reshaping operations. For example, the following operation derives differences between Seattle and Chicago and sorts the months accordingly. ```js echo -const diffs = dt.derive({ - month: (d) => aq.op.row_number(), - diff: (d) => d.Seattle - d.Chicago - }) - .select("month", "diff") - .orderby(aq.desc("diff")); - -display(Inputs.table(diffs, {maxWidth: 640})); +Inputs.table( + dt.derive({ + month: (d) => aq.op.row_number(), + diff: (d) => d.Seattle - d.Chicago + }) + .select("month", "diff") + .orderby(aq.desc("diff")) +) ``` Is Seattle more correlated with San Francisco or Chicago? ```js echo -const correlations = dt.rollup({ - corr_sf: aq.op.corr("Seattle", "San Francisco"), - corr_chi: aq.op.corr("Seattle", "Chicago") -}); - -display(Inputs.table(correlations, {maxWidth: 640})); +Inputs.table( + dt.rollup({ + corr_sf: aq.op.corr("Seattle", "San Francisco"), + corr_chi: aq.op.corr("Seattle", "Chicago") + }) +) ``` We can aggregate statistics per city. The following code reshapes (or “folds”) the data into two columns _city_ & _sun_ and shows the output as objects: @@ -68,14 +76,25 @@ We can aggregate statistics per city. The following code reshapes (or “folds dt.fold(aq.all(), {as: ["city", "sun"]}) .groupby("city") .rollup({ - min: (d) => aq.op.min(d.sun), // functional form of op.min('sun') - max: (d) => aq.op.max(d.sun), - avg: (d) => aq.op.average(d.sun), - med: (d) => aq.op.median(d.sun), - // functional forms permit flexible table expressions - skew: ({sun: s}) => (aq.op.mean(s) - aq.op.median(s)) / aq.op.stdev(s) || 0 + min: aq.op.min("sun"), + max: aq.op.max("sun"), + avg: (d) => aq.op.average(d.sun), // equivalent to aq.op.average("sun") + med: (d) => aq.op.median(d.sun), // equivalent to aq.op.median("sun") + skew: ({sun}) => (aq.op.mean(sun) - aq.op.median(sun)) / aq.op.stdev(sun) }) .objects() ``` +To load an Arquero table from an Apache Arrow, Apache Parquet, CSV, TSV, or JSON file, use [`file.arquero`](../files#arquero) : + +```js run=false +const flights = FileAttachment("flights-200k.arrow").arquero(); +``` + +This is equivalent to: + +```js run=false +const flights = aq.loadArrow(FileAttachment("flights-200k.arrow").href); +``` + For more, see [Arquero’s official documentation](https://uwdata.github.io/arquero/). diff --git a/docs/lib/csv.md b/docs/lib/csv.md index aa9d5e526..efbb2d639 100644 --- a/docs/lib/csv.md +++ b/docs/lib/csv.md @@ -18,7 +18,7 @@ The column names are listed in the `columns` property: gistemp.columns ``` -You can also load a tab-separated values (TSV) file using `FileAttachment.tsv`: +You can also load a tab-separated values (TSV) file using `file.tsv`: ```js echo const capitals = FileAttachment("us-state-capitals.tsv").tsv({typed: true}); @@ -28,7 +28,7 @@ const capitals = FileAttachment("us-state-capitals.tsv").tsv({typed: true}); Inputs.table(capitals) ``` -For a different delimiter, use `FileAttachment.dsv`. For example, for semicolon separated values: +For a different delimiter, use `file.dsv`. For example, for semicolon separated values: ```js run=false const capitals = FileAttachment("us-state-capitals.csv").dsv({delimiter: ";", typed: true}); diff --git a/docs/loaders.md b/docs/loaders.md index 86a232c87..aeb7f33e8 100644 --- a/docs/loaders.md +++ b/docs/loaders.md @@ -128,7 +128,7 @@ const metadata = FileAttachment("quakes/metadata.json").json(); const features = FileAttachment("quakes/features.csv").csv({typed: true}); ``` -The ZIP file itself can be also referenced as a whole — for example if the names of the files are not known in advance — with [`FileAttachment.zip`](./lib/zip): +The ZIP file itself can be also referenced as a whole — for example if the names of the files are not known in advance — with [`file.zip`](./lib/zip): ```js echo const zip = FileAttachment("quakes.zip").zip(); diff --git a/docs/reactivity.md b/docs/reactivity.md index b9f6a8aa6..388cdea5c 100644 --- a/docs/reactivity.md +++ b/docs/reactivity.md @@ -64,7 +64,7 @@ In Framework, when one code block refers to a promise defined in another code bl
await
only applies across code blocks, not within a code block. Within a code block, a promise is just a promise.