From 56756b02b2bbab0085e412120a96ded73cd54e46 Mon Sep 17 00:00:00 2001 From: Jeffrey Heer Date: Thu, 24 Oct 2024 12:56:26 -0700 Subject: [PATCH] Rename data cube indexing to pre-aggregation. (#566) * feat!: Rename DataCubeIndexer to PreAggregator. * filterIndexable -> optimizable, and update some more mentions of indexes * optimizable -> filterStable --------- Co-authored-by: Dominik Moritz --- README.md | 2 +- dev/index.html | 36 +-- docs/api/core/client.md | 6 +- docs/api/core/coordinator.md | 2 +- docs/core/index.md | 2 +- docs/examples/flights-10m.md | 2 +- docs/examples/linear-regression-10m.md | 2 +- docs/public/specs/json/flights-10m.json | 2 +- .../specs/json/linear-regression-10m.json | 2 +- docs/public/specs/yaml/flights-10m.yaml | 2 +- .../specs/yaml/linear-regression-10m.yaml | 2 +- docs/what-is-mosaic/index.md | 2 +- docs/why-mosaic/index.md | 8 +- packages/core/README.md | 2 +- packages/core/src/Coordinator.js | 24 +- packages/core/src/MosaicClient.js | 9 +- .../{DataCubeIndexer.js => PreAggregator.js} | 212 ++++++++++-------- .../{index-columns.js => preagg-columns.js} | 46 ++-- packages/core/src/util/selection-types.ts | 2 +- packages/core/test/client.test.js | 4 +- ...-indexer.test.js => preaggregator.test.js} | 2 +- packages/plot/src/marks/Density1DMark.js | 2 +- packages/plot/src/marks/Grid2DMark.js | 2 +- packages/plot/src/marks/HexbinMark.js | 2 +- .../spec/src/spec/interactors/Interval1D.ts | 3 +- .../spec/src/spec/interactors/Interval2D.ts | 3 +- packages/widget/mosaic_widget/__init__.py | 4 +- packages/widget/src/index.js | 12 +- specs/json/flights-10m.json | 2 +- specs/json/linear-regression-10m.json | 2 +- specs/ts/flights-10m.ts | 2 +- specs/ts/linear-regression-10m.ts | 2 +- specs/yaml/flights-10m.yaml | 2 +- specs/yaml/linear-regression-10m.yaml | 2 +- 34 files changed, 217 insertions(+), 194 deletions(-) rename packages/core/src/{DataCubeIndexer.js => PreAggregator.js} (56%) rename packages/core/src/util/{index-columns.js => preagg-columns.js} (93%) rename packages/core/test/{data-cube-indexer.test.js => preaggregator.test.js} (99%) diff --git a/README.md b/README.md index a5605c39..9016bfe2 100644 --- a/README.md +++ b/README.md @@ -38,7 +38,7 @@ _Note_: For convenience, the `vgplot` package re-exports much of the `mosaic-cor ### Core Components -- [`mosaic-core`](https://github.com/uwdata/mosaic/tree/main/packages/core): The core Mosaic components. A central coordinator, parameters and selections for linking scalar values or query predicates (respectively) across Mosaic clients, and filter groups with optimized index management. The Mosaic coordinator can send queries either over the network to a backing server (`socket` and `rest` clients) or to a client-side [DuckDB-WASM](https://github.com/duckdb/duckdb-wasm) instance (`wasm` client). +- [`mosaic-core`](https://github.com/uwdata/mosaic/tree/main/packages/core): The core Mosaic components. A central coordinator, parameters and selections for linking scalar values or query predicates (respectively) across Mosaic clients, and filter groups with materialized views of pre-aggregated data. The Mosaic coordinator can send queries either over the network to a backing server (`socket` and `rest` clients) or to a client-side [DuckDB-WASM](https://github.com/duckdb/duckdb-wasm) instance (`wasm` client). - [`mosaic-sql`](https://github.com/uwdata/mosaic/tree/main/packages/sql): An API for convenient construction and analysis of SQL queries. Query objects then coerce to SQL query strings. - [`mosaic-inputs`](https://github.com/uwdata/mosaic/tree/main/packages/inputs): Standalone data-driven components such as input menus, text search boxes, and sortable, load-on-scroll data tables. - [`mosaic-plot`](https://github.com/uwdata/mosaic/tree/main/packages/plot): An interactive grammar of graphics implemented on top of [Observable Plot](https://github.com/observablehq/plot). Marks (plot layers) serve as individual Mosaic clients. These marks can push data processing (binning, hex binning, regression) and optimizations (such as M4 for line/area charts) down to the database. This package also provides interactors for linked selection, filtering, and highlighting using Mosaic Params and Selections. diff --git a/dev/index.html b/dev/index.html index 6a5bb74f..d8c1e62f 100644 --- a/dev/index.html +++ b/dev/index.html @@ -85,20 +85,20 @@
- Query Cache: + Cache Queries:
- Query Consolidation: + Consolidate Queries:
- Data Cube Indexes: - + Pre-aggregate: +
- Active Index State: - + Pre-aggregate State: +
@@ -115,8 +115,8 @@ const qlogToggle = document.querySelector('#query-log'); const cacheToggle = document.querySelector('#cache'); const consolidateToggle = document.querySelector('#consolidate'); - const indexToggle = document.querySelector('#index'); - const indexState = document.querySelector('#index-state'); + const preaggToggle = document.querySelector('#preagg'); + const preaggState = document.querySelector('#preagg-state'); connectorMenu.addEventListener('change', setConnector); exampleMenu.addEventListener('change', reload); @@ -124,23 +124,23 @@ qlogToggle.addEventListener('input', setQueryLog); cacheToggle.addEventListener('input', setCache); consolidateToggle.addEventListener('input', setConsolidate); - indexToggle.addEventListener('input', setIndex); - indexState.addEventListener('click', () => { - const { indexes } = vg.coordinator().dataCubeIndexer || {}; - if (indexes) { + preaggToggle.addEventListener('input', setPreAggregate); + preaggState.addEventListener('click', () => { + const { entries } = vg.coordinator().preaggregator || {}; + if (entries) { console.warn( - 'Data Cube Index Entries', - Array.from(indexes.values()) + 'Pre-aggregate Entries', + Array.from(entries.values()) ); } else { - console.warn('No Active Data Cube Index'); + console.warn('No Pre-aggregate Entries'); } }); setQueryLog(); setCache(); setConsolidate(); - setIndex(); + setPreAggregate(); setConnector(); async function setConnector() { @@ -160,8 +160,8 @@ vg.coordinator().manager.consolidate(consolidateToggle.checked); } - function setIndex() { - vg.coordinator().dataCubeIndexer.enabled = indexToggle.checked; + function setPreAggregate() { + vg.coordinator().preaggregator.enabled = preaggToggle.checked; } function reload() { diff --git a/docs/api/core/client.md b/docs/api/core/client.md index ed775373..0345d78a 100644 --- a/docs/api/core/client.md +++ b/docs/api/core/client.md @@ -19,11 +19,11 @@ Create a new client instance. If provided, the [Selection](./selection)-valued _ Property getter for the Selection that should filter this client. The [coordinator](./coordinator) uses this property to provide automatic updates to the client upon selection changes. -## filterIndexable +## filterStable -`client.filterIndexable` +`client.filterStable` -Property getter for a Boolean value indicating if the client query can be safely indexed using a pre-aggregated data cube. +Property getter for a Boolean value indicating if the client query can be safely optimized using a pre-aggregated materialized view. This property should return true if changes to the `filterBy` selection do not change the groupby (e.g., binning) values of the client query. The `MosaicClient` base class will always return `true`. diff --git a/docs/api/core/coordinator.md b/docs/api/core/coordinator.md index 79857944..017959c7 100644 --- a/docs/api/core/coordinator.md +++ b/docs/api/core/coordinator.md @@ -20,7 +20,7 @@ Create a new Mosaic Coordinator to manage all database communication for clients * _logger_: The logger to use, defaults to `console`. * _cache_: Boolean flag to enable/disable query caching (default `true`). * _consolidate_ Boolean flag to enable/disable query consolidation (default `true`). -* _indexes_: Data cube indexer options object. The _enabled_ flag (default `true`) determines if data cube indexes should be used when possible. The _schema_ option (default `'mosaic'`) indicates the database schema in which data cube index tables should be created. +* _preagg_: Pre-aggregation options object. The _enabled_ flag (default `true`) determines if pre-aggregation optimizations should be used when possible. The _schema_ option (default `'mosaic'`) indicates the database schema in which materialized view tables should be created for pre-aggregated data. ## databaseConnector diff --git a/docs/core/index.md b/docs/core/index.md index 154417d7..9c3c0c41 100644 --- a/docs/core/index.md +++ b/docs/core/index.md @@ -37,7 +37,7 @@ Finally, clients may expose a `filterBy` Selection property. The predicates prov The _coordinator_ is responsible for managing client data needs. Clients are registered via the coordinator `connect(client)` method, and similarly removed using `disconnect()`. Upon registration, the event lifecycle begins. In addition to the `fields` and `query` calls described above, the coordinator checks if a client exposes a `filterBy` property, and if so, adds the client to a _filter group_: a set of clients that share the same `filterBy` selection. Upon changes to this selection (e.g., due to interactions such as brushing or zooming), the coordinator collects updated queries for all corresponding clients, queries the data source, and updates clients in turn. -The Coordinator additionally performs optimizations including caching and data cube indexing. +The Coordinator additionally performs optimizations including caching and pre-aggregation. [Coordinator API Reference](/api/core/coordinator) diff --git a/docs/examples/flights-10m.md b/docs/examples/flights-10m.md index ba29ffbb..e1aaf5d2 100644 --- a/docs/examples/flights-10m.md +++ b/docs/examples/flights-10m.md @@ -6,7 +6,7 @@ # Cross-Filter Flights (10M) Histograms showing arrival delay, departure time, and distance flown for 10 million flights. -Once loaded, automatically-generated indexes enable efficient cross-filtered selections. +Once loaded, automatic pre-aggregation optimizations enable efficient cross-filtered selections. _You may need to wait a few seconds for the dataset to load._ diff --git a/docs/examples/linear-regression-10m.md b/docs/examples/linear-regression-10m.md index 614f6463..7bac786b 100644 --- a/docs/examples/linear-regression-10m.md +++ b/docs/examples/linear-regression-10m.md @@ -5,7 +5,7 @@ # Linear Regression 10M -A linear regression plot predicting flight arrival delay based on the time of departure, over 10 million flight records. Regression computation is performed in the database, with optimized selection updates using data cube indexes. The area around a regression line shows a 95% confidence interval. Select a region to view regression results for a data subset. +A linear regression plot predicting flight arrival delay based on the time of departure, over 10 million flight records. Regression computation is performed in the database, with optimized selection updates using pre-aggregated materialized views. The area around a regression line shows a 95% confidence interval. Select a region to view regression results for a data subset. diff --git a/docs/public/specs/json/flights-10m.json b/docs/public/specs/json/flights-10m.json index 5b6a3b52..9a623efa 100644 --- a/docs/public/specs/json/flights-10m.json +++ b/docs/public/specs/json/flights-10m.json @@ -1,7 +1,7 @@ { "meta": { "title": "Cross-Filter Flights (10M)", - "description": "Histograms showing arrival delay, departure time, and distance flown for 10 million flights.\nOnce loaded, automatically-generated indexes enable efficient cross-filtered selections.\n\n_You may need to wait a few seconds for the dataset to load._\n" + "description": "Histograms showing arrival delay, departure time, and distance flown for 10 million flights.\nOnce loaded, automatic pre-aggregation optimizations enable efficient cross-filtered selections.\n\n_You may need to wait a few seconds for the dataset to load._\n" }, "data": { "flights10m": "SELECT GREATEST(-60, LEAST(ARR_DELAY, 180))::DOUBLE AS delay, DISTANCE AS distance, DEP_TIME AS time FROM 'https://idl.uw.edu/mosaic-datasets/data/flights-10m.parquet'" diff --git a/docs/public/specs/json/linear-regression-10m.json b/docs/public/specs/json/linear-regression-10m.json index f169e18d..f96f6057 100644 --- a/docs/public/specs/json/linear-regression-10m.json +++ b/docs/public/specs/json/linear-regression-10m.json @@ -1,7 +1,7 @@ { "meta": { "title": "Linear Regression 10M", - "description": "A linear regression plot predicting flight arrival delay based on the time of departure, over 10 million flight records. Regression computation is performed in the database, with optimized selection updates using data cube indexes. The area around a regression line shows a 95% confidence interval. Select a region to view regression results for a data subset.\n" + "description": "A linear regression plot predicting flight arrival delay based on the time of departure, over 10 million flight records. Regression computation is performed in the database, with optimized selection updates using pre-aggregated materialized views. The area around a regression line shows a 95% confidence interval. Select a region to view regression results for a data subset.\n" }, "data": { "flights10m": "SELECT GREATEST(-60, LEAST(ARR_DELAY, 180))::DOUBLE AS delay, DISTANCE AS distance, DEP_TIME AS time FROM 'https://idl.uw.edu/mosaic-datasets/data/flights-10m.parquet'" diff --git a/docs/public/specs/yaml/flights-10m.yaml b/docs/public/specs/yaml/flights-10m.yaml index 2ab535df..2b6f088a 100644 --- a/docs/public/specs/yaml/flights-10m.yaml +++ b/docs/public/specs/yaml/flights-10m.yaml @@ -2,7 +2,7 @@ meta: title: Cross-Filter Flights (10M) description: | Histograms showing arrival delay, departure time, and distance flown for 10 million flights. - Once loaded, automatically-generated indexes enable efficient cross-filtered selections. + Once loaded, automatic pre-aggregation optimizations enable efficient cross-filtered selections. _You may need to wait a few seconds for the dataset to load._ data: diff --git a/docs/public/specs/yaml/linear-regression-10m.yaml b/docs/public/specs/yaml/linear-regression-10m.yaml index daee0fd2..294bc829 100644 --- a/docs/public/specs/yaml/linear-regression-10m.yaml +++ b/docs/public/specs/yaml/linear-regression-10m.yaml @@ -4,7 +4,7 @@ meta: A linear regression plot predicting flight arrival delay based on the time of departure, over 10 million flight records. Regression computation is performed in the database, with optimized - selection updates using data cube indexes. + selection updates using pre-aggregated materialized views. The area around a regression line shows a 95% confidence interval. Select a region to view regression results for a data subset. data: diff --git a/docs/what-is-mosaic/index.md b/docs/what-is-mosaic/index.md index 2c93207d..a5c01bf1 100644 --- a/docs/what-is-mosaic/index.md +++ b/docs/what-is-mosaic/index.md @@ -42,7 +42,7 @@ Next let's visualize over 200,000 flight records. The first histogram shows flig -When the selection changes we need to filter the data and recount the number of records in each bin. The Mosaic coordinator analyzes these queries and automatically optimizes updates by building indexes of pre-aggregated data ("data cubes") in the database, binned at the level of input pixels for the currently active view. +When the selection changes we need to filter the data and recount the number of records in each bin. The Mosaic coordinator analyzes these queries and automatically optimizes updates by building tables (["materialized views"](https://en.wikipedia.org/wiki/Materialized_view)) of pre-aggregated data in the database, binned at the level of input pixels for the currently active view. While 200,000 points will stress many web-based visualization tools, Mosaic doesn't break a sweat. Now go ahead and try this with [10 million records](/examples/flights-10m)! diff --git a/docs/why-mosaic/index.md b/docs/why-mosaic/index.md index 02faf6fc..68cd5e43 100644 --- a/docs/why-mosaic/index.md +++ b/docs/why-mosaic/index.md @@ -129,8 +129,8 @@ DuckDB-WASM in the browser fares well, though is limited (compared to a DuckDB s
Vega(-Lite)VegaFusionObservable PlotMosaic WASMMosaic Local
When it comes to interaction, Mosaic really shines! -For many forms of aggregated data, the coordinator will automatically pre-aggregate data into smaller "data cube" indexes to support real-time interaction with billion+ element databases. -The figure below shows benchmark results for index-optimized interactive updates. +For many forms of aggregated data, the coordinator will automatically pre-aggregate data into smaller tables ("materialized views") to support real-time interaction with billion+ element databases. +The figure below shows benchmark results for optimized interactive updates. Even with billions of rows, Mosaic with a server-side DuckDB instance maintains interactive response rates. @@ -173,8 +173,8 @@ Even with billions of rows, Mosaic with a server-side DuckDB instance maintains
VegaFusionMosaic WASMMosaic LocalMosaic Remote
-If not already present, Mosaic will create data cube index tables when the mouse cursor enters a view. -For very large data sets with longer data cube construction times, precomputation and server-side caching are supported. +If not already present, Mosaic will build pre-aggregated data tables when the mouse cursor enters a view. +For very large data sets with longer pre-aggregation times, precomputation and server-side caching are supported. Other tasks, like changing a color encoding or adjusting a smoothing parameter, can be carried out quickly in the browser alone, including over aggregated data. Mosaic clients have the flexibility of choosing what works best. diff --git a/packages/core/README.md b/packages/core/README.md index cf8efa65..14074981 100644 --- a/packages/core/README.md +++ b/packages/core/README.md @@ -1,5 +1,5 @@ # mosaic-core -The core Mosaic components: a central coordinator, parameters (`Param`) and selections (`Selection`) for linking scalar values or query predicates (respectively) across Mosaic clients, and filter groups with optimized index management. The Mosaic coordinator can send queries either over the network to a backing server (`socket` and `rest` clients) or to a client-side [DuckDB-WASM](https://github.com/duckdb/duckdb-wasm) instance (`wasm` client). +The core Mosaic components: a central coordinator, parameters (`Param`) and selections (`Selection`) for linking scalar values or query predicates (respectively) across Mosaic clients, and filter groups with materialized views of pre-aggregated data. The Mosaic coordinator can send queries either over the network to a backing server (`socket` and `rest` clients) or to a client-side [DuckDB-WASM](https://github.com/duckdb/duckdb-wasm) instance (`wasm` client). The `mosaic-core` facilities are included as part of the [vgplot](https://github.com/uwdata/mosaic/tree/main/packages/vgplot) API. diff --git a/packages/core/src/Coordinator.js b/packages/core/src/Coordinator.js index 9a050b31..8cc135dd 100644 --- a/packages/core/src/Coordinator.js +++ b/packages/core/src/Coordinator.js @@ -1,5 +1,5 @@ import { socketConnector } from './connectors/socket.js'; -import { DataCubeIndexer } from './DataCubeIndexer.js'; +import { PreAggregator } from './PreAggregator.js'; import { MosaicClient } from './MosaicClient.js'; import { QueryManager, Priority } from './QueryManager.js'; import { queryFieldInfo } from './util/field-info.js'; @@ -33,15 +33,15 @@ export function coordinator(instance) { /** * A Mosaic Coordinator manages all database communication for clients and * handles selection updates. The Coordinator also performs optimizations - * including query caching, consolidation, and data cube indexing. + * including query caching, consolidation, and pre-aggregation. * @param {*} [db] Database connector. Defaults to a web socket connection. * @param {object} [options] Coordinator options. * @param {*} [options.logger=console] The logger to use, defaults to `console`. * @param {*} [options.manager] The query manager to use. * @param {boolean} [options.cache=true] Boolean flag to enable/disable query caching. * @param {boolean} [options.consolidate=true] Boolean flag to enable/disable query consolidation. - * @param {import('./DataCubeIndexer.js').DataCubeIndexerOptions} [options.indexes] - * Data cube indexer options. + * @param {import('./PreAggregator.js').PreAggregateOptions} [options.preagg] + * Options for the Pre-aggregator. */ export class Coordinator { constructor(db = socketConnector(), { @@ -49,7 +49,7 @@ export class Coordinator { manager = new QueryManager(), cache = true, consolidate = true, - indexes = {} + preagg = {} } = {}) { /** @type {QueryManager} */ this.manager = manager; @@ -58,7 +58,7 @@ export class Coordinator { this.databaseConnector(db); this.logger(logger); this.clear(); - this.dataCubeIndexer = new DataCubeIndexer(this, indexes); + this.preaggregator = new PreAggregator(this, preagg); } /** @@ -208,12 +208,12 @@ export class Coordinator { /** * Issue a query request for a client. If the query is null or undefined, * the client is simply updated. Otherwise `updateClient` is called. As a - * side effect, this method clears the current data cube indexer state. + * side effect, this method clears the current preaggregator state. * @param {MosaicClient} client The client to update. * @param {QueryType | null} [query] The query to issue. */ requestQuery(client, query) { - this.dataCubeIndexer.clear(); + this.preaggregator.clear(); return query ? this.updateClient(client, query) : Promise.resolve(client.update()); @@ -307,10 +307,10 @@ function connectSelection(mc, selection, client) { * selection clause representative of the activation. */ function activateSelection(mc, selection, clause) { - const { dataCubeIndexer, filterGroups } = mc; + const { preaggregator, filterGroups } = mc; const { clients } = filterGroups.get(selection); for (const client of clients) { - dataCubeIndexer.index(client, selection, clause); + preaggregator.request(client, selection, clause); } } @@ -322,11 +322,11 @@ function activateSelection(mc, selection, clause) { * @returns {Promise} A Promise that resolves when the update completes. */ function updateSelection(mc, selection) { - const { dataCubeIndexer, filterGroups } = mc; + const { preaggregator, filterGroups } = mc; const { clients } = filterGroups.get(selection); const { active } = selection; return Promise.allSettled(Array.from(clients, client => { - const info = dataCubeIndexer.index(client, selection, active); + const info = preaggregator.request(client, selection, active); const filter = info ? null : selection.predicate(client); // skip due to cross-filtering diff --git a/packages/core/src/MosaicClient.js b/packages/core/src/MosaicClient.js index 8b71e09e..09f4cdc4 100644 --- a/packages/core/src/MosaicClient.js +++ b/packages/core/src/MosaicClient.js @@ -38,11 +38,12 @@ export class MosaicClient { } /** - * Return a boolean indicating if the client query can be indexed. Should - * return true if changes to the filterBy selection does not change the - * groupby domain of the client query. + * Return a boolean indicating if the client query can be sped up with + * materialized views of pre-aggregated data. Should return true if changes to + * the filterBy selection does not change the groupby domain of the client + * query. */ - get filterIndexable() { + get filterStable() { return true; } diff --git a/packages/core/src/DataCubeIndexer.js b/packages/core/src/PreAggregator.js similarity index 56% rename from packages/core/src/DataCubeIndexer.js rename to packages/core/src/PreAggregator.js index 02d610e8..1a2b6651 100644 --- a/packages/core/src/DataCubeIndexer.js +++ b/packages/core/src/PreAggregator.js @@ -1,49 +1,50 @@ import { Query, and, asColumn, createTable, isBetween, scaleTransform, sql } from '@uwdata/mosaic-sql'; -import { indexColumns } from './util/index-columns.js'; +import { preaggColumns } from './util/preagg-columns.js'; import { fnv_hash } from './util/hash.js'; const Skip = { skip: true, result: null }; /** - * @typedef {object} DataCubeIndexerOptions + * @typedef {object} PreAggregateOptions * @property {string} [schema] Database schema (namespace) in which to write - * data cube index tables (default 'mosaic'). + * pre-aggregated materialzied views (default 'mosaic'). * @property {boolean} [options.enabled=true] Flag to enable or disable the - * indexer. This setting can later be updated via the `enabled` method. + * pre-aggregation. This flag can be updated later via the `enabled` property. */ /** - * Build and query optimized indices ("data cubes") for fast computation of - * groupby aggregate queries over compatible client queries and selections. - * A data cube contains pre-aggregated data for a Mosaic client, subdivided - * by possible query values from an active selection clause. These cubes are - * realized as as database tables that can be queried for rapid updates. + * Build and query optimized pre-aggregated materaialized views, for fast + * computation of groupby aggregate queries over compatible client queries + * and selections. The materialized views contains pre-aggregated data for a + * Mosaic client, subdivided by possible query values from an active selection + * clause. These materialized views are database tables that can be queried + * for rapid updates. * * Compatible client queries must consist of only groupby dimensions and * supported aggregate functions. Compatible selections must contain an active * clause that exposes metadata for an interval or point value predicate. * - * Data cube index tables are written to a dedicated schema (namespace) that + * Materialized views are written to a dedicated schema (namespace) that * can be set using the *schema* constructor option. This schema acts as a - * persistent cache, and index tables may be used across sessions. The - * `dropIndexTables` method issues a query to remove *all* tables within - * this schema. This may be needed if the original tables have updated data, - * but should be used with care. + * persistent cache, and materialized view tables may be used across sessions. + * The `dropSchema` method issues a query to remove *all* tables within this + * schema. This may be needed if the original tables have updated data, but + * should be used with care. */ -export class DataCubeIndexer { +export class PreAggregator { /** - * Create a new data cube index table manager. + * Create a new manager of materialized views of pre-aggregated data. * @param {import('./Coordinator.js').Coordinator} coordinator A Mosaic coordinator. - * @param {DataCubeIndexerOptions} [options] Data cube indexer options. + * @param {PreAggregateOptions} [options] Pre-aggregation options. */ constructor(coordinator, { schema = 'mosaic', enabled = true } = {}) { - /** @type {Map} */ - this.indexes = new Map(); + /** @type {Map} */ + this.entries = new Map(); this.active = null; this.mc = coordinator; this._schema = schema; @@ -51,9 +52,10 @@ export class DataCubeIndexer { } /** - * Set the enabled state of this indexer. If false, any local state is - * cleared and subsequent index calls will return null until re-enabled. - * This method has no effect on any index tables already in the database. + * Set the enabled state of this manager. If false, any local state is + * cleared and subsequent request calls will return null until re-enabled. + * This method has no effect on any pre-aggregated tables already in the + * database. * @param {boolean} [state] The enabled state to set. */ set enabled(state) { @@ -64,7 +66,7 @@ export class DataCubeIndexer { } /** - * Get the enabled state of this indexer. + * Get the enabled state of this manager. * @returns {boolean} The current enabled state. */ get enabled() { @@ -72,10 +74,10 @@ export class DataCubeIndexer { } /** - * Set the database schema used by this indexer. Upon changes, any local - * state is cleared. This method does _not_ drop any existing data cube - * tables, use `dropIndexTables` before changing the schema to also remove - * existing index tables in the database. + * Set the database schema used for pre-aggregated materialized view tables. + * Upon changes, any local state is cleared. This method does _not_ drop any + * existing materialized views, use `dropSchema` before changing the schema + * to also remove existing materalized views in the database. * @param {string} [schema] The schema name to set. */ set schema(schema) { @@ -86,7 +88,7 @@ export class DataCubeIndexer { } /** - * Get the database schema used by this indexer. + * Get the database schema used for pre-aggregated materialized view tables. * @returns {string} The current schema name. */ get schema() { @@ -94,49 +96,49 @@ export class DataCubeIndexer { } /** - * Issues a query through the coordinator to drop the current index table - * schema. *All* tables in the schema will be removed and local state is - * cleared. Call this method if the underlying base tables have been updated, - * causing derived index tables to become stale and inaccurate. Use this - * method with care! Once dropped, the schema will be repopulated by future - * data cube indexer requests. + * Issues a query through the coordinator to drop the current schema for + * pre-aggregated materialized views. *All* materialized view tables in the + * schema will be removed and local state is cleared. Call this method if + * the underlying base tables have been updated, causing materialized view + * to become stale and inaccurate. Use this method with care! Once dropped, + * the schema will be repopulated by future pre-aggregation requests. * @returns A query result promise. */ - dropIndexTables() { + dropSchema() { this.clear(); return this.mc.exec(`DROP SCHEMA IF EXISTS "${this.schema}" CASCADE`); } /** - * Clear the cache of data cube index table entries for the current active - * selection clause. This method does _not_ drop any existing data cube - * tables. Use `dropIndexTables` to remove existing index tables from the - * database. + * Clear the cache of pre-aggregation entries for the current active + * selection clause. This method does _not_ drop any existing materialized + * views. Use `dropSchema` to remove existing materialized view tables from + * the database. */ clear() { - this.indexes.clear(); + this.entries.clear(); this.active = null; } /** - * Return data cube index table information for the active state of a - * client-selection pair, or null if the client is not indexable. This - * method has multiple possible side effects, including data cube table - * generation and updating internal caches. + * Return pre-aggregation information for the active state of a + * client-selection pair, or null if the client has unstable filters. + * This method has multiple possible side effects, including materialized + * view creation and updating internal caches. * @param {import('./MosaicClient.js').MosaicClient} client A Mosaic client. * @param {import('./Selection.js').Selection} selection A Mosaic selection * to filter the client by. * @param {import('./util/selection-types.js').SelectionClause} activeClause * A representative active selection clause for which to (possibly) generate - * data cube index tables. - * @returns {DataCubeInfo | Skip | null} Data cube index table - * information and query generator, or null if the client is not indexable. + * materialized views of pre-aggregates. + * @returns {PreAggregateInfo | Skip | null} Information and query generator + * for pre-aggregated tables, or null if the client has unstable filters. */ - index(client, selection, activeClause) { + request(client, selection, activeClause) { // if not enabled, do nothing if (!this.enabled) return null; - const { indexes, mc, schema } = this; + const { entries, mc, schema } = this; const { source } = activeClause; // if there is no clause source to track, do nothing @@ -144,11 +146,11 @@ export class DataCubeIndexer { // if we have cached active columns, check for updates or exit if (this.active) { - // if the active clause source has changed, clear indexer state - // this cancels outstanding requests and clears the index cache + // if the active clause source has changed, clear the state + // this cancels outstanding requests and clears the local cache // a clear also sets this.active to null if (this.active.source !== source) this.clear(); - // if we've seen this source and it's not indexable, do nothing + // if we've seen this source and it has unstable filters, do nothing if (this.active?.source === null) return null; } @@ -157,32 +159,32 @@ export class DataCubeIndexer { // if cached active columns are unset, analyze the active clause if (!active) { - // generate active data cube dimension columns to select over - // will return an object with null source if not indexable + // generate active dimension columns to select over + // will return an object with null source if it has unstable filters this.active = active = activeColumns(activeClause); - // if the active clause is not indexable, exit now + // if the active clause has unstable filters, exit now if (active.source === null) return null; } - // if we have cached data cube index table info, return that - if (indexes.has(client)) { - return indexes.get(client); + // if we have cached pre-aggregate info, return that + if (entries.has(client)) { + return entries.get(client); } - // get non-active data cube index table columns - const indexCols = indexColumns(client); + // get non-active materialized view columns + const preaggCols = preaggColumns(client); let info; - if (!indexCols) { - // if client is not indexable, record null index + if (!preaggCols) { + // if client is not indexable, record null info info = null; } else if (selection.skip(client, activeClause)) { // skip client if untouched by cross-filtering info = Skip; } else { - // generate data cube index table + // generate materialized view table const filter = selection.remove(source).predicate(client); - info = dataCubeInfo(client.query(filter), active, indexCols, schema); + info = preaggregateInfo(client.query(filter), active, preaggCols, schema); info.result = mc.exec([ `CREATE SCHEMA IF NOT EXISTS ${schema}`, createTable(info.table, info.create, { temp: false }) @@ -190,19 +192,19 @@ export class DataCubeIndexer { info.result.catch(e => mc.logger().error(e)); } - indexes.set(client, info); + entries.set(client, info); return info; } } /** - * Determines the active data cube dimension columns to select over. Returns - * an object with the clause source, column definitions, and a predicate - * generator function for the active dimensions of a data cube index table. If - * the active clause is not indexable or is missing metadata, this method + * Determines the active dimension columns to select over. Returns an object + * with the clause source, column definitions, and a predicate generator + * function for the active dimensions of a pre-aggregated materialized view. + * If the active clause is not indexable or is missing metadata, this method * returns an object with a null source property. - * @param {import('./util/selection-types.js').SelectionClause} clause The - * active selection clause to analyze. + * @param {import('./util/selection-types.js').SelectionClause} clause + * The active selection clause to analyze. */ function activeColumns(clause) { const { source, meta } = clause; @@ -277,17 +279,17 @@ function binInterval(scale, pixelSize, bin) { } /** - * Generate data cube table query information. + * Generate pre-aggregate query information. * @param {Query} clientQuery The original client query. * @param {*} active Active (selected) column definitions. - * @param {*} indexCols Data cube index column definitions. - * @returns {DataCubeInfo} + * @param {*} preaggCols Pre-aggregate column definitions. + * @returns {PreAggregateInfo} */ -function dataCubeInfo(clientQuery, active, indexCols, schema) { - const { dims, aggr, aux } = indexCols; +function preaggregateInfo(clientQuery, active, preaggCols, schema) { + const { dims, aggr, aux } = preaggCols; const { columns } = active; - // build index table construction query + // build materialized view construction query const query = clientQuery .select({ ...columns, ...aux }) .groupby(Object.keys(columns)); @@ -299,23 +301,23 @@ function dataCubeInfo(clientQuery, active, indexCols, schema) { subqueryPushdown(subq, cols); } - // push orderby criteria to later cube queries + // push orderby criteria to later queries const order = query.orderby(); query.query.orderby = []; // generate creation query string and hash id const create = query.toString(); const id = (fnv_hash(create) >>> 0).toString(16); - const table = `${schema}.cube_${id}`; + const table = `${schema}.preagg_${id}`; - // generate data cube select query + // generate preaggregate select query const select = Query .select(dims, aggr) .from(table) .groupby(dims) .orderby(order); - return new DataCubeInfo({ id, table, create, active, select }); + return new PreAggregateInfo({ table, create, active, select }); } /** @@ -335,42 +337,60 @@ function subqueryPushdown(query, cols) { } /** - * Metadata and query generator for a data cube index table. This - * object provides the information needed to generate and query - * a data cube index table for a client-selection pair relative to - * a specific active clause and selection state. + * Metadata and query generator for materialized views of pre-aggregated data. + * This object provides the information needed to generate and query the + * materialized views for a client-selection pair relative to a specific + * active clause and selection state. */ -export class DataCubeInfo { +export class PreAggregateInfo { /** - * Create a new DataCubeInfo instance. - * @param {object} options + * Create a new pre-aggregation information instance. + * @param {object} options Options object. + * @param {string} options.table The materialized view table name. + * @param {string} options.create The table creation query. + * @param {*} options.active Active column information. + * @param {Query} options.select Base query for requesting updates + * using a pre-aggregated materialized view. */ - constructor({ table, create, active, select } = {}) { - /** The name of the data cube index table. */ + constructor({ table, create, active, select }) { + /** + * The name of the materialized view. + * @type {string} + */ this.table = table; - /** The SQL query used to generate the data cube index table. */ + /** + * The SQL query used to generate the materialized view. + * @type {string} + */ this.create = create; - /** A result promise returned for the data cube creation query. */ + /** + * A result promise returned for the materialized view creation query. + * @type {Promise | null} + */ this.result = null; /** * Definitions and predicate function for the active columns, * which are dynamically filtered by the active clause. */ this.active = active; - /** Select query (sans where clause) for data cube tables. */ + /** + * Select query (sans where clause) for materialized views. + * @type {Query} + */ this.select = select; /** * Boolean flag indicating a client that should be skipped. - * This value is always false for completed data cube info. + * This value is always false for a created materialized view. + * @type {boolean} */ this.skip = false; } /** - * Generate a data cube index table query for the given predicate. + * Generate a materialized view query for the given predicate. * @param {import('@uwdata/mosaic-sql').SQLExpression} predicate The current * active clause predicate. - * @returns {Query} A data cube index table query. + * @returns {Query} A materialized view query. */ query(predicate) { return this.select.clone().where(this.active.predicate(predicate)); diff --git a/packages/core/src/util/index-columns.js b/packages/core/src/util/preagg-columns.js similarity index 93% rename from packages/core/src/util/index-columns.js rename to packages/core/src/util/preagg-columns.js index d65b2335..e53331ee 100644 --- a/packages/core/src/util/index-columns.js +++ b/packages/core/src/util/preagg-columns.js @@ -2,14 +2,14 @@ import { Query, agg, sql } from '@uwdata/mosaic-sql'; import { MosaicClient } from '../MosaicClient.js'; /** - * Determine data cube index columns for a given Mosaic client. + * Determine pre-aggregation columns for a given Mosaic client. * @param {MosaicClient} client The Mosaic client. - * @returns An object with necessary column data to generate data - * cube index columns, or null if the client is not indexable or - * the client query contains an invalid or unsupported expression. + * @returns An object with necessary column data to generate pre-aggregated + * columns, or null if the client can be optimized or the client query + * contains an invalid or unsupported expression. */ -export function indexColumns(client) { - if (!client.filterIndexable) return null; +export function preaggColumns(client) { + if (!client.filterStable) return null; const q = client.query(); const from = getBase(q, q => q.from()?.[0].from.table); @@ -146,7 +146,7 @@ export function indexColumns(client) { /** * Generate an output column name for use as an auxiliary column - * (e.g., for sufficient statistics) within a data cube index. + * (e.g., for sufficient statistics) within a preaggregated table. * @param {string} type The operation type. * @param {...any} args The input column arguments. * @returns {string} A sanitized auxiliary column name. @@ -203,7 +203,7 @@ function getBase(query, get) { * As a side effect, this method adds a column to the input *aux* object * to track the count of non-null values per-partition. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {any} arg Source data table column. This value may be a string, * column reference, SQL expression, or other string-coercible value. * @returns An aggregate expression for calculating counts over @@ -220,7 +220,7 @@ function countExpr(aux, arg) { * As a side effect, this method adds a column to the input *aux* object * to track the count of non-null values per-partition. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {string} as The output column for the original aggregate. * @param {any} arg Source data table column. This value may be a string, * column reference, SQL expression, or other string-coercible value. @@ -237,7 +237,7 @@ function avgExpr(aux, as, arg) { * As a side effect, this method adds a column to the input *aux* object * to track a maximum value per-partition. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {string} as The output column for the original aggregate. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. @@ -255,7 +255,7 @@ function argmaxExpr(aux, as, [, y]) { * As a side effect, this method adds a column to the input *aux* object * to track a minimum value per-partition. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {string} as The output column for the original aggregate. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. @@ -277,7 +277,7 @@ function argminExpr(aux, as, [, y]) { * As a side effect, this method adds columns for these statistics to the * input *aux* object. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {*} x The source data table column. This may be a string, * column reference, SQL expression, or other string-coercible value. * @param {(field: any) => string} avg Global average query generator. @@ -306,7 +306,7 @@ function varianceExpr(aux, x, avg, correction = true) { * (of mean-centered values) for x and y. As a side effect, this method * adds columns for these statistics to the input *aux* object. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. * @param {(field: any) => string} avg Global average query generator. @@ -337,7 +337,7 @@ function covarianceExpr(aux, args, avg, correction = true) { * As a side effect, this method adds columns for these statistics to the * input *aux* object. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. * @param {(field: any) => string} avg Global average query generator. @@ -361,7 +361,7 @@ function corrExpr(aux, args, avg) { * effect, this method adds columns to the input *aux* object to the * partition-level count of non-null pairs. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. * @returns An aggregate expression for calculating regression pair counts @@ -380,7 +380,7 @@ function regrCountExpr(aux, [y, x]) { * floating point error. As a side effect, this method adds a column for * partition-level sums to the input *aux* object. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {number} i An index indicating which argument column to sum. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. @@ -402,7 +402,7 @@ function regrSumExpr(aux, i, args, avg) { * reduce floating point error. As a side effect, this method adds a column * for partition-level sums to the input *aux* object. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {number} i An index indicating which argument column to sum. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. @@ -424,7 +424,7 @@ function regrSumSqExpr(aux, i, args, avg) { * reduce floating point error. As a side effect, this method adds a column * for partition-level sums to the input *aux* object. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. * @param {(field: any) => string} avg Global average query generator. @@ -443,7 +443,7 @@ function regrSumXYExpr(aux, args, avg) { * effect, this method adds columns to the input *aux* object to track both * the count of non-null pairs and partition-level averages. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. * @returns An aggregate expression over pre-aggregated data partitions. @@ -462,7 +462,7 @@ function regrAvgXExpr(aux, args) { * effect, this method adds columns to the input *aux* object to track both * the count of non-null pairs and partition-level averages. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. * @returns An aggregate expression over pre-aggregated data partitions. @@ -482,7 +482,7 @@ function regrAvgYExpr(aux, args) { * reduce floating point error. As a side effect, this method adds columns * for partition-level count and sums to the input *aux* object. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {number} i The index of the argument to compute the variance for. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. @@ -503,7 +503,7 @@ function regrVarExpr(aux, i, args, avg) { * side effect, this method adds columns for sufficient statistics to the * input *aux* object. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. * @param {(field: any) => string} avg Global average query generator. @@ -522,7 +522,7 @@ function regrSlopeExpr(aux, args, avg) { * side effect, this method adds columns for sufficient statistics to the * input *aux* object. * @param {object} aux An object for auxiliary columns (such as - * sufficient statistics) to include in the data cube aggregation. + * sufficient statistics) to include in the pre-aggregation. * @param {any[]} args Source data table columns. The entries may be strings, * column references, SQL expressions, or other string-coercible values. * @param {(field: any) => string} avg Global average query generator. diff --git a/packages/core/src/util/selection-types.ts b/packages/core/src/util/selection-types.ts index 9aaca3db..56f6f071 100644 --- a/packages/core/src/util/selection-types.ts +++ b/packages/core/src/util/selection-types.ts @@ -131,7 +131,7 @@ export interface SelectionClause { /** * Optional clause metadata that varies based on the selection type. * The metadata can be used to optimize selection queries, for example - * by creating pre-aggregated data cubes when applicable. + * by creating materialized views of pre-aggregated data when applicable. */ meta?: ClauseMetadata; } diff --git a/packages/core/test/client.test.js b/packages/core/test/client.test.js index 27e16709..9d054c77 100644 --- a/packages/core/test/client.test.js +++ b/packages/core/test/client.test.js @@ -7,10 +7,10 @@ import { QueryResult } from '../src/util/query-result.js'; describe('MosaicClient', () => { it('is filtered by selections', async () => { // instantiate coordinator to use node.js DuckDB - // disable logging and data cube indexes + // disable logging and preaggregation const coord = new Coordinator(nodeConnector(), { logger: null, - indexes: { enabled: false } + preagg: { enabled: false } }); // load test data diff --git a/packages/core/test/data-cube-indexer.test.js b/packages/core/test/preaggregator.test.js similarity index 99% rename from packages/core/test/data-cube-indexer.test.js rename to packages/core/test/preaggregator.test.js index 07646627..e382f188 100644 --- a/packages/core/test/data-cube-indexer.test.js +++ b/packages/core/test/preaggregator.test.js @@ -46,7 +46,7 @@ async function run(measure) { }); } -describe('DataCubeIndexer', () => { +describe('PreAggregator', () => { it('supports count aggregate', async () => { expect(await run(count())).toBe(3); expect(await run(count('x'))).toBe(2); diff --git a/packages/plot/src/marks/Density1DMark.js b/packages/plot/src/marks/Density1DMark.js index 27b72930..d61153d0 100644 --- a/packages/plot/src/marks/Density1DMark.js +++ b/packages/plot/src/marks/Density1DMark.js @@ -28,7 +28,7 @@ export class Density1DMark extends Mark { }); } - get filterIndexable() { + get filterStable() { const name = this.dim === 'x' ? 'xDomain' : 'yDomain'; const dom = this.plot.getAttribute(name); return dom && !dom[Transient]; diff --git a/packages/plot/src/marks/Grid2DMark.js b/packages/plot/src/marks/Grid2DMark.js index 1a315265..c2d0e0a9 100644 --- a/packages/plot/src/marks/Grid2DMark.js +++ b/packages/plot/src/marks/Grid2DMark.js @@ -70,7 +70,7 @@ export class Grid2DMark extends Mark { super.setPlot(plot, index); } - get filterIndexable() { + get filterStable() { const xdom = this.plot.getAttribute('xDomain'); const ydom = this.plot.getAttribute('yDomain'); return xdom && ydom && !xdom[Transient] && !ydom[Transient]; diff --git a/packages/plot/src/marks/HexbinMark.js b/packages/plot/src/marks/HexbinMark.js index 5c0f9233..75c2b2d0 100644 --- a/packages/plot/src/marks/HexbinMark.js +++ b/packages/plot/src/marks/HexbinMark.js @@ -15,7 +15,7 @@ export class HexbinMark extends Mark { }); } - get filterIndexable() { + get filterStable() { const xdom = this.plot.getAttribute('xDomain'); const ydom = this.plot.getAttribute('yDomain'); return xdom && ydom && !xdom[Transient] && !ydom[Transient]; diff --git a/packages/spec/src/spec/interactors/Interval1D.ts b/packages/spec/src/spec/interactors/Interval1D.ts index f164463f..7aada69c 100644 --- a/packages/spec/src/spec/interactors/Interval1D.ts +++ b/packages/spec/src/spec/interactors/Interval1D.ts @@ -39,7 +39,8 @@ export interface Interval1DOptions { field?: string; /** * The size of an interative pixel (default `1`). Larger pixel sizes reduce - * the brush resolution, which can reduce the size of data cube indexes. + * the brush resolution, which can reduce the size of pre-aggregated + * materialized views. */ pixelSize?: number; /** diff --git a/packages/spec/src/spec/interactors/Interval2D.ts b/packages/spec/src/spec/interactors/Interval2D.ts index d4844520..5d7f90c5 100644 --- a/packages/spec/src/spec/interactors/Interval2D.ts +++ b/packages/spec/src/spec/interactors/Interval2D.ts @@ -23,7 +23,8 @@ export interface Interval2DOptions { yfield?: string; /** * The size of an interative pixel (default `1`). Larger pixel sizes reduce - * the brush resolution, which can reduce the size of data cube indexes. + * the brush resolution, which can reduce the size of pre-aggregated + * materialized views. */ pixelSize?: number; /** diff --git a/packages/widget/mosaic_widget/__init__.py b/packages/widget/mosaic_widget/__init__.py index a059ed7d..86552621 100644 --- a/packages/widget/mosaic_widget/__init__.py +++ b/packages/widget/mosaic_widget/__init__.py @@ -25,8 +25,8 @@ class MosaicWidget(anywidget.AnyWidget): # The current params indexed by name params = traitlets.Dict({}).tag(sync=True) - # Where data cube indexes should be created - data_cube_schema = traitlets.Unicode().tag(sync=True) + # Where pre-aggregated materialized views should be created + preagg_schema = traitlets.Unicode().tag(sync=True) def __init__( self, diff --git a/packages/widget/src/index.js b/packages/widget/src/index.js index 0a7b0ba7..904a3c86 100644 --- a/packages/widget/src/index.js +++ b/packages/widget/src/index.js @@ -9,22 +9,22 @@ import { v4 as uuidv4 } from 'uuid'; * @typedef Model * @property {import('@uwdata/mosaic-spec').Spec} spec * The current Mosaic specification. - * @property {string} data_cube_schema The database schema in which to store - * data cube index tables (default 'mosaic'). + * @property {string} preagg_schema The database schema in which to store + * pre-aggregated materialized views (default 'mosaic'). * @property {Params} params The current params. */ export default { /** @type {import('anywidget/types').Initialize} */ initialize(view) { - view.model.set('data_cube_schema', coordinator().dataCubeIndexer.schema); + view.model.set('preagg_schema', coordinator().preaggregator.schema); }, /** @type {import('anywidget/types').Render} */ render(view) { view.el.classList.add('mosaic-widget'); const getSpec = () => view.model.get('spec'); - const getDataCubeSchema = () => view.model.get('data_cube_schema'); + const getPreaggSchema = () => view.model.get('preagg_schema'); const logger = coordinator().logger(); /** @type Map, startTime: number, resolve: (value: any) => void, reject: (reason?: any) => void}> */ @@ -90,10 +90,10 @@ export default { view.model.on('change:spec', () => updateSpec()); function configureCoordinator() { - coordinator().dataCubeIndexer.schema = getDataCubeSchema(); + coordinator().preaggregator.schema = getPreaggSchema(); } - view.model.on('change:data_cube_schema', () => configureCoordinator()); + view.model.on('change:preagg_schema', () => configureCoordinator()); view.model.on('msg:custom', (msg, buffers) => { logger.group(`query ${msg.uuid}`); diff --git a/specs/json/flights-10m.json b/specs/json/flights-10m.json index 60751030..5d582a48 100644 --- a/specs/json/flights-10m.json +++ b/specs/json/flights-10m.json @@ -1,7 +1,7 @@ { "meta": { "title": "Cross-Filter Flights (10M)", - "description": "Histograms showing arrival delay, departure time, and distance flown for 10 million flights.\nOnce loaded, automatically-generated indexes enable efficient cross-filtered selections.\n\n_You may need to wait a few seconds for the dataset to load._\n" + "description": "Histograms showing arrival delay, departure time, and distance flown for 10 million flights.\nOnce loaded, automatic pre-aggregation optimizations enable efficient cross-filtered selections.\n\n_You may need to wait a few seconds for the dataset to load._\n" }, "data": { "flights10m": { diff --git a/specs/json/linear-regression-10m.json b/specs/json/linear-regression-10m.json index 97430e05..3fb36c6e 100644 --- a/specs/json/linear-regression-10m.json +++ b/specs/json/linear-regression-10m.json @@ -1,7 +1,7 @@ { "meta": { "title": "Linear Regression 10M", - "description": "A linear regression plot predicting flight arrival delay based on the time of departure, over 10 million flight records. Regression computation is performed in the database, with optimized selection updates using data cube indexes. The area around a regression line shows a 95% confidence interval. Select a region to view regression results for a data subset.\n" + "description": "A linear regression plot predicting flight arrival delay based on the time of departure, over 10 million flight records. Regression computation is performed in the database, with optimized selection updates using pre-aggregated materialized views. The area around a regression line shows a 95% confidence interval. Select a region to view regression results for a data subset.\n" }, "data": { "flights10m": { diff --git a/specs/ts/flights-10m.ts b/specs/ts/flights-10m.ts index 57265aa3..fc4ea869 100644 --- a/specs/ts/flights-10m.ts +++ b/specs/ts/flights-10m.ts @@ -3,7 +3,7 @@ import { Spec } from '@uwdata/mosaic-spec'; export const spec : Spec = { "meta": { "title": "Cross-Filter Flights (10M)", - "description": "Histograms showing arrival delay, departure time, and distance flown for 10 million flights.\nOnce loaded, automatically-generated indexes enable efficient cross-filtered selections.\n\n_You may need to wait a few seconds for the dataset to load._\n" + "description": "Histograms showing arrival delay, departure time, and distance flown for 10 million flights.\nOnce loaded, automatic pre-aggregation optimizations enable efficient cross-filtered selections.\n\n_You may need to wait a few seconds for the dataset to load._\n" }, "data": { "flights10m": "SELECT GREATEST(-60, LEAST(ARR_DELAY, 180))::DOUBLE AS delay, DISTANCE AS distance, DEP_TIME AS time FROM 'https://idl.uw.edu/mosaic-datasets/data/flights-10m.parquet'" diff --git a/specs/ts/linear-regression-10m.ts b/specs/ts/linear-regression-10m.ts index 52ec86c9..5d6b4cc0 100644 --- a/specs/ts/linear-regression-10m.ts +++ b/specs/ts/linear-regression-10m.ts @@ -3,7 +3,7 @@ import { Spec } from '@uwdata/mosaic-spec'; export const spec : Spec = { "meta": { "title": "Linear Regression 10M", - "description": "A linear regression plot predicting flight arrival delay based on the time of departure, over 10 million flight records. Regression computation is performed in the database, with optimized selection updates using data cube indexes. The area around a regression line shows a 95% confidence interval. Select a region to view regression results for a data subset.\n" + "description": "A linear regression plot predicting flight arrival delay based on the time of departure, over 10 million flight records. Regression computation is performed in the database, with optimized selection updates using pre-aggregated materialized views. The area around a regression line shows a 95% confidence interval. Select a region to view regression results for a data subset.\n" }, "data": { "flights10m": "SELECT GREATEST(-60, LEAST(ARR_DELAY, 180))::DOUBLE AS delay, DISTANCE AS distance, DEP_TIME AS time FROM 'https://idl.uw.edu/mosaic-datasets/data/flights-10m.parquet'" diff --git a/specs/yaml/flights-10m.yaml b/specs/yaml/flights-10m.yaml index 2ab535df..2b6f088a 100644 --- a/specs/yaml/flights-10m.yaml +++ b/specs/yaml/flights-10m.yaml @@ -2,7 +2,7 @@ meta: title: Cross-Filter Flights (10M) description: | Histograms showing arrival delay, departure time, and distance flown for 10 million flights. - Once loaded, automatically-generated indexes enable efficient cross-filtered selections. + Once loaded, automatic pre-aggregation optimizations enable efficient cross-filtered selections. _You may need to wait a few seconds for the dataset to load._ data: diff --git a/specs/yaml/linear-regression-10m.yaml b/specs/yaml/linear-regression-10m.yaml index daee0fd2..294bc829 100644 --- a/specs/yaml/linear-regression-10m.yaml +++ b/specs/yaml/linear-regression-10m.yaml @@ -4,7 +4,7 @@ meta: A linear regression plot predicting flight arrival delay based on the time of departure, over 10 million flight records. Regression computation is performed in the database, with optimized - selection updates using data cube indexes. + selection updates using pre-aggregated materialized views. The area around a regression line shows a 95% confidence interval. Select a region to view regression results for a data subset. data: