near · bucanero · Dec 6, 2023 · Nov 24, 2023 · Nov 24, 2023 · Nov 28, 2023
@@ -34,6 +34,56 @@ An in-depth, detailed overview of the QueryApi components:
 - **Database:** a Postgres database where the developer's indexer data is stored, using a logical DB per user, and a logical schema per indexer function.
 - **API:** a Hasura server running on Google Cloud Platform exposes a GraphQL endpoint so users can access their data from anywhere.
 
+
+## Provisioning
+
+In summary, QueryAPI provisioning consists of the following steps:
+
+1. Create `database` in [Postgres](#postgres), named after account, if necessary
+2. Add `database` to Hasura
+3. Create `schema` in `database`
+4. Run DDL in `schema`
+5. Track all tables in `schema` in [Hasura](#hasura)
+6. Track all foreign key relationships in `schema` in [Hasura](#hasura)
+7. Add all permissions to all tables in `schema` for account
+
+:::note
+This is the workflow for the initial provisioning. Nothing happens for the remaining provisions, as QueryAPI checks if it has been provisioned, and then skips these steps.
+:::
+
+:::info
+To check if an Indexer has been provisioned, QueryAPI checks if both the `database` and `schema` exist. This check is what prevents the app from attempting to provision an already provisioned indexer.
+:::
+
+## Historical backfill
+
+When an indexer is created, two processes are triggered:
+
+- real-time - starts from the block the indexer was registered (`block_X`), and will execute the indexer function on every matching block from there on.
+- historical - starts from the configured `start_from_block` height, and will execute the indexer function for all matching blocks up until `block_X`.
+
+The historical backfill process can be broken down in to two parts: _indexed_, and _unindexed_ blocks.
+
+**Indexed** blocks come from the `near-delta-lake` bucket in S3. This bucket is populated via a DataBricks job which streams blocks from NEAR Lake, and for every account, stores the block heights that contain transactions made against them. This processed data allows QueryAPI to quickly fetch a list of block heights that match the contract ID defined on the Indexer, rather than filtering through all blocks.
+
+NEAR Delta Lake is not updated in real time, so for the historical process to close the gap between it and the starting point of the real-time process, it must also manually process the remaining blocks. This is the **unindexed** portion of the backfill.
+
+### Low level details
+
+There are two main pieces to provisioning: [Postgres](#postgres) and [Hasura](#hasura).
+
+#### Postgres
+
+The dynamic piece in this process is the user-provided `schema.sql` file, which uses Data Definition Language (DDL) to describe the structure of the user's database. The `schema` DDL will be executed within a new Postgres Schema named after the account + function name. For example, if you have an `account.near` and `my_function`, the schema's name will be `account_near_my_function`. Each schema exists within a separate virtual database for that account, named after the account, i.e. `account_near`.
+
+#### Hasura
+
+After creating the new schema, Hasura is configured to expose this schema via a GraphQL endpoint. First, the database is added as a data source to Hasura. By default, all tables are hidden, so to expose them they must be "tracked". 
+
+Foreign keys in Postgres allow for nested queries via GraphQL, for example a `customer` table may have a foreign relationship to an `orders` table, this enables the user to query the orders from a customer within a single query. These are not enabled by default, so they must be "tracked" too.
+
+Finally, necessary permissions are added to the tables. These permissions control which GraphQL operations will be exposed. For example, the `SELECT` permission allows all queries, and `DELETE` will expose all delete mutations. A role, named after the account, is used to group these permissions. Specifying the role in the GraphQL request applies these permissions to that request, preventing access to other users data.
+
 ## Who hosts QueryAPI
 
 [Pagoda Inc.](https://pagoda.co) runs and manages all the infrastructure of QueryAPI, including NEAR Lake nodes to store the data in JSON format on AWS S3.