Tweaks to quickstarts (#180)

spiceai · Sep 16, 2024 · 48357f2 · 48357f2
1 parent e1e1be9
commit 48357f2
Show file tree

Hide file tree

Showing 4 changed files with 74 additions and 54 deletions.
diff --git a/databricks/README.md b/databricks/README.md
@@ -1,38 +1,43 @@
-## Spice on Databricks
+# Spice on Databricks
 
 Spice can read data straight from a Databricks instance. This guide will create an app, configure Databricks, load and query a dataset. It assumes:
+
 - Spice is installed (see the [Getting Started](https://docs.spiceai.org/getting-started) documentation).
 - The Databricks instance is running against AWS S3 storage in `us-east-1`.
 - Basic AWS authentication is configured (with environment variable credentials `AWS_ACCESS_KEY_ID` & `AWS_SECRET_ACCESS_KEY`).
 - A Databricks personal access token is available (as the environment variable `DATABRICKS_TOKEN`).
 - A table already exists in Databricks, called `spice_data.public.awesome_table`.
 
 1. Initialize a Spice app
+
     ```shell
     spice init databricks_demo
     cd databricks_demo
     ```
 
-1. Start the Spice runtime
-    ```shell
-    >>> spice run
-    2024-03-27T05:27:52.696536Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
-    2024-03-27T05:27:52.696543Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
-    2024-03-27T05:27:52.696606Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
-    ```
-
 1. In another terminal, working in the `databricks_demo` directory, configure Spice with the Databricks credentials
+
     ```shell
     spice login databricks \
         --token $DATABRICKS_TOKEN \
         --aws-access-key-id $AWS_ACCESS_KEY_ID \
         --aws-secret-access-key $AWS_SECRET_ACCESS_KEY \
         --aws-region us-east-1
-    ``` 
+    ```
 
     Executing `spice login` and successfully authenticating will create a `.env` file in the `databricks_demo` directory with the Databricks credentials.
 
-1. Configure a Databricks dataset into the spicepod. The table provided must be a reference to a table in the Databricks unity catalog. 
+1. Start the Spice runtime
+
+    ```shell
+    >>> spice run
+    2024-03-27T05:27:52.696536Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
+    2024-03-27T05:27:52.696543Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
+    2024-03-27T05:27:52.696606Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
+    ```
+
+1. Configure a Databricks dataset into the spicepod. The table provided must be a reference to a table in the Databricks unity catalog.
+
     ```shell
     >>> spice dataset configure
 
@@ -44,21 +49,22 @@ Spice can read data straight from a Databricks instance. This guide will create
     Saved datasets/my_table/dataset.yaml
     ```
 
-1. Edit the dataset to add `mode: delta_lake` and `databricks_cluster_id: <cluster id>` to the `params` section:
+1. Edit the dataset to add `mode: delta_lake` to the `params` section:
 
     ```yaml
     params:
       mode: delta_lake
       databricks_endpoint: <existing_endpoint>
-      databricks_cluster_id: <cluster id>
     ```
 
 1. Confirm that the runtime has registered the new table (in the original terminal)
+
     ```shell
     2024-03-27T05:27:54.051229Z  INFO runtime: Dataset my_table registered (databricks:spice_data.public.awesome_table), results cache enabled.
     ```
 
 1. Check the table exists from the Spice REPL
+
     ```shell
     >>> spice sql 
     Welcome to the Spice.ai SQL REPL! Type 'help' for help.
@@ -76,9 +82,8 @@ Spice can read data straight from a Databricks instance. This guide will create
     Time: 0.008540708 seconds
     ```
 
-
-    ```shell 
-    sql> describe datafusion.public.my_table
+    ```shell
+    sql> describe spice.public.my_table
     +-----------------------+------------------------------+-------------+
     | column_name           | data_type                    | is_nullable |
     +-----------------------+------------------------------+-------------+
@@ -106,6 +111,7 @@ Spice can read data straight from a Databricks instance. This guide will create
     ```
 
 1. Query against the Databricks table. Since the table isn't accelerated, the spice runtime will make a network call to the object storage service.
+
     ```shell
     >>> spice sql
     sql> SELECT avg(total_amount), avg(tip_amount), count(1), passenger_count FROM my_table  GROUP BY passenger_count ORDER BY passenger_count ASC;
@@ -129,40 +135,47 @@ Spice can read data straight from a Databricks instance. This guide will create
     ```
 
 ## (Optional): Accelerating Databricks
-To improve the query performance, the Databricks dataset can be accelerated. 
+
+To improve the query performance, the Databricks dataset can be accelerated.
+
 1. Edit the dataset, `my_table`.
-```shell
-echo """acceleration:
-  enabled: true""" >> datasets/my_table/dataset.yaml
-```
+
+    ```shell
+    echo """acceleration:
+    enabled: true""" >> datasets/my_table/dataset.yaml
+    ```
+
 2. The Spice runtime should be updated (i.e. `ACCELERATION=true`)
-```shell
->>> spice datasets
 
-FROM                                        NAME     REPLICATION ACCELERATION DEPENDSON STATUS
-databricks:spice_data.public.awesome_table my_table false       true                   Ready
-```
+    ```shell
+    >>> spice datasets
+
+    FROM                                       NAME     REPLICATION ACCELERATION DEPENDSON STATUS
+    databricks:spice_data.public.awesome_table my_table false       true                   Ready
+    ```
+
 3. Rerun the query
-```shell
->>> spice sql 
-sql> select avg(total_amount), avg(tip_amount), count(1), passenger_count from my_table  group by passenger_count order by passenger_count asc;
-+----------------------------+--------------------------+-----------------+-----------------+
-| AVG(my_table.total_amount) | AVG(my_table.tip_amount) | COUNT(Int64(1)) | passenger_count |
-+----------------------------+--------------------------+-----------------+-----------------+
-| 25.32781693945653          | 3.072259971396793        | 31465           | 0               |
-| 26.205230445474996         | 3.3712622884680052       | 2188739         | 1               |
-| 29.520659930930304         | 3.7171302113290854       | 405103          | 2               |
-| 29.138309044290263         | 3.5370455392167615       | 91262           | 3               |
-| 30.877266710278306         | 3.466037634201712        | 51974           | 4               |
-| 26.269129111203988         | 3.3797078135259317       | 33506           | 5               |
-| 25.801183286359798         | 3.344098778687425        | 22353           | 6               |
-| 57.735                     | 8.37                     | 8               | 7               |
-| 95.66803921568626          | 11.972156862745097       | 51              | 8               |
-| 18.45                      | 3.05                     | 1               | 9               |
-| 25.81173663332435          | 1.545956750046378        | 140162          |                 |
-+----------------------------+--------------------------+-----------------+-----------------+
-
-Time: 0.0227835 seconds
-```
-
-Note: A dataset can be accelerated when configured by specifying yes (y) to `locally accelerate (y/n)?`.
+
+    ```shell
+    >>> spice sql 
+    sql> select avg(total_amount), avg(tip_amount), count(1), passenger_count from my_table  group by passenger_count order by passenger_count asc;
+    +----------------------------+--------------------------+-----------------+-----------------+
+    | AVG(my_table.total_amount) | AVG(my_table.tip_amount) | COUNT(Int64(1)) | passenger_count |
+    +----------------------------+--------------------------+-----------------+-----------------+
+    | 25.32781693945653          | 3.072259971396793        | 31465           | 0               |
+    | 26.205230445474996         | 3.3712622884680052       | 2188739         | 1               |
+    | 29.520659930930304         | 3.7171302113290854       | 405103          | 2               |
+    | 29.138309044290263         | 3.5370455392167615       | 91262           | 3               |
+    | 30.877266710278306         | 3.466037634201712        | 51974           | 4               |
+    | 26.269129111203988         | 3.3797078135259317       | 33506           | 5               |
+    | 25.801183286359798         | 3.344098778687425        | 22353           | 6               |
+    | 57.735                     | 8.37                     | 8               | 7               |
+    | 95.66803921568626          | 11.972156862745097       | 51              | 8               |
+    | 18.45                      | 3.05                     | 1               | 9               |
+    | 25.81173663332435          | 1.545956750046378        | 140162          |                 |
+    +----------------------------+--------------------------+-----------------+-----------------+
+
+    Time: 0.0227835 seconds
+    ```
+
+Note: A dataset can be accelerated when configured by specifying yes (y) to `locally accelerate (y/n)?`.
diff --git a/graphql/README.md b/graphql/README.md
@@ -6,7 +6,7 @@ Follow these steps to get started with GraphQL as a Data Connector.
 
 - The latest version of Spice. [Install Spice](https://docs.spiceai.org/getting-started/installation)
 - A GraphQL endpoint with a query that returns data in JSON format.
-  - The GitHub GraphQL API (https://api.github.com/graphql) is a good example to get started with. [GitHub GraphQL API](https://docs.github.com/en/graphql)
+  - The GitHub GraphQL API (<https://api.github.com/graphql>) is a good example to get started with. [GitHub GraphQL API](https://docs.github.com/en/graphql)
 
 **Step 1.** Edit the `spicepod.yaml` file in this directory and replace the `graphql_quickstart` dataset params with the connection parameters for the GraphQL instance, where `[local_table_name]` is the desired name for the federated table within Spice, `[graphql_endpoint]` is the URL to the GraphQL endpoint, `[graphql_query]` is the query to execute, and `[json_pointer]` is the JSON pointer to the data in the GraphQL response.
 
@@ -46,7 +46,13 @@ datasets:
 
 See the [GraphQL data connector docs](https://docs.spiceai.org/components/data-connectors/graphql) for more configuration options.
 
-To securely store GraphQL auth params, see [Secret Stores](https://docs.spiceai.org/components/secret-stores)
+To securely store GraphQL auth params, see [Secret Stores](https://docs.spiceai.org/components/secret-stores).
+
+Add the following environment variable to a `.env` file:
+
+```bash
+GH_TOKEN=<your GitHub token>
+```
 
 **Step 2.** Run the Spice runtime with `spice run` from the directory with the `spicepod.yaml` file.
 

diff --git a/kubernetes/README.md b/kubernetes/README.md
@@ -1,4 +1,4 @@
-### Follow these steps to get started running Spice.ai in Kubernetes.
+# Follow these steps to get started running Spice.ai in Kubernetes
 
 **Step 1.** (Optional) Start a local [`kind`](https://kind.sigs.k8s.io/) cluster:
 

diff --git a/spiceai/README.md b/spiceai/README.md
@@ -1,4 +1,4 @@
-## Spice Quickstart Tutorial using the Spice.ai Cloud Platform
+# Spice Quickstart Tutorial using the Spice.ai Cloud Platform
 
 The Spice.ai Cloud Platform has many datasets that can be used within Spice.  A valid login for the Spice.ai Cloud Platform is required to access the datasets.  Before beginning this quickstart, [link your GitHub account to Spice.ai](https://spice.ai/login) to get access to the platform.
 
@@ -14,6 +14,7 @@ cd spiceai-demo
 ```bash
 spice login
 ```
+
 A browser window will open displaying a code that will appear in the terminal.  Select Approve if the authorization codes match.
 
 ![Screenshot](./device_login.png)
@@ -90,7 +91,7 @@ description: ethereum recent blocks
 
 The Spice runtime terminal will show that the dataset has been loaded:
 
-```
+```console
 2024-07-23T01:01:50.403937Z  INFO runtime: Dataset eth_recent_blocks registered (spice.ai/eth.recent_blocks), results cache enabled.
 ```