Skip to content

Commit

Permalink
Add statistics information in table snapshot
Browse files Browse the repository at this point in the history
  • Loading branch information
findepi committed Jul 25, 2022
1 parent 3c0fd8f commit 01097d8
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions format/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -665,9 +665,34 @@ Table metadata consists of the following fields:
| _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored as full sort order objects. |
| _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. |
| | _optional_ | **`refs`** | A map of snapshot references. The map keys are the unique snapshot reference names in the table, and the map values are snapshot reference objects. There is always a `main` branch reference pointing to the `current-snapshot-id` even if the `refs` map is null. |
| _optional_ | _optional_ | **`snapshot-statistics`** | A list (optional) of [table statistics](#table-statistics). |

For serialization details, see Appendix C.

#### Table statistics

Table statistics files are valid [Puffin files](../puffin-spec). Statistics are informational. A reader can choose to
ignore statistics information. Statistics support is not required to read the table correctly. A table can contain
many statistics files associated with different table snapshots.

Statistics files metadata within `snapshot-statistics` table metadata field is a struct with the following fields:

| v1 | v2 | Field name | Type | Description |
|------------|------------|---------------------------------|-----------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| _required_ | _required_ | **`snapshot-id`** | `string` | ID of the Iceberg table's snapshot the statistics were computed from. |
| _required_ | _required_ | **`statistics-path`** | `string` | Path of the statistics file. See [Puffin file format](../puffin-spec). |
| _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the statistics file. |
| _required_ | _required_ | **`file-footer-size-in-bytes`** | `long` | Total size of the statistics file's footer (not the footer payload size). See [Puffin file format](../puffin-spec) for footer definition. |
| _required_ | _required_ | **`blob-metadata`** | `list<blob metadata>` (see below) | A list of the blob metadata for statistics contained in the file with structure described below. |

Blob metadata is a struct with the following fields:

| v1 | v2 | Field name | Type | Description |
|------------|------------|------------------|-----------------------|----------------------------------------------------------------------------------------------------|
| _required_ | _required_ | **`type`** | `string` | Type of the blob. Matches Blob type in the Puffin file. |
| _required_ | _required_ | **`fields`** | `list<integer>` | Ordered list of fields, given by field ID, on which the statistic was calculated. |
| _required_ | _optional_ | **`properties`** | `map<string, string>` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. |


#### Commit Conflict Resolution and Retry

Expand Down

0 comments on commit 01097d8

Please sign in to comment.