Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List API for the state store building block #61

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
182 changes: 182 additions & 0 deletions 20240627-BC-listapi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# List API for the Dapr state component

This proposal proposes implementing a List API in Dapr's state component. The List API will enable the retrieval of keys in a state store based on certain criteria, providing users with the necessary visibility into stored keys. List API results will **not** include the value. SDKs could provide the possibility to bulk get all the keys returned in a page.

The requirements for the API are:

- Ability to list all keys in a state store
- Ability to list keys in a state store with a certain prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not supported by all state stores

Copy link
Member

@artursouza artursouza Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @elena-kolevska has a table with her analysis and she will be sharing here. We can modify the proposal to reduce the feature set of the list API and maximize coverage across state stores but that will not be 100%. We already have that today for existing state store features. I don't think that every single state store must support a feature to be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I probably should have shared that from the beginning. I updated the proposal now with the table in an adendum.

- The results can be sorted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not supported by all state stores

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

- The results can be paginated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not supported by all state stores

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


As with the other state store APIs, the List API will also have the difficult job of finding a set of features that are supported across most state store components and filling in the gaps with reasonable behaviour when they aren’t.

## API

### HTTP

Developers can list keys by issuing an HTTP API call to the Dapr sidecar:

```bash
GET /v1.0/state/:storeName/?prefix={prefix}&sorting={sorting}&page_limit={pageLimit}&page_token={pageToken}
```

The `sorting` query parameter can accept one of the following values:
- `default`
- `asc`
- `desc`


The response will be a JSON object with the following structure:
```json
{
"keys": ["key1", "key2", "key3", "...", "keyN"],
"next_page_token": "nextTokenString"
}
```

For example:
Request:
```cURL
GET /v1.0/state/myStateStore?prefix=user&sorting=asc&page_limit=3&page_token=user3
```
Response:
```json
{
"keys": ["user4", "user5", "user6"],
"next_page_token": "user6"
}
```

### gRPC

Developers can also list keys by issuing a unary gRPC call

```bash
service Dapr {
...
rpc ListState(ListRequest) returns (ListResponse) {}
...
}

message ListStateRequest {
// The prefix that should be used for listing.
optional string prefix = 1;
Copy link
Member

@artursouza artursouza Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think prefix matching is important for apps to filter based on customer ID, scenarios like: all orders where key starts with "Customer1035143531|" since keys are composed as "customer Id|Order Id".

So, we should keep it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should case-sensitivity be handled, if at all?


// The maximum number of items that should be returned per page
optional uint32 page_limit = 2;

// Specifies if the result should be sorted
optional Sort sort = 3;

// Specifies the next pagination token
// If this is empty, it indicates the start of a new listing.
optional string page_token = 4;

// Sorting order options
enum Sort {
Copy link
Member

@artursouza artursouza Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might remove sorting from the initial proposal. Some state stores may support a metadata param to handle it.

DEFAULT = 0;
ASCENDING = 1;
DESCENDING = 2;
}
}

message ListStateResponse {
// The items that were listed
repeated string keys = 1;

// The next pagination token that should be sent for the next page
// If this is empty, it indicates that there are no more pages.
optional string next_page_token = 2;
}
```

### Default values

- Prefix: “”
- Sorting: “default”
- Page limit: 50
- Next token: “”

## **Pagination**

The two most common pagination strategies are token and offset-based pagination.

**Offset-based pagination**
Uses a fixed offset and limit to retrieve a subset of results from a larger dataset. This method is common in relational databases and is implemented with the `LIMIT` and `OFFSET` clauses.

It’s not common in no-sql databases, but it is very common in SQL databases. It relies on a table scan and skipping results until it reaches the offset value.

**Token-based pagination**

Relies on a token usually equal to, or derived from the last element in the last returned page.

Very common in no-sql databases that do a scan across the keyspace.

In relational databases this method relies on an indexed column, such as a timestamp or an ID, to ensure efficient sorting and querying. For example:

```bash
SELECT * FROM items WHERE key > last_key_id ORDER BY key;
```

---

Most often, offset-based pagination is not possible in no-sql databases, while it’s easy (even preferable) to implement in relational databases, so this proposal suggests using **token-based pagination** in the List API.

Based on this decision, listing items will only be available forwards, and not backwards. To list previous pages, the application would have to keep track of the page tokens.

## **Sorting**

Sorting is required for token-based pagination in relational databases, so we must have a default sorting order.

Some no-sql databases (ex. Azure blob store) don’t support sorting in descending order and others don’t support any sorting at all (ex. Redis). In these cases, we want to return an explicit error instead of failing silently.

This might be restricting for use cases where the underlying state store needs to be swapped though. For example a team could use Redis for local development, and Postgres in production, and they wouldn’t be able to use the same application code, because the sorting clause would error on Redis, but pass on Postgres. That’s why we’re introducing the `Default` sorting option which will sort in ascending order for all databases that support it, and leave results unsorted for the databases that don’t.

## SDKs

All supported SDKs should be updated to implement the List API. SDKs should offer the option to fetch batch values of the returned keys.

## Default behaviour for state stores with missing features

Some of the state stores Dapr supports don’t provide the necessary capabilities for implementing the list API. For example, Memcached doesn’t provide a way to list keys, Azure table storage can’t sort keys in descending order and so on. For those cases the list API will do a best effort to provide the closest functionality to the one defined in the API. The functionality will be specific to the data store and will be implemented on the component level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different philosophies - but I no longer support this approach in Dapr building blocks and think we should instead have a new specialty building block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having specialty building blocks. I think the state store is doing too much. On the other hand, it also misses the List API, which is a basic operation. I think we should consider moving some state store components out of the building block into a new one and rename state store to key-value store. Another route is to deprecate the state store API and have multiple specialized building blocks. For the sake of people's time availability, the most realistic path is to rebrand state store to key/value store and add list API. Non-compliant components will partially implement the API until a specialized building block is created.


List API requests on state stores that don’t support the List API will result in errors.

## Impact of the List API on Dapr state store components
From the moment this proposal is accepted, all state store components will be required to implement the List API in order to get the "Stable" certification level.
Components that are currently stable and for which the underlying state store does not support listing will not lose their stable status.

## Performance and pricing implications
Listing keys in big data sets, specially for partitioned databases, can be expensive in terms of both performance and cost. Often it would incur creating an index which will impact write performance, storage cost and sometimes even read performance.
For the databases where this is a concern, we should offer an option to disable the List API on the component level.

## Definitions:

- **Listing**: The ability to retrieve a collection of items.
- **Sorting**: The ability to sort the results based on one or more fields.
- **Prefix Search**: The ability to search for items that start with a given prefix.
- **Pagination**: The ability to paginate through items, typically using skip/limit or similar mechanisms.

## Adendum

Here's a list of the relevant capabilities of all the stable state stores:

| Store | Cursor listing | Offset listing | Sorting | Number of Items per Page | Prefix Search | Comments |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this list, we can clearly see that with cursor listing, page limit and prefix search, we will have plenty of coverage.

| --- | --- | --- | --- | --- | --- | --- |
| **aws dynamodb** | Yes | No | Yes, with a GSI | Yes | Yes, with an additional sortKey and a GSI | In order to be able to use prefix search, users will need to have a Global Search Index(GSI) where the partition key will be a single fixed string (for ex. the `_` character) and the sort key will be the key name. There are some drawbacks to this that can be discussed in detail elsewhere. |
| **azure blob store** | Yes (continuation token) | No | Always sorted in ASC order. Desc, or unsorted is not possible. | Yes | Yes | Results are always sorted by key name in ascending order. |
| **azure cosmos db** | Yes | Yes | Yes | Yes | Yes |   |
| **azure table storage** | Yes | No | Yes, just ASC | Yes, with $top | Yes, with range search | Partition key is the application id. |
| **cassandra** | Yes | No | No | Yes | No | Can’t prefix search and sort across all partitions. We could consider maintaining a new table containing all keys, and mirroring the original key’s ttl. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can cassandra to filtering within the same partition? It might be enough to begin with. It is a common thing in CosmosDb too for transaction API, for example.

| **cockroachdb** | Yes, if sorting is required | Yes | Yes | Yes | Yes | Need to create an index on the search column |
| **gcp firestore** | Yes |   |   |   |   |   |
| **in-memory** | No | No | No | No | No | We can implement all the features, but it’s not trivial to aggregate data across multiple instances |
elena-kolevska marked this conversation as resolved.
Show resolved Hide resolved
| **memcached** | No | No | No | No | No |   |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not belong to state store IMO, so we should not dismiss list API just because of this one.

| **mongodb** | Yes | Yes | Yes | Yes | Yes |   |
| **mysql** | Yes | Yes |   | Yes | Yes | Need to create an index on the id column. MySql supports specialized prefix indices, but you would have to know the exact length of the prefix you’ll be searching on, also sorting will not use the index. |
| **postgresql** | Yes | Yes | Yes | Yes | Yes | Need to create an index on the key column. We can use the varchar\_pattern\_ops operator class, optimised for prefix search. |
| **redis** | Yes | No | No | Yes (Best effort) | Yes | Number of record per page is not guaranteed, but best effort. |
|   |   |   |   |   |   |   |
| **sqlite** | Yes, if sorting is required | Yes | Yes | Yes | Yes | Need to create an index on the key column. It’s a standard b-tree index.We could maintain an index of all keys in a hash |
| **sqlserver** | Yes, if sorting is required | Yes | Yes | Yes | Yes | need to create a non-clustered index on the “key” column |