Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interface for PaginatedKVStore. #22

Merged
merged 1 commit into from
Nov 23, 2024

Conversation

G8XSU
Copy link
Contributor

@G8XSU G8XSU commented Nov 14, 2024

PaginatedKVStore mainly add 2 functionalities in addition to that of KVStore:

  • Paginated list response, useful for exposing paginated apis to user.
  • Descending time-based ordering of listed keys.

Considered extending the KVStore trait but decided against it.
Reasons:

  • Overlapping Method Names with Different Signatures
  • Separation of concern and clarity in Interface: keeping PaginatedKVStore separate makes it clear that it provides different functionality and can be supported by different backend.

@G8XSU G8XSU requested a review from jkczyz November 14, 2024 18:10
Copy link

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the PR to describe why pagination is needed?

/// Will create the given `primary_namespace` and `secondary_namespace` if not already present
/// in the store.
fn write(
&self, primary_namespace: &str, secondary_namespace: &str, key: &str, time: i64, buf: &[u8],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why time can't be an implementation detail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it could be handled by impl internally, this would pose some limitations on impl.

For example, a user of this interface might already have access to a monotonic counter and thus avoid a dependency on std::time; iiuc, std::time has limited support in Wasm.

While it doesn’t matter much for ldk-server right now, when we upstream something similar, we might want to do this in a non-std::time-dependent fashion or atleast not at kv-store interface/impl level.

Additionally, providing time as an input could allow certain types of objects to be written without requiring time, which is mainly useful in cases where the listing doesn’t need time ordering.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it could be handled by impl internally, this would pose some limitations on impl.

For example, a user of this interface might already have access to a monotonic counter and thus avoid a dependency on std::time; iiuc, std::time has limited support in Wasm.

Note that it doesn't need to be directly in the implementation but could be at the database-level, for instance.

While it doesn’t matter much for ldk-server right now, when we upstream something similar, we might want to do this in a non-std::time-dependent fashion or atleast not at kv-store interface/impl level.

Additionally, providing time as an input could allow certain types of objects to be written without requiring time, which is mainly useful in cases where the listing doesn’t need time ordering.

I suppose that's a fair point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it doesn't need to be directly in the implementation but could be at the database-level, for instance.

We could do this at DB level but that is tight coupling and using very db specific features.
Doing this without time dependency in implementation requires the feature "auto-populate creation_time"
With different SQL impls it might be possible but not every DB, hence we don't want to place that limitation.

For example:

  • Can we do this in sqllite?
    Yes, creation_time INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))

  • Can we do this in postgres?
    Yes, creation_time INTEGER NOT NULL DEFAULT (EXTRACT(EPOCH FROM NOW())::INT)

  • Can we do this in mysql?
    Yes, creation_time INT NOT NULL DEFAULT UNIX_TIMESTAMP(),

  • Can we do this in dynamoDB?
    Afaik, No

  • Can we do this in cloudflare-kv?
    Afaik, No

  • Can we do this in mongo-db?
    Yes,

  • Can we do this in couch-db?
    Afaik, No

  • Can we do this in redis?
    Afaik, No

Hence, not providing this at interface level, places expectations/requirements on db-implementations that were not necessarily required. It is only relevant for wasm environments, but if we can easily avoid this, i think we should.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the time parameter in the extension trait method as defined in #22 (comment) and discussed offline.

///
/// [`ListResponse`]: struct.ListResponse.html
fn list(
&self, primary_namespace: &str, secondary_namespace: &str, next_page_token: Option<String>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than needing to wait on a dependency to be released, is there any reason why pagination couldn't be optional in KvStore (i.e., an implementation could chose whether or not to support it)? An extension trait could add a list_paginated method with a naive provided implementation to do the filtering. Then implementations could override it with something more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pagination cannot be optional; we need to expose pagination at the interface level in ldk-server APIs. There could be 100,000 forwarded payments or payments, and we don’t want to load them all into memory.

There is no reason for objects that need to be fetched in a paginated manner to be retrieved using just a list, as it results in an "unbounded" request on both the database and the server, consuming unbounded resources.

While pagination isn’t optional, time-based ordering can be optional depending on the objects we are listing, which we can consider.

We need to override both write and list. I am open to suggestions, but I thought it might be confusing to combine them into a single trait.
It might be better to write/access objects that need pagination through a dedicated paginated interface, while those that don’t can use the normal kv_store interface.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pagination cannot be optional; we need to expose pagination at the interface level in ldk-server APIs. There could be 100,000 forwarded payments or payments, and we don’t want to load them all into memory.

There is no reason for objects that need to be fetched in a paginated manner to be retrieved using just a list, as it results in an "unbounded" request on both the database and the server, consuming unbounded resources.

By optional, I mean you can decided to use an implementation that does not implement the extension trait since the latter could provide a default implementation using the non-paginated version.

That said, we don't need to provide a naive implementation for the extension trait. We can leave it undefined to force any implementation to define it in order to avoid the problems that you mentioned.

While pagination isn’t optional, time-based ordering can be optional depending on the objects we are listing, which we can consider.

We need to override both write and list. I am open to suggestions, but I thought it might be confusing to combine them into a single trait. It might be better to write/access objects that need pagination through a dedicated paginated interface, while those that don’t can use the normal kv_store interface.

I think there's some confusion here then. An extension trait is a separate trait that is a subtrait of another trait. So if you have KvStore you can define PaginatedKvStore as:

pub trait PaginatedKvStore: KvStore {
	fn paginated_list(
		&self, primary_namespace: &str, secondary_namespace: &str, next_page_token: Option<String>,
	) -> Result<ListResponse, io::Error>;
}

This seems better than duplicating the entire SqliteStore implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, I agree with Jeff here in that I find going the extension trait route much preferable. At least, if I'm not currently missing something that would prohibit us from going that way.

In particular if the goal is to upstream this trait and the corresponding implementations to LDK Node and eventually to LDK, I imagine it would be unnecessarily hard to reconcile the APIs at that point if we start off with two entirely separate but functionally and terminologically overlapping traits.

Copy link
Contributor Author

@G8XSU G8XSU Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extension option proposed above only works if we write all objects with time all the time.

1.) it would need changes in ldk-node base sql at the minimum.
2.) there is time dependency in database impl. (being discussed in above comment), (if we don't want two write methods in same trait.)

Due to changes in both write and list, i proposed going with separate paginated_kv_store interface.

Also, ldk-node no longer has kvstore trait, any changes to kv_store trait require changes in rust-lightning, although it would be good-to-have, we probably don't need paginated api in rust-lightning right now.

If we want to have time in write,
We can either have an extension trait like:

pub trait PaginatedKvStore: KvStore {
        // this write method doesn't make much sense imo.
        fn paginated_write(
		&self, primary_namespace: &str, secondary_namespace: &str, key: &str, time: i64, buf: &[u8]
	) -> Result<ListResponse, io::Error>;
	
	fn paginated_list(
		&self, primary_namespace: &str, secondary_namespace: &str, next_page_token: Option<String>,
	) -> Result<ListResponse, io::Error>;
}

or separate trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another problem with this combined interface.
paginated_writes and normal write operations cannot be combined for the same object type (since normal writes don't use the time field currently).
Similarly, a paginated list operation can't be run on objects written through normal writes. I think that will make for a super confusing API with footguns everywhere.

Moreover, we would want paginated_writes and list to go through a separate table since we will have indexes on the time column.

Essentially, these two are two different data stores. If the implementation chooses to combine them, they are free to do so, but I don't think we should complicate the interface just to avoid duplication of methods in trait.

We can always implement two traits using the same base methods, instead of combining two different functionalities into one.
For ldk-node as well, I think it would make sense to support providing both KvStore and paginated_kv_store implementations separately.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another problem with this combined interface. paginated_writes and normal write operations cannot be combined for the same object type (since normal writes don't use the time field currently). Similarly, a paginated list operation can't be run on objects written through normal writes. I think that will make for a super confusing API with footguns everywhere.

Confusing to whom? It's not the implementors responsibility to get anything right. The caller (i.e., whichever struct is parameterized by the trait) is responsible for using it correctly. And here the caller and the author of the trait are one and the same.

Moreover, we would want paginated_writes and list to go through a separate table since we will have indexes on the time column.

Essentially, these two are two different data stores. If the implementation chooses to combine them, they are free to do so, but I don't think we should complicate the interface just to avoid duplication of methods in trait.

We can always implement two traits using the same base methods, instead of combining two different functionalities into one. For ldk-node as well, I think it would make sense to support providing both KvStore and paginated_kv_store implementations separately.

Hmm... not quite sure how you would support that in LDK Node. If anything, it would probably make more sense to decompose the API into KVRead, KVWrite, PaginatedKVRead, and PaginatedKVWrite. That way each part of the code is explicit about what the interface it needs. But when configuring LDK Node, you'd still need something that implements all the traits. So, in practice, an implementation would need to implement all those traits anyway.

My broader point, though, the onus is really on the user of the interface to do the right thing. How the traits are structured is less important until we talk about upstreaming. See #22 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing to whom? It's not the implementor's responsibility to get anything right.

I think it is confusing inside ldk-node or ldk-server and creates foot-guns, such as not combining list with paginated_write or paginated_list with list/write.

But when configuring LDK Node, you'd still need something that implements all the traits.

I'm not sure if it is correct to assume that users will want to use the same datastore for both normal kvstore and paginated_kvstore. NoSQL/DocumentDB/ObjectStorage are almost a perfect fit for kvstore, whereas SQL-based databases might be a better fit for paginated_kvstore.

So, I think users could be better served if they have the option to configure kvstore and paginated_kvstore separately. Combining them and adding more and more functionality to kvstore pushes them toward using SQL databases for everything.

What about the overhead of indexing on the time column? Are we okay with introducing additional overhead to index the time column for all keys instead of just the paginated ones?

In my opinion, there are bigger trade-offs to consider here, like flexibility in database choice and indexing performance, than just duplication in the interface.

Separate tables

If the implementor of the trait wants to configure separate tables for paginated and normal writes, it isn't easy to do so.
If there are separate tables, we would also need a paginated_read (just reads from the paginated table) and a paginated_remove (just deletes from the paginated table).

So, in practice, an implementation would need to implement all those traits anyway.

The key point is that there can be two different implementations/schemas for the two traits (KvStore and PaginatedKvStore) being used.

Not quite sure how you would support that in LDK Node.

I think, just like we configure KvStore in the builder, we would also have the option to configure paginated_kv_store.

Another option could be to just use paginated_kv_store or change the interface of the current KvStore. But this restricts database, schema and indexing options.

My broader point, though, the onus is really on the user of the interface to do the right thing. How the traits are structured is less important until we talk about upstreaming.

The trait structure adds additional requirements on implementations, database schemas, and indexing.
More requirements for trait-implementations and database not just for user(ldk-node/ldk-server).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this offline with @G8XSU. Seems two independent traits would be ideal if we want to allow parameterizing LDK Node with two different stores, where one needs to be paginated. That would also mean we'd want two separate implementations -- one for each trait -- like SqliteStore and PaginatedSqliteStore. This is because each would need to be initialized differently (e.g., creating an index, using a different db/table name, etc.). If instead we had one type implementing both traits, you could imagine constructing one that is set up to work in a non-paginated manner but passing it for use with the paginated case, causing it to break.

@tnull What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed it offline in dev-sync, concluded with going for separate traits as outlined in this PR.

@G8XSU G8XSU requested a review from jkczyz November 15, 2024 21:05
Copy link

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one small fix then good to squash and merge.

server/src/io/paginated_kv_store.rs Outdated Show resolved Hide resolved
Pagination in list is required because:
1. To bound query size for database.
2. Bound execution time of request on server.
3. Bound in-memory footprint of request on server.
4. Bound response size to client.
@G8XSU
Copy link
Contributor Author

G8XSU commented Nov 22, 2024

Squashed with fix.

@G8XSU G8XSU requested a review from jkczyz November 22, 2024 23:30
@G8XSU G8XSU mentioned this pull request Nov 22, 2024
@G8XSU G8XSU merged commit f2c5baf into lightningdevkit:main Nov 23, 2024
1 check passed
@G8XSU G8XSU mentioned this pull request Dec 4, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants