-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(query): Add custom SQL query generation for pgvector search #478
Conversation
Hey @shamb0, the concern I have with this, is that it only works with Sql and I suspect it's an uphill battle to implement every conceivable query method for every supported provider. I was thinking of a broader solution such that Swiftide makes several big wins:
It also removes the need / pressure to implement every search strategy for every storage, as there's always the manual option. Might be a bit harder, but huge relieve when it works. For instance, say we have a For instance, for what we have now, Q could be: sqlx -> In pseudo code, something like this might work: struct CustomQuery<Q> {
query: Q
}
impl<Q: Send + Sync + Clone> CustomQuery<Q> {
pub fn from_query(
query: impl Fn(&CustomQuery<Q>, &Query<states::Pending>) -> Q + Send + Sync + 'static,
) -> Self {
CustomQuery {
query: Arc::new(query),
}
}
pub fn build_query(&self, query_node: &Query<states::Pending>) -> Q {
(*self.query)(self, query_node)
}
}
impl Retrieve<CustomQuery<QueryPoints>> for PgVector {
async fn retrieve(
&self,
search_strategy: &CustomQuery<sqlx::QueryBuilder>,
query: Query<states::Pending>,
) -> Result<Query<states::Retrieved>> {
...
let builder = search_strategy.build_query(&query);
... // anything else and execute
}
}
// A user can then use it as follows:
let strat = CustomQuery::from_query(|strat, query_node| sqls::QueryBuilder::new ... }) This way, there's no uphill battle in trying to implement every conceivable query method, and users can do whatever they want. Other query strategies are there for sane and quick defaults. We could also let the provider (i.e. PGVector) provide a |
Just to be clear, I'd be happy to merge an implementation that only supports SqlxBuilder at the moment, then I or someone else can expand on it in the future 👍 |
Hi @timonv, Apologies for the delay :), and thank you for sharing the detailed design guidelines! I'll aim to complete the SqlxBuilder implementation this week. Next week, I'll start working on the search strategy for the remaining storage. 👍🏾 |
Hi @timonv, I’ve implemented the custom search strategy for
Please review and let me know if further adjustments are needed. Thank you! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really nice! Why did you opt for wrapping the CustomQuery
with a concrete type? If it's possible to use the CustomQuery
instead (let's call it CustomStrategy, to avoid confusion with the Query pipeline), then it removes one layer of abstraction.
@@ -0,0 +1,49 @@ | |||
//! Custom search strategy implementation for `PostgreSQL` vector similarity search. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have two files with the same name, one with a typo
Hi @timonv, Thank you for your review comments :). I have reworked the code and cleaned it up as suggested.
|
3b396fb
to
39156c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, looks really good. Two more small comments, otherwise this looks very useful and looks ready to merge. Nice work (again) 🚀
/// `CustomQuery` provides a flexible way to generate provider-specific search queries. | ||
/// | ||
/// # Type Parameters | ||
/// * `Q` - The provider-specific query type (e.g., `sqlx::QueryBuilder` for `PostgreSQL`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct name here is retriever
instead of provider
Self { | ||
query: None, | ||
top_k: super::DEFAULT_TOP_K, | ||
vector_field: EmbeddedField::Combined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like top_k and vector_field aren't used in the retrieval implementation. Since it builds from a closure anyway, and any data can be moved there, how about simplifying by removing the fields, having only query and _marker here?
swiftide/tests/pgvector.rs
Outdated
@@ -38,6 +42,7 @@ struct VectorSearchResult { | |||
/// - Performs a similarity-based vector search on the database and validates the retrieved results. | |||
/// | |||
/// Ensures correctness of end-to-end data flow, including table management, vector storage, and query execution. | |||
#[ignore] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two tests here are ignored
Introduce a flexible, type-safe strategy for custom SQL query generation in pgvector-based similarity search. Key features: - Type-safe closures for custom SQL queries - Dynamic row retrieval with user-defined conditions - Configurable vector fields and metadata filters - Enhanced integration with pgvector's similarity search This update empowers users to combine vector similarity with metadata, customize query construction, and control filtering and ordering. Component: search_strategies/dynamic_vector_search.rs Signed-off-by: shamb0 <[email protected]>
* `swiftide`: 0.14.3 -> 0.14.4 (✓ API compatible changes) * `swiftide-agents`: 0.14.3 -> 0.14.4 * `swiftide-core`: 0.14.3 -> 0.14.4 (✓ API compatible changes) * `swiftide-macros`: 0.14.3 -> 0.14.4 * `swiftide-integrations`: 0.14.3 -> 0.14.4 (✓ API compatible changes) * `swiftide-indexing`: 0.14.3 -> 0.14.4 (✓ API compatible changes) * `swiftide-query`: 0.14.3 -> 0.14.4 (✓ API compatible changes) <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> [0.14.4](bosun-ai/swiftide@v0.14.3...v0.14.4) - 2024-12-11 - [7211559](bosun-ai@7211559) *(agents)* **EXPERIMENTAL** Agents in Swiftide (bosun-ai#463) ````text Agents are coming to Swiftide! We are still ironing out all the kinks, while we make it ready for a proper release. You can already experiment with agents, see the rustdocs for documentation, and an example in `/examples`, and feel free to contact us via github or discord. Better documentation, examples, and tutorials are coming soon. Run completions in a loop, define tools with two handy macros, customize the agent by hooking in on lifecycle events, and much more. Besides documentation, expect a big release for what we build this for soon! 🎉 ```` - [3751f49](bosun-ai@3751f49) *(query)* Add support for single embedding retrieval with PGVector (bosun-ai#406) - [5ce4d21](bosun-ai@5ce4d21) Clippy and deps fixes for 1.83 (bosun-ai#467) **Full Changelog**: bosun-ai/swiftide@0.14.3...0.14.4 </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/).
Implements flexible query generation for PostgreSQL vector similarity search with: - Introduces PgVecCustomStrategy for custom vector search implementations - Adds builder pattern support for configuring search parameters and filters - Enables dynamic SQL generation with metadata filtering and vector similarity This change allows users to create customized vector similarity searches while maintaining type safety and query optimization capabilities. Signed-off-by: shamb0 <[email protected]>
Implements flexible query generation for PostgreSQL vector similarity search with: - Introduces PgVecCustomStrategy for custom vector search implementations - Adds builder pattern support for configuring search parameters and filters - Enables dynamic SQL generation with metadata filtering and vector similarity This change allows users to create customized vector similarity searches while maintaining type safety and query optimization capabilities. Signed-off-by: shamb0 <[email protected]>
* `swiftide`: 0.14.3 -> 0.14.4 (✓ API compatible changes) * `swiftide-agents`: 0.14.3 -> 0.14.4 * `swiftide-core`: 0.14.3 -> 0.14.4 (✓ API compatible changes) * `swiftide-macros`: 0.14.3 -> 0.14.4 * `swiftide-integrations`: 0.14.3 -> 0.14.4 (✓ API compatible changes) * `swiftide-indexing`: 0.14.3 -> 0.14.4 (✓ API compatible changes) * `swiftide-query`: 0.14.3 -> 0.14.4 (✓ API compatible changes) <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> [0.14.4](bosun-ai/swiftide@v0.14.3...v0.14.4) - 2024-12-11 - [7211559](bosun-ai@7211559) *(agents)* **EXPERIMENTAL** Agents in Swiftide (bosun-ai#463) ````text Agents are coming to Swiftide! We are still ironing out all the kinks, while we make it ready for a proper release. You can already experiment with agents, see the rustdocs for documentation, and an example in `/examples`, and feel free to contact us via github or discord. Better documentation, examples, and tutorials are coming soon. Run completions in a loop, define tools with two handy macros, customize the agent by hooking in on lifecycle events, and much more. Besides documentation, expect a big release for what we build this for soon! 🎉 ```` - [3751f49](bosun-ai@3751f49) *(query)* Add support for single embedding retrieval with PGVector (bosun-ai#406) - [5ce4d21](bosun-ai@5ce4d21) Clippy and deps fixes for 1.83 (bosun-ai#467) **Full Changelog**: bosun-ai/swiftide@0.14.3...0.14.4 </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/).
Implements flexible query generation for PostgreSQL vector similarity search with: - Introduces PgVecCustomStrategy for custom vector search implementations - Adds builder pattern support for configuring search parameters and filters - Enables dynamic SQL generation with metadata filtering and vector similarity This change allows users to create customized vector similarity searches while maintaining type safety and query optimization capabilities. Signed-off-by: shamb0 <[email protected]>
4c9b793
to
0741eae
Compare
Thank you @timonv for your thorough and constructive feedback :). I appreciate your patience in guiding me through these improvements. I have addressed both key points from your review:
The documentation has been enhanced with:
I believe these changes have made the code more maintainable and the API more intuitive to use. Please let me know if you'd like to see any further improvements or have additional suggestions. |
Looks good! I want to go over the docs one more time before merging, I'll do the changes. I will merge this asap. |
## 🤖 New release * `swiftide`: 0.15.0 -> 0.16.0 (✓ API compatible changes) * `swiftide-agents`: 0.15.0 -> 0.16.0 (✓ API compatible changes) * `swiftide-core`: 0.15.0 -> 0.16.0 (⚠️ API breaking changes) * `swiftide-macros`: 0.15.0 -> 0.16.0 * `swiftide-integrations`: 0.15.0 -> 0.16.0 (✓ API compatible changes) * `swiftide-indexing`: 0.15.0 -> 0.16.0 (✓ API compatible changes) * `swiftide-query`: 0.15.0 -> 0.16.0 (✓ API compatible changes) ###⚠️ `swiftide-core` breaking changes ``` --- failure enum_missing: pub enum removed or renamed --- Description: A publicly-visible enum cannot be imported by its prior path. A `pub use` may have been removed, or the enum itself may have been renamed or removed entirely. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.38.0/src/lints/enum_missing.ron Failed in: enum swiftide_core::querying::states::AnsweredBuilderError, previously in file /tmp/.tmpfXNg04/swiftide-core/src/query.rs:202 enum swiftide_core::querying::states::RetrievedBuilderError, previously in file /tmp/.tmpfXNg04/swiftide-core/src/query.rs:179 --- failure struct_missing: pub struct removed or renamed --- Description: A publicly-visible struct cannot be imported by its prior path. A `pub use` may have been removed, or the struct itself may have been renamed or removed entirely. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.38.0/src/lints/struct_missing.ron Failed in: struct swiftide_core::querying::states::RetrievedBuilder, previously in file /tmp/.tmpfXNg04/swiftide-core/src/query.rs:179 struct swiftide_core::querying::states::AnsweredBuilder, previously in file /tmp/.tmpfXNg04/swiftide-core/src/query.rs:202 --- failure trait_added_supertrait: non-sealed trait added new supertraits --- Description: A non-sealed trait added one or more supertraits, which breaks downstream implementations of the trait ref: https://doc.rust-lang.org/cargo/reference/semver.html#generic-bounds-tighten impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.38.0/src/lints/trait_added_supertrait.ron Failed in: trait swiftide_core::querying::QueryState gained Default in file /tmp/.tmp5vx3MP/swiftide/swiftide-core/src/query.rs:178 --- failure trait_no_longer_object_safe: trait no longer object safe --- Description: Trait is no longer object safe, which breaks `dyn Trait` usage. ref: https://doc.rust-lang.org/stable/reference/items/traits.html#object-safety impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.38.0/src/lints/trait_no_longer_object_safe.ron Failed in: trait QueryState in file /tmp/.tmp5vx3MP/swiftide/swiftide-core/src/query.rs:178 ``` <details><summary><i><b>Changelog</b></i></summary><p> ## `swiftide` <blockquote> ## [0.16.0](v0.15.0...v0.16.0) - 2025-01-02 ### New features - [52e341e](52e341e) *(lancedb)* Public method for opening table (#514) - [3254bd3](3254bd3) *(query)* Generic templates with document rendering (#520) ````text Reworks `PromptTemplate` to a more generic `Template`, such that they can also be used elsewhere. This deprecates `PromptTemplate`. As an example, an optional `Template` in the `Simple` answer transformer, which can be used to customize the output of retrieved documents. This has excellent synergy with the metadata changes in #504. ```` - [235780b](235780b) *(query)* Documents as first class citizens (#504) ````text For simple RAG, just adding the content of a retrieved document might be enough. However, in more complex use cases, you might want to add metadata as well, as is or for conditional formatting. For instance, when dealing with large amounts of chunked code, providing the path goes a long way. If generated metadata is good enough, could be useful as well. With this retrieved Documents are treated as first class citizens, including any metadata as well. Additionally, this also paves the way for multi retrieval (and multi modal). ```` - [584695e](584695e) *(query)* Add custom SQL query generation for pgvector search (#478) ````text Adds support for custom retrieval queries with the sqlx query builder for PGVector. Puts down the fundamentals for custom query building for any retriever. --------- ```` - [b55bf0b](b55bf0b) *(redb)* Public database and table definition (#510) - [176378f](176378f) Implement traits for all Arc dynamic dispatch (#513) ````text If you use i.e. a `Persist` or a `NodeCache` outside swiftide as well, and you already have it Arc'ed, now it just works. ```` - [dc9881e](dc9881e) Allow opt out of pipeline debug truncation ### Bug fixes - [2831101](2831101) *(lancedb)* Metadata should be nullable in lancedb (#515) - [c35df55](c35df55) *(macros)* Explicit box dyn cast fixing Rust Analyzer troubles (#523) ### Miscellaneous - [1bbbb0e](1bbbb0e) Clippy **Full Changelog**: 0.15.0...0.16.0 </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/).
Hi @timonv,
I’d like to align quickly on this.
This PR addresses our recent discussion, particularly the idea:
As part of this PR, I’ve implemented a Dynamic Search Strategy to enable dynamic SQL query generation for pgvector search.
Key benefits:
Looking forward to your feedback on how this can be refined further!