-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Embed modes and named vectors #123
Conversation
Look good already, nice! Maybe there's a way to clean up the qdrant configuration. If you always need an EmbeddableType and sometimes a Vector config, you could also have the function signature like: pub fn with_vector(
mut self,
vector: impl Into<VectorConfig>,
) And then implent One requirement I do have is that if needed, qdrant can be utilized fully. What happens if a vector config is provided, but a custom client is build? What I really like about your implementation is that it also sets swiftide up to infer the schema from the pipeline at a later stage. |
…e with_embed_mode function.
I do not understand. |
Exactly, I don't see the need to replace / reinvent full apis from other libraries that already work. Most (if not all) integrations support setting whatever it wraps manually, allowing for more fine grained configuration. So we provide some basic stuff for sensible defaults, otherwise it's RTM on the integrations' documentation. I thought the current setup might go wrong when overriding the client and accidentally also providing vector configuration. |
@timonv I cannot assign a reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work! I have a couple of comments and some suggestions.
I think EmbeddedType could be more apptly named as EmbeddedField
or EmbedField
, as that seems to be more what it is.
Maybe there's also a way to clean up the api for .with_vector
, .with_vector(EmbeddableType::Metadata(metadata_summary::NAME.into()))
, do you have any ideas?
On a more practical note, this looks very close to merging and I would love to get it in for 0.6! Do you have any social handles I can mention on release? 🎉
I do not have any actively updated profiles.
These The quoted line could be cleaned up a bit by providing following impl for impl From<&str> for EmbeddedField {
fn from(value: &str) -> Self {
Self::Metadata(value.into())
}
} The problem is I am not sure how long such Making |
Fully agree, I don't see a direct option here either. I have hunch that when we get further down the line, a nicer model will pop up. I.e. a pipeline could infer a schema. Feel it's ready to merge? 0.6 will probably be released this week. Need to update and rewrite a bunch of documentation on swiftide.rs for it, it was setup a bit in a hurry. |
Yes. |
Multiple then store with don’t work? That’s a bug, could you open an issue?
Sent from [Proton Mail](https://proton.me/mail/home) for iOS
…On Tue, Jul 9, 2024 at 14:42, pwalski ***@***.***(mailto:On Tue, Jul 9, 2024 at 14:42, pwalski <<a href=)> wrote:
> Feel it's ready to merge?
Yes.
There are issues like the fact with_embed_mode() needs to be placed before then(impl Transformer), and multiple then_store_with() do not work, but it is not related directly to this PR.
—
Reply to this email directly, [view it on GitHub](#123 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AAAMBXLJFNPG4JLLYZMO2S3ZLPLDBAVCNFSM6AAAAABKJUQL5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXGYYDAOJUGY).
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
The issue I remembered was really only a problem with ignoring some of the |
## 🤖 New release * `swiftide`: 0.5.0 -> 0.6.0 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.6.0](https://github.com/bosun-ai/swiftide/releases/tag/0.6.0) - 2024-07-12 ### Features - [70ea268](70ea268) *(prompts)* Add prompts as first class citizens ([#145](#145)) - [699cfe4](699cfe4) *(uncategorized)* Embed modes and named vectors ([#123](#123)) ### Bug Fixes - [9334934](9334934) *(chunkcode)* Use correct chunksizes ([#122](#122)) - [7357fea](7357fea) *(deps)* Update rust crate spider to v1.98.6 ([#119](#119)) - [353cd9e](353cd9e) *(qdrant)* Upgrade and better defaults ([#118](#118)) - [b53636c](b53636c) *(uncategorized)* Inability to store only some of `EmbeddedField`s ([#139](#139)) ### Documentation - [5691ac9](5691ac9) *(readme)* Add preproduction warning - [4c40e27](4c40e27) *(readme)* Add back coverage badge - [3e447fe](3e447fe) *(readme)* Link to CONTRIBUTING ([#114](#114)) - [37af322](37af322) *(rustdocs)* Rewrite the initial landing page ([#149](#149)) - [7686c2d](7686c2d) *(uncategorized)* Templated prompts are now a major feature ### Performance - [ea8f823](ea8f823) *(uncategorized)* Improve local build performance and crate cleanup ([#148](#148)) ### Miscellaneous Tasks - [364e13d](364e13d) *(swiftide)* Loosen up dependencies ([#140](#140)) - [d2a9ea1](d2a9ea1) *(uncategorized)* Enable clippy pedantic ([#132](#132)) - [51c114c](51c114c) *(uncategorized)* Various tooling & community improvements ([#131](#131)) - [84dd65d](84dd65d) *(uncategorized)* Rename all mentions of ingest to index ([#130](#130)) **Full Changelog**: 0.1.0...0.6.0 <!-- generated by git-cliff --> </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/MarcoIeni/release-plz/). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Timon Vonk <[email protected]>
Added named vector support to qdrant. A pipeline can now have its embed mode configured, either per field, chunk and metadata combined (default) or both. Vectors need to be configured on the qdrant client side.
See
examples/store_multiple_vectors.rs
for an example.Shoutout to @pwalski for the contribution. Closes #62.