Replies: 6 comments
-
hi @drelyea when uploading a document with the same ID, the resulting operation is equivalent to an Upsert. All the previous information is replaced. For instance if you upload a PDF with ID "foo" and then upload a Word doc with the same ID "foo", the content of the PDF is replaced with the content of the Word doc. Same if you upload multiple files under the document ID (a document can be composed of multiple files). Perhaps the "Import" name is confusing, but I can assure it's designed to work this way:
|
Beta Was this translation helpful? Give feedback.
-
Hey @dluc! Thanks for getting back to me. This seems at odds with the behavior I observe, at least with
Is this true for both As an example, I call
When I look at my index in Azure Search Service searching for 'frog', I can see 2 distinct entities with matching
Finally, when I call
And the |
Beta Was this translation helpful? Give feedback.
-
I believe I may have found the answer after looking into BaseOrchestrator - I'm using the If the update operation depends on persisted pipeline records between operations, that would absolutely explain the behavior I see. I'll do a little more digging and see if this is the case. |
Beta Was this translation helpful? Give feedback.
-
thanks for investigating, yes I think you're on the right track. All If you need Serverless Memory to be fully persistent:
If by any chance you're setting Serverless to use queues, I would avoid using SimpleQueues and opt for Azure Queues or RabbitMQ. Or just don't use queues with Serverless, because it's an odd setup :-) |
Beta Was this translation helpful? Give feedback.
-
Appreciate it - I'll look into these options! Looks like I was using a persisted vector storage, but not a persisted content storage. I also verified by an integration test running the two Thanks for the help! |
Beta Was this translation helpful? Give feedback.
-
I am not using serverless. I have an instance of Kernel Memory service hosted as a container app running off of the latest docker container image. When I use the /upload endpoint to store a document, it is 100% duplicating entries in the index instead of upserting. Please advise. |
Beta Was this translation helpful? Give feedback.
-
Following off of #85
I've observed that the
documentId
parameter in theIKernelMemory.Import<*>Async
methods is actually used as the value for a reserved tag__document_id
when importing to a vector storage (Azure AI Search in my case). Since it is not used as the primary key for the entity in the index, I can upload multiple pieces of information with the samedocumentId
which count as unique objects in the index grouped together by tag.I understand the benefit of doing this if you were separately uploading parts of a larger file, but it also means that there is no easy Update mechanism if my intention is to completely replace everything associated with the
documentId
in question.If my source content changes (and potentially has conflicting information with what is already in the index), I would love for deletion of the old information in the index to be part of the SDK. My workaround for this is (in pseudocode):
Is there any benefit to an adding an
Upsert
orCreateOrUpdate
operation natively?Beta Was this translation helpful? Give feedback.
All reactions