Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] "Error: End of data reached" returned when using Upset API to upsert documents #3041

Closed
briansoegaard opened this issue Aug 20, 2024 · 6 comments
Labels
question Further information is requested

Comments

@briansoegaard
Copy link

Describe the bug
I want to upsert different files (e.g. PDF) via the API. Some Document Loaders return an error when using them via the Upsert API. The "Text File" Document Loader works fine, but, for example, "Pdf File" and "Docx File" return the following error (content of the JSON object returned by the requests.post(...) call)

{'statusCode': 500, 'success': False, 'message': 'Error: vectorsService.upsertVector - Error: End of data reached (data length = 0, asked index = 4). Corrupted zip ?', 'stack': {}}

I can't override the settings in the Pdf File or Docx File Document Loaders node via the API.

To Reproduce

  1. Just strictly follow the Python code example provided in the documentation: Document Loaders with Upload. Use a basic .txt file. That works.
  2. Reset your vector store or just define another Namespace in the code to start over.
  3. Save the text file as a .pdf file or .docx file (for example), and in the 'form_data' replace the code accordingly, e.g.:
form_data = {
    "files": ('text.pdf', open('text.pdf', 'rb'))
}

  1. When running the code, the error message above ('Error: vectorsService.upsertVector - Error: End of data reached (data length = 0, asked index = 4). Corrupted zip ?') is returned.

If the node is 'configured' in the chatflow via the UI (the text.pdf file is uploaded) the error does not occur - but then that file is upserted every time I call the API, no matter which file I send via the API. I can't override the settings in the Pdf File Document Loader node via the API.

Expected behavior
I expect the "Pdf File", "Docx File", etc. Document Loaders to work like the "Text File" Document Loader where the form_data object overrides the settings in my node.

Screenshots
The simple chatflow to reproduce it:
Upsert API Bug demo

The test Python code:
Upsert API Bug demo - code

Setup

  • Flowise Version: 2.0.5
  • OS: Latest macOS
  • Vector store: Pinecoe
@HenryHengZJ
Copy link
Contributor

question: have you tried other pdf, docx file? Im guessting the file is corrupted because you save the text file into another format.

@HenryHengZJ HenryHengZJ added the question Further information is requested label Aug 23, 2024
@briansoegaard
Copy link
Author

It's definitely not the pdf file - it's a regular PDF. Docs files saved from Word fails too.
The bug is more related to the need of configuring the node in Flowise before it's even possible to use the API.

@briansoegaard
Copy link
Author

Any further thoughts on this bug, @HenryHengZJ ? It looks like it's impossible to upsert pdf and docx files at all via the API. To me, that's a critical bug to anyone who integrates their chatflows in solutions with a knowledgebase that can change over time.

@HenryHengZJ
Copy link
Contributor

HenryHengZJ commented Aug 29, 2024

thats strange, I tried the following and it works:

1.) Have a chatflow with PDF loader:
image

2.) Execute the POST call:
image

@RobertinaRenzi
Copy link

@HenryHengZJ yes it works, but if you ask then something about the document with prediction api, it says that no doc was loaded

@HenryHengZJ
Copy link
Contributor

We've released 2 new APIs that should work fine: https://docs.flowiseai.com/using-flowise/api#document-upsert-refresh-api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants