-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can NOT upload docx file with embedded images to dataprep-redis service #407
Comments
comment to take issue |
Signed-off-by: lvliang-intel <[email protected]>
another potential related issue opea-project/GenAIExamples#568 |
Hi @lianhao I got a similar "permission denied" error when trying to reproduce your error. I downloaded "test.docx" and tried to upload it via the curl command files:UploadFile(filename='test.docx', size=77397, headers=Headers({'content-disposition': 'form-data; name="files"; filename="test.docx"', 'content-type': 'application/octet-stream'}))
link_list:None
Parsing document ./uploaded_files/test.docx.
INFO: 172.17.0.1:52896 - "POST /v1/dataprep HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 398, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in __call__
raise exc
File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in __call__
await self.app(scope, receive, send_wrapper)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 486, in async_wrapper
raise e
File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 472, in async_wrapper
function_result = await asyncio.create_task( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 200, in ingest_documents
ingest_data_to_redis(
File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 167, in ingest_data_to_redis
content = document_loader(path)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/dataprep/utils.py", line 337, in document_loader
return load_docx(doc_path)
^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/dataprep/utils.py", line 192, in load_docx
docx2txt.process(docx_path, save_path)
File "/home/user/.local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 103, in process
with open(dst_fname, "wb") as dst_f:
^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: './imgs/image1.png' My suspicion is that the current Using CPU. Note: This module is much faster with a GPU.
files:UploadFile(filename='gaudi-3-ai-accelerator-white-paper.pdf', size=2390860, headers=Headers({'content-disposition': 'form-data; name="files"; filename="gaudi-3-ai-accelerator-white-paper.pdf"', 'content-type': 'application/pdf'}))
link_list:None
Parsing document ./uploaded_files/gaudi-3-ai-accelerator-white-paper.pdf.
Done preprocessing. Created 52 chunks of the original pdf
[ ingest chunks ] file name: gaudi-3-ai-accelerator-white-paper.pdf
[ ingest chunks ] Current batch: 0
INFO: 172.17.0.1:42152 - "POST /v1/dataprep HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/home/user/.local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: http://172.25.116.82:6006/
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 398, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in __call__
raise exc
File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in __call__
await self.app(scope, receive, send_wrapper)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 486, in async_wrapper
raise e
File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 472, in async_wrapper
function_result = await asyncio.create_task( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 200, in ingest_documents
ingest_data_to_redis(
File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 176, in ingest_data_to_redis
return ingest_chunks_to_redis(file_name, chunks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 127, in ingest_chunks_to_redis
_, keys = Redis.from_texts_return_keys(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_community/vectorstores/redis/base.py", line 423, in from_texts_return_keys
keys = instance.add_texts(texts, metadatas, keys=keys)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_community/vectorstores/redis/base.py", line 694, in add_texts
embeddings = embeddings or self._embeddings.embed_documents(list(texts))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_community/embeddings/huggingface_hub.py", line 116, in embed_documents
responses = self.client.post(
^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/inference/_client.py", line 304, in post
hf_raise_for_status(response)
File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 367, in hf_raise_for_status
raise HfHubHTTPError(message, response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError:
403 Forbidden: None.
Cannot access content at: http://172.25.116.82:6006/.
If you are trying to create or update content, make sure you have a token with the `write` role. I would recommend trying the ChatQnA example first, where |
@ctao456 your issue could be resolved by passing in the HUGGINGFACEHUB_API_TOKEN environment variable to the container. We should resolve the docx file issue, opea-project/GenAIExamples#568 mentioned another issue with docx file which doesn't contain any picture in it. |
Hi @lianhao thank you. I already tried passing in |
@ctao456 as for the .img permission denied issue, I guess it related to the function https://github.com/opea-project/GenAIComps/blob/main/comps/dataprep/utils.py#L191 where it tries to create a temporary directory where it doesn't have the write permission. I would suggest to create the temporary directory using Python's tempfile module, instead of writing your own mktemdir/delete logic |
Understood. Please feel free to commit a pr. Thanks. |
Unfortunately, I don't have bandwidth to resolve this right now. |
Please assign this bug to me. I have pending PRs to be submitted. |
Fix issue opea-project#407 Signed-off-by: Lianhao Lu <[email protected]>
Fix issue opea-project#407 Signed-off-by: Lianhao Lu <[email protected]>
Fix issue #407 Signed-off-by: Lianhao Lu <[email protected]> Co-authored-by: XuhuiRen <[email protected]>
Completed as PR #561 is merged |
When I try to upload a docx file with embedded images test.docx to the datapre-redis service (built and launch from here ),
I found the following error in curl:
Checking the dataprep-redis service logs found the following errors:
The text was updated successfully, but these errors were encountered: