-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Knowledge Graph Has a error when Parsing [expected string or buffer] #1859
Labels
question
Further information is requested
Comments
Same issue. Occurs when uploading any word documents include pics. |
6 tasks
KevinHuSh
pushed a commit
that referenced
this issue
Aug 8, 2024
…ted using Knowledge Graph.#1859 (#1865) ### What problem does this PR solve? Fix a "TypeError: expected string or buffer bug" in docx files extracted using Knowledge Graph. #1859 ``` Traceback (most recent call last): File "//Users/XXX/ragflow/rag/svr/task_executor.py", line 149, in build cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/ragflow/rag/app/knowledge_graph.py", line 18, in chunk chunks = build_knowlege_graph_chunks(tenant_id, sections, callback, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/ragflow/graphrag/index.py", line 87, in build_knowlege_graph_chunks tkn_cnt = num_tokens_from_string(chunks[i]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/github/ragflow/rag/utils/__init__.py", line 79, in num_tokens_from_string num_tokens = len(encoder.encode(string)) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/tiktoken/core.py", line 116, in encode if match := _special_token_regex(disallowed_special).search(text): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: expected string or buffer ``` This type is `Dict` <img width="1689" alt="Pasted Graphic 3" src="https://github.com/user-attachments/assets/e5ba5c45-df1d-4697-98c9-14365c839f20"> The correct type should be ` Str` <img width="1725" alt="Pasted Graphic 2" src="https://github.com/user-attachments/assets/e54d5e60-4ce4-4180-b394-24e485013534"> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):
Halfknow
pushed a commit
to Halfknow/ragflow
that referenced
this issue
Nov 11, 2024
…ted using Knowledge Graph.infiniflow#1859 (infiniflow#1865) ### What problem does this PR solve? Fix a "TypeError: expected string or buffer bug" in docx files extracted using Knowledge Graph. infiniflow#1859 ``` Traceback (most recent call last): File "//Users/XXX/ragflow/rag/svr/task_executor.py", line 149, in build cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/ragflow/rag/app/knowledge_graph.py", line 18, in chunk chunks = build_knowlege_graph_chunks(tenant_id, sections, callback, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/ragflow/graphrag/index.py", line 87, in build_knowlege_graph_chunks tkn_cnt = num_tokens_from_string(chunks[i]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/github/ragflow/rag/utils/__init__.py", line 79, in num_tokens_from_string num_tokens = len(encoder.encode(string)) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/tiktoken/core.py", line 116, in encode if match := _special_token_regex(disallowed_special).search(text): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: expected string or buffer ``` This type is `Dict` <img width="1689" alt="Pasted Graphic 3" src="https://github.com/user-attachments/assets/e5ba5c45-df1d-4697-98c9-14365c839f20"> The correct type should be ` Str` <img width="1725" alt="Pasted Graphic 2" src="https://github.com/user-attachments/assets/e54d5e60-4ce4-4180-b394-24e485013534"> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe your problem
Traceback (most recent call last):
File "/ragflow/rag/svr/task_executor.py", line 149, in build
cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"],
File "/ragflow/rag/app/knowledge_graph.py", line 18, in chunk
chunks = build_knowlege_graph_chunks(tenant_id, sections, callback,
File "/ragflow/graphrag/index.py", line 87, in build_knowlege_graph_chunks
tkn_cnt = num_tokens_from_string(chunks[i])
File "/ragflow/rag/utils/init.py", line 79, in num_tokens_from_string
num_tokens = len(encoder.encode(string))
File "/usr/local/lib/python3.10/dist-packages/tiktoken/core.py", line 116, in encode
if match := _special_token_regex(disallowed_special).search(text):
TypeError: expected string or buffer
The text was updated successfully, but these errors were encountered: