Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix a "TypeError: expected string or buffer bug" in docx files extrac…
…ted using Knowledge Graph.infiniflow#1859 (infiniflow#1865) ### What problem does this PR solve? Fix a "TypeError: expected string or buffer bug" in docx files extracted using Knowledge Graph. infiniflow#1859 ``` Traceback (most recent call last): File "//Users/XXX/ragflow/rag/svr/task_executor.py", line 149, in build cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/ragflow/rag/app/knowledge_graph.py", line 18, in chunk chunks = build_knowlege_graph_chunks(tenant_id, sections, callback, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/ragflow/graphrag/index.py", line 87, in build_knowlege_graph_chunks tkn_cnt = num_tokens_from_string(chunks[i]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/github/ragflow/rag/utils/__init__.py", line 79, in num_tokens_from_string num_tokens = len(encoder.encode(string)) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/XXX/tiktoken/core.py", line 116, in encode if match := _special_token_regex(disallowed_special).search(text): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: expected string or buffer ``` This type is `Dict` <img width="1689" alt="Pasted Graphic 3" src="https://github.com/user-attachments/assets/e5ba5c45-df1d-4697-98c9-14365c839f20"> The correct type should be ` Str` <img width="1725" alt="Pasted Graphic 2" src="https://github.com/user-attachments/assets/e54d5e60-4ce4-4180-b394-24e485013534"> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):
- Loading branch information