-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: How to split or seperate nodes Doucment data? #7569
Comments
🤖 Hello, Based on your question, it seems you want to split your large documents into smaller chunks when using the Here is a simple example of how you can do this: from llama_index.schema import Document
def split_document(document: Document, chunk_size: int):
text = document.text
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
return [Document(text=chunk, metadata=document.metadata) for chunk in chunks]
# Assume `documents` is your list of large documents
smaller_documents = []
for document in documents:
smaller_documents.extend(split_document(document, chunk_size=1000))
index = VectorStoreIndex.from_documents(smaller_documents) In this example, Please adjust the Additionally, you might want to consider the I hope this helps! If you have any other questions, feel free to ask. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
You can also just decrease the chunk size in the service context
|
Question Validation
Question
JSONReader = download_loader("JSONReader")
loader = JSONReader()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
when I use this code documents prompt size very big, How to split documents text i or how to seperate nodes?
The text was updated successfully, but these errors were encountered: