Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token indices sequence length is longer than the specified maximum sequence length for this model (3793 > 1024). Running this sequence through the model will result in indexing errors #987

Closed
claysauruswrecks opened this issue Mar 30, 2023 · 12 comments

Comments

@claysauruswrecks
Copy link

Initially I thought the error was due to the loader not splitting chunks, but I'm still getting the mentioned error after adding a splitter. Maybe it's coming from OpenAI's API?

Bugfix branch: https://github.com/claysauruswrecks/llama-hub/tree/bugfix/github-repo-splitter

import pickle
import os
import logging
from llama_index import GPTSimpleVectorIndex

assert (
    os.getenv("OPENAI_API_KEY") is not None
), "Please set the OPENAI_API_KEY environment variable."

from llama_index import download_loader

logging.basicConfig(level=logging.DEBUG)

LLAMA_HUB_CONTENTS_URL = "https://raw.githubusercontent.com/claysauruswrecks/llama-hub/bugfix/github-repo-splitter"
LOADER_HUB_PATH = "/loader_hub"
LOADER_HUB_URL = LLAMA_HUB_CONTENTS_URL + LOADER_HUB_PATH

download_loader(
    "GithubRepositoryReader", loader_hub_url=LOADER_HUB_URL, refresh_cache=True
)

from llama_index.readers.llamahub_modules.github_repo import (
    GithubClient,
    GithubRepositoryReader,
)

docs = None

if os.path.exists("docs.pkl"):
    with open("docs.pkl", "rb") as f:
        docs = pickle.load(f)

if docs is None:
    github_client = GithubClient(os.getenv("GITHUB_TOKEN"))
    loader = GithubRepositoryReader(
        github_client,
        owner="jerryjliu",
        repo="llama_index",
        filter_directories=(
            ["gpt_index", "docs"],
            GithubRepositoryReader.FilterType.INCLUDE,
        ),
        filter_file_extensions=([".py"], GithubRepositoryReader.FilterType.INCLUDE),
        verbose=True,
        concurrent_requests=10,
    )

    docs = loader.load_data(commit_sha="1b739e1fcd525f73af4a7131dd52c7750e9ca247")

    with open("docs.pkl", "wb") as f:
        pickle.dump(docs, f)

index = GPTSimpleVectorIndex.from_documents(docs)

index.query("Explain each LlamaIndex class?")
@claysauruswrecks
Copy link
Author

It appears I might be able to address this by using the PromptHelper to split after the loader's execution.

From Kapa.ai


Here's an example of how to set up a PromptHelper with custom parameters:

from llama_index import PromptHelper

# Set maximum input size
max_input_size = 1024
# Set number of output tokens
num_output = 256
# Set maximum chunk overlap
max_chunk_overlap = 20

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)


Then, you can create a ServiceContext with the PromptHelper:

from llama_index import ServiceContext, LLMPredictor
from langchain import OpenAI

# Define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)


Finally, you can build your index with the service_context:

from llama_index import GPTSimpleVectorIndex
from your_data_loading_module import documents

index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)


By using the PromptHelper with the appropriate parameters, you can ensure that the input text does not exceed the model's maximum token limit and avoid the indexing errors.

For more information, refer to the PromptHelper documentation (https://gpt-index.readthedocs.io/en/latest/reference/service_context/prompt_helper.html).

@jerryjliu
Copy link
Collaborator

@claysauruswrecks instead of setting the prompt helper, one thing you can try to do is set the chunk_size_limit in the ServiceContext.

Just do

# NOTE: set a chunk size limit to < 1024 tokens 
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

does that work for you?

@claysauruswrecks
Copy link
Author

@jerryjliu - Excellent, yes. I also now see the notebook examples. I will open a PR to clarify in the docs.

@karottc
Copy link

karottc commented Apr 4, 2023

@jerryjliu

However, after setting it up like this, the response to response = index.query("query something") has also become shorter, losing information.

@jerryjliu
Copy link
Collaborator

by default similarity_top_k=1, you can increase similarity_top_k in index.query call

@bisonliao
Copy link

Is it possible to process documents with 2000 text files each has 5000 words?
I want use LLaMA-index to process my website doc, then create a smart assistant.

@pramitchoudhary
Copy link

# NOTE: set a chunk size limit to < 1024 tokens 
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)

Any concern about not exposing other params of Prompt Helper via ServiceContext.from_defaults? especially max_chunk_overlap

@Shane-Khong
Copy link

Shane-Khong commented May 13, 2023

# NOTE: set a chunk size limit to < 1024 tokens 
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)

Any concern about not exposing other params of Prompt Helper via ServiceContext.from_defaults? especially max_chunk_overlap

I have a similar question, so hopefully not repeating here: does [directly inputting chunk_size_limit=512 parameter into service_context] do the same thing as [setting chunk_size_limit=512 in prompt_helper, and then inputting prompt_helper as paramater into service_context]?

@Shane-Khong
Copy link

Shane-Khong commented May 13, 2023

Also, will setting chunk_size_limit = 512 result in a better outcome than chunk_size_limit = 2000 when summarising 280 page document?

@dxiaosa
Copy link

dxiaosa commented May 27, 2023

@claysauruswrecks instead of setting the prompt helper, one thing you can try to do is set the chunk_size_limit in the ServiceContext.

Just do

# NOTE: set a chunk size limit to < 1024 tokens 
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

does that work for you?

Hello, "text-davinci-003" model can get 4,097 tokens at most, I just wonder why we still have the problem "Token indices sequence length is longer than the specified maximum sequence length for this model (2503 > 1024)."?

@Majidbadal
Copy link

This issue is about max output tokens I believe and not the input tokens

@dosubot
Copy link

dosubot bot commented Sep 25, 2023

Hi, @claysauruswrecks! I'm Dosu, and I'm here to help the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you raised is related to a token indices sequence length being longer than the specified maximum sequence length for a model. You suspect that the error may be coming from OpenAI's API and have provided a bugfix branch for reference. There have been discussions about using PromptHelper or setting the chunk_size_limit in the ServiceContext to address the issue. Some users have also raised questions about the impact on response length and the possibility of processing large documents.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LlamaIndex repository!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 25, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 2, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants