DOC: Could load GGUF models from https #14514

elliotthwang · 2023-12-11T02:45:28Z

Issue with current documentation:

No response

Idea or request for content:

No response

dosubot · 2023-12-11T02:45:38Z

🤖

Hello! It seems like you're experiencing an issue with the current documentation for LangChain. I'd be happy to help you with that. Could you please provide more details about the problem you're encountering when trying to load GGUF models from HTTPS? Additionally, if you have any specific ideas or requests for content, feel free to share them so that I can assist you more effectively.

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

elliotthwang · 2023-12-11T20:58:40Z

@dosu-bot
It is more convenient to load huggingface gguf models, such as
model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"

dosubot · 2023-12-11T21:04:48Z

🤖

Based on the provided context, the LangChain framework does support loading models from HuggingFace Hub and directly from a URL, including HTTPS URLs. However, the context does not specify whether the models can be loaded via HTTPS from HuggingFace Hub.

To load models from HuggingFace Hub, you can use the HuggingFaceHub class. Here is an example:

from langchain.llms import HuggingFaceHub
hf = HuggingFaceHub(repo_id="gpt2", huggingfacehub_api_token="my-api-key")

In this example, repo_id is the model name to use, and huggingfacehub_api_token is the API token for HuggingFace Hub. If the huggingfacehub_api_token is not provided, it will try to get it from the environment variable HUGGINGFACEHUB_API_TOKEN.

To load models directly from a URL, you can use the UnstructuredURLLoader class. Here is the supporting evidence from the provided context:

from langchain.document_loaders.url import UnstructuredURLLoader

This class can be used to load documents directly from a URL.

However, it's not clear from the context whether you can load a model from HuggingFace Hub via HTTPS using the UnstructuredURLLoader class. The huggingface_hub package handles the underlying details of how the models are loaded, which is not specified in the provided context.

If you're still encountering issues, could you please provide more details about the specific issues you're facing? This could include error messages, the specific steps you're taking, and any relevant code snippets. This will help me provide more targeted assistance.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

elliotthwang · 2023-12-11T21:14:15Z

@dosu-bot
huggingface gguf models, such as
"https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
Your HuggingFaceHub class.can not reach to load it.

dosubot · 2023-12-11T21:20:42Z

🤖

Based on the information you've provided, it seems like you're trying to load a HuggingFace GGUF model using the HuggingFaceHub class in the LangChain framework. Here's how you can do it:

repo_id = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"

llm = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)
llm_chain = LLMChain(prompt=prompt, llm=llm)
print(llm_chain.run(question))

Please note that the model_kwargs parameters such as temperature and max_length might need to be adjusted based on the specific requirements of your model. You can find more examples in the HuggingFaceHub integration notebook.

If you're encountering an error, it could be due to a few reasons:

The huggingface_hub python package is not installed. You can install it with pip install huggingface_hub.
The task specified is not one of the valid tasks ("text2text-generation", "text-generation", "summarization").
There is an error in the response from the HuggingFace Hub's inference endpoint.
The task is not one of the valid tasks after calling the HuggingFace Hub's inference endpoint.

These errors are raised in the validate_environment and _call methods in the HuggingFaceHub class.

If none of these solutions work, could you please provide the exact error message you're encountering? This will help me better understand the issue and provide a more accurate solution.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

elliotthwang · 2023-12-11T22:16:22Z

Thanks!
It can be done accordingly.

elliotthwang · 2023-12-12T03:42:06Z

@dosu-bot
Following the suggestions as folllows：

from langchain.llms import HuggingFaceHub
repo_id = HuggingFaceHub(repo_id="https://huggingface.co/TheBloke/OpenOrca-Zephyr-7B-GGUF/resolve/main/openorca-zephyr-7b.Q5_K_M.gguf", huggingfacehub_api_token="mytoken")
llm = HuggingFaceHub(
repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 128})

Encounter ValidationError as folllows：

ValidationError Traceback (most recent call last)
in <cell line: 2>()
1 from langchain.llms import HuggingFaceHub
----> 2 repo_id = HuggingFaceHub(repo_id="https://huggingface.co/TheBloke/OpenOrca-Zephyr-7B-GGUF/resolve/main/openorca-zephyr-7b.Q5_K_M.gguf", huggingfacehub_api_token="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")
3 llm = HuggingFaceHub(
4 repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 128})

1 frames
/usr/local/lib/python3.10/dist-packages/pydantic/main.cpython-310-x86_64-linux-gnu.so in pydantic.main.BaseModel.init()

ValidationError: 1 validation error for HuggingFaceHub
root
Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'https://huggingface.co/TheBloke/OpenOrca-Zephyr-7B-GGUF/resolve/main/openorca-zephyr-7b.Q5_K_M.gguf'. Use repo_type argument if needed. (type=value_error.hfvalidation)

It seems URLs are invalid to load moldels.

dosubot · 2023-12-12T03:48:45Z

🤖

Based on the error message you've provided, it seems like the repo_id parameter for the HuggingFaceHub class is expecting a repository ID in the form of 'repo_name' or 'namespace/repo_name', not a full URL.

You can try using just the namespace and repository name as the repo_id:

from langchain.llms import HuggingFaceHub
repo_id = "TheBloke/OpenOrca-Zephyr-7B-GGUF"
llm = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 128}, huggingfacehub_api_token="mytoken"
)

This should correctly load the model from the HuggingFace Hub. If you're still encountering issues, please provide any new error messages or further details about the problem.

The repo_id parameter in the HuggingFaceHub class in the LangChain framework is used to specify the model name to use. It is passed to the InferenceApi from the huggingface_hub package to specify which model to use for inference. You can check if a model exists on the HuggingFace Model Hub by visiting the URL https://huggingface.co/models?search=<model_name>, replacing <model_name> with the repo_id you're trying to use. If the model does not exist, you will need to use a different repo_id.

If you're sure that the repo_id is correct and the model exists on the HuggingFace Model Hub, then the issue might be with the huggingfacehub_api_token. Make sure that the huggingfacehub_api_token is correctly set in your environment or passed as a named parameter to the HuggingFaceHub constructor.

For more details, you can refer to the source code and this similar issue that was resolved in the past.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

elliotthwang · 2023-12-12T03:59:03Z

@dosu-bot

It seems the prio one is invalid：
🤖

Based on the information you've provided, it seems like you're trying to load a HuggingFace GGUF model using the HuggingFaceHub class in the LangChain framework. Here's how you can do it:

repo_id = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"

llm = HuggingFaceHub(
repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)
llm_chain = LLMChain(prompt=prompt, llm=llm)
print(llm_chain.run(question))

dosubot · 2023-12-12T04:05:39Z

🤖

Based on the error message you're receiving, it seems that the repo_id parameter for the HuggingFaceHub class is expecting a repository ID in the form of 'repo_name' or 'namespace/repo_name', not a full URL.

The repo_id should be the identifier of the model on the HuggingFace Model Hub. It's usually in the format 'namespace/repo_name'. For example, if you're trying to load the model at 'https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML', the repo_id would be 'TheBloke/Llama-2-13B-chat-GGML'.

Here's how you can modify your code:

from langchain.llms import HuggingFaceHub
repo_id = "TheBloke/Llama-2-13B-chat-GGML"
llm = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)

This should correctly load the model from the HuggingFace Hub using the HuggingFaceHub class in the LangChain framework.

Please note that the HuggingFaceHub class in LangChain expects the repo_id to be in the format 'namespace/repo_name'. This is because the repo_id is passed directly to the InferenceApi from the huggingface_hub package, which requires the 'namespace/repo_name' format. If only 'repo_name' is provided, the InferenceApi may not be able to locate the correct model repository.

You can find more details about this in the LangChain repository.

I hope this helps! If you have any other questions, feel free to ask.

Sources

libs/langchain/langchain/llms/huggingface_hub.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

elliotthwang · 2023-12-12T04:09:27Z

@dosu-bot
Could load GGUF models from https ?
Like llamaindex's
model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"

dosubot · 2023-12-12T04:16:13Z

🤖

Based on the provided context, it seems that the LangChain framework does support loading models directly from an HTTPS URL using the UnstructuredURLLoader class. However, the HuggingFaceHub class, which you're currently using, expects a repo_id in the form of 'repo_name' or 'namespace/repo_name', not a direct URL.

To load a GGUF model from an HTTPS URL, you might want to use the UnstructuredURLLoader class instead. Unfortunately, the exact usage of this class isn't provided in the context. However, based on the usage of similar classes in the LangChain framework, you might be able to use it like this:

from langchain.document_loaders.url import UnstructuredURLLoader

loader = UnstructuredURLLoader(
    "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
)
model = loader.load()

Please note that this is a hypothetical usage and might not work as expected. I recommend checking the LangChain documentation or source code for more information on how to use the UnstructuredURLLoader class.

If you're still having trouble, could you provide more information about the error message you're receiving and the code you're using to load the model? This could help me provide a more accurate solution.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

elliotthwang · 2023-12-12T04:28:48Z

@dosu-bot

from langchain.document_loaders.url import UnstructuredURLLoader

loader = UnstructuredURLLoader(
"https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
)
model = loader.load()

Still encounter ERRORS：

ERROR:langchain_community.document_loaders.url:Error fetching or processing 3, exception: Invalid URL '3': No scheme supplied. Perhaps you meant https://3?
ERROR:langchain_community.document_loaders.url:Error fetching or processing B, exception: Invalid URL 'B': No scheme supplied. Perhaps you meant https://b/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing c, exception: Invalid URL 'c': No scheme supplied. Perhaps you meant https://c/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing h, exception: Invalid URL 'h': No scheme supplied. Perhaps you meant https://h/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing a, exception: Invalid URL 'a': No scheme supplied. Perhaps you meant https://a/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing t, exception: Invalid URL 't': No scheme supplied. Perhaps you meant https://t/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing G, exception: Invalid URL 'G': No scheme supplied. Perhaps you meant https://g/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing G, exception: Invalid URL 'G': No scheme supplied. Perhaps you meant https://g/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing M, exception: Invalid URL 'M': No scheme supplied. Perhaps you meant https://m/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing L, exception: Invalid URL 'L': No scheme supplied. Perhaps you meant https://l/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing /, exception: Invalid URL '/': No scheme supplied. Perhaps you meant https:///?
ERROR:langchain_community.document_loaders.url:Error fetching or processing r, exception: Invalid URL 'r': No scheme supplied. Perhaps you meant https://r/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing e, exception: Invalid URL 'e': No scheme supplied. Perhaps you meant https://e/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing s, exception: Invalid URL 's': No scheme supplied. Perhaps you meant https://s/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing o, exception: Invalid URL 'o': No scheme supplied. Perhaps you meant https://o/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing l, exception: Invalid URL 'l': No scheme supplied. Perhaps you meant https://l/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing v, exception: Invalid URL 'v': No scheme supplied. Perhaps you meant https://v/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing e, exception: Invalid URL 'e': No scheme supplied. Perhaps you meant https://e/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing /, exception: Invalid URL '/': No scheme supplied. Perhaps you meant https:///?
ERROR:langchain_community.document_loaders.url:Error fetching or processing m, exception: Invalid URL 'm': No scheme supplied. Perhaps you meant https://m/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing a, exception: Invalid URL 'a': No scheme supplied. Perhaps you meant https://a/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing i, exception: Invalid URL 'i': No scheme supplied. Perhaps you meant https://i/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing n, exception: Invalid URL 'n': No scheme supplied. Perhaps you meant https://n/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing /, exception: Invalid URL '/': No scheme supplied. Perhaps you meant https:///?
ERROR:langchain_community.document_loaders.url:Error fetching or processing l, exception: Invalid URL 'l': No scheme supplied. Perhaps you meant https://l/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing l, exception: Invalid URL 'l': No scheme supplied. Perhaps you meant https://l/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing a, exception: Invalid URL 'a': No scheme supplied. Perhaps you meant https://a/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing m, exception: Invalid URL 'm': No scheme supplied. Perhaps you meant https://m/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing a, exception: Invalid URL 'a': No scheme supplied. Perhaps you meant https://a/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 2, exception: Invalid URL '2': No scheme supplied. Perhaps you meant https://2?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 1, exception: Invalid URL '1': No scheme supplied. Perhaps you meant https://1?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 3, exception: Invalid URL '3': No scheme supplied. Perhaps you meant https://3?
ERROR:langchain_community.document_loaders.url:Error fetching or processing b, exception: Invalid URL 'b': No scheme supplied. Perhaps you meant https://b/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing c, exception: Invalid URL 'c': No scheme supplied. Perhaps you meant https://c/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing h, exception: Invalid URL 'h': No scheme supplied. Perhaps you meant https://h/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing a, exception: Invalid URL 'a': No scheme supplied. Perhaps you meant https://a/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing t, exception: Invalid URL 't': No scheme supplied. Perhaps you meant https://t/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing ., exception: Invalid URL '.': No scheme supplied. Perhaps you meant https://.?
ERROR:langchain_community.document_loaders.url:Error fetching or processing g, exception: Invalid URL 'g': No scheme supplied. Perhaps you meant https://g/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing g, exception: Invalid URL 'g': No scheme supplied. Perhaps you meant https://g/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing m, exception: Invalid URL 'm': No scheme supplied. Perhaps you meant https://m/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing l, exception: Invalid URL 'l': No scheme supplied. Perhaps you meant https://l/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing v, exception: Invalid URL 'v': No scheme supplied. Perhaps you meant https://v/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 3, exception: Invalid URL '3': No scheme supplied. Perhaps you meant https://3?
ERROR:langchain_community.document_loaders.url:Error fetching or processing ., exception: Invalid URL '.': No scheme supplied. Perhaps you meant https://.?
ERROR:langchain_community.document_loaders.url:Error fetching or processing q, exception: Invalid URL 'q': No scheme supplied. Perhaps you meant https://q/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 4, exception: Invalid URL '4': No scheme supplied. Perhaps you meant https://4?
ERROR:langchain_community.document_loaders.url:Error fetching or processing , exception: Invalid URL '': No scheme supplied. Perhaps you meant https://_/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 0, exception: Invalid URL '0': No scheme supplied. Perhaps you meant https://0?
ERROR:langchain_community.document_loaders.url:Error fetching or processing ., exception: Invalid URL '.': No scheme supplied. Perhaps you meant https://.?
ERROR:langchain_community.document_loaders.url:Error fetching or processing b, exception: Invalid URL 'b': No scheme supplied. Perhaps you meant https://b/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing i, exception: Invalid URL 'i': No scheme supplied. Perhaps you meant https://i/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing n, exception: Invalid URL 'n': No scheme supplied. Perhaps you meant https://n/?

dosubot · 2023-12-12T04:34:18Z

🤖

Based on the error messages you've shared, it seems like the URL is being incorrectly parsed as a sequence of characters, which is why each character is being treated as a separate URL. This issue is similar to a previously solved issue in the LangChain repository, titled WebBaseLoader interprets incorrectly the web_path parameter. The problem was due to a regression in the latest update to WebBaseLoader, where a string passed to the web_path parameter was incorrectly interpreted and parsed as a sequence.

To correctly load a GGUF model from an HTTPS URL using the UnstructuredURLLoader class in the LangChain framework, you can follow the example provided in the class docstring:

from langchain.document_loaders import UnstructuredURLLoader

loader = UnstructuredURLLoader(
    urls=["https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"], 
    mode="elements", 
    strategy="fast",
)
docs = loader.load()

In this example, the mode parameter is set to "elements", which means the unstructured library will split the document into elements such as Title and NarrativeText. The strategy parameter is set to "fast", which is one of the unstructured settings you can apply.

If the problem persists, it might be due to the version of the unstructured library you're using. The UnstructuredURLLoader class checks the version of the unstructured library and logs a warning if you're using an old version that doesn't support the headers parameter. To fix this, you can update the unstructured library to the latest version using pip:

pip install --upgrade unstructured

Please try this solution and let me know if it resolves your issue. If not, please provide more details about the error and I'll be glad to assist further.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

elliotthwang · 2023-12-12T04:43:50Z

@dosu-bot
!pip install --upgrade unstructured
Successfully installed emoji-2.9.0 filetype-1.2.0 langdetect-1.0.9 python-iso639-2023.12.11 python-magic-0.4.27 rapidfuzz-3.5.2 unstructured-0.11.2

from langchain.document_loaders.url import UnstructuredURLLoader

loader = UnstructuredURLLoader(
"https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
)
model = loader.load()

Still encounter ERRORS：

ERROR:langchain_community.document_loaders.url:Error fetching or processing 3, exception: Invalid URL '3': No scheme supplied. Perhaps you meant https://3?
ERROR:langchain_community.document_loaders.url:Error fetching or processing B, exception: Invalid URL 'B': No scheme supplied. Perhaps you meant https://b/?

loader = UnstructuredURLLoader(
"https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
)
model = loader.load()

Still encounter ERRORS：

ERROR:langchain_community.document_loaders.url:Error fetching or processing 3, exception: Invalid URL '3': No scheme supplied. Perhaps you meant https://3?
ERROR:langchain_community.document_loaders.url:Error fetching or processing B, exception: Invalid URL 'B': No scheme supplied. Perhaps you meant https://b/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?

dosubot · 2023-12-12T04:49:26Z

🤖

Based on the error messages you've shared, it seems like the URL you're trying to load is being incorrectly parsed as a sequence of characters. This issue has been reported before in the LangChain repository, specifically in the issue titled WebBaseLoader interprets incorrectly the web_path parameter. The problem was identified as a regression in the latest update to WebBaseLoader, where a string passed to the web_path parameter was incorrectly interpreted and parsed as a sequence.

To resolve this issue, you might need to modify the condition in the code to correctly handle strings passed to the web_path parameter. However, before you proceed with this, I would recommend checking a few things:

Ensure the URL you're trying to load is correctly formatted and accessible.
Make sure you're using the latest version of the unstructured package. If the version is less than 0.5.7 for HTML content or less than 0.5.13 for non-HTML content, the headers are ignored, which could potentially lead to issues if the server requires certain headers for the request.
If the server requires certain headers for the request, include them in the unstructured_kwargs when initializing the UnstructuredURLLoader class.

If the issue persists, you might want to catch the exception and print it out to get more details about the error. Here's an example of how you can do this:

for url in urls:
    try:
        # existing code...
    except Exception as e:
        print(e)
        if self.continue_on_failure:
            logger.error(f"Error fetching or processing {url}, exception: {e}")
            continue
        else:
            raise e

This will print out the exception message, which should give you more information about the error. If you could share the exact version of the unstructured package you're using, it would be helpful in further diagnosing the issue.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

elliotthwang · 2023-12-12T11:32:47Z

@dosu-bot
The following codes will run without error messages：
from langchain.document_loaders.url import UnstructuredURLLoader

loader = UnstructuredURLLoader(
urls=["https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"],
mode="elements",
strategy="fast",
)
model = loader.load()

However let OOM.

dosubot · 2023-12-12T11:39:00Z

🤖

Hello @dosubot[bot]!

I'm Dosu, a friendly bot here to lend a hand with bugs, answer your queries, and help you become a better contributor to the LangChain project. While we wait for a human maintainer, feel free to poke me with any problems you're encountering. I'm here to make your coding journey smoother. Let's work together on this!

Based on the error messages you're receiving, it seems like the URL you're passing to the UnstructuredURLLoader is being incorrectly parsed as a sequence of characters. This issue has been previously reported in the LangChain repository, specifically in the issue titled WebBaseLoader interprets incorrectly the web_path parameter. The problem was identified as a regression in the latest update to WebBaseLoader, where a string passed to the web_path parameter was incorrectly interpreted and parsed as a sequence.

To resolve this issue, you need to ensure that the URL you're passing to the UnstructuredURLLoader is a properly formatted string. If you're generating these URLs programmatically, consider using a library like urllib to ensure they're correctly formatted. Here's an example:

from urllib.parse import urlunparse

# Define the components of your URL
scheme = "https"
netloc = "huggingface.co"
path = "/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
params = ""
query = ""
fragment = ""

# Use urlunparse to generate a properly formatted URL
url = urlunparse((scheme, netloc, path, params, query, fragment))

# Now you can pass this URL to the UnstructuredURLLoader
loader = UnstructuredURLLoader(url)
model = loader.load()

This should prevent the UnstructuredURLLoader from interpreting the URL as a sequence of characters. If you continue to encounter issues, please provide more details about your environment and the exact steps you're taking that lead to this error.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

elliotthwang · 2023-12-12T11:48:52Z

@dosu-bot
Got errors following your instructions：
from urllib.parse import urlunparse
from langchain.document_loaders import UnstructuredURLLoader

Define the components of your URL

scheme = "https"
netloc = "huggingface.co"
path = "/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
params = ""
query = ""
fragment = ""

Use urlunparse to generate a properly formatted URL

url = urlunparse((scheme, netloc, path, params, query, fragment))

Now you can pass this URL to the UnstructuredURLLoader

loader = UnstructuredURLLoader(url)
model = loader.load()

Encounter errors：
ERROR:langchain_community.document_loaders.url:Error fetching or processing 3, exception: Invalid URL '3': No scheme supplied. Perhaps you meant https://3?
ERROR:langchain_community.document_loaders.url:Error fetching or processing B, exception: Invalid URL 'B': No scheme supplied. Perhaps you meant https://b/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing c, exception: Invalid URL 'c': No scheme supplied. Perhaps you meant https://c/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing h, exception: Invalid URL 'h': No scheme supplied. Perhaps you meant https://h/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing a, exception: Invalid URL 'a': No scheme supplied. Perhaps you meant https://a/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing t, exception: Invalid URL 't': No scheme supplied. Perhaps you meant https://t/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing G, exception: Invalid URL 'G': No scheme supplied. Perhaps you meant https://g/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing G, exception: Invalid URL 'G': No scheme supplied. Perhaps you meant https://g/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing M, exception: Invalid URL 'M': No scheme supplied. Perhaps you meant https://m/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing L, exception: Invalid URL 'L': No scheme supplied. Perhaps you meant https://l/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing /, exception: Invalid URL '/': No scheme supplied. Perhaps you meant https:///?
ERROR:langchain_community.document_loaders.url:Error fetching or processing r, exception: Invalid URL 'r': No scheme supplied. Perhaps you meant https://r/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing e, exception: Invalid URL 'e': No scheme supplied. Perhaps you meant https://e/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing s, exception: Invalid URL 's': No scheme supplied. Perhaps you meant https://s/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing o, exception: Invalid URL 'o': No scheme supplied. Perhaps you meant https://o/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing l, exception: Invalid URL 'l': No scheme supplied. Perhaps you meant https://l/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing v, exception: Invalid URL 'v': No scheme supplied. Perhaps you meant https://v/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing e, exception: Invalid URL 'e': No scheme supplied. Perhaps you meant https://e/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing /, exception: Invalid URL '/': No scheme supplied. Perhaps you meant https:///?
ERROR:langchain_community.document_loaders.url:Error fetching or processing m, exception: Invalid URL 'm': No scheme supplied. Perhaps you meant https://m/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing a, exception: Invalid URL 'a': No scheme supplied. Perhaps you meant https://a/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing i, exception: Invalid URL 'i': No scheme supplied. Perhaps you meant https://i/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing n, exception: Invalid URL 'n': No scheme supplied. Perhaps you meant https://n/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing /, exception: Invalid URL '/': No scheme supplied. Perhaps you meant https:///?
ERROR:langchain_community.document_loaders.url:Error fetching or processing l, exception: Invalid URL 'l': No scheme supplied. Perhaps you meant https://l/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing l, exception: Invalid URL 'l': No scheme supplied. Perhaps you meant https://l/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing a, exception: Invalid URL 'a': No scheme supplied. Perhaps you meant https://a/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing m, exception: Invalid URL 'm': No scheme supplied. Perhaps you meant https://m/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing a, exception: Invalid URL 'a': No scheme supplied. Perhaps you meant https://a/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 2, exception: Invalid URL '2': No scheme supplied. Perhaps you meant https://2?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 1, exception: Invalid URL '1': No scheme supplied. Perhaps you meant https://1?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 3, exception: Invalid URL '3': No scheme supplied. Perhaps you meant https://3?
ERROR:langchain_community.document_loaders.url:Error fetching or processing b, exception: Invalid URL 'b': No scheme supplied. Perhaps you meant https://b/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing -, exception: Invalid URL '-': No scheme supplied. Perhaps you meant https://-/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing c, exception: Invalid URL 'c': No scheme supplied. Perhaps you meant https://c/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing h, exception: Invalid URL 'h': No scheme supplied. Perhaps you meant https://h/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing a, exception: Invalid URL 'a': No scheme supplied. Perhaps you meant https://a/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing t, exception: Invalid URL 't': No scheme supplied. Perhaps you meant https://t/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing ., exception: Invalid URL '.': No scheme supplied. Perhaps you meant https://.?
ERROR:langchain_community.document_loaders.url:Error fetching or processing g, exception: Invalid URL 'g': No scheme supplied. Perhaps you meant https://g/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing g, exception: Invalid URL 'g': No scheme supplied. Perhaps you meant https://g/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing m, exception: Invalid URL 'm': No scheme supplied. Perhaps you meant https://m/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing l, exception: Invalid URL 'l': No scheme supplied. Perhaps you meant https://l/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing v, exception: Invalid URL 'v': No scheme supplied. Perhaps you meant https://v/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 3, exception: Invalid URL '3': No scheme supplied. Perhaps you meant https://3?
ERROR:langchain_community.document_loaders.url:Error fetching or processing ., exception: Invalid URL '.': No scheme supplied. Perhaps you meant https://.?
ERROR:langchain_community.document_loaders.url:Error fetching or processing q, exception: Invalid URL 'q': No scheme supplied. Perhaps you meant https://q/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 4, exception: Invalid URL '4': No scheme supplied. Perhaps you meant https://4?
ERROR:langchain_community.document_loaders.url:Error fetching or processing , exception: Invalid URL '': No scheme supplied. Perhaps you meant https://_/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing 0, exception: Invalid URL '0': No scheme supplied. Perhaps you meant https://0?
ERROR:langchain_community.document_loaders.url:Error fetching or processing ., exception: Invalid URL '.': No scheme supplied. Perhaps you meant https://.?
ERROR:langchain_community.document_loaders.url:Error fetching or processing b, exception: Invalid URL 'b': No scheme supplied. Perhaps you meant https://b/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing i, exception: Invalid URL 'i': No scheme supplied. Perhaps you meant https://i/?
ERROR:langchain_community.document_loaders.url:Error fetching or processing n, exception: Invalid URL 'n': No scheme supplied. Perhaps you meant https://n/?

dosubot · 2023-12-12T11:55:07Z

🤖

Based on the information you've provided and the context from the LangChain repository, it seems like the issue you're encountering is related to how the UnstructuredURLLoader class is handling the URL you're passing to it.

The UnstructuredURLLoader class in the LangChain framework doesn't parse the URL itself, but rather passes it directly to the unstructured library's partition or partition_html functions. These functions are responsible for fetching and processing the content at the given URL. If you're receiving errors indicating that the URL is being incorrectly parsed as a sequence of characters, it's likely that the unstructured library's partition or partition_html functions are not able to correctly handle the URL you're providing.

Here is the relevant code from the UnstructuredURLLoader class that shows how it passes the URL to the unstructured library:

for url in urls:
    try:
        if self.__is_non_html_available():
            if self.__is_headers_available_for_non_html():
                elements = partition(
                    url=url, headers=self.headers, **self.unstructured_kwargs
                )
            else:
                elements = partition(url=url, **self.unstructured_kwargs)
        else:
            if self.__is_headers_available_for_html():
                elements = partition_html(
                    url=url, headers=self.headers, **self.unstructured_kwargs
                )
            else:
                elements = partition_html(url=url, **self.unstructured_kwargs)
    except Exception as e:
        if self.continue_on_failure:
            logger.error(f"Error fetching or processing {url}, exception: {e}")
            continue
        else:
            raise e

You can find this code in the libs/langchain/langchain/document_loaders/url.py file in the LangChain repository.

I also found a similar issue in the LangChain repository: WebBaseLoader interprets incorrectly the web_path parameter. This issue was caused by a regression in the latest update to WebBaseLoader, where a string passed to the web_path parameter was incorrectly interpreted and parsed as a sequence. The solution was to fix this condition in the code to correctly handle strings passed to the web_path parameter.

I hope this information helps you understand the issue and how to potentially fix it. If you have any further questions or need more clarification, please let me know.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

elliotthwang · 2023-12-12T12:06:06Z

@sbusso @jarib @deepblue

Could load GGUF models from https ?
Like llamaindex's
model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"

dosubot bot added the 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder label Dec 11, 2023

elliotthwang closed this as completed Dec 11, 2023

elliotthwang reopened this Dec 12, 2023

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 12, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Could load GGUF models from https #14514

DOC: Could load GGUF models from https #14514

elliotthwang commented Dec 11, 2023

dosubot bot commented Dec 11, 2023 •

edited

Loading

elliotthwang commented Dec 11, 2023 •

edited

Loading

dosubot bot commented Dec 11, 2023

elliotthwang commented Dec 11, 2023

dosubot bot commented Dec 11, 2023

elliotthwang commented Dec 11, 2023

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

elliotthwang commented Dec 12, 2023

DOC: Could load GGUF models from https #14514

DOC: Could load GGUF models from https #14514

Comments

elliotthwang commented Dec 11, 2023

Issue with current documentation:

Idea or request for content:

dosubot bot commented Dec 11, 2023 • edited Loading

elliotthwang commented Dec 11, 2023 • edited Loading

dosubot bot commented Dec 11, 2023

Sources

elliotthwang commented Dec 11, 2023

dosubot bot commented Dec 11, 2023

Sources

elliotthwang commented Dec 11, 2023

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

Sources

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

Sources

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

Sources

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

Sources

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

Sources

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 12, 2023

Sources

elliotthwang commented Dec 12, 2023

Define the components of your URL

Use urlunparse to generate a properly formatted URL

Now you can pass this URL to the UnstructuredURLLoader

dosubot bot commented Dec 12, 2023

Sources

elliotthwang commented Dec 12, 2023

dosubot bot commented Dec 11, 2023 •

edited

Loading

elliotthwang commented Dec 11, 2023 •

edited

Loading