-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does HTML Doc Loader accept .htm sites? #14545
Comments
🤖 Hello @jimmy2933! 👋 I'm Dosu, a friendly bot who's here to assist while you're waiting for a human maintainer. I can help you solve bugs, answer questions, and guide you to become a contributor. Remember, no issue is too big or small for us to tackle together! 😄 Based on the information you've provided, it seems like you're encountering an issue with the The def _get_elements(self) -> List:
from unstructured.partition.html import partition_html
return partition_html(filename=self.file_path, **self.unstructured_kwargs) (Source: libs/langchain/langchain/document_loaders/html.py) This method uses the However, without more information, it's hard to say for certain what the exact issue is. Could you please provide more details about the error message you're receiving? Specifically, it would be helpful to know if there's any additional information in the error message about where in the code the error is occurring. This could help pinpoint the exact cause of the issue. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
try:
|
System Info
Langchain 0.0.331, macOS Monterey, Python 3.10.9
Who can help?
No response
Information
Related Components
Reproduction
from langchain.document_loaders import UnstructuredHTMLLoader
loader = UnstructuredHTMLLoader("https://www.sec.gov/ix?doc=/Archives/edgar/data/40987/000004098720000010/gpc-12312019x10k.htm")
documents = loader.load()
FileNotFoundError: [Errno 2] No such file or directory: 'https://www.sec.gov/ix?doc=/Archives/edgar/data/40987/000004098720000010/gpc-12312019x10k.htm
Expected behavior
Success loading .htm file
The text was updated successfully, but these errors were encountered: