feat: Add BSHTMLLoader support and enhance error handling for document loading #1166

LavX · 2025-02-18T13:13:52Z

This PR enhances the DocumentLoader class by adding support for HTML documents and improving error handling. The changes include:

Added BSHTMLLoader support:

Imported BSHTMLLoader from langchain_community.document_loaders
Added handlers for both .html and .htm file extensions
BSHTMLLoader provides better HTML parsing capabilities compared to basic text loading
Improved error handling:

Added specific try-catch block for document loading operations
Enhanced error messages to differentiate between HTML-specific and general document loading failures
Provides better debugging information when loading fails
These changes make the document loader more robust and expand its capabilities to handle HTML documents more effectively.

Technical Details:

File modified: gpt_researcher/document/document.py
Added BSHTMLLoader to the imported loaders
Updated loader_dict to include HTML file extensions
Implemented specific error handling for HTML document loading failures

LavX and others added 2 commits February 18, 2025 14:07

Add BSHTMLLoader support and improve error handling in DocumentLoader

24b6926

Merge branch 'assafelovic:master' into master

65b9799

ElishaKay approved these changes Feb 19, 2025

View reviewed changes

ElishaKay merged commit 9a06536 into assafelovic:master Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add BSHTMLLoader support and enhance error handling for document loading #1166

feat: Add BSHTMLLoader support and enhance error handling for document loading #1166

LavX commented Feb 18, 2025

feat: Add BSHTMLLoader support and enhance error handling for document loading #1166

feat: Add BSHTMLLoader support and enhance error handling for document loading #1166

Conversation

LavX commented Feb 18, 2025