Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enhance scraper logging and title handling #1118

Merged
merged 1 commit into from
Feb 8, 2025

Conversation

kga245
Copy link
Contributor

@kga245 kga245 commented Feb 7, 2025

Title

Enhance Scraper Logging and Title Handling

Description

This PR improves the scraping functionality in two ways:

  1. Adds detailed logging for scraper operations to help with debugging and monitoring
  2. Fixes inconsistent title handling in the scraper return values

Changes

  • Added structured logging in extract_data_from_url method including:
    • Scraper type being used
    • Page title
    • Content length
    • Number of images found
    • URL being processed
  • Modified error cases to return the actual scraped title instead of empty string
  • Added warning log for short/empty content cases

Testing

The changes can be tested by:

  1. Running a research task with various URLs
  2. Checking the logs in:
    • Console output
    • logs/app.log
    • Task JSON output
    • Web UI agent output
  3. Verifying that titles are preserved even when content is too short

Related Issues

  • Improves debugging capability in order to close issue #[578]
  • Enhances logging visibility for scraper operations

Additional Notes

  • No new dependencies were added
  • All changes are backwards compatible
  • Logging uses existing logger infrastructure

- Add detailed logging for scraper operations
- Fix inconsistent title handling in return values
- Improve error case logging with warnings
- Preserve titles even when content is too short
Copy link
Owner

@assafelovic assafelovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome thank you @kga245

@assafelovic assafelovic merged commit 1106233 into assafelovic:master Feb 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants