This project is a Python script that uses Selenium to scrape images from websites that load content dynamically with JavaScript. It handles loading more images by scrolling and clicking through multiple pages, and saves high-resolution images locally.
- Web Automation: Utilizes Selenium to automate browser interactions and handle dynamic content loading.
- Image Scraping: Extracts high-resolution images from web pages.
- Pagination Handling: Automatically navigates through multiple pages to scrape more images.
- Image Saving: Saves the scraped images to a local directory with proper naming.
- Selenium
- Webdriver_manager
- Requests
- Pillow
-
Clone the repository:
git clone https://github.com/yourusername/selenium-image-scraper.git cd selenium-image-scraper
-
Install the required packages:
pip install -r requirements.txt
-
Update the script:
- Set the
driver.get("site_link")
to the URL of the website you want to scrape. - Update the XPaths:
image_row_xpath
,image_xpath
, andnext_page_button_xpath
with the correct values from the target website.
- Set the
-
Run the script:
python scrape_images.py
The script is designed to scrape up to 2000 images from a website by navigating through multiple pages and saving the images to a directory called scripted_images
.
-
Specify the website: Update the
driver.get("site_link")
line in the script with the URL of the target website. -
Set the XPaths: Update the XPaths for image rows, images, and the next page button:
image_row_xpath = "your_image_row_xpath" image_xpath = "your_image_xpath" next_page_button_xpath = "your_next_page_button_xpath"
-
Run the script: Execute the script to start scraping images:
python scrape_images.py
-
Find your images: The images will be saved in a directory named
scripted_images
with filenames in the formatimage_1.jpg
,image_2.jpg
, etc.
Here's an example of how to set up and run the script:
-
Update the script with the target website and XPaths:
driver.get("https://example.com") image_row_xpath = "//div[@class='image-row']" image_xpath = ".//img" next_page_button_xpath = "//button[@id='next-page']"
-
Run the script:
python scrape_images.py
-
Output: The script will save the images in the
scripted_images
directory.
- Element Not Found: Ensure the XPaths are correct and match the structure of the target website.
- Timeouts: Increase the sleep time in the
load_more_images
andclick_next_page
functions if the website is slow to load.
Feel free to open issues or submit pull requests if you find any bugs or have suggestions for improvements.
This project is licensed under the Apache 2.0 License.
- Selenium for web automation
- WebDriver Manager for managing ChromeDriver
- Pillow for image processing
Happy Scraping! 🚀