A simple web scraper base that combines BeautifulSoup and Selenium to scrape dynamic websites.
- Install with pip
pip install souperscraper
- Download the appropriate ChromeDriver for your Chrome version using getchromedriver.py (command below) or manually from the ChromeDriver website.
To find your Chrome version, go to
chrome://settings/help
in your browser.
getchromedriver
- Create a new SouperScaper object using the path to your ChromeDriver
from souperscraper import SouperScraper
scraper = SouperScraper('/path/to/your/chromedriver')
- Start scraping using BeautifulSoup and/or Selenium methods
scraper.goto('https://github.com/LucasFaudman')
# Use BeautifulSoup to search for and extract content
# by accessing the scraper's 'soup' attribute
# or with the 'soup_find' / 'soup_find_all' methods
repos = scraper.soup.find_all('span', class_='repo')
for repo in repos:
repo_name = repo.text
print(repo_name)
# Use Selenium to interact with the page such as clicking buttons
# or filling out forms by accessing the scraper's
# find_element_by_* / find_elements_by_* / wait_for_* methods
repos_tab = scraper.find_element_by_css_selector("a[data-tab-item='repositories']")
repos_tab.click()
search_input = scraper.wait_for_visibility_of_element_located_by_id('your-repos-filter')
search_input.send_keys('souper-scraper')
search_input.submit()