HTTP, SOCKS4, SOCKS5 proxies scraper and checker.
- Asynchronous.
- Uses regex to search for proxies (ip:port format) on a web page, allowing proxies to be extracted even from json without making changes to the code.
- It is possible to specify the URL to which to send a request to check the proxy.
- Can sort proxies by speed.
- Supports determining the geolocation of the proxy exit node.
- Can determine if the proxy is anonymous.
You can get proxies obtained using this script in monosans/proxy-list.
- Download and unpack the archive with the program.
- Edit
config.ini
according to your preference. - Install Python (minimum supported version is 3.7). During installation, be sure to check the box
Add Python to PATH
. - Install dependencies and run the script. There are 2 ways to do this:
- Automatic:
- On Windows run
start.cmd
- On Unix-like OS run
start.sh
- On Windows run
- Manual:
cd
into the unpacked folder- Install dependencies with the command
python -m pip install -U --no-cache-dir --disable-pip-version-check pip setuptools wheel; python -m pip install -U --no-cache-dir --disable-pip-version-check -r requirements.txt
- Run with the command
python -m proxy_scraper_checker
- Automatic:
When the script finishes running, the following folders will be created (this behavior can be changed in the config):
proxies
- proxies with any anonymity level.proxies_anonymous
- anonymous proxies.proxies_geolocation
- same asproxies
, but includes exit-node's geolocation.proxies_geolocation_anonymous
- same asproxies_anonymous
, but includes exit-node's geolocation.
Geolocation format is ip:port|Country|Region|City
.