Solve Captchas w/ Machine Learning and Selenium
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
This project aims to demonstrate how a captcha can be bypassed using a bot and machine learning methods. The captcha being fooled is Really Simple CAPTCHA. As noted on the site and implied by the name, this captcha is really simple and doesn't provide a strong level of security. Even so, this is a popular WordPress plugin with 700,000+ active installations 😨.
This project can be easily cloned and executed locally. Prior to running the python script, I recommend checking out the website containing the captcha first. This way you can get a feel for how the captcha works and what the bot may be trying to accomplish.
This project requires Python (I am using Python 3.8.5) to be installed along with several Python modules. These modules can be found in requirements.txt.
To install the modules, simply execute the following command in the base directory.
pip3 install -r requirements.txt
In addition, Google Chrome and ChromeDriver will need to be installed. The instructions provided here pertain to Debian based systems.
# Installing Chrome
# First, install Chrome dependencies
sudo apt-get update
sudo apt-get install -y curl unzip xvfb libxi6 libgconf-2-4
# Next, install Chrome
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ./google-chrome-stable_current_amd64.deb
# Validate installation
google-chrome --version
# Installing ChromeDriver
# Note, use the ChromeDriver url that matches your Chrome version
wget https://chromedriver.storage.googleapis.com/86.0.4240.22/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/bin/chromedriver
sudo chown root:root /usr/bin/chromedriver
sudo chmod +x /usr/bin/chromedriver
# Validate installation
chromedriver --version
If you are using WSL2, you must also install an X server. First, download and install VcXsrv. Once installed, run xlaunch.exe. You can use most of the default settings, but make sure to check "Disable access control". Finally, set the DISPLAY to the correct ip address.
export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2; exit;}'):0.0
To prevent having to export DISPLAY for each session, add this line to your .bashrc file.
- Clone the repo
git clone https://github.com/reevesba/captcha-bot cd captcha-bot
- Install Python packages
pip3 install -r requirements.txt
- Extract the images in dat/captchas.zip
Executing this bot is very simple. There are three primary modules included. The first preprocesses images of captchas by extracting each character from the captcha and saving them as new images. The second module creates a convolutional neural network with Keras and trains the model. The final module executes the bot. The bot will open the webpage, enter the captcha guess, and submit.
The first step is to execute the driver program.
cd src
python3 main.py
You will be met with an interactive command line interface.
If you have already preprocessed the captcha images, you can skip the preprocessing step by entering 'n'. Similiarily, if you have already trained the network and saved the weights, you can skip the training step by entering 'n'.
Once the first two steps are complete, the bot will attempt to break the captcha.
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the GNU GPLv3 License. See LICENSE
for more information.
Bradley Reeves - [email protected]
Project Link: https://github.com/reevesba/captcha-bot