CAPTCHA22 is a toolset for building, and training, CAPTCHA cracking models using neural networks. These models can then be used to crack CAPTCHAs with a high degree of accuracy. When used in conjunction with other scripts, CAPTCHA22 gives rise to attack automation; subverting the very control that aims to stop it.

Installation

CAPTCHA22 requires tensorflow (see prerequisites). You can then install CAPTCHA22 using pip:

pip install captcha22

Prerequisites

CAPTCHA22 is most performant on a GPU-enabled tensorflow build. This, however, will require numerous steps (as discussed here). Currently TF less than version 2 is required for AOCR, which requires Python 3.7 or less. AOCR will be ported to TF2 in the future.

  • To install a less optimal, CPU-based, tensorflow build - you can simply issue the following command:

    pip install "tensorflow<2"
    
  • The tensorflow serving addon is required to host trained CAPTCHA models.

Usage: How to crack CAPTCHAs

CAPTCHA22 works by training a neural network against a sample of labelled CAPTCHAs (using a sliding CNN with a LSTM module). Once this model is suitably accurate, it can be applied to unknown CAPTCHAs - automating the CAPTCHA cracking process.

This process is broken down into 3 steps:

Step 1: Creating training sample data (labelling CAPTCHAs)

The first step in this whole process is create a sample of correctly labelled CAPTCHAs. Ideally, you’ll want to aim for at least 200.

1. Collecting CAPTCHAs

Unfortunately, there is no one size fits all solution for collecting CAPTCHA samples and you’ll have to be innovative with your approach. In our experience, we’ve had little difficulty automating this process using wget or the python requests library. How you approach this is up to you, but a good starting point would probably be to try and work out how the target application is generating/serving their CAPTCHAs.

2. Labelling

Sadly, labelling is manual. This is most laborious and time consuming step in this whole process - fortunately things only get better from here. To try and make things a little easier, we’ve included functionality to help with labelling:

captcha22 client label --input=<stored captcha folder>

Once complete, CAPTCHA22 will produce a ZIP file (e.g. <api_username>_<test_name>_<version_number>.zip) that you can upload (discussed in step 2).

Step 2: Training a CAPTCHA model

Once you have a sample set of labelled CAPTCHAs, the next step is to begin training the CAPTCHA model.

1. Launch the Server (and API)

To do this, you first need to launch CAPTCHA22’s server engine, which will poll the ./Unsorted/ directory for new ZIPs:

captcha22 server engine

Enable the API for interfacing with the CAPTCHA22 engine (if you’re an advanced user, feel free to skip this step):

captcha22 server api

The default API credentials are admin:admin. You can modify the users.txt file to change this value, or add additional users. See the below code snippet for guidance:

python -c "from werkzeug.security import generate_password_hash;print('username_string' + ',' + generate_password_hash('password_string'))"

2. Upload CAPTCHA training samples

To upload training samples, simply drop the ZIP file you created in Step 1 into ./Unsorted/. The zip file name should be <captcha_name>_<captcha_version>.zip. Alternatively, if you opted to enable the API, you can perform this step interactively using the client:

captcha22 client api

In both cases, CAPTCHA22 will automatically begin training a model.

3. Deploy the trained model

Once a model is trained and sufficiently accurate, the model can be deployed to use for automated cracking. The model can either be deployed on the CAPTCHA22 server or downloaded. Both methods can be performed using the interactive API client.

To host the model, extract the ZIP and execute:

tensorflow_model_server --port=9000 --rest_api_port=9001 --model_name=<yourmodelname> --model_base_path=<full path to exported model directory>

The interactive API client can also be used to upload a CAPTCHA to CAPTCHA22 to be solved by the hosted model.

The following cURL request will verify whether the model is working:

curl -X POST \
    http://localhost:9001/v1/models/<yourmodelname>:predict \
    -H 'cache-control: no-cache' \
    -H 'content-type: application/json' \
    -d '{
            "signature_name": "serving_default",
            "inputs": 
            {
                "input": { "b64": "/9j/4AAQ==" }
            }
        }'

Step 3: CAPTCHA Cracking

Once a model is hosted, you’ll be able to pass CAPTCHAs to the model and receive an answer (i.e. automation). You can use the template code below to use CAPTCHA22 in conjuntion with your own custom code to execute a variety of automated attacks (e.g. Username enumeration, Brute force password guessing, Password spraying, etc.).

from captcha22 import Cracker

# Create cracker instance, all arguments are optional
solver = Cracker(
    #  server_url="http://127.0.0.1",
    #  server_path="/captcha22/api/v1.0/",
    #  server_port="5000",
    #  username=None,
    #  password=None,
    #  session_time=1800,
    #  use_hashes=False,
    #  use_filter=False,
    #  use_local=False,
    #  input_dir="./input/",
    #  output="./output/",
    #  image_type="png",
    #  filter_low=130,
    #  filter_high=142,
    #  captcha_id=None
    )

# Retrieve captcha from website
...
# Create b64 image string
...

# Solve with CAPTCHA22
answer = solver.solve_captcha_b64(b64_image_string)

# Submit answer to website and launch attack
...

As the model exposes a JSON API, you’re not restricted to Python if you prefer to use tools such as cURL, wget, or anything else.

Two example cracker scripts are also provided (baseline and pyppeteer). Both of these scripts are experimental and will not cater for most cases.

  • The baseline script will create a connection to the CAPTCHA22 server, or a locally hosted model, before requesting the file path to a CAPTCHA.
  • The pyppeteer script will use the baseline script and simulate browser requests to find and solve the CAPTCHA, before running a login attack.

To execute one of these scripts:

captcha22 client cracking --script=<script name>

Troubleshooting

CAPTCHA22 was tested on two GPU-enabled Tensorflow rigs with the following specifications:

Rig 1Rig 2
Graphics CardGeForce GTX 1650GeForce GTX 960
OSUbuntu 16.06Ubuntu 16.04
Cuda LibCuda 10.0.130Cuda 9.1.1
cuDDN LibcuDNN 10.0cuDNN 7.0
TensorflowTensorflow 1.10.1Tensorflow 1.4.1

For assistance on any issues in CAPTCHA22 itself, please log an issue.

Contributing

See CONTRIBUTING.md for more information.