Hack This Site - Programming - Bypass the image captcha

Welcome back to my walkthrough of hackthissite.org’s CTF missions. I will be going through my thought process of how I solved these missions, and therefore also giving away the solutions. If you came across this to give you hints, watch out for spoilers! Good luck, have fun.

This mission has given me many issues throughout my completion. In the end, I was still not able to get a successful submission rate anywhere near where I would have liked it to be, but I did not want to spend anymore time on it. I had successfully ‘completed’ the mission after 28 submitted entries, with an overall 6.9% success rate within those 28 submissions. The issues that caused this low success rate were that a single 0 would be interpreted as a D or a single B would get interpreted as a 8. This success rate could have been increased if I had more training data for these particular characters. The following explains my process of completing this challenge.

I had decided to use a couple of libraries to help me with this mission. Specifically the following python modules: PIL, math, and pytesseract - with Tesseract as a dependency. Furthermore, I had to train the eng.traineddata file to create my own language file called hts-p6-eng.traineddata.

The following code is what I had come up with to solve this challenge. Originally it had a success rate of 2% for a given set of 15 characters. Once training with 506 annotated character images, it had evolved to a 6.9% success rate amongst a given set of 253 characters.

from PIL import Image, ImageOps, ImageEnhance, ImageFilter
from math import ceil, floor
from pytesseract import image_to_string as its

BOUNDING_BOX= (13, 16)
Y_DIFF = [20, 23, 28, 34, 40, 49, 57]
BLACK = (0,0,0)
FILENAME = 'a.png'

def printCharAtBoundingBox(pxIm, x, y):
    """Prints char to console @ starting (x,y) coordinates.
    Ending inclusively at (x+12, y+14)

    x -- x coordinate
    y -- y coordinate
    """
    print(f'printing ({x}, {y})')
    for ty in range(y, y+BOUNDING_BOX[1]):
        for tx in range(x, x+BOUNDING_BOX[0]):
            if pxIm[tx,ty][:3] == (0,0,0):
                print("-", end="")
            else:
                print("G", end="")
        print()
    print()

def findUpperGreenPixel(pxIm):
    """Finds the first green pixel within the evaluation zone
    of the defined pxIm.

    pxIm -- PIL.Image.load() representation of an image
    Return: (x, y) coordinates of first green pixel from top-down
    """
    # we know:
    #   1. image is 1870 x 1050
    #   2. analyze part starts @ 1 x 138
    #   3. origin is @ top left corner

    for y in range(140, 525):
        for x in range(1, 935):
            if (pxIm[x, y][:3] != BLACK):
                return (x, y)
    return None

def validateBoundingBox(pxIm, x, y):
    """Checks the current bounding box @ (x, y)

    pxIm -- PIL.Image.load() representation of an image
    x -- x coordinate
    y -- y coordinate
    """
    # a valid bounding box is defined as having a full black box border around
    # it, and green pixels within it

    # check top & bottom horizontal outer limits
    for ty in [y - 1, y + BOUNDING_BOX[1] + 1]:
        black = True
        for tx in range(x, x + BOUNDING_BOX[0]):
            black &= (pxIm[tx, ty][:3] == BLACK)
        if not black:
            return {"res": False, "reason": "failed testing horizontal"}
    
    # check left & right vertical outer limits
    for tx in [x - 1, x + BOUNDING_BOX[0] + 1]:
        black = True
        for ty in range(y, y + BOUNDING_BOX[1]):
            black &= (pxIm[tx, ty][:3] == BLACK)
        if not black:
            return {"res": False, "reason": "failed testing vertical"}
    
    # find a single non-black pixel within the bounding box limits
    for ty in range(y, y + BOUNDING_BOX[1]):
        for tx in range(x, x + BOUNDING_BOX[0]):
            if (pxIm[tx, ty][:3] != BLACK):
                return {"res": True}
    
    # no non-black pixel found in bounding box, if we made it here, something
    # is wrong
    return {"res": False, "reason": "failed testing contents"}

def fitToBoundingBox(pxIm, x, y):
    """Given x, y coordinates, tries to fit the char into a bounding box.

    pxIm -- PIL.Image.load() representation of an image
    x -- x coordinate
    y -- y coordinate
    """
    # check current box
    currBoxStatus = validateBoundingBox(pxIm, x, y)
    if not currBoxStatus["res"]:
        # try shifting box left (@ most shift one bounding box width)
        for tx in range(x, x - BOUNDING_BOX[0], -1):
            if validateBoundingBox(pxIm, tx, y)["res"]:
                return (tx, y)
        # try shifting box right (@ most shift one bounding box width)
        for tx in range(x, x + BOUNDING_BOX[0]):
            if validateBoundingBox(pxIm, tx, y)["res"]:
                return (tx, y)
        # try shifting box up (@ most shift one boudning box height)
        for ty in range(y, y + BOUNDING_BOX[1]):
            if validateBoundingBox(pxIm, x, ty)["res"]:
                return (x, ty)
        # try shifting box down (@ most shift one bounding box height)
        for ty in range(y, y - BOUNDING_BOX[1], -1):
            if validateBoundingBox(pxIm, x, ty)["res"]:
                return (x, ty)
        # valid bounding box not found
        return None
    else:
        return (x, y)

def findBestBoundingBox(pxIm, x, y, depth=6):
    """Recursively finds a bounding box that fits all 7 chars in the center
    vertical. Returns None if it doesn't succeed, and (x, y) tuple otherwise.

    pxIm -- PIL.Image.load() representation of an image
    x -- int x coordinate
    y -- int y coordinate
    depth -- int depth level, default = 6
    """
    # top of call stack
    if (depth == 6):
        # test all fitting bounding boxes
        newDepth = depth - 1
        for tx in range(x, x + BOUNDING_BOX[0]):
            for ty in range(y, y + BOUNDING_BOX[1]):
                if validateBoundingBox(pxIm, tx, ty)["res"]:
                    if findBestBoundingBox(pxIm, tx, ty + Y_DIFF[depth], 
                      newDepth):
                        return (tx, ty)
            for ty in range(y, y - BOUNDING_BOX[1], -1):
                if validateBoundingBox(pxIm, tx, ty)["res"]:
                    if findBestBoundingBox(pxIm, tx, ty + Y_DIFF[depth], 
                      newDepth):
                        return (tx, ty)
        for tx in range(x, x - BOUNDING_BOX[0], -1):
            for ty in range(y, y + BOUNDING_BOX[1]):
                if validateBoundingBox(pxIm, tx, ty)["res"]:
                    if findBestBoundingBox(pxIm, tx, ty + Y_DIFF[depth], 
                      newDepth):
                        return (tx, ty)
            for ty in range(y, y - BOUNDING_BOX[1], -1):
                if validateBoundingBox(pxIm, tx, ty)["res"]:
                    if findBestBoundingBox(pxIm, tx, ty + Y_DIFF[depth], 
                      newDepth):
                        return (tx, ty)
    # within call stack
    elif (depth == -1):
        # bottom of call stack
        return validateBoundingBox(pxIm, x, y)["res"]
    else:
        # test and pass down the call stack
        if validateBoundingBox(pxIm, x, y)["res"]:
            return findBestBoundingBox(pxIm, x, y + Y_DIFF[depth], depth - 1)
        return False
    # no valid bounding box for all chars was found
    return None
        
def findStartingChar(pxIm):
    """Finds the bounding box for the first char that we are meant to analyze.

    pxIm -- PIL.Image.load() representation of an image
    """
    # find the uppermost green pizel
    res = findUpperGreenPixel(pxIm)
    if res is None:
        return res
    (x, y) = res
    # print(f'starting window')
    # printCharAtBoundingBox(pxIm, x, y)
    # try to fit it into a bounding box
    res = fitToBoundingBox(pxIm, x, y)
    if res is None:
        return res
    (x, y) = res
    # test our bounding box against other chars going down
    # adjust bounding box as needed
    # print(f'first fit window')
    # printCharAtBoundingBox(pxIm, x, y)
    res = findBestBoundingBox(pxIm, x, y)
    if res is None:
        return res
    (x, y) = res
    # return the starting char
    y += sum(Y_DIFF)
    return (x, y)

def pastePixels(pxIm, x, y):
    """Creates an (R,B,B) image object from the given (x,y) coordinates of an
    pxIm image based on BOUNDING_BOX +/- 3

    pxIm -- PIL.Image.load() representation of an image
    x -- int x coordinate
    y -- int y coordinate
    """
    res = []
    for ty in range(y - 3, y+BOUNDING_BOX[1] + 3):
        for tx in range(x - 3, x+BOUNDING_BOX[0] + 3):
            if pxIm[tx,ty][:3] == (0,0,0):
                res.append((255,255,255))
            else:
                res.append((0,0,0))
    return res 


if __name__ == "__main__":
    res=""
    # get image pixels
    with Image.open(FILENAME) as im:
        pxIm = im.load()
    # get pixel location of starting character (most inside char)
    res = findStartingChar(pxIm)
    if res is None:
        print(f'no starting char was found')
        exit(1)
    (x, y) = res
    # rotate image & shift up by certain factor around the center of spiral
    (cX, cY) = (x + BOUNDING_BOX[0]//2, y + BOUNDING_BOX[1] + 182//2)
    # 37 makes a full 360 deg rotation
    with Image.open(FILENAME) as im:
        # keep sum of upwards shift
        s=0
        # total 253 chars for a given image
        for i in range(0, 253):
            # rotate the image around spiral & blow up letters
            im_rot = im.rotate(i%36*10,
                               center= (cX, cY),
                               resample= Image.BILINEAR)
            tpxim = im_rot.load()
            # figure out upwards shift amount
            if (i % 36 == 0 and i != 0):
                s = sum(Y_DIFF[:i//36])
            s += 0 if (i % 36 == 0) else Y_DIFF[i//36]/3
            # create a temp image of only the bouding box +/- 6 just in case
            with Image.new('RGB', (BOUNDING_BOX[0] + 6,
                                   BOUNDING_BOX[1] + 6)) as tempImage:
                tempImage.putdata(pastePixels(tpxim, x, y - ceil(s)))
                # clean up the image a little bit
                tempImage = tempImage.resize((52,64), resample= Image.BICUBIC)
                tempImage = tempImage.filter(ImageFilter.MedianFilter())
                tempImage = tempImage.filter(ImageFilter.SMOOTH_MORE)
                # call pytesseract to interpret image to text
                ch = its(tempImage,
                         lang='hts-p6-eng',
                         config='--psm 10 --oem 1 -c ' + 
                                'tessedit_char_whitelist=ABCDEF0123456789'
                        ).strip()
                # add to final ouput string
                res += ch
    print(f'{res}')

I had used the following resources to find the most accurate config options prior to data training:

It was after testing that I had realized that I would never get anywhere without training for the specific font that that is generated by this mission. I had then used the following resource, whilst having to edit a lot of the code that was made available, in order to train the eng.traineddata file.

saiashish90.medium.com

This one was really rough, but we move onto the next.