Just another mundane earthling : अशाच एखाद्याची बखर: July 2020

Today I wrote python code to generate CAPTCHA images. Before we jump into the code, let's recap what a CAPTCHA is. CAPTCHA is an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart. This is one type of a challenge–response test that is used in computing to determine whether the user is a human or not. This is implemented at places where we want only humans to proceed ahead, and stop others (bots et al.) How effective is the method of using CAPTCHA, and can it be circumvented is not a topic I take here. Right now, let us get to writing python code for generating CAPTCHA images.

Well, a python library named captcha is available that you could use to generate audio and image CAPTCHAs. You could install this library and use it to write your python code for generating audio and image CAPTCHAs. If you want to generate image CAPTCHAs from python code yourself, without using the python's captcha library, do read on. Wait. Why would someone write their own code, while a standard library is available? This is an obvious question. Well, in most cases we should use the standard library. Only in some exceptional cases we should write our own code. In cases when standard library is not available, or is not allowed to be installed. And in cases when we want some customization in the implementation. What we have here could be one such case, where we want to customize how the image CAPTCHAs are being generated.

Here is the python code I wrote to generate image CAPTCHAs.

from PIL import Image
from PIL import ImageDraw
from PIL import ImageFont
import string

import random

def draw_text(img, x_given, y_given, text_given, font_given, fill_given):
    d = ImageDraw.Draw(img)
    d.text((x_given, y_given), text_given, font=font_given, fill=fill_given)

def char_selecting(size=6):
    characters = "abdefghijklmnqrtuy123456789ABDEFGHIJKLMNQRTUY"
    # Exclude letters that look similar in capital and small cases

# Why?

# We are randomly choosing a font size. Bigger to smaller.

# So can't tell whether you are looking at C or c in that situation

    # Exclude zero because we want to avoid confusion between 0 and O and o
    selection = ""
    for x in range(0, size):
        char = characters[random.randint(0, len(characters) - 1)]
        selection = selection + char
    return selection

image_x_size = 200
image_y_size = 100
img = Image.new('RGB', (image_x_size, image_y_size), color = (250, 250, 250))

word = char_selecting(6)    # We want 6 characters in our CAPTCHA
print word

x_pos = 10    # Position of the first character in the image

for char in (list(word)):
    print char

    font_size = random.randint(16, 50)
    # Font size smaller than 16 is too small
    # Font size bigger than 50 is too big
    font_selected = ImageFont.truetype('/usr/share/fonts/gnu-free/FreeMono.ttf', font_size)

    fill_selected = (random.randint(0, 200), random.randint(0, 200), random.randint(0, 200), random.randint(0, 255))
    # Chose a random colour

    y_random = random.randint(0, 30)
    # All characters should not be in a horizontal line.
    # So shift position of each character randomly

    draw_text(img, x_pos, y_random, char, font_selected, fill_selected)

    x_pos = x_pos + 30
    # Position of the next character
    # We are randomly choosing a font size for each character
    # So position of the next character should not be chosen randomly

mid_x = image_x_size / 2
mid_y = image_y_size / 2

first_half_x = random.randint(1, mid_x - 1)
first_half_y = random.randint(1, mid_y - 1)

second_half_x = mid_x + random.randint(1, mid_x - 1)
second_half_y = mid_y + random.randint(1, mid_y - 1)

print "Going to draw a line from", first_half_x, ",", first_half_y, "to", second_half_x, ",", second_half_y
d = ImageDraw.Draw(img)
d.line((first_half_x, first_half_y, second_half_x, second_half_y), 10)

img.save('captcha.png')   # Save the image as a file on the disk

Here are some of the CAPTCHA images that I generated from this code.

Now let us see this code in detail.

from PIL import Image

We are using the Image module from the PIL library.

from PIL import ImageDraw

We are using the ImageDraw module from the PIL library.

from PIL import ImageFont
We are using the ImageFont module from the PIL library.

def draw_text(img, x_given, y_given, text_given, font_given, fill_given):
d = ImageDraw.Draw(img)
d.text((x_given, y_given), text_given, font=font_given, fill=fill_given)

I have written this function to draw the given text at the specified place in the given image. The font and color to be used must be specified. So why did I write a separate function for this? Why not simply do this in the main code? Well, you could do that. Here I wrote a generic function, and used it. So that tomorrow if I have to extend this code, a generic function would be useful.

def char_selecting(size=6):
characters = "abdefghijklmnqrtuy123456789ABDEFGHIJKLMNQRTUY"

# Exclude letters that look similar in capital and small cases

# Why?

# We are randomly choosing a font size. Bigger to smaller.

# So can't tell whether you are looking at C or c in that situation

# Exclude zero because we want to avoid confusion between 0 and O and o

    selection = ""
    for x in range(0, size):
        char = characters[random.randint(0, len(characters) - 1)]
        selection = selection + char
    return selection

This is the function where the magic happens. The magic of choosing a set of characters that we want to show in the CAPTCHA image. Caller may choose the length of the string, by passing a numeric value. Else, a default of 6 is taken. We randomly choose the characters and form a string using them. While choosing the characters in the string, I have not considered all of the upper case letters, small case letters, and numbers. Why? Allow me to explain please. By looking at the CAPTCHA images that I have generated, you must have noticed that font size of each character varies. I did that deliberately. To avoid the text being recognized by an OCR software. But when we have a mix of small and big font sizes, how could you tell the difference between c and C, or between z and Z? Same for P, S, V, W, and X. O is even difficult, because a 0 (zero) also looks similar. So I chose not to use these characters. So that we don't create a confusion for humans.

image_x_size = 200
image_y_size = 100
img = Image.new('RGB', (image_x_size, image_y_size), color = (250, 250, 250))

This is the start of the main code. I have chosen image size of 200 x 100 pixels. I have chosen background color which is almost white, but not exactly white.

word = char_selecting(6) # We want 6 characters in our CAPTCHA
print word

A call to function char_selecting gets us a string of length 6, made up of randomly selected characters.

x_pos = 10 # Position of the first character in the image

In the loop that follows, you will observe that I keep on increasing this value, for the subsequent characters in the image.

for char in (list(word)):
print char

From the string that we have prepared, here we take one character at a time, and work on it.

    font_size = random.randint(16, 50)
    # Font size smaller than 16 is too small
    # Font size bigger than 50 is too big
    font_selected = ImageFont.truetype('/usr/share/fonts/gnu-free/FreeMono.ttf', font_size)

As mentioned earlier, I have deliberately chosen a random font size for each character. So that the resulting text could not be recognized by an OCR software. Here I am using the same font for all characters. If you want, you could obtain a list of fonts available in your system, and randomly choose a different font for each character. That implementation I leave up to you. Some weird fonts may be present in the system, which would render some of the characters beyond recognition by humans. Considering this possibility, I did not do that. I found a font in which the characters are rendered in a way that is easy for recognition by humans. And I am sticking to that font.

fill_selected = (random.randint(0, 200), random.randint(0, 200), random.randint(0, 200), random.randint(0, 255))
# Chose a random colour

We are deliberately choosing a different color for each character. So that the resulting text could not be recognized by an OCR software.

    y_random = random.randint(0, 30)
    # All characters should not be in a horizontal line.
    # So shift position of each character randomly

In my opinion, this is an important trick. If all characters appear in an horizontal line, there is a possibility of them being recognized by an OCR software. To minimize this possibility, we shift position of each character. In the range of 30 pixels.

    draw_text(img, x_pos, y_random, char, font_selected, fill_selected)

    x_pos = x_pos + 30
    # Position of the next character
    # We are randomly choosing a font size for each character
    # So position of the next character should not be chosen randomly

Function draw_text is where we draw the character in the image. Now let's see what precautions we have taken so that the text could not be recognized by an OCR software.

1. Color of each character is different.

2. Font size of each character is different.

3. Characters are not in a horizontal line. They are slightly off-positioned. Well enough to fool the OCR software. But not too out-of-position. So that humans don't have a difficulty in recognizing the sequence.

Now let's add something more in the image, which will prevent the OCR software from recognizing the text. Let's draw a line.

mid_x = image_x_size / 2
mid_y = image_y_size / 2

first_half_x = random.randint(1, mid_x - 1)
first_half_y = random.randint(1, mid_y - 1)

second_half_x = mid_x + random.randint(1, mid_x - 1)
second_half_y = mid_y + random.randint(1, mid_y - 1)

print "Going to draw a line from", first_half_x, ",", first_half_y, "to", second_half_x, ",", second_half_y
d = ImageDraw.Draw(img)
d.line((first_half_x, first_half_y, second_half_x, second_half_y), 10)

First we decide the position of the line in the image. We consider the image in two parts, say left part and right part. In the left part, we chose a random position, one end point of the line. In the right part, we chose a random position, the other end point of the line. And we draw the line, using the end points that we have chosen. Well, you could argue that a better method could have been used to draw the line. Yes, I agree. I leave this part up to you. To implement a better method to draw a line. Or may be some other geometrical object. Or may be two geometrical objects. Basically we want to have something that would prevent the OCR software from recognizing the text.

img.save('captcha.png') # Save the image as a file on the disk

Finally we save the image as a file named captcha.png

Is this code perfect? I don't claim it to be. Does this code generate CAPCHAs that are easily recognizable by humans, but could never be recognized by OCR software? I don't claim that either. If you try hard enough, you might find an OCR software that recognizes the text. And then what? Well, if I'd get to know that, may be I'd change my code so that the flaw is removed. After all, this is a game of cat and mouse that we all are playing, right?

And we did not generate audio CAPCHA yet. So let's do that.

from gtts import gTTS

word = "TAAaGr"

text = ""
for char in (list(word)):
text = text + char + " "

speech = gTTS(text = text, lang = 'en', slow = True)

speech.save("captcha.mp3")

We are using the gtts library for this purpose. For each character to be clearly audible, we are separating them by spaces. You'd notice there is no difference in which small case letters and upper case letters are spoken. So, if we have to implement image and audio CAPTCHA together, then along with numbers we should use only lower case letters or only upper case letters.

What if we could make the challenge–response test a little more difficult to be solved by a non-human, while keeping it easy for humans. One way to do this is to show two numbers, ask what is their addition, and check the response. Let's see python code for this.

from PIL import Image
from PIL import ImageDraw
from PIL import ImageFont
import random

def draw_text(img, x_given, y_given, text_given, font_given, fill_given):
    d = ImageDraw.Draw(img)
    d.text((x_given, y_given), text_given, font=font_given, fill=fill_given)

image_x_size = 150
image_y_size = 70
img = Image.new('RGB', (image_x_size, image_y_size), color = (250, 250, 250))

num1 = random.randint(1,9)
num2 = random.randint(1,9)
total = num1 + num2
print num1, "+", num2, "=", total

font_size = random.randint(16, 50)
# Font size smaller than 16 is too small
# Font size bigger than 50 is too big
font_selected = ImageFont.truetype('/usr/share/fonts/gnu-free/FreeMono.ttf', font_size)

fill_selected = (random.randint(0, 200), random.randint(0, 200), random.randint(0, 200), random.randint(0, 255))
# Chose a random colour

x_pos = 10    # Position of the first character in the image

y_random = random.randint(0, 20)
# All characters should not be in a horizontal line.
# So shift position of each character randomly

draw_text(img, x_pos, y_random, str(num1), font_selected, fill_selected)

fill_selected = (random.randint(0, 200), random.randint(0, 200), random.randint(0, 200), random.randint(0, 255))
x_pos = x_pos + 30
y_random = random.randint(0, 20)
draw_text(img, x_pos, y_random, "+", font_selected, fill_selected)

fill_selected = (random.randint(0, 200), random.randint(0, 200), random.randint(0, 200), random.randint(0, 255))
x_pos = x_pos + 30
y_random = random.randint(0, 20)
draw_text(img, x_pos, y_random, str(num2), font_selected, fill_selected)

fill_selected = (random.randint(0, 200), random.randint(0, 200), random.randint(0, 200), random.randint(0, 255))
x_pos = x_pos + 30
y_random = random.randint(0, 20)
draw_text(img, x_pos, y_random, "=", font_selected, fill_selected)

img.save('captcha_num.png')   # Save the image as a file on the disk

Here are some of the CAPTCHA images that I generated from this.