Study On Security Offered By Captcha Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

CAPTCHA is well known term in the Security field for the last decade. It gained its own importance by providing a way to protect the web resources from bots, any automated attacks by computer programs etc., which was a challenging problem before the introduction of CAPTCHA. But even now, the solution remained incomplete. This paper discusses about what is a CAPTCHA and different types of CAPTCHA's, the architecture showing the usage of CAPTCHA, the Usability and Robustness issues of the CAPTCHA's. The robustness issues reveal why this solution is incomplete.

CAPTCHA is short form of 'Completely Automated Public Turing test to tell Computers and Humans Apart' [1]. This expansion pretty much explains what a 'CAPTCHA' is. The only thing that needs explanation to a person who is unfamiliar with Computers is 'Turing test'. Basically a Turing test is the test conducted to remove ambiguity. It can be used in different ways like for testing a machine for its intelligence or for deciding which one is a machine and which is a human. In the former case, a person who is blindsided on with whom he is communicating with is allowed to participate in a communication. At the end of communication, if the person concludes that he is talking to a human, then it can be concluded that the machine passed the intelligence test. This is not the case for CAPTCHA. We use the second case in CAPTCHA. The Turing test in CAPTCHA has some differences when compared to previous one. In CAPTCHA's the person who is participant in the communication is a computer machine. The other side can be either a human or a machine. The result of Turing test in this case is whether the other side is a Human or a Computer. With this statement anyone can guess 'What the Turing test in CAPTCHA's are.' Yes, they are the puzzles or tests that can be solved only by humans. Therefore, we understood what a CAPTCHA is and now let us see what makes CAPTCHA so important.

In the present day Internet world, there are many threats and security is the main issue to deal with. We should take some security steps to protect our resources. Many big companies like Microsoft, Google, Yahoo etc., which have a large number of users needs to ensure that their resources are not being wasted. For, example let us consider a website that can handle 'n' number of users at a time. Let it be a commercial website which offers Internet TV to its users. Suppose an evil user wrote an evil script which automatically connects to this website and requests watching the TV. When such automated requests reach the maximum number of users that it can handle, then the requests made by genuine users can not be handled. Such type of cases occurs with automatic email accounts creation, login into hacked accounts etc., which exploits valuable resources. These attacks can be handled by our CAPTCHA's. It is clear that these evil things are not manually done by any Human i.e., they are the results of their evil automated scripts. So, it can be solved if we are able to decide whether it is a machine generated request or a human user. So, asking to solve a CAPTCHA can deny the evil requests and can allow the genuine human users. In other way, CAPTCHA's protects the valuable Internet resources and prevent evil attacks. The CAPTCHA's are simple solution to resolve the question "Are you a Human?"

The people who are ignorant of the attacks in the Internet field, the loss caused by them and ignorant of the role of CAPTCHA's in ensuring the security thinks that there is no sense in solving the CAPTCHA puzzle which seems to be very simple for humans to solve. In fact, many such people may think of it as a useless thing which wastes their time. This paper helps to understand the importance of CAPTCHA's, its architecture, the different types of CAPTCHA's, issues regarding CAPTCHA's and some attacks on these CAPTCHA's. Finally, we conclude after describing the latest type of CAPTCHA.

Types of CAPTCHA

The CAPTCHA's are mainly of two kinds based on the way the CAPTCHA's are designed [1, 2]. They are Visual CAPTCHA and Audio-based CAPTCHA. As the names explains by themselves, the Visual CAPTCHA's are designed to solve by seeing and observing them and Audio-based one's are designed to listen and analyze the audio provided. In both cases, the results of observation or analysis are the answers to the questions they ask. Coming to the specific kinds of CAPTCHA's, the widely used CAPTCHA's are Visual CAPTCHA's and the different kinds of Visual CAPTCHA's are:

Text-based and


These two differ just in the contents they have. Pure text-based CAPTCHA's are Gimpy, Baffle text and pure Image-based ones are PIX, Bongo. Some CAPTCHA's which combine both Images and Text are Pessimal Print etc., Text-based CAPTCHA's are popular ones and adapted by many commercial companies as well as non-commercial organizations too. Its popularity is due to its simplicity. Let us see brief descriptions of each type of CAPTCHA [1, 2].


This type of CAPTCHA is a simple text-based one. Typically GIMP means a General Image Manipulation Program. Gimpy CAPTCHA uses twisted or deformed or misrepresented (in general distorted) text or words on an Image. The image may also be distorted. It is based on the philosophy that machines find it extremely difficult to read such kind of stuff while humans find it easy. While using such CAPTCHA's, care is taken in order to prevent attack using Optical Character Recognition (OCR) technique. It has to be designed such that present day OCR cannot solve it. But the drawback is they are designed to use dictionary words. This makes it possible for the attackers to use random guesses which may lead to dictionary attack.

Baffle Text

It typically uses a word which can be uttered by humans and represent them in such a way that little portions of the characters are missing. As some portions are missed it is extremely difficult for OCR to predict them, because OCR tries to recognize character by character with the help of the sequence of curves, lines or clutters and predicts the character by matching this sequence with the database associated with it. Therefore missing parts of the characters have highest probability of misleading the OCR. On the other hand, as the words can be uttered or in other words they are pronounceable, it is easy for humans to understand the word. But, because of this, there is a significant chance for the dictionary attack in this case also.


This is an Image-based CAPTCHA. It is given with a database of pictures of some simple objects in different forms. It also has a list of such objects. So, for forming a CAPTCHA puzzle, it randomly selects an object and then picks some 'x' number of pictures from the database that matches with the object and presents them to the user. As safety step to prevent attacks, the images can be deformed or twisted or distorted before presenting them to the user. Now, the user needs to determine the name of the object and that is the solution to this type of CAPTCHA puzzle. For example, if it selects the object fruit as an object, it can select the pictures of apple, banana, pineapple etc., and present them to user. The user can easily say that they are fruits and hence the puzzle is solved. But, the computers cannot match these and moreover cannot do anything with the distorted images. But, usability is the major issue with using this type of CAPTCHA's and hence they are less used.


It is also an Image-based CAPTCHA. But, in this one, the user will be given two groups of Images [1, 2]. Some images may be common in both sets or may not have any common images. In both cases, the images in two sets differ in certain properties like color or transparency or boldness etc., Now, the user will be given an image and will asked to decide to which set the image belongs to. The number of possible answers to this type of CAPTCHA is the number of sets presented to the user. As these are less in number, there is a high probability that a random guess can be correct and hence it is easily prone to Brute-force attack. Therefore, this type is not secure and hence not used.

Pessimal Print

It is a mixed type of CAPTCHA. It uses both text and images. It uses a word and then uses a degraded image as a background and may sometimes use some confusing fonts and merges them as a single image. Such images are called Pessimal Prints. The users are asked for the embedded word in the pessimal print image. As images which are degraded are used as backgrounds, it makes it more difficult for OCR to predict the characters.

Audio-based CAPTCHA's are different from Visual ones. Their main aim is to make it easier for human users with visual defects. Beside this advantage, these are also not as popular as the text-based CAPTCHA's. The Audio-based CAPTCHA's are the ones which present a small audio clip to the users. These audio clips contain some words, numbers or mixture of them and presented with some noise. The noise levels are maintained so that it doesn't affect the human audibility i.e., humans can easily recognize the words or numbers apart of the added noise. They can hear the Audio clip any number of times. They are asked to type what they hear. Like OCR to Gimpy, Speech Recognition software's are the threat of attacking this kind of CAPTCHA's. The reason for adding noise to the audio clip is to make the Speech recognition techniques fail.

Architecture showing CAPTCHA Usage

The following diagram shows the architecture of using CAPTCHA:

Figure 1: CAPTCHA Architecture [1]

The above architecture [1] shows how a CAPTCHA can be used with the Client-Server architecture. Whenever a client requests a resource or service from the server (It may be a general URL request also), the server generates a CAPTCHA puzzle to ensure that it is handling the request of a Human user. For generating the CAPTCHA, it makes use of the Resources associated with it. The resources may be the dictionary database, image database or Image manipulation programs etc., now, the server sends the CAPTCHA to the client and it is presented to the user. We can also make use of CAPTCHA providers like reCAPTCHA etc. Using such providers can help eliminate the resources required for the CAPTCHA generation. Now, the user needs to solve the CAPTCHA and submit the solution to the Server. The Server validates the solution. If it is right, the request is granted. If it is wrong, then either the client's request can be rejected or the client can be given another CAPTCHA puzzle to solve. This mechanism shows the enhanced security to protect from any automated attacks.

From this architecture, we see that there is an increased load on the Server side. So, in order to reduce this we can shift the validation task to the client side. Also, we need to use some unique identities for each CAPTCHA, so that if server sends CAPTCHA's to two or more of its clients and receives all the solutions at a time, then with the help of this identity it can be able to resolve which solution came from which client.

Characteristics of CAPTCHA's

The characteristics of CAPTCHA are as follows [1]:

The CAPTCHA's should be able to be generated automatically (as the name itself suggests) independent of their type. Besides this, the Validation task should also be made automatic. The reason behind this is that it is required to generate and validate unknown number of CAPTCHA's for each Server. If they cannot be automated, it becomes impractical for using Human resources to do such things. So, a CAPTCHA should possess this characteristic.

The methods used to generate CAPTCHA's should be made available to public.

The Server should be ignorant of solving the CAPTCHA's generated by itself. It is also mandatory that solutions should be maintained separate from the resources used for generating CAPTCHA's. This characteristic eliminates the chance of the evil attacker to use the Server as a weapon to defeat CAPTCHA's.

Fundamental Issues with CAPTCHA's

There are two fundamental issues with CAPTCHA's: Usability and Robustness [3]. Usability is the issue related to the aspects that normal user is concerned with. Robustness is the issue that deals with the security issues of CAPTCHA. Let us discuss about each of these issues in detail.

Usability Issues

As we already discussed that Text-based CAPTCHA's are the most widely used CAPTCHA's. The Usability can be defined in terms of Learnability, Efficiency, Memorability, Errors and Satisfaction [3]. Coming to each term, 'Learnability' means, for the first time user's how simple and easy is the solving task of CAPTCHA is. 'Efficiency' describes, the speed of solving the CAPTCHA's after they are used to (i.e., familiar) with a CAPTCHA design. 'Memorability' describes how simple it is to memorize. It means that, if a user didn't used a CAPTCHA design for a long time and then got to use it, then how easily he can recollect the usage of it and can solve without thinking as a naive user. 'Errors' describes the error prone nature of the CAPTCHA design. It can also be used to assess the severity of common errors that are being made by the users. 'Satisfaction' is used to assess the user satisfaction which includes the pleasantness of the design etc.

Basically, CAPTCHA's are very simple problems for humans to understand and solve. So, there are by default learnable and memorable for humans and hence these two issues need not be concerned. So, we are left to deal with the remaining three issues. When we make these three issues specific to the CAPTCHA's, they resemble the Accuracy of the user in solving the CAPTCHA, the time taken by the user to solve the CAPTCHA (i.e., the Response time) and the way the presentation of the CAPTCHA is (This affects the User satisfaction). The Accuracy helps in addressing the Efficiency and Errors issues. The other two addresses the User Satisfaction issue.

But, at this level we cannot suggest how to improve the Usability of a CAPTCHA design with the help of these factors. So, we relate these factors to the features of the Text-based CAPTCHA's so that we can see how the Usability can be improved. The features are Distortion, Content and Presentation [3]. Let us see each of them in detail.

Distortion: As we already discussed that Distortion means twisting or misrepresenting the characters in a text-based CAPTCHA, it had great effect on the Readability of the characters by humans. Distortion can be done in four common ways. They are Translation, Rotation, Scaling and Wrap. These terms are geometrical terms which deal with the orientation and alignment of the objects. 'Translation' means placing the characters below or above the baseline or moving them along X-axis i.e., baseline so that the gap between the characters may increase or decrease. This may lead to the overlapping of the characters also. 'Rotation' itself tells what's happening. Yes, it means rotating the characters either clockwise or anti-clockwise direction. 'Scaling' means altering the size of the characters other than their original. This can be done along either axes and finally results in the characters appearing as they stretched (elongated) or compressed. 'Wrap' is different from the others because it is the distortion related to the images rather than characters. This is the elastic distortion of the background images used in the Text-based CAPTCHA's. Any of these or a mix of these distortion techniques can be used in designing CAPTCHA's, but their Readability depends on the level of the distortion used. It should have an optimal value so that it doesn't produce any CAPTCHA that is impossible for a human to read.

Moreover, Distortion can also result in introducing Confusing characters in the CAPTCHA's. This happens when some characters occur consecutively and some distortion is applied. For example, if there are letters 'l' and 'o' consecutively and 'o' is translated up and moved left near to 'l', then it may appear like 'p' to the user [3]. It leads the User to a confused state and may lead to a wrong solution. These confusing characters can occur as a result of letters and digits, digits and digits, letter and letters and also characters (either letter or digit) and clutters. 'Clutters' are some random arcs (may appear as lines sometimes) that are introduced in some Text-based schemes to improve the security. But, they sometimes lead to confusing characters like if a vertical line appears randomly in an appropriate position in a randomly generated text, then it may lead to a confusion of whether it is digit '1' or letter 'l' while it is not at all a part of the text. So, care should be taken such that confusing characters do not occur in the CAPTCHA. This can be achieved by maintaining a list of characters that should not appear as a pair or by controlling the distortion levels and by concentrating on the location of the clutters.

Content: It is the character set that is used to generate the Text-based CAPTCHA's. It is also a factor of security because if the character set is too small, then random guess is more possible and the brute force attack has more probability to break a CAPTCHA. So, considering this point, it is better to have a large character set. But if there is a large character set, there is more chance for confusing characters to appear. For example, if only letters are used for our character set, then the confusing characters involving digit and digit combination, letter and digit combination would be eliminated. So, depending on the type of use, the character set should be selected.

Now, whether randomly generated text or dictionary based words are being used is another issue. Using dictionary words gives a chance to dictionary attack whereas using randomly generated text affects the readability of the text. In addition to this the String length is to be considered. It is a difficult task for a user to interpret a randomly generated text of long string length. So, using randomly generated text with a considerable string length would solve both these usability and security issues. Beside these issues of using dictionary words or randomly generated text, care should be taken so that offensive words do not appear in the CAPTCHA's. This may cause a usability problem for some users.

Presentation: This describes the way the CAPTCHA is presented to the User. It includes the color, font, size etc. The use of colors has some effects on both usability and security issues. In terms of usability, using multiple colors may have negative effect on users with color blindness. If colors are not used in an appropriate way, they may also result in difficulty reading them rather than enhancing it. In the security view, it gives scope to the segmentation attack (will be discussed in Robustness issue). So, it is better either to use colors in appropriate way or not to use any colors(with respect to both usability and security issues)

The CAPTCHA's should also be taken care such that they are integrated into the web pages securely. For example, if the solution box is not enabled while presenting a CAPTCHA on a webpage, it may give chance to an attacker to enable his own text box to capture the solution. This may result in breaking the CAPTCHA. So, they should be properly integrated into web pages.


'Robustness' is the issue related to the security of the CAPTCHA. It means how secure a CAPTCHA is from being broken by a computer program [4, 5]. It doesn't address the security with respect to the usage of CAPTCHA's. It means that, the attacks like using the Session ID of the old session in which a CAPTCHA is solved or redirecting CAPTCHA to other innocent users and making them to solve it etc., The attack in which the evil attacker redirects CAPTCHA challenges to unknown users and make them to solve CAPTCHA is termed as 'CAPTCHA smuggling' [6].It only addresses the security issues regarding the design of CAPTCHA. As the case of Usability, we discuss Robustness also with respect to the Text-based CAPTCHA's.

As we discussed in the section describing the types of CAPTCHA's, the text-based CAPTCHA's are mainly taken care of being resistant to the OCR. Even then there are many other simple techniques that can identify the characters in the given CAPTCHA. So, it is important to understand the difficulties being faced by these techniques and then improve the design so that it defeats those techniques. Coming to the view of these techniques, the challenges remain identifying the locations of the characters in the CAPTCHA. Once the locations are identified, identifying what character it is not much difficult.

The attacks which try to identify the location of the characters in the text-based CAPTCHA's are called Segmentation attacks. Now, let us see the different Segmentation attacks depending on the text-based schemes used. Let us initially consider the CAPTCHA's which are generated by random-shearing distortion. There are different such schemes which differ in the character sets and the string length being used by them. Whatever the scheme may be, the common flaw found in their design after profound observation and study is that, the characters used in this category of CAPTCHA's are of unique number of pixels. Some have the same number, but considerably have different numbers. So, using this simple statistics, one can identify what character it is based on a simple look-up table. This solves the simple problem of identifying characters.

Now, coming to the challenging Segmentation problem, this scheme uses two colors wiz one background color and one foreground color. A simple algorithm was proposed in [5], gives a step-by-step procedure of identifying the characters in this type of CAPTCHA one by one. It can be described as:

Identify Foreground and Background Colors.

Identify a Foreground Color pixel and

Find its neighboring pixels one by one till all neighboring pixels are found.

This will be one character.

If it is noise(i.e., pixel count of this group << minimum pixel count in the look-up table) then leave it

Else perform look-up table matching,

If only one match occurs, the corresponding character is retrieved

Else if multiple matches occur, a random decision is taken (similar to brute force with high probability)

Consider all the Foreground Color pixels except the ones already found into one group.

Use this group as input for step 'b' and continue

Loop steps b, c and d until no more Foreground pixels are left.

Therefore, mixing these two solutions would give the result which breaks this type of CAPTCHA's. As this algorithm proceeds by identifying character by character, it is termed as 'Color Filling Segmentation' (CFS) [5] algorithm.

Now, in order to overcome this security hole, another scheme came up which introduced 'clutters'. This scheme was introduced by Microsoft. As we already discussed, introducing the clutters increases the possibility of confusing characters. But, it defeated the previous algorithm as it was unable to distinguish between clutters and characters. So, it was able to break this scheme with poor True Positive rate. So, they challenging problem here in addition to using the CFS is identifying what is a clutter and what is a character. This is achieved by using Vertical Segmentation method and Relative positioning criteria are used in addition to the CFS [5]. Let us see this technique briefly:

First take an optimal value and map all the pixel values to Black or White and make the image a two color image.

Identify the foreground and background pixels.

Consider a matrix with the dimension of the image and the values represent the presence or absence of foreground pixels.

Divide the image into vertical segments based on this matrix values (i.e., Divide the image at the columns which have no Foreground pixels). This result in several image segments called 'chunks'. [4]

Now, for each chunk apply CFS

For each object identified, check the relative position with the others.

Objects with similar relative position are considered as characters and the remaining are left as arcs or clutters.

The last step is done based on the conclusions drawn from observation and study of many such CAPTCHA's. It was basically observed that characters are more likely to align near the baseline where as arcs are not.

So, using the differentiation made based on this conclusion increased the accuracy of the algorithm. Similarly, by identifying the vulnerable characteristic of a particular scheme, the corresponding attack can be designed. Each time an attack is made successful, the design has to be altered in order to overcome that attack. Thus robustness of a CAPTCHA design depends on the attacks identified till the time of its proposal.


We have seen what a CAPTCHA is and discussed about different types of CAPTCHA's and their drawbacks. As the text-based CAPTCHA's are the most widely used CAPTCHA's, we discussed the Usability and Robustness issues with respect to this kind of CAPTCHA's. This discussion concludes that to achieve the goal of CAPTCHA's which is 'It should have the probability of at least 90% for humans to solve and the probability for the machines to solve should not exceed 0.01%'; proper usability and robustness levels should be maintained for a better CAPTCHA design. A new kind of CAPTCHA, called Zhang's CAPTCHA [2] is the newest one, which is ahead of the attacks that are discussed here. But, it is an Image based CAPTCHA, in which a set of images will be presented to the user and he is asked to drag and drop the image of particular object into a specific area. This is the basic idea behind it. But we should wait and see how long this CAPTCHA goes well without any successful attacks on it.