Handwritten character recognition

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Chapter 2

Literature Review

Handwritten Character Recognition:

Since the beginning of writing as a form of communication, paper prevailed as the medium for writing. Electronic media is replacing paper with time. Because it preserves space and is fast to access, electronic media are constantly gaining esteem. The convenience of paper, its pervasive used for communication and archiving, and the quantity of information already on paper, press for quick and accurate methods to automatically read that information and adapt it into electronic form [Albadr95].

The latent application areas of automatic reading machines are numerous. One of the earliest, and most thriving, applications is sorting checks in banks, as the volume of checks that circulates daily has proven to be too huge for manual entry. Other applications are detailed in the next section [Govindan90, Mantas86].

The machine imitation of human reading (i.e. handwritten character recognition) has been the subject of widespread research for more than five decades. Character identification is pattern recognition application with a crucial aim of simulating the human reading capabilities of both machine printed and handwritten cursive text. The currently available systems may interpret faster than humans, but cannot reliably read such a wide diversity of text nor consider context. One can say that a great quantity of further effort is required to, at least, narrow gap between the machines learning and human understanding capabilities. The practical significance of HCR applications, as well as the interesting nature of the HCR problem, has lead to great research interest and assessable advances in this field. Now, commercial HCR systems for Latin characters are commonly accessible on personal computers achieving recognition rates above 99% [McClelland91, Welch93]. Further, systems on the market can now interpret a variety of writing styles (e.g., hand-written, printed Omni-font), and character sets including Chinese, Japanese, Korean, Cyrillic, and Arabic.

Since the 50s, researchers have carried out far-reaching work and published many papers on character recognition. Nearly all of the published work on HCR has been on Latin, Japanese or Chinese characters. This has started since the median 40s for Latin, the middle of the 1960s for Chinese and Japanese. The following are positive surveys and reviews on Latin character recognition. Reference may be made to [Mori92] for historical appraisal of OCR research and development. The survey of [Govindan90] includes surveys of other languages; [Mantas86] has an overview of character identification methodologies, [Impedovo91] on commercial OCR systems, [Tian91] on machine-printed OCR, [Tappert90, Wakahara92] for on-line handwriting identification. [Suen80] has a survey on automatic identification of hand printed characters (viz. numerals, alphanumeric, FORTRAN, and Katakana), while [Nouboud90] produced a review of the recognition of hand-printed (non-cursive) characters and conducted beta tests on a business system. [Bozinovic89, Simon92] surveyed off-line cursive word recognition, Jain et al [Jain2000] reviewed statistical pattern recognition methods, and [Plamondon2000] comprehensive survey of online and offline handwriting identification. Two bibliographies of the fields of HCR and document scrutiny appeared in [Jenkins93, Kasturi92]. [Stallings76, Mori84], produced surveys on identification of Chinese machine- and hand-printed characters, respectively, and Liu et al [Liu2004] addressed the state of the art of online identification of Chinese characters.

Arabic Character Recognition:

Although almost one billion people world-wide, in several diverse languages, use Arabic characters for writing (Arabic, Persian, and Urdu are the most noted examples), Arabic character identification has not been researched as thoroughly as Latin, Japanese, or Chinese. The first published work on Arabic character acknowledgment may be traced back to 1975 by Nazif [Nazif75] in his master's thesis. In his thesis a system for the identification of printed Arabic characters was developed based on extracting strokes that he called radicals (20 radicals are used) and their positions. He used correlation between the templates of the deep-seated and the character image. A segmentation phase was included to segment the cursive text. Years later Badi and Shimura [Badi78, Badi80] and Noah [Nouh80] toiled on printed Arabic characters and Amin [Amin80] on hand-written Arabic characters. Surveys on AOTR may be referred in [Amin85a, Amin98, Shoukry89, Jambi91, Albadr95, Nabawi2000, Ahmed94].

On-line systems are restricted to recognizing hand-written text. Some systems recognize remote characters [Ali89, Amin80, Amin85b, Amin87, ElSheikh89, ElSheikh90b, ElWakil87, ElWakil89, Saadallah85] and hand-written mathematical formulas [ElSheikh90c, Amin91b], while others recognize cursive words [Badi78, Badi80, Badi82, Amin82a, Amin82b, Shaheen90, AlEmami90]. Since the segmentation problem in Arabic is non-trivial the concluding systems deal with a much harder problem.

While several off-line systems use video cameras to digitize pages of text (e.g., [Abbas86, Goraine92, Amin86, HajHassan85, HajHassan90, Nouh80, Nouh87, Nouh89, Sarfraz2003, Sarfraz2004]), the inclination now is to use scanners with resolutions ranging from 200 to 400 dots per- inch (e.g., [AbdelAzim89c, AbdelAzim90a, AlYousefi88, Amin91a, Bouhlila89, ElDabi90, ElSheikh88a, Ramsis88, Sarfraz2003a, Sarfraz2003b, Zidouri2002, Zidouri2005]). Scanners set up less noise to an image, are less pricey, and more convenient to use for character recognition, especially when coupled with automatic document feeders, automatic Binarization, and image enhancement.

Among the off-line systems that identify hand-written isolated characters are [Abuhaiba90, AlYousefi90, AlTikriti85, ElDesouky92, Hyder88]. [Abbas86, AbdelAzim89b, Goneid92] identify hand-written Arabic (Hindi) numerals, and [Badi80, Badi82, Goraine92, Jambi92, Zahour91] distinguish hand-written words. The majority of off-line systems distinguish typewritten cursive words [AbdelAzim89c, AbdelAzim90a, Bouhlila89, ElDabi90, Amin86, ElKhaly90, ElSheikh88b, Goraine89, Khella92, Margner92, Nazif75, Nouh87, Ramsis88, Tolba89, Tolba90, ElRamly89c, HajHassan90, HajHassan91], while [ElShiekh88a, Mahdi89, Mahmoud94, Nouh80, Nouh89, NurulUla88, Fayek92, Sarfraz2005d, Zidouri2005] identify only typewritten isolated characters. The systems of [Abdelazim90b, AlBadr92, ElGowely90, Kurdy92, Fakir93] are intended to recognize typeset words. One of the systems [Abdelazim89a] recognizes bilingual (Arabic/Latin) typewritten words. Examples of systems for detection of other languages that use Arabic script are [Parhami81, Yalabik88, Hyder88], which are designed for the identification of Persian, Ottoman (Old Turkish), and Urdu, respectively.

Uses Of Optical Character Recognition:

Optical character recognition technology has many practical applications that are independent of the treated language. The following are some of these applications:


For cataloging bank checks since the number of checks per day has been far too large for manual arrangement.

Data Processing:

For inflowing data into commercial data processing files, for example inflowing the names and addresses of mail order customers into a database. In addition, it can be worn as a work sheet reader for payroll accounting.

In Postal Department:

For postal address reading, cataloging and as a reader for handwritten and printed postal codes.


Premium typescript may be read by recognition equipment into a computer typesetting system to keep away from typing errors that would be introduced by keypunching the text on computer peripheral equipment.

Use by Blind:

It is used as a reading abet using photo sensor and tactile simulators, and as a sensory aid with sound output. Additionally, it can be worn for reading text sheets and reproduction of Braille originals.

In Facsimile Transmission:

This procedure involves transmission of pictorial data over communications channels. In practice, the pictorial data is mainly text. Instead of transmitting characters in their pictorial representation, a character identification system could be used to recognize each character then transmit its text code. Finally, it is worth to say that the major potential application for automatic character identification is as a general data entry for the automation of the work of an ordinary office typist.

Development Of New HCR Techniques:

As HCR and OCR research and development advanced, demands on handwriting identification also increased because a lot of data (such as addresses written on envelopes; sums written on checks; names, addresses, identity numbers, and dollar values written on invoices and forms) were written by hand and they had to be pierced into the computer for processing. But early HCR techniques were based generally on template matching, simple line and geometric features, stroke detection, and the extraction of their derivatives.

Such techniques were not classy enough for practical identification of data handwritten on forms or documents. To cope with this, the Standards Committees in the United States, Canada, Japan, and some countries in Europe designed some handprint models in the 1970s and 1980s for people to write them in boxes [7]. Hence, characters written in such specified shapes did not diverge too much in styles, and they could be recognized more easily by OCR machines, especially when the data were pierced by controlled groups of people, for example, employees of the same company were asked to write their data like the advocated models. Sometimes writers were asked to follow certain bonus instructions to enhance the quality of their samples, for example, write big, close the loops, use simple shapes, do not link characters, and so on. With such constraints, OCR detection of handprints was able to flourish for a number of years.

Recent Trends And Movements:

As the years of exhaustive research and development went by, and with the birth of several new conferences and workshops such as IWFHR (International Workshop on Frontiers in Handwriting Recognition), 1 ICDAR (International Conference on Document Analysis and Recognition), 2 and others [13], identification techniques advanced rapidly. Moreover, computers became much more authoritative than before. People could write the way they normally did, and characters need not have to be written like specified models, and the subject of unimpeded handwriting recognition gained considerable momentum and grew swiftly. As of now, many new algorithms and techniques in pre-processing, feature extraction, and powerful classification methods have been urbanized [8, 9].