Gujarati Hand Written Numeral Optical Character Reorganization Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In the OCR proposed for Gujarati numerals recognition, profiles are used for feature extraction and ANN is used for classification. Different size numerals are reconstructed in the size of 16 X 16 pixels, using nearest neighborhood interpolation (NNI) algorithm. The features of Gujarati digits are abstracted by four different profiles of digits. A very simple but effective feature extraction technique which uses four different profiles viz Horizontal, vertical, and twodiagonals as templates to identify various Gujarati Numeral digits. A multilayered feed forward back propogation neural network is used for classification. The proposed multilayered neural network consists of three layers with 94, 50, and 10 neurons, respectively. The input layer is the layer which accepts the profile vector which is of 1x94 insize and it has 10 neurons in the output layer. The proposed network achieved 82 % of recognition rate.

2010- Word level Script Identification from Bangla and Devanagri Handwritten Texts mixed with Roman Script,


Script separation technique was proposed by [ ]. Text lines and words are extracted from the document pages using a script independent Neighboring Component Analysis technique. Multi Layer Perceptron (MLP) based classifier is designed for script separation, trained with 8 different wordlevel holistic features. On independent text samples, word-level script identification accuracies of 99.29% and 98.43% are achieved.

2010- Segmentation of Handwritten Hindi Text

International Journal of Computer Applications (0975 - 8887), Volume 1 - No. 4

In skew variable line text document header line detection is one the most challenging task. A new line segmentation approach is proposed by [ ] is based on the structure of the Handwritten Hindi text document. Two-stripe projection appraoch is used for detecting the header and base line in the document. The stripe height is computed by the statistical information of the average line height. Segmentation of character is carried out by vertical projection method. Horizontal profile information is used to identify the presence of upper modifiers and lower modifiers, after extracting the header position in the text line. Vertical projection is made and the column with zero black pixels is treated as delimiter for character separator. Segmentation accuracies of 91.5%, 98.1%, 79.12%, 95.5% and 82.6% are reported for Line, Word, Conosonants, Ascenders And Lower Modifiers respectively.

2010- Preferred Computational Approaches for the Recognition of different Classes of Printed Malayalam Characters using Hierarchical SVM Classifiers

International Journal of Computer Applications (0975 - 8887) Volume 1 - No. 16

A novel technique is proposed [ ] to segment printed Malayalam Characters. Malayalam characters can be represented with a maximum of three segments. The first segment could have either none or possibly one or two left vowel signs (a unique case). The second segment would be the core character which could be either a vowel or a consonant or a conjunct while the third segment could again have either none or one of the seven right vowel signs. The subcharacters within the word are extracted using the smaller valley of vertical projections. The logic used behind the choice of search space for the classifier is based on the sequence of arrangement of the segments of the character. This logic facilitates accurate segmentation.

Author utilised the feature extraction technique which is based on the distinctive structural features of machine-printed text lines in these scripts. The final recognition is achieved through Support Vector Machine (SVM) classifiers. The proposed algorithms have been tested on a variety of printed Malayalam documents. Recognition rates between 97.72% and 98.78% have resulted.

2009- Kannada Character Recognition System: A Review

Inter JRI Science and Technology, Vol. 1, Issue 2, pp 31-42, July 2009

K. Indira, S. Sethu Selvi

Kumar and Ramakrishnan [ ] described a three-stage character segmentation for separating Kannada characters from the segmented word. Three line segmentation of character involves the division of each into three segments: Top zone consists of top matras, middle zone consists of base and compound characters and bottom zone consists of consonant conjuncts. Head line and base line information is extracted from the Horizontal Projection Profile. Head line refers to the index corresponding to maximum in the top half of the profile base line refers to the index corresponding to maximum in the bottom half of the profile. Using the baseline information, text region in the middle and top zones of a word is extracted and its VPP is obtained. The consonant conjuncts are segmented separately based the on Connected Component Analysis (CCA).

The CCA approach fails in the case where the conjunct is connected to the character in the middle zone and in the consonant segmentation. The segmentation technique based on CCA would falsely recognize the consonant conjunct for both the characters. K.Indira and S.Sethu Selvi described [ ] a method to segment the characters address this problem. Bottom matra is isolated from the remainng character by removing the middle zone characters before applying the CCA.

Kannada is a non-cursive script and the individual characters in a word are isolated. Spacing between the characters can be used for segmentation. The presence of conjunct-consonant (subscripts) characters, subscript character position overlaps with the two adjacent main characters in vertical direction, in the script leads to fail in the character segmentation process using VPP.

Kunte and Samuel proposed [ ] a two stage method for segmentation of Kannada characters where the subscripts are separated in the first step using CCA from the word and the remaining characters in the word are extracted by using VPP.

2010- Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script

Galaxy Bansal Dharamveer Sharma

International Journal of Computer Applications (0975 - 8887) Volume 1 - No. 24

Segmentation of handwritten words is a challenging task primarily because of structural features of the script and varied writing styles. Handwritten words are also prone to the problem of overlapped, connected, merged and broken characters. Based on structural properties of Gurmukhi script, different zones viz upper, middle and lower zone across the height of word are detected. Upper zone information used to extarct vowels. Middle zone is used to extarct consonants and some subparts of vowels. Vowel has single connectivity with headline, but consonant can be connected to headline at one or two locations. Lower zone information is used to extarct some vowels and certain half characters lie in the foot of full character.

A new approach is proposed[ ] to identify the headline in handwrittern gurumuklhi text where there are uneven headline, touching and overlapping characters in all three zones (upper, middle and bottom). This method is based on the statistical analysis of the pixel distribution of the script in all three zones.

Segmentation accuracy of 72.6% has been achieved with the use of the algorithms for segmenting all types of words. Segmentation accuracy of 88.1% has been achieved for segmenting all types of handwritten words in Gurmukhi script.

2010- Invariant Moments Based Feature Extraction for Handwritten Devanagari Vowels Recognition, R. J. Ramteke

International Journal of Computer Applications (0975 - 8887) Volume 1 - No. 18

Handwritten Devanagari Character Recognition (HDCR) system is proposed [ ] to recognize handwritten devanagari vowels which explores the use of an Invariant Moments. The technique is independent of size, slant, orientation, translation and other variations in handwritten vowels. The header line (Shirorekha) plays vital role for segmentating the devanagari words. Vertical and horizontal projection are adapted to isolate the vowels form different groups. An attempt is made to enhence the performance of the system by computing invariant moments by small perturbation in image and information is extracted from the perturbation. Individual image is normalized to 40X40 pixel size. The Fuzzy Gaussian Membership function has been adopted for classification. The success rate of the method is reported as 94.56.

2010- Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier, Nibaran Das et al.

Journal of Computing, Vol. 2, Issue 2, Feb 2010

A novel approach is presented [ ] for recognition of handwritten compound Bangla characters, along with the Basic characters of Bangla alphabet which uses MLP and SVM classifiers. This approach is based on identifying the compound character classes from most frequently to less frequently occured ones, i.e., in order of importance in the state of art bangla literature. Only first 55 characters which cover frequency wise 90% occurrences out of 160 compound charatcers. The average recognition rate of 79.25% using MLP and 80.510% using SVM after three fold cross validation of data.

2010- A Novel approach for handwitten devanagari character Recognition

Sandhya Arora et al.


A method for recognition of handwritten devanagari characters is described [ ] is absed on feature vector which is constituted by accumulated directional gradient changes in different segments, number of intersections points for the character, type of spine present and type of shirorekha present in the character. One Multi-layer Perceptron with conjugate-gradient training is used to classify these feature vectors. This method reported a recognition of 88.12%

2009- Transliteration Based Text Input Methods for Telugu

V.B. Sowmya and Vasudeva Varma

W. Li and D. Mollá-Aliod (Eds.): ICCPOL 2009, LNAI 5459, pp. 122-132, 2009.

Springer-Verlag Berlin Heidelberg 2009

Sowmya et al. Proposed [ ] a transliteration based text input method in which the users type Telugu using Roman script. This method is based on edit-distance based approach light-weight system with good efficiency for a text input method. This approach is tested with three datasets - general data, countries and places and person names and found worked considerably well.

2009- Segmentation of Touching Characters in Upper Zone in Printed Gurmukhi Script

M. K. Jindal, R. K. Sharma and G. S. Lehal

Compute 2009, Jan 9, 10, Bangalore, Karnataka, India. ACM ISBN978-1-60558-476-8

A new technique is presented[ ] by Jindal et al. for segmenting touching characters in upper zone of printed Gurmukhi script. This technique is based on the structural properties of the Gurmukhi script characters. Top profile projection information is used to identify the Concavity and convexity of the characters which is further used to segment the the touching characters in upper zone. Recognition rate of 91% is reported.

2009- Rule based segmentation of lower modifiers in complex Bangla scripts

Md. Abul Hasnat Mumit Khan

Abul Hasnat and Mumit Khan presented[ ] a dissection based lower modifier segmentation method which segments the lower modifiers in the Bangla scripts. This approach avoids the over-segmentation of the units that do not actually contain any lower modifier, leading to unacceptably high error rates during segmentation. The methodolgy adopted here is devided into four tasks- first Lower modifier separator line is identified using character height information and identify the primary lower modifier. The lower modifier unit are extracted using the features of the core units and the lower modifiers. The empirical rules like aspect ration along with the aid of dictionary information is used to eliminated the segmentation errors. It is reported a segmentation accuracy of 99.6%.

2009- Offline Handwritten Character Recognition of Gujrati Script using Pattern Matching

Jayashree R. Prasad, Dr. U.V.Kulkarni and Rajesh S. Prasad

Jayashree et al. Proposed[ ] Pattern matching method for recognition of character in Gujrati script. In this method shape and featurers of the character are used distinguish each character and Neural Network is used for pattermatching. The PNG file format of the handwritten character image document applied with median filter to removes the salt and pepper noise. Image is inverted and thinned and image is segmented using 8x8 grid of the character. This data of 8x8 matrix i.e. 64-bits is fed to Neural Networks. The model of Neural network is fed forward to the template matching algorithm.

2009- Identification of Telugu Devanagari and English scripts using discriminating features

M C Padma and P A Vijaya

International Journal of Com. Science & Infor. Tech. (IJCSIT), Vol 1, No 2, November 2009

Padma et al. Proposed [ ] a method to discriminate printed multi-script lines in a tri-lingual document by using projection profile features. The distinct visual appearance of every script is due to the presence of the segments like - horizontal lines, vertical lines, upward curves, downward curves, descendants and so on. The presence of such segments in a particular script is used as visual clues for a human to identify the type of even the unfamiliar script. This method uses the distinct features extracted from the top and bottom profiles of each printed text lines. The performance of this method is reported as 99.67%.

2009- Elimination of splitting errors in printed Bangla scripts

Md. Abul Hasnat Mumit Khan

Abdul Hasant and Mumit Khan proposed[ ] a dissection based segmentation method which eleminates the errors in over segmentation under wide range of document images. In the segmentation process pixel information of the basic units are kept intact because they are sensitive to splitting errors. Feature information of the units in a word of Bangla script is used and segmentation is carried out based on the several rules. Success rate of 99.93% in eliminating the splitting errors is reportd.

2009- Development of a Multi-User Recognition Engine for Handwritten Bangla Basic Characters and Digits

Sandip Rakshit et al.

Proc. (CD) Int. Conf. on Information Technology and Business Intelligence (2009)

Sandip Rakshit et al. describes [ ] a process to develop a multi-user recognition engine for handwritten Bangla basic characters and digits usignt essaract OCR engine. Handwritten data samples of isolated Bangla basic characters and digits are collected from three users. Tesseract is trained with user-specific data samples of document pages to generate separate user-models representing a unique language-set. Each such language-set recognizes isolated basic Bangla handwritten test samples collected from the designated users. The user specific character/digit recognition accuracies were reported as 90.66%, 91.66% and 96.87% respectively. The overall basic character-level and digit level accuracy of the system is reported as 92.15% and 97.37%.

The system fails to segment 12.33% characters and 15.96% digits and also erroneously classifies 7.85% characters and 2.63% on the overall dataset.

2009- Comparative analysis of Radon and Fan-beam based feature extraction techniques for Bangla character recognition

M. A. Naser †, Adnan Mahmud‡, T. M. Arefin, Golam Sarowar†, M. M. Naushad Ali

International Journal of Computer Science and Network Security, VOL.9 No.9, Sep 2009

Naser et al. presents [ ] a comparative analysis of two projection based feature extraction techniques namely Radon and fan-beam. Radon and fan-beam projections are used to compute feature vectors of Bangla characters. Extracted features are simulated for recognition and the recognition rates are compared for both of the methods. The feature vectors were generated from 4 different types of mostly used fonts of Bangla printed text. These features are classified using two classifiers namely Artificial Neural Network (ANN) Multi-layer Perceptron and k-nearest neighbor (KNN) classifier. It is reported that KNN is the best classifier with 99% efficiency for both Radon and fan-beam projections againest 98% and 67% efficiency with ANN for Radon and fan-beam projections respectively . It is reported that the time required by KNN is less than ANN.

2009- An Efficient OCR for Printed Malayalam Text using Novel Segmentation Algorithm and SVM Classifiers

Bindu Philip and R. D. Sudhaker Samuel

International Journal of Recent Trends in Engineering, Issue. 1, Vol. 1, May 2009

This paper describes an Optical Character Recognition (OCR) System for printed text documents in Malayalam, a South Indian language. Indian scripts are rich in patterns while the combinations of such patterns makes the problem even more complex and these complex patterns are exploited to arrive at the solution. The system segments the scanned document image into text lines, words and further characters and sub-characters. The segmentation algorithm proposed is motivated by the structure of the script. A novel set of features, computationally simple to extract are proposed.

Bindu Philip and R. D. Sudhaker Samuel proposed [ ] a segmentation approach which is based on the distinctive structural features of machine-printed text lines in malayalam script. A lateral cross-sectional analysis is performed along each row of the normalized binary image matrix resulting in distinct features. Support Vector Machine (SVM) classifier method is used in the recognition process and it is reported a recognition rates between 90.22% and 95.31 %.

2009- A Novel Zone Based Feature Extraction Algorithm for Handwritten Numeral Recognition of Four Indian Scripts

S.V. Rajashekararadhya P. Vanaja Ranjan

Digital Technology Journal 2009, Vol. 2, pp. 41V#SB { Technical University of Ostrava, FEECS, 2009.

Rajashekararadhya and Vanaja Ranjan propsed [ ] Zone centroid and Image centroid based angle feature extraction method for handwritten numeral recognition. The character centroid is computed and the image (character/numeral) is further divided in to n equal zones. Average angle from the character centroid to the pixels present in the zone is computed (one feature). Similarly zone centroid is computed (two features). Nearest neighbor, Feed forward back propagation neural network and support vector machine classifiers are used for subsequent classification and recognition purpose. It is reported that 97.3 %, 96.2 %, 93.5% and 93.6 % recognition rate for Kannada, Telugu, Tamil and Malayalam numerals respectively.

2009- A hierarchicalapproachtorecognitionofhandwritten Bangla characters

Subhadip Basu et al.


Due to possible appearances of consecutive characters of Bangla words on overlapping character positions, segmentation of Bangla word images is not easy. For successful OCR of hand written Bangla text, not only recognition but also segmentation of word images are important. In this respect the present hierarchical approach deals with both segmentation and recognition of hand written Bangla word images for a complete solution to hand written word recognition problem, an essential area of OCRof hand written Bangla text. In dealing with certain category of word segments, created on Matra hierarchy, a sophisticated recognition technique,viz., two-pass approach is employed here.

Subhadip Basu et al. presented [ ] a novel hierarchical approach for optical character recognition of hand written Bangla words. Two pass approach is employed here for recognizing middle zone character segments, this approach segments a word image on Matra hierarchy, then recognizes the individual word segments. Constituent characters of the word image are identified through intelligent combination of recognition decisions of the associated word segments. A powerful feature set is also proposed for recognition of complex character patterns using three types of topological features, viz., longest-run features, modified shadow features and octant-centroid features.

2008- Segmentation of Handwritten Text in Gurmukhi Script

Rajiv K,Sharma, Amardeep Singh

International Journal of Computer Science and Security, volume (2) issue (3) pp.12-17.

Character segmentation is an important preprocessing step for text recognition. The size and shape of characters generally play an important role in the process of segmentation. But for any optical character recognition (OCR) system, the presence of touching characters in textual as well handwritten documents further decreases correct segmentation as well as recognition rate drastically. Because one can not control the size and shape of characters in handwritten documents so the segmentation process for the handwritten document is too difficult.

Rajiv K Sharma and Amardeep Singh proposed [ ] a method to segment the hand written text in Gurmukhi script. This method uses the profile information to segment the lines, words and characters in the script.

2008- Handwritten Character Recognition of Popular South Indian Scripts

Umapada Pal, Nabin Sharma, Tetsushi Wakabayashi, and Fumitaka Kimura

Springer-LNCS Conf.

In[ ] Umapada Pal et al. proposed a quadratic classifier based scheme for the recognition of off-line handwritten characters of three popular south Indian scripts: Kannada, Telugu, and Tamil. Directional information is used to obtain the features of the characters. Features are obtained by segmenting the charater into block and down-sampling each block into sub blocks. The features of the down sampled block are fed to the modified quadratic classifier for recognition. Two sets of features are used for achieving high recognition speed and high acuuracy. 64-dimensional features are used for high speed recognition and 400-dimensional features are used for high accuracy recognition. It is reported that 90.34%, 90.90%, and 96.73% accuracy rates from Kannada, Telugu, and Tamil characters, respectively, from 400 dimensional features.

2008- Document Image Segmentation as a Spectral Partitioning Problem

Praveen Dasigi, Raman Jain and C V Jawahar

Sixth Indian Conference on Computer Vision, Graphics & Image Processing

In [ ] Praveen et al. describes proper segmentation based approach based on the spectral properties of the pairwise similarity matrix. This is based on the global properties of the document. Euclidean distance, Co-occurrence probability, Whitespace area, Gutter area and Global geometry boosts are the paraqmeters used for segmentation of the document image.

2008- Combining Spatial and Transform Features for the Recognition of Middle Zone Components of Telugu

ASCS Sastry, L Satyaprasad, P.Paul Clee, L.Pratap Reddy

TENCON 2008-2008 IEEE Region 10 Conference

Sastry et al. propoaes [ ] a method to extract Middle Zone Components by combining Component model and Zone Separation model on Telugu Document Images. Recognition of middle zone components is achieved by combining spatial features for understanding the topological characteristics and transform feature for effective classification. Euler Number, Compact Ratio and Zernike moments are used as features for tree classifier. Middle Zone components are identified by unsupervised training strategy.

2008- Canonical Syllable Segmentation of Telugu Document Images

L.Pratap Reddy, ASCS Sastry, A.V.Srinivasa Rao, N.Venkata Rao

TENCON 2008-2008 IEEE Region 10 Conference

In [ ] Pratap et al. propose a classical approach in the segmentation of Canonical Syllables of Telugu document images. The model presented is based on the canonical structure of the telufgu script. The relation between zones and components is established in the segmentation process of canonical syllable. The components in the script are classified into six different classes. An individual component associated with only top zone, only middle zone and only bottom zone is classified as Top Zone (TZ class), Middle Zone (MZ class) and Bottom Zone (BZ class) respectively. A component associated in a combination with TZ and MZ class is treated as TMZ class, MZ and BZ class is treated as MBZ class and the component associated with all the three TZ, MZ and BZ class is treated as TMBZ class. The components associated with TZ and BZ classes independently are referred to as other components. Those components falling in the MZ class and TMZ class are referred to as Core / essential component. The confusion in determining the MBZ and TMBZ classes whether the component is a core or other comoponent is resolved by computing the pixel density of the components in the respective zones.

2008- Bangla Handwritten Pin Code String Recognition for Indian Postal Automation

Umapada Pal Kaushik Roy Fumitaka Kimura


Umapada Pal et al. proposed [ ] a method for pin code recognition system by lexicon free word recognition appraoch. In this method water reservoir concept is applied to pre-segment a pin code string into possible primitive components (individual digits or its parts). Presegmented components of the pin code are then merged into possible digits to get the best pin code. In order to merge these primitive components into digits and to find optimum segmentation, dynamic programming (DP) is applied using total likelihood of digits as the objective function. To compute the likelihood of a digit, modified quadratic discriminant function (MQDF) is used. The features used in the MQDF are based on the directional information of the components. It is reported that the proposed system on handwritten Bangla pin code gives 99.08% reliability when rejection and error rates are 19.28% and 0.74%, respectively.