Multitouch Gesture Generation and Recognition Techniques
4252 words (17 pages) Essay
18th Aug 2017 Computer Science Reference this
Tags:
Disclaimer: This work has been submitted by a university student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.
Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UKEssays.com.
Abstract: –A huge number of users are using smart phones to communicate with each other. A smart phone user is exposed to various threats when they use their phone for communication. These threats can disorganization the operation of the smart phone, and transmit or modify user data rather than original [1]. So applications must guarantee privacy and integrity of the information. Single touch mobile security is unable to give efficient performance for confidential data. Hence we are moving towards multitouch mobile security for high security. In computing, multitouch is authentication technology that enables a surface to recognize the presence of more than one touch points of contact with the touch screen [2]. By using multiple touch points to authenticate user for access confidential data in mobile phones. we are presenting our study about biometric gestures to authenticate user through multitouch finger points for more security [1].
Keywords: Multitouch, biometric gesture, authentication, security, smart phone Fingertracking, Android Operating system.
 Introduction
Today’s IT admins face the troublesome task of managing the unnumberable amounts of mobile devices that connect with enterprise networks a day for communication through network. Securing mobile devices has become increasingly important now days as the numbers of the devices in operation and the uses to which they are put have expanded in world wide. The problem is compounded within the enterprise as the ongoing trend toward IT users or organizations is resulting in more and many more employeeowned devices connecting to the corporate internet. Authentication is a nothing but process in which the credentials provided are compared to those on file in a database of valid users’ information on a operating system. If the credentials match, the process is completed and the user is granted authorization for access to the system. The permissions and folders came back outline each the surroundings the user sees and also the method he will move with it, as well as the amount of access and different rights comparable to the number of allotted cupboard space and different services [1].
The generally a computer authentication process is to use alphanumerical usernames or text based and passwords. This method has been shown to have some disadvantages. For example, users tend to pick passwords that can be easily guessed and recognized by other hard to remember. To device this problem, some researchers have developed authentication techniques that use multitouch biometric gesture as passwords for authentication.
Multitouch, in a computing environment, is an interface technology that enables input gestures on multiple points on the surface of a device. Although most generally used with touch screen devices on handheld devices, such as smart phones and tablets, and other multitouch has been used for other surfaces as well, including touch pads and whiteboards, tables and walls [2].
In other words, multitouch refers to the capability of a touch screen (or a touchpad) to recognize two or more points of contact on the surface simultaneously. The constant following of the multiple points permits the portable interface to acknowledge gestures, that modify advanced practicality similar to pinchtozoom, pinch. wherever gesture recognition is much of deciphering human gestures via mathematical algorithms. Gestures will originate from any bodily motion however normally originate from the face or hand and alternative human biometric gestures but the identification and recognition of posture, and human behaviours is additionally the topic of gesture recognition techniques.
We used Equal Error Rate (EER) to measure accuracy. This is the rate at which False Acceptance Rate (FAR) and False Rejection Rate (FRR). To find out whether using multiple gestures would improve the system’s performance, we combined scores of 2 different gestures from the same user in the same order and evaluated the EER of the combined gestures as:
FAR=
FRR=
 Developing a Gesture Authentication Technique
Biometric systems are an effective way to authenticate valid users generally based on the “something they are” property [2] in mobile authentication. The goal of biometric identification is that the automatic verification of identity of a living person by proving over some distinctive gestures that solely he possesses in authentication method.
Figure1: Multitouch behavior
The biometric authentication system has two phases: enrollment phase and authentication phase. If new user must first record his secret hand signs at the first enrollment phase to the system. The process is performing the hand signs at the user’s discreet choice with sufficient space for hand movement during registration phase.
 Gesture Taxonomy [1]
1. Parallel: All fingertips are moving in the same direction
during the gesture. For example, a bush swipe, during which all 5 fingers move from left to right the screen.
2. Closed: If all fingertips are moving inward toward the center of the hand. For example, a pinch gesture.
3. Opened: All fingertips are moving outward from the center of the hand. For example, a reverse pinch gesture.
4. Circular: All fingertips are rotating around the center of the hand. For example, a clockwise or counterclockwise rotation [1].
Figure1: Single touch
 Matching Touch Sequences to Specific Fingers:
 Hidden Markov Models [3]
Hidden Markov Models (HMMs) are statistical models and simplest versions of dynamic Bayesian Networks, where the system being modelled is a Markov process with an unobserved state. It is a collection of finite states connected by transitions, much like Bayesian Networks. Each state has two probabilities: a transition probability, and an output probability distribution. Parameters of the model are determined by training data [4][5].
Figure2: Hidden Markov Models
hidden states, as well as N dimensional observable symbols.
Figure3: Multitouch Movement
The conventional HMM is expressed as the following [4]. HMM is the mathematical tool to model signals, objects â€¦ that have the temporal structure and follow the Markov process. HMM can be described compactly as Î» = (A, B, Ï€) (Figure 4b) where,
Figure 4: Conventional Hidden Markov Model
A = {a_{ij}}: the state transition matrix
aij=P[qt+1=sjqt=si],
1â‰¤iâ‰¤Naij=P[qt+1=sjqt=si],
1â‰¤iâ‰¤N
B = {b_{j} (k)}: the observation symbol probability distribution
bj(k)=P[Ot=vkqt=sj],
1â‰¤jâ‰¤N,
1â‰¤kâ‰¤Mbj(k)=P[Ot=vkqt=sj],
1â‰¤jâ‰¤N,1â‰¤kâ‰¤M
Ï€ = {Ï€_{i}}: the initial state distribution
Ï€i=P[q1=si]Ï€i=P[q1=si]
 Set of states: S = {s_{1}s_{2}, â€¦, s_{N}}
 State at time t: q_{t}
 Set of symbols: V = {v_{1}, v_{2}, â€¦, v_{M}}
Given the observation sequence OT1=O1O2…OTO1T=O1O2…OT and a model Î» = (A,B,Ï€), how do we efficiently compute P(O  Î»), i.e., the probability of the observation sequence given the model.
Now let us consider following two states:
 Training: based on the input data sequences {O}, we calculate and adjust Î»=Î»Ì„ Î»=Î»Ì„ to maximize likelihood P(O  Î»)
 Recognizing: based on Î»Ì„ =(AÌ„ ,BÌ„ ,Ï€Ì„ )Î»Ì„=(AÌ„,BÌ„,Ï€Ì„) for each class, we can then assign the class in which the likelihood P(O  Î») is maximized.
The observation symbol probability distribution P[O_{t} = v_{k}  q_{t} = s_{j}] can be discrete symbols or continuous variables. If the observations are different symbols.
B(i,k)=P(Ot=kqt=si)
B(i,k)=P(Ot=kqt=si)
If the observations are vectors in R^{L}, it is common to represent P[O_{t}  q_{t}] as a Gaussian:
P[Ot=yqt=si]=N(y;Î¼i,Î£i)
P[Ot=yqt=si]=Î(y;Î¼i,Î£i)
N(y;Î¼,Î£)=1(2Ï€)L/2Î£1/2exp[âˆ’12(yâˆ’Î¼)TÎ£âˆ’1(yâˆ’Î¼)]
Î(y;Î¼,Î£)=1(2Ï€)L/2Î£1/2exp[âˆ’12(yâˆ’Î¼)TÎ£âˆ’1(yâˆ’Î¼)]
A more flexible representation is a mixture of M Gaussians:
P[Ot=yqt=si]=âˆ‘m=1MP(Mt=mqt=si)ÃÃN(y;Î¼m,i,Î£m,i)
P[Ot=yqt=si]=âˆ‘m=1MP(Mt=mqt=si)ÃÃÎ(y;Î¼m,i,Î£m,i)
where M_{t} is a hidden variable that specifies which mixture component to use and P(M_{t}=mq_{t}=s_{i}) =C(i,m) is the conditional prior weight of each mixture component. In our approach, we both implement continuous and discrete output variable distribution for 1^{st} and 2^{nd} HMM stages respectively [3][6].
 Dynamic Time Warping
Dynamic Time Warping (DTW), introduced by Sakoe and Chiba in 1978, is an algorithm that compares two different sequences that may possibly vary in time. For example, if two video clips of different people walking a particular path were compared, the DTW algorithm would detect the similarities in the walking pattern, despite walking speed differences, accelerations or decelerations. [3][7]
Figure 4: Dynamic time warping
The algorithm begins with a set of template streams, describing each gesture available in the system database. This results in high computation time, and hence, limitations in recognition speed. Additionally, the storing of many templates for each gesture results in costly space usage on a resourceconstrained device.
Consider a training set of N sequences fS1; S2; : : : ; SNg, where each Sg represents sample of the same gesture class. Then, each sequence Sg composed by a set of feature vectors at each time t, Sg = fsg1; : : : ; sgLgg for a certain gesture category, where Lg is the length in frames of sequence Sg. Let us assume that sequences are ordered according to their length, so that Lgt1 _ Lg _ Lg+1; 8g 2 [2; ::;N ], the median length sequence is _ S = SdN2 e. This sequence _ S is used as a reference and the rest of sequences are aligned with it using the classical Dynamic Time Warping with Euclidean distance [4], in order to avoid the temporal deformations of various samples from an equivalent gesture class. Therefore, once the alignment method, all sequences have lengthLdN2 e.
We define the set of warped sequences as ~ S = f ~ S1; ~ S2; : : : ; ~ SNg. Consider a training set of N sequences fS1; S2; : : : ; SNg, where each Sg represents a sample of the same gesture class. Then, each sequence Sg composed by a set of feature vectors at each time t, Sg = fsg1; : : : ; sgLgg for a certain gesture category, where Lg is the length in frames of sequence Sg. Let us consider that sequences are ordered according to their length, so that Lgt1 _ Lg _ Lg+1; 8g 2 [2; ::;N1], the median length sequence is _ S =SdN2 e[4].
This sequence _ S is used as a reference, and the remaining of sequences are assigned with it using the classical Dynamic Time Warping with Euclidean distance [3], in order to remove the temporal deformations of different samples from the same gesture category. Hence, after the alignment process, all sequences have lengthLdN2 e. We define the set of warped sequences as ~ S = f ~ S1; ~ S2; : : : ; ~ SNg [3].
Input: A gesture C={c1,..cn} with corresponding GMM model Î»={Î»1,..Î»m}, its similarly threshold value Âµ, and the testing seprate Q={q1,..qn}, Cost Matrix M is defined,where N(x), x =(i,t) is the set of three upperleft location of x in M.
Output:Working path of the dected gesture, if any.
//Initialization
for i=1:m do
for j=1:âˆž do
M(i,j)=v
end
end
for j=1:v do
M(0,j)=0
end
for t=0:v do
for i=1:m do
x=(i,j)
M(x)=D(qi,Î»i)+mináµªêžŒÏµ N(áµª)M(áµªêžŒ)
End
end
if m(m,t)< then
W={argmináµªêžŒ Ïµ N(áµª)M(áµªêžŒ)}
Return
End
end [4]
 Artificial Neural Networks
Artificial Neural Networks (ANNs) are networks of weighted, directed graphs where the nodes are artificial neurons, and the directed edges are connections between them. The most common ANN structure is the feed forward MultiLayer Perceptron. Feed forward means that the signals only travels one way through the net [4][8].
For input pattern p, the ith input layer node holds x_{p,i}.
Net input to jth node in hidden layer:
Now Output of jth node in hidden layer:
Then Net input to kth node in output layer:
Finally Output of kth node in output layer:
Network error for p:
Neurons are arranged in layer wise, with the outputs of each neuron in the same layer being connected to the inputs of the neurons in that layer . Finally, the output layer neurons are assigned a value. Each output layer neuron show the particular class of gesture, and the record is assigned to however class’s neuron has the highest value During training, the gesture class for each neuron in the output layer is known, and the nodes can be assigned the “correct” value.
Critical Analysis
A critical analysis based on the results achieved by is shown in this section. ANNs, HMMs, and DTW algorithms were implemented on a mobile phone, and measured in performance according to recognition speed, accuracy and time needed to train [3]. Since Bayesian Networks are a super class of HMMs which have been tweaked towards gesture classification, they are not considered. Thus according to recognition, accuracy and training time we can say that DTW gives better performance as compare to HMM and ANN. These results are summarized below:
Table 1: Comparison between different algorithms [3]
No. 
Algorithm 
Recognition Speed 
Accuracy 
Training Time 
1 
HMMs 
10.5ms 
95.25% 
Long 
2 
ANNs 
23ms 
90% 
Medium 
3 
DTW 
8ms 
95.25% 
No Training 
 Finger Tracking:
Firstly we need adjust finger tracking parameters, that’s why we need to activate the calibration in the tab in onscreen display [5][9].
a. Projection Signatures:
Projection signatures are performed directly on the resulting threshold binary image of the hand [5]. The core process of this algorithm is consists of adding the binary pixels row by row along a diagonal (the vertical in this case). Previous knowledge of the hand angle is therefore required. A lowpass filter is applied on the signature (row sums) in order to reduce low frequency variations that create many local maxima and cause the problem of multiple positives (more than one detection per fingertip). The five maxima thereby obtained correspond to the position of the five fingers.
b. Geometric Properties:
The second algorithm is based on the geometric properties and, as shown on line 3 of figure 5, uses a contour image of the hand on which a reference point is set. This point can be determined either by finding the centre of mass of the contour (barycenter or centroid) or by fixing a point on the wrist [6].
Figure 5: Hand Movement
Euclidean distances from that point to every contour point are then computed, with the five resulting maxima assumed to correspond to the finger ends [5]. The minima can be used to determine the intersections between fingers (finger valleys). The geometric algorithm also required filtering in order to reduce the problem of multiple positives.
c. Circular Hough Transform:
The circular Hough transform is applied on the contour image of the hand but could as well be performed on an edge image with complex background if no elements of the image exhibit the circular shape of the fingertip radius. This can be done efficiently for finger ends by eliminating points that are found outside the contour image. The inconvenient is that the set of discard points contains a mix of finger valleys and false positive that cannot be sorted easily [5].
d. Color Markers:
While the three previous algorithms rely only on the hand characteristics to find and track the fingers, the marker algorithm tracks color markers attached to the main joints of the fingers. Each color is tracked individually using colour segmentation and filtering [5].
This permits the identification of the different hand segments. The marker colors should therefore be easy to track and should not affect the threshold, edge or contour image of the hand. Respecting these constraints makes it possible to apply all algorithms to the same video images and therefore to compare each algorithm degree of accuracy and precision with respect to the markers [5].
Comparisons:
Properties 
Projection Signature 
Geometric Properties 
Circular Hough Transform 
Color Makers 
Locates fingers 
Good 
Good 
Good 
Good 
Locates fingertips 
Poor 
Normal 
Normal 
Good 
Locates finger ends and valleys 
Poor 
Good 
Good 
Good 
Work with complex background 
Poor 
Good 
Normal 
Good 
Precision 
Good 
Good 
Good 
Good 
Accuracy 
Poor 
Good 
Good 
Good 
Table 2: Comparison between different techniques [5]
All the presented algorithms have succeeded, in various degrees, in detecting each finger. The projection signatures algorithm can only roughly identify a finger, but the circular Hough transform and geometric properties algorithms can find both finger intersections and finger end points, it is important to note that in the case where finger are folded, the end points don’t’ correspond to the fingertips [5].
Conclusion:
We have plot three prominent strategies that comprehensively characterize the signal acknowledgment that should be possible on advanced mobile phones Artificial Neural Networks, Dynamic Time Warping and Hidden Markov Models were optimized, and tested on resource constrained devices (in this instance, cellular phones), and compared against each other in terms of accuracy, and computational performance. ANNs proved to have the slowest computation performance due to the large size of the neural network. HMMs performed better, but the DTW algorithm proved to be the fastest, with comparable recognition accuracy. DTWs also did not require training, as is the case with HMMs and ANNs.
References
[1] Kalyani Devidas: Deshmane Android Software based Multitouch Gestures Recognition for Secure Biometric Modality
[2] Memon, K. Isbister, N. SaeBae, N. and K. Ahmed, “Multitouch gesture based authentication,” IEEE Trans. Inf. Forensics Security, vol. 9, no. 4, pp. 568582, Apr. 2014
[3] Methods for Multitouch Gesture Recognition:Daniel Wood
[4] http://journals.sagepub.com/doi/full/10.5772/50204
[5] Finger Tracking Methods Using EyesWeb AnneMarie Burns1 and Barbara Mazzarino2
[6]https://www.cse.buffalo.edu/~jcorso/t/CSE555/files/lecture_hmm.pdf
[7]DWT: Probabilitybased Dynamic Time Warping and BagofVisual andDepthWords for Human Gesture Recognition
[8]https://en.wikipedia.org/wiki/Artificial_neural_network
[9]http://whatis.techtarget.com/definition/gesturerecognition
Prof. Ramdas Pandurang Bagawade, 

Miss Pournima Akash Chavan, BE Computer Pursuing degree in PES’s College of Engineering Phaltan. 

Miss Kajal Kantilal Jadhav, BE Computer Pursuing degree in PES’s College of Engineering Phaltan 
Cite This Work
To export a reference to this article please select a referencing stye below:
Related Services
View allDMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on the UKDiss.com website then please: