Facial expressions are important in facilitating human communication and interactions. Also, they are used as an important tool in behavioural studies and in medical rehabilitation. Facial image based mood detection techniques may provide a fast and practical approach for non-invasive mood detection. This paper deals with developing an intelligent system for facial image based expression classification using committee neural networks.
Several facial parameters were extracted from a facial image and were used to train several generalized and specialized neural networks. Based on initial testing, the best performing generalized and specialized neural networks were recruited into decision making committees which formed an integrated committee neural network system. The integrated committee neural network system was then evaluated using data obtained from subjects not used in training or in initial testing.
Results and conclusion
The system correctly identified the correct facial expression in 255 of the 282 images (90.43% of the cases), from 62 subjects not used in training or in initial testing. Committee neural networks offer a potential tool for image based mood detection.
Get your grade
or your money back
using our Essay Writing Service!
Keywords: facial expression, integrated committee neural network system.
Facial expressions and related changes in facial patterns give us information about the emotional state of the person and help to regulate conversations with the person. Moreover, these expressions help in understanding the overall mood of the person in a better way. Facial expressions play an important role in human interactions and non-verbal communication. Classification of facial expressions could be used as an effective tool in behavioural studies and in medical rehabilitation. Facial expression analysis deals with visually recognizing and analyzing different facial motions and facial feature changes. The FACS (facial action coding system) codes different facial movements into Action Units (AU) based on the underlying muscular activity that produces momentary changes in the facial expression. An expression is further recognized by correctly identifying the action unit or combination of action units related to a particular expression.
Numerous investigators have used neural networks for facial expression classification. The performance of a neural network depends on several factors including the initial random weights, the training data, the activation function used, and the structure of the network including the number of hidden layer neurons .Based on initial testing with data obtained from subjects not used in training, a few networks are recruited into a committee.
The database used in the study consisted of facial expression images from the Cohn-Kanade database. Two types of parameters were extracted from the facial image: real valued and binary. A total of 15 parameters consisting of eight real-valued parameters and seven binary parameters were extracted from each facial image. The real valued parameters were normalized. Generalized neural networks were trained with all fifteen parameters as inputs. There were seven output nodes corresponding to the seven facial expressions (neutral, angry, disgust, fear, happy, sad and surprised).
Based on initial testing, the best performing neural networks were recruited to form a generalized committee for expression classification. Due to a number of ambiguous and no-classification cases during the initial testing, specialized neural networks were trained for angry, disgust, fear and sad expression. Then, the best performing neural networks were recruited into a specialized committee to perform specialized classification. A final integrated committee neural network classification system was built utilizing both generalized committee networks and specialized committee networks. Then, the integrated committee neural network classification system was evaluated with an independent expression dataset not used in training or in initial testing. A generalized block diagram of the entire system is shown in Figure 1.
Figure 1. An overall block diagram of the methodology.
Facial Image Database:
Facial expression images were obtained from the Cohn-Kanade database. The database contained facial images taken from 97 subjects with age ranging from 18 to 30 years. The database had 65 percent female subjects. Fifteen percent of the subjects were African-American and three percent were Asian or Latino. The database images were taken with a Panasonic camera (model WV 3230). The camera was located directly in front of the subject. The subjects performed different facial displays (single action units and combinations of action units) starting and ending with a neutral face. The displays were based on descriptions of prototypic emotions (i.e., neutral, happy, surprise, anger, fear, disgust, and sad). The image sequences were digitized into 640 by 480 pixel arrays with 8-bit precision for gray scale values.
Always on Time
Marked to Standard
Although the database contained 2000 images, many images were repetitions (frames of same subjects in same moods): hence, the entire dataset was not used for the study. In fact, using repetitions would increase the accuracy, but essentially would be analyzing somewhat similar expressions of the same subject. This study states the response of the classification engine on repetitive images, but to test it on a variety of images. Thus, in order to study the robustness of the system for different subject-mood variations, selection of images for this study was based on selecting a unique combination of subject-mood. The present study utilized 467 images from 97 subjects.
Image Processing and Feature Extraction:
Two types of parameters were extracted from the facial images of 97 subjects: (1) real valued parameters and (2) binary parameters. The real valued parameters have a definite value depending upon the distance measured. This definite value was measured in number of pixels. The binary measures gave either a present (= 1) or an absent (= 0) value. In all, eight real valued measures and seven binary measures were obtained.
A number of parameters, both real-valued and binary, were extracted and analyzed to decide their effectiveness in identifying a certain facial expression. The features which did not provide any effective information of the facial expression portrayed in the image were eliminated and were not used in the final study. The real valued and binary feature selection was inspired by the FACS. The following real valued and binary parameters were finally used in the study.
Real valued parameters
1. Eyebrow raise distance - The distance between the junction point of the upper and the lower eyelid and the lower central tip of the eyebrow.
2. Upper eyelid to eyebrow distance - The distance between the upper eyelid and eyebrow surface.
3. Inter-eyebrow distance - The distance between the lower central tips of both the eyebrows.
4. Upper eyelid - lower eyelid distance - The distance between the upper eyelid and lower eyelid.
5. Top lip thickness - The measure of the thickness of the top lip.
6. Lower lip thickness - The measure of the thickness of the lower lip.
7. Mouth width - The distance between the tips of the lip corner.
8. Mouth opening - The distance between the lower surface of top lip and upper surface of lower lip.
The real valued parameters are depicted in Figure 2.
Figure 2 Real-valued measures from a sample neutral expression image.
1. Upper teeth visible - Presence or absence of visibility of upper teeth.
2. Lower teeth visible - Presence or absence of visibility of lower teeth.
3. Forehead lines - Presence or absence of wrinkles in the upper part of the forehead.
4. Eyebrow lines - Presence or absence of wrinkles in the region above the eyebrows.
5. Nose lines - Presence or absence of wrinkles in the region between the eyebrows extending over the nose.
6. Chin lines - Presence or absence of wrinkles or lines on the chin region just below the lower lip.
7. Nasolabial lines - Presence or absence of thick lines on both sides of the nose extending down to the upper lip.
These binary parameters are depicted in Figure 3
figure 3 Binary measures from sample expression images
The real valued parameters were the distances (in number of pixels) measured between specified facial features. In case of parameters involving features which were symmetrically present on both sides of the face, an average of both the measurements was obtained. Real-valued measures were obtained for expressions including the neutral image. The real valued parameters were then normalized in the following manner:
All the parameters were extracted by manual and/or semi-automatic techniques. This study evaluates the efficacy of committee neural networks. Therefore, no effort was made to develop automated techniques for feature extraction.
The binary parameters were characterized by the presence or absence of the facial muscle contractions or the facial patterns formed due to these contractions. An edge detection algorithm was applied to the image to determine if the pattern was present or absent. A simple canny edge detector (MATLAB based) is used to determine whether a pattern of lines existed which further decided the binary feature was true (1) or false (0).
This Essay is
a Student's Work
This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.Examples of our work
The eight normalized real valued parameters together with the seven binary parameters were fed to neural networks. The entire dataset from 97 subjects (467 images) was divided into three groups: 25 subjects (139 images) for training, 10 subjects (46 images) for initial testing, and 62 subjects (282 images) for final evaluation.
Training of generalized neural networks:
Several multi layered, fully connected, feed forward neural networks were trained to classify different expressions. A total of 105 networks were trained using different number of hidden layers (2, 3, 4, 5), different initial weights, different number of neurons in the hidden layers (7, 14, 15, 28, 45, 60), and different transfer functions.
Each network had fifteen input nodes, each corresponding to the fifteen input parameters. Each of these networks had seven output nodes, each corresponding to one of the seven expressions (neutral, angry, disgust, fear, happy, sad and surprised). Since the normalized input data was in the range of -1 to 1, the "tansig" function was used for the hidden layer neurons. The output of the neural network has to be in the 0 to 1 range. Thus, the "logsig" function was used as the transfer function for the output layer neurons. The output of each node was converted to a binary number (either 0 or 1). An output of 0.6 or more was forced to 1 and an output of less than 0.6 was forced to 0. An output of 1 indicated that particular expression was present and output of 0 indicated that particular expression was absent. We have varied the threshold from 0.55 to 0.9 and found that a threshold of 0.6 gave better results.
Recruitment of the generalized committee neural networks :
Figure 4: five-network committee neural network architecture.
Training of Specialized neural networks:
The initial evaluation of the committee classification system presented some all-zero or no-classification cases. These no-classification cases resulted when the input data was from the angry, disgust, fear or sad expressions. Twenty specialized networks were trained to perform classification of these four (angry, disgust, fear and sad) expressions with an aim to reduce the number of no-classification cases. These networks also had binary outputs at each output node. Training data for the specialized networks were extracted from the same 25 subjects used for training the generalized networks.
Evaluation of the integrated committee neural network system:
An integrated committee neural network system was formed incorporating the eleven member generalized committee and three member specialized committee.
Figure 5 shows the flowchart of the integrated committee neural network system classification process.
This study demonstrates the development and the application of committee neural networks to classify seven basic emotion types from facial images. The integrated committee neural network system consisting of generalized and specialized networks, can classify the emotion depicted in the facial image into one of the following emotions: neutral, angry, disgust, fear, sad, surprised or happy. The database used for the expression analysis consists of a subject who performs a series of different expressions. The variability and reliability of these expressions introduced different levels in the same expression. This expression introduces the overall dataset. In addition, the database consists of mostly expressions of a deliberate nature. In reality, an expression is often a combination of two or more of the prototypic expressions, Also, expressions are assumed to be singular and to begin and end with a neutral position. The performance of a neural network depends on the type of parameters extracted from the facial image. The trend of variation of different parameters with respect to neutral values for different expressions helps in the effective training of neural networks to recognize specific expressions. The real valued and binary parameters characterize each expression. Each neural network had a single output node for each expression. The output of each output node is binary (present or absent). For the individual member network classification, one approach is to use a "winner takes all" and have each member of the committee produce only one output. This process produces good results. However, for numerous biomedical applications, due to significant biological variability, such an approach can produce misclassifications, if the network is presented with data from entirely new subjects with extreme features. Therefore, our approach is to let a network produce more than one classification. For example, a patient simultaneously can have disease A and disease B. Our technique is to take the output of each output node of a network and compare it with a threshold, and if the output exceeds the threshold, then the output is made equal to one, otherwise it is set equal to zero. Thus this technique yields better result.
Thus an integrated committee neural network system is developed incorporating a generalized neural network committee and is specialized for facial image based expression classification.