Face tracking real time video processing

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


The aim of this project is to implement face tracking using video processing. This system has many application in the field of car safety systems, gaze estimation, and on further processing may be used for facial recognition systems as well.

Chapter 1

1. Introduction:

Face processing has today become one of the most successful applications of image analysis and understanding. This is evident from the large number of conferences taking place around the world every year. Due to improved feasibility and the wide range of commercial and law-enforcement applications, many products are now available in the market which performs face processing. Also, its application in the field of machine perception is attractions researchers from various disciplines including image processing, neural networks and training, computer graphics and computer vision.[1]

Face processing systems can be broadly classified into two categories, still image processing and video processing. There are significant difference in them due to image quality, type of background clutter, ambient conditions, variability of image and also application. The use of video has become prevalent in everyday with applications in motion detection, security, traffic monitoring etc. The biggest advantage of use of video imaging is that while a single image provides a snapshot of a scene, the different frames of a video taken over time represents the changes in the scene, making it possible to capture motion and various other details in the sequence, which otherwise wouldn't have been available to us. This additional information that we obtain can then be used for expression analysis, gesture analysis etc.

Even though many different biometric techniques are available like iris scan and finger print analysis, these require the co-operation of the participants unlike face processing which can be effectively done without co-operation or even knowledge.

1.1 System Components:

There are mainly 3 components used in this project and they are:

  • Image Acquisition Setup

  • Processor

  • Image Analysis

A) Image Acquisition Setup:

Since the project involves the use of video, we shall be using a digital camera for this purpose. There are mainly two types of cameras available: digital cameras (CCD-Charge Coupled Device and CMOS sensor based) and analogue cameras. A digital camera allows us to directly use it by plugging it into a computer, like a USB interface, unlike analogues cameras which require the use of a tuner card or other computer peripherals. Also, while CCD cameras have high picture quality with low noise, when compared to CMOS sensor cameras, the power consumed in them is also higher.

Image resolution of a camera determines the amount of detail contained in a picture. Hence the resolution plays a crucial role. For example, a 640x480 image would have 307,200 pixels, or approximately 307 kilopixels; a 3872x2592 image would have 10,036,224 pixels.

Due to its availability, image quality, easy PC interfacing and low image noise, the inbuilt digital camera of resolution 640x480 will be used.

B) Processor:

After image acquisition using the camera, comes the problem of image analysis and processing.

Digital signal processingalgorithmstypically require a large number of mathematical operations to be performed quickly on a set of data. Also, the architecture of a digital signal processor is optimized specifically for digital signal processing work[3]

Some of the dedicated digital signal processors available in the market are:

  • MSC8144 DSP

  • ARM Cortex-A8

  • Texas Instruments TMS320C541

For all processing operation a standard PC with on-board GPU will be used.

C) Image Analysis:

Image analysis consists of extracting useful information from the captured images. To identify and locate a particular object we must first define the features and characteristics of the object that we are looking for. Generally, for the purpose of tracking or identifying the object we can use the properties of color, intensity, edge and structure. In certain application we can also count the number of pixels or center of gravity of the pixels to determine the object.

Image analysis maybe performed using a variety of applications like Matlab and other open source applications like OpenCV from Intel Corp.

Matlab provides a very easy platform for image acquisition and processing. It provides a powerful built-in library of many useful functions too. While, in OpenCV extensive coding is required. Hence for ease of use Matlab will be used on this project.

Chapter 2

2. Digital Image Processing:

An image may be defined as a two dimensional function f(x, y) where x and y are spatial (plane) coordinates and the amplitude of the function at any point is called the intensity or gray-level of the image at that point. When values of x, y and the amplitude of the function are finite and discrete quantities it is called a digital image. Processing of a digital image comes in the field of digital image processing. [4]

Digital image processing is mainly used for Classification, Feature extraction, Pattern recognition, Projection, and Multi-scale signal analysis. Some techniques which are used in digital image processing include: [5]

  • Principal components analysis

  • Independent component analysis

  • Hidden Markov models

  • Partial differential equations

  • Self-organizing maps

  • Neural networks

  • Wavelets

  • Linear Filtering

In Matlab, for image processing we can use the Image Processing Toolbox.

2.1 Types of Images In Matlab: [4]

The Image Processing Toolbox supports four basic types of images:

  • Indexed Images

  • Intensity Images

  • Binary Images

  • RGB Images

2.1.1 Indexed Image: [4]

An indexed image0consists of a data matrix, 0X, and a color map0matrix, map. The color map is an m-by-3 array of floating-point values which lies in the range [0,1]. Each row0of the color map has 3 columns each corresponding to the Red, Blue and Green components. The value of every pixel in the data matrix corresponds to a particular row of the color map. For example, the value 3 of the data matrix corresponds to the third row in map; the value 2 of the data matrix corresponds to the second row, and so on.

2.1.2 Intensity Image: [4]

An intensity image has a data matrix, I, in which the values represent intensities within a defined range. Intensity images are stored as a single matrix, with each element of the matrix corresponding to one pixel. The element in the intensity matrix represents the intensities or gray-scale value of various pixels of the image, where 0 corresponds to Black and 1,255 or 65535 corresponding to White.

MATLAB treats intensity images as indexed images.

2.1.3 RGB Image: [4]

An RGB image, or a true color image is stored as an m-by-n-by-3 data array with red, green, and blue color components for each pixel. The color of each pixel is determined by the combination of the red, green, and blue intensities stored in the matrix. Each of the color components is 8-bits in size, thus the total graphic file will be of 24bits. There are 16 million colors possible in an RGB image. For example the red, green, and blue color components of the pixel (2,5) are stored in RGB(2,5,1), RGB(2,5,2), and RGB(2,5,3), respectively for Red, Green and Blue.

2.1.4 Binary Image: [4]

In a binary image, each pixel assumes one of only two discrete values. Where 1 corresponds to ON and 0 to OFF. A binary image can be considered a special kind of intensity image, containing only black and white. The data matrix consists of only 1s and 0s.

Chapter 3

3.1 Face Detection and Tracking :

The first step in face processing is face detection and it precedes feature extraction and face recognition. Face detection may be defined as a combination of object-class detection and also object localization, i.e. determining the sizes and location of all objects belonging to the face class in a complex environment or background.[2]

Popular techniques and algorithms used for face detection are[2]:

  • Skin color segmentation and face segmentation Method

  • Object Tracking Method

  • Viola and Jones Algorithm

  • Neural Network based Method

  • AdaBoost Algorithm

  • Face Models and Multimodal Features.

  • Eigen Faces Method

Face tracking is the problem of generating an inference about the motion of an object given a sequence of images. In this project, the focus is on face tracking that can be used for other facial image analysis tasks, such as face recognition, face expression analysis, gaze tracking and lip-reading. Face tracking is different from face detection in that face tracking uses spatio-temporal continuity to locate human faces in a video sequence, instead of detecting them in each frame independently. Face tracking is part of object localization and involves object tracking.

For this project, I have used the method of skin color segmentation based on RGB, and YCBCR color cues for the detection of face and tracking.

3.1 Skin Segmentation Using YCbCr Color Space:

Color is usually represented directly in the RGB format or indirectly in the Index Image format. But there also exists other models apart from RGB for representing color. The main ones are YCBCR, HSV, HIS etc.

In the skin segmentation method of face detection and tracking, first we have to segment the facial and non-facial regions of the image. For this we don't prefer using the RGB model because it contains not only color information but also luminance. To reduce the effect of luminance in image analysis we prefer to work with YCBCR and HSV color models instead. [6]

The YCbCr format is widely used for digital video. In this format, luminance information is stored as a single component (Y), and chrominance information is stored as two color difference components (Cb and Cr). Cb represents the difference between the blue component and a reference value, and Cr represents the difference between the red component and a reference value. [4]

The RGB components can be converted to the YCbCr components using the following formula. [8]

Y = 0.299R + 0.587G + 0.114B

Cb = -0.169R - 0.332G + 0.500B

Cr = 0.500R - 0.419G - 0.081B

The values of the constants in the above formulas are defined in the standard Rec BT. 601, published byInternational Telecommunication Union - Radio communications sector for converting analoguevideosignals to digital form.

The input RGB image on conversion to YCbCr will be as shown in Figure 7.

In the skin color detection process, each pixel was classified as skin or non-skin based on its color components. Once this has been done, the image we have will be in the binary format, like shown below

As can be seen in Figure 8, the binary image thus obtained contains a lot of noise in it. So, the next step in the process is to remove all the noise and to extract the face region from the image.

For the process of noise removal there are many different inbuilt functions that are available in MATLAB. The functions that were used are:

1) bwfill[4] : This function is used to fill background regions in binary image.

Syntax: bw2 = bwfill ( bw1,'holes');

Where, bw2=final image, bw1=initial image, holes are defined as background pixels surrounded by foreground pixels.

2) bwareaopen [4]: Binary area open; remove small objects.

Syntax: bw2 = bwareaopen (bw, p)

It removes from a binary image, bw, all connected pixels that have fewer than P pixels, producing another binary image BW2.

3) imerode [4]: Erode image.

Syntax: im2 = imerode (im, se)

This function erodes the grayscale, binary, or packed binary image, im, returning the eroded image, im2. SE is a structuring element.

The image after noise filtering will be as shown in Figure 9.

After isolating the face region in the binary image, by removing noise the next step is to mark the are of the face by making a box around the box region. In order to find the boundary points of the face isolated face region, we can use the in-built function ‘regionprops.BoundingBox' and using the information we get from this function, I made a function ‘boxit.m' (check source code) which draws a box around the face, in the initial RGB image.

The final output will be as shown in Figure 10 :

3.2 Advantages and Drawbacks of Skin Segmentation: [10]


  • Easy to implement and use: Skin is one of the most visible features of the face and hence use of it for detection of face is ideal. But because color depends on perception and origin of light and its intensity, use of RGB color space is not recommended. But with use of YCBCR and HSV color cues these drawbacks may be overcome.

  • Effective and efficient in a constrained environment: Even if there are limitations on the available data and surrounding, implementation of skin segmentation is possible.

  • Insensitive to Pose, Rotation and expression: Unlike other algorithms, pose of the face and orientation does not affect the detection rate of this algorithm. This is mainly due to the fact that it doesn't use facial features or face models for detection.


  • Highly sensitive to environment: Ambient lighting condition, source of light and the wavelength plays a huge role on the color of skin that is perceived. Hence all these factors directly affect the detection rate and its success.

  • Noisy Detection Results: Because skin segmentation method looks for regions of skin, other body parts and also other objects in the surrounding of the same or similar color affects detection of face.

Chapter 4

4. Recommendations:

Due to time constraints on the project and also due to limitation in hardware and software, use of complex algorithms is not feasible. Hence skin segmentation is the ideal method for our purpose, even though it is not considered very reliable and its false detection(FD) and false acceptance (FA) rates are high too.

The color segmentation, which has been implemented here only uses one color space, namely, YCbCr, now in order to improve the efficiency we can include RGB and the HSV color space and use a combination to decide the face region. But in order to use multiple color space we should also have a better processor as the onboard virtual memory in a laptop sometimes proves insufficient.

Also, higher resolution images will also improve the accuracy and efficiency of the color segmentation method.

4.1 Conclusion:

The design and implementation of this project proved to be a challenging and rewarding experience. The aim of the project, face detection using color segmentation was completed successfully.

After completing, various changes were made to improve the efficiency of the code, and thus to reduce the False Acceptance and False Detection Rates, keeping in mind the limitations of this technique and algorithm.


[1] Zhao, Wenyi; Rama Chellappa Eds, “Face Processing: Advanced Modeling and Methods”, Academic Press, 2006.

[2] http://en.wikipedia.org/wiki/Face_detection

[3] http://en.wikipedia.org/wiki/Digital_signal_processor

[4] Matlab Image Processing Toolbox User Guide, Mathworks Inc.

[5] http://en.wikipedia.org/wiki/Digital_image_processing

[6]Zhang J,Liu Yang, “A Novel Approach of Face Detection Based on Skin Segmentation”, 9th International Conf for Young Computer Scientists,2008.

[7] Gonzalez, Woods, ‘Digital Image Processing', Prentice Hall, 2nd Edition, 2006.

[8] http://scien.stanford.edu/2003projects/ee368/Project/reports/ee368group01.pdf

[9] http://scien.stanford.edu/class/psych221/projects/07/face%20detection/colorspace.html

[10] http://www.cs.cmu.edu/~cil/vision.html

[11] http://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip-Contents.html