The Density Based Multifeature Background Subtraction Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Background modeling and subtraction is a natural technique for object detection in videos. A pixel wise background modeling and subtraction technique using multiple features is involved in classification. A pixel wise generative background model is obtained for each feature efficiently and effectively by Kernel Density Approximation (KDA). The proposed SVM algorithm is not more robust to shadow, illumination changes, spatial variations of background. Background subtraction and classification is performed in a discriminative manner based on Relevance Vector Machines (RVMs). Approximately the same classification accuracy as SVM is obtained using Relevance Vector Machine-based classification, with a significantly smaller Relevance Vector Rate and, therefore, much faster testing time, compared with Support Vector Machine (SVM) based classification. This feature makes the RVM-based Background modeling and subtraction approach more suitable for Applications that require low complexity and possibly real-time classification.

Index Terms: Background modeling and subtraction, Haar-like features, Relevance vector machine(RVM), kernel density approximation.


THE identification of regions of interest is typically the first step in many computer vision applications, including event detection, visual surveillance, and robotics. A general object detection algorithm may be desirable, but it is extremely difficult to properly handle unknown objects or objects with significant variations in color, shape, and texture. Therefore, many practical computer

vision systems assume a fixed camera environment, which makes the object detection process much more straightforward; a background model is trained with data obtained from empty scenes and foreground regions are identified using the dissimilarity between the trained model and new observations. This procedure is called background subtraction.

Various background modeling and subtraction algorithms have been proposed [1], [2], [3], [4], [5] which are mostly focused on modeling methodologies, but potential visual features for effective modeling have received relatively little attention. The study of new features for background modeling may overcome or reduce the limitations of typically used features, and the combination of several heterogeneous features can improve performance, especially when they are complementary and uncorrelated. There have been several studies for using texture for background modeling to handle spatial variations in the scenes; they employ filter responses, whose computation is typically very costly. Instead of complex filters, we select efficient Haar-like features [6] and gradient features to alleviate potential errors in background subtraction caused by shadow, illumination changes, and spatial and structural variations.

Model-based approaches involving probability density function are common in background modeling and subtraction, and we employ Kernel Density Approximation (KDA) [3], [7], where a density function is represented with a compact weighted sum of Gaussians whose number, weights, means, and covariances are determined automatically based on mean-shift mode-finding algorithm. In our framework, each visual feature is modeled by KDA independently and every density function is 2D & 3D. By utilizing the properties of the 2D and 3D mean-shift mode-finding procedure, the KDA can be implemented efficiently because we need to compute the convergence locations for only a small subset of data.

When the background is modeled with probability density functions, the probabilities of foreground and background pixels should be discriminative, but it is not always true. Specifically, the background probabilities between features may be inconsistent due to illumination changes, shadow, and foreground objects similar in features to the background. Also, some features are highly correlated, i.e., RGB color features. So, we employ a Relevance Vector Machine (RVM) for nonlinear, linear, and kernel classification, which mitigates the inconsistency and the correlation problem among features. The final classification between foreground and background is based on the outputs of the RVM.

There are three important aspects of our algorithm - integration of multiple features, efficient 2D & 3D density estimation by KDA, and foreground/background classification by RVM. These are coordinated tightly to improve background subtraction performance.


The main objective of background subtraction is to obtain an effective and efficient background model for foreground object detection. In the early years, simple statistics, such as frame differences and median filtering, were used to detect foreground objects. More advanced background modeling methods are density-based, where the background model for each pixel is defined by a probability density function based on the visual features observed at the pixel during a training period.

A mixture of Gaussians is another popular density-based method which is designed for dealing with multiple backgrounds. Recently, more elaborate and recursive update techniques are discussed. However, none of the Gaussian mixture models have any principled way to determine the number of Gaussians. Therefore, most real-time applications rely on models with a fixed number of components or apply ad hoc strategies to adapt the number of mixtures over time.

Kernel density estimation is a nonparametric density estimation technique that has been successfully applied to background subtraction. Although it is a powerful representation for general density functions, it requires many samples for accurate estimation of the underlying density functions and is computationally expensive, so it is not appropriate for real-time applications, especially when high-dimensional features are involved.

Most background subtraction algorithms are based on pixel-wise processing, but multilayer approaches are also introduced, where background models are constructed at the pixel, region, and frame levels and information from each layer is combined for discriminating foreground and background.

Video clip

Video frame Extraction


Feature Extraction



Background classification

RVM Classifier

Some research on background subtraction has focused more on features than the algorithm itself. Various visual features may be used to model backgrounds, including intensity, color, gradient, motion, texture, and other general filter responses. Color and intensity are probably the most popular features for background modeling, but several attempts have been made to integrate other features to overcome their limitations.

Fig. 1.System Architecture

Figure 1 shows the system architecture of background subtraction with Relevance Vector Machine. First, the camera catches the video and stores it into system database. The user retrieves it from database and extracts the video frame. The extracted frames are stored into database. From the database, an particular image is chosen and object are identified clearly by means of optimization, modeling etc. Finally, classifier classifies the moveable and immovable objects with fewer samples.


Video Frame Extraction

The camera captures the video and stores into a system database. The user extracts the video clip and converts the video clip into .avi format. Video frame (i.e., stilled image) of required size is extracted from video by means of Grabbing.

3.2 Feature Analysis

The most popular features for background modeling and subtraction are probably pixel wise color (or intensity) since they are directly available from images and reasonably discriminative. Although it is natural to monitor color variations at each pixel for background modeling, we integrate colour, gradient, and Haar-like features together to alleviate the disadvantages of pixel wise colour modelling.

The gradient features are more robust to illumination variations than colour or intensity features and are able to model local statistics effectively. So, gradient features are occasionally used in background modelling problems. The strength of Haar-like features lies in their simplicity and the ability to capture neighbourhood information. The integration of these features is expected to improve the accuracy of background subtraction.

Ada-Boost algorithm is used for feature extraction of colour, gradient, haar-like features.

3.3 Background Modeling By KDA

The background probability of each pixel for each feature is modelled with a Gaussian mixture density function. There are various methods to implement this idea, and we adopt KDA, where the density function for each pixel is represented with a compact and flexible mixture of Gaussians.

The KDA is a density approximation technique based on mixture models, where mode locations (local maxima) are detected automatically by the mean shift algorithm and a single Gaussian component is assigned to each detected mode. The covariance for each Gaussian is computed by curvature fitting around the associated mode.

The KDA finds local maxima in the underlying density function, and a mode-based representation of the density is obtained by estimating all the parameters for a compact Gaussian mixture.

KDA handles multimodal density functions for each feature, it is still not sufficient to handle long-term background variations. The updating of background models periodically or incrementally, which is done by Sequential Kernel Density Approximation (SKDA)

3.4 Optimization in 1D, 2D & 3D

A method to find all the convergence points by a single linear scan of samples using density function created by KDA, efficiently. The sample points are sorted in ascending order, and start performing mean-shift mode finding from the smallest sample. When the current sample moves in the gradient ascent direction by the mean-shift algorithm in the underlying density function and passes another sample's location during the iterative procedure.

The convergence point of the current sample must be the same as the convergence location of the sample just passed, terminate the current sample's mean-shift process, and move on to the next smallest sample, where we begin the mean-shift process again. If a mode is found during the mean-shift iterations, its location is stored and the next sample is considered.

3.5 Foreground and Background classification After background modelling, each pixel is associated with 2D Gaussian mixtures. In most density-based background subtraction algorithms, the probabilities associated with each pixel are combined in a straightforward way, either by computing the average probability or by voting for the classification. However, such simple methods may not work well under many real-world situations due to feature dependency and nonlinearity. For example, pixels in shadow may have a low-background probability in colour modelling unless shadows are explicitly modelled as transformations of colour variables, but high-background probability in texture modelling.

Also, the foreground colour of a pixel can look similar to the corresponding background model, which makes the background probability high although the texture probability is probably low. Such inconsistency among features is aggravated when many features are integrated and data are high dimensional, so classifier is trained over the background probability vectors for the feature set.

Another advantage to integrating the classifier for foreground/ background segmentation is to select discriminative features and reduce the feature dependency problem; otherwise, highly correlated non-discriminative features may dominate the classification process regardless of the states of other features.

3.6 RVM Classifier

Relevance vector machines (RVM) are based on a Bayesian formulation of a linear model with an appropriate prior that results in a sparse representation. As a consequence, they can generalize well and provide inferences at low computational cost. RVM is mainly used for regression and classification, followed by application for object detection and classification.


Kernel Density Approximation

Kernel density estimation is a popular method to estimate probability density function. This algorithm estimates the following


S ={xnewt+1 }, κ= α

F^t+1( x)= f^t+1 (x)

cnew = MeanShiftModeFinding (f^t+1( x) , xnewt+1)

f^t+1( x)= f^t+1( x) − N (α, xnewt+1 ,Pnewt+1)

while 1 do

xit =MeanShiftModeFinding (f^t+1 (x) , xnewt+1)

c= MeanShiftModeFinding (f^t+1 (x ), xit)

if cnew≠ c then



S= S ∪{xit}, κ =κ+ κit

F^t+1( x)= f^t+1 (x) − N (κit, xit,Pit)

End while

Merge all the modes in the set S and create N(K,C,Pc) where Pc is derived by the same method as above.

4.2 AdaBoost Algorithm

Given example images (x1,x2),…,( xn,

xn) Where yi= 0,1 for negative and

positive examples respectively.

Initialize weights w1,I = , for yi= 0,1 respectively, where m and l are the number of negatives and positives respectively.

For t=1,….,T

Normalize the weights,

Wt,I ←

so that wt is a probability distribution.

b. For each feature, j, train a classifier hj which is restricted to using a single feature. The error is evaluated with respect to wt, ϵj =

c. Choose the classifier, ht , with

the lowest error ϵt

d. Update the weights:


Where ei =0 if example xi is classified correctly, ei =1 otherwise, and βt = .

The final strong classifier is:

h(x) =

where .


We have introduced a multiple feature integration algorithm for background modeling and subtraction, where the background is modeled with a generative method and background and foreground are classified by a discriminative technique. KDA is used to represent a probability density function of the background for RGB, gradient, and Haar-like features in each pixel, where 2D & 3D independent density functions are used for simplicity. For classification, an RVM based on the probability vectors for the given feature set is employed. Our algorithm demonstrates better performance than other density-based techniques such as GMM and KDE.