Scope Of Mpeg 7 Visual Standard Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

We have seen rapid increase in volume of image and video collections. A huge amount of information is available every day in gigabytes. The new visual information is being generated, stored, and transmitted. However, it is very hard to access this visual information unless it is organized in a well-mannered to allow effective browsing, searching and retrieval. Image retrieval has been a very active research and development domain since the early 1970s.

During the early 1990s the research on video retrieval became of equal importance. A very popular means for image or video retrieval is to note images or video with text and to use text-based database management systems to perform image retrieval. However text-based notation has important drawbacks when presented with large volumes of images. A notation can in this fact become importantly intensive. Furthermore, since images are valuable in content text may in many applications not be valuable enough to describe images.

To overcome these difficulties in the early 1990s, content-based image retrieval comes up as a promising means for describing and retrieving images. Content-based image retrieval systems describe images by their own visual content rather than text such as color, texture, and shape information.

The MPEG-7 standardized description systems that allow users to search, identify, filter and browse content. The purpose of this report to provide overview of MPEG-7 content-based description and retrieval specifications.


The ultimate goal and objective of MPEG-7 visual standard is to provide description of streamed and stored images or videos. Standardized header bits that help users or applications to identify, categorize or filter images or video. These low-level descriptors can be used to compare, filter or browse image or video purely based on nontext visual description content. They will be used differently for different user domains and different application environments.

These selected applications include digital libraries, broadcast media selection and multimedia editing. Among this diversity of possible applications, The MPEG-7 visual feature descriptors allow users or agents to perform following task taken as examples.

Graphics: Draw few lines on a screen and get in return a set of images containing similar graphics or logos.

Images: Define objects, including color patches or textures

Video: On a given set of video objects, describe object movements, camera motion or relations between objects and get in return a list of videos with similar or dissimilar temporal and spatial relations.

Video Activity: On a given video content describe actions and get a list of videos where similar action happens.

The MPEG-7 Visual Descriptors describe basic audio-visual content of media based on visual information. For images and video, the content may be described for example by the shape of objects, Object size, texture, color, movement of objects and camera motion


Fig. 2.1 Scope of MPEG-7

The descriptor need to be removed from the image or video content. It should be noted that MPEG-7 descriptor data may be physically associated with AV material in same data stream. Once MPEG-7 descriptors are available, suitable search engines can be employed to search, filter or browse visual material based on suitable similarity measures. It must be noted that practical search engine implementations maybe also including common text-based queries.

Figure 1 depicts the MPEG-7 processing chain to understand the range of the MPEG-7 standard in simple way. A typical application scenario involves MPEG-7 descriptors are produced from the content. It is important to understand for most visual descriptors. The MPEG-7 feature describes how to extract these features. Extraction for most parts of MPEG-7 visual standard is not establishing type.

MPEG-7 descriptors are being used for further processing for search and filtering of content is again not specified by MPEG-7 to leave maximum of flexibility to applications. In particular how similarity between images or video is defined is left to specific application requirements.


MPEG develops specifications based on a well-defined standard development framework. When a standard activity such as MPEG-7 is agreed within MPEG. The MPEG standard is being developed in a combined effort through the definition of experimentation Model and a series of core experiments. This procedure already proved successful in the course of development of the MPEG-1, MPEG-2 and MPEG-4.

The purpose of an experimental model within MPEG-7 is to specify and implement feature extraction, encoding and decoding algorithm as well as search engines. An experimental model specifies the input and output formats for the uncoded data the elimination method used to obtain descriptor and the format of bitstream containing data.

Various proposals for color, texture, shape/contour, and motion descriptors were evaluated in performance tests in 1998, and the most promising proposals were adopted for the first visual experimental model. In subsequent meetings, these first tools were improved in the Core Experiment process by introduction of refinements and new promising algorithms. This refinement

process took until early 2001, when the successful descriptors were finally taken for the MPEG-7 Visual Standard.

A Core Experiment aims to improve to the current technology

in the experimental model. It is defined with respect to the experimental model, which includes the Common Core algorithm. A Core Experiment is established by the MPEG Video group if two independent clients are committed to perform the experiment. If a Core Experiment is successful in improving on a technique described in experimental model in terms of retrieval efficiency, provisions for functionalities not supported by the experimental model, and implementation complexity, with the

successful technique, is incorporated into the newest version of the experimental model. The technique may either replace an existing technique, or supplement the algorithms already supported by the experimental model.

Core Experiments are being performed between two MPEG Video group meetings. At each MPEG Video group meeting the results of the Core Experiments are reviewed.


The MPEG-7 descriptors which were build on can be broadly named as general visual descriptors and domain specific visual descriptors. The past include color, texture and shape while the latter are application dependent and includes recognizing of human faces. Since the domain-specific descriptor is yet under construction this report focus on the general descriptors which will be used in the most part of applications.

Fig 4.1 Three color images and their MPEG-7 histogram color distribution, depicted using a simplified color histogram. Based on the color distribution, the two left images would be recognized as more similar compared to the one on the right.

4.1 Visual Color Descriptor

Color is one of the most broadly used visual features in image and video regain. Color feature are relatively strong to changes in background colors and independent of image size. Color descriptors may be used for explanation of content in images and video.

Sensible design and practical work, and severe testing has been carry into MPEG-7 to arrive at capable color descriptors for resemblance matching.

A brief overview of each descriptor is provided.

4.1.1 Color Spaces:

To allow interaction between various color descriptors spaces are forced to hue-saturation-value (HSV) and hue-min-max-diff (HMMD). HSV is a well-known color space widely used in image applications. HMMD is a new color space defined by MPEG and is only used in the color structure descriptor (CSD).

4.1.2 Scalable Color Descriptor:

One of the most basic explanation of color features is describing color distribution in images. If this is measured above a whole image global color feature can be in picture. Fig. 2 explains the examples of color images and their related color distributions in a color histogram figure.

4.1.3 Dominant Color Descriptor:

This color descriptor aims to describe global as well as local spatial color distribution in images for high-speed retrieval and browsing. In contrast to the Color Histogram approach, this descriptor arrives at a much more compact representation at the expense of lower performance in some applications. Colors in a given region are clustered into a small number of representative colors.

4.1.4 Color Layout Descriptor:

This descriptor is designed to describe spatial distribution of color in an arbitrarily-shaped region. Color distribution in each region can be described using the Dominant Color Descriptor above. The spatial distribution

of color is an effective description for sketch-based retrieval, content filtering using image indexing, and visualization.

4.2 Visual Texture Descriptor

Texture makes an appeal to the visual patterns that have properties of similarity or not, that result from the available of multiple colors the image. It is a property of almost any surface, including clouds, trees, bricks, hair, and fabric. It contains important information of surfaces and their relation to the environment. Describing textures in images by suitable texture descriptors which provide strong means for similarity matching and retrieval.


Fig 4.2: Examples of grayscale images with different textures. Using the

MPEG-7 Visual texture descriptors, the two images on the bottom would be

rated of similar texture, while less similar in texture compared to the two

images on the top.

To provide an example of texture properties a collection of images with different textures is described in Fig. 4.3. MPEG-7 has defined suitable texture descriptors that can be used for a bunch of applications and tasks.

4.2.1 Homogenous Texture Descriptor

Fig 4.3: Frequency layout for MPEG-7 Homogenous Texture Descriptor frequency extraction. Energy deviation values are extracted from this frequency division into 30 channels

In order to description of the image texture and energy values are removed from a frequency layout.

To arrive at scalable and moving description and matching of texture the frequency space is separated into 30 channels with equal division in the angular direction.

4.2.2 Non Homogenous Texture Descriptor:

In order to also provide descriptions for nonhomogeneous texture images MPEG-7 explaining an Edge Histogram Descriptor. This descriptor takes spatial distribution of edges same as the Color Layout Descriptor.

The removal of this descriptor involves partition of images into 16 overlapping blocks are not of equal size. Edge information is then calculated for each block in five edge categories: vertical, horizontal, 45, 135, and no directional edge. It is depicted as a 5-bin histogram one for each image block.

4.3 Visual Shape Descriptor

In many image applications the shape of image objects gives a strong visual clue for similarity matching. examples of such applications include binary images with written characters.

It is usually required that the shape descriptor is unchangeable for scaling, rotation, and translation. Shape information can be 2D/ 3D in nature relying on the application. In general 2-D shape description can be partitioned into two categories contour based and region-based.

4.3.1 Region-based Descriptor

The MPEG-7 Region-Based Descriptor Angular Radial Transformation belongs to the unchangeable methods for shape description. This descriptor is approaching shapes that can be explained by shape regions rather than contours.

The MPEG-7 ART descriptor employs a complex Angular Radial Transformation defined on a individual coordinates to achieve this goal. Coefficients of ART basis functions are determined and used for similarity. The descriptor is very closely and also very firm to noise. Examples of similarity matching between various shapes using the ART descriptor are shown in Fig. 5.


Fig 4.4: Examples of various shapes that can be indexed using MPEG-7 Region-Based Shape Descriptor. Images contained in either of the sets (a)-(d) would be rated similar and dissimilar to the ones in the remaining sets. For example, images in set (a) would be identified being similar and dissimilar to the ones in set (b), (c), or (d).

4.3.2 Contour-Based shape Descriptor

Objects for which shape features are best expressed by contour information can be described using the MPEG-7 Contour-Based Descriptor. This descriptor is based on curvature scale-space representations of contours.


Fig. 4.5. Examples of shapes that can be indexed using MPEG-7 Contour-Based Shape Descriptor

The rate size of the descriptor is 122. Fig. 6(b)-(d) explained similarity matching results using the MPEG-7 Contour-Based Shape descriptor.

Fig. 4.5(a) shows examples of shapes which have same region but different contour properties. Such objects would be concise as very different by the contour-based shape descriptor.

4.3.3 2D/3D Shape Descriptor

The shape of a 3-D object can be explained approximately by a limited number of 2-D shapes which are taken as 2-D snapshots from different angles.

The MPEG-7 2-D shape descriptors can be used to describe each of the 2-D shapes captured from the 3-D object.

A similarity matching between 3-D objects includes matching multiple pairs of 2-D views taken one from each of the objects. In general better performance for 3-D shapes has been show using the MPEG-7 2 Dimensional Contour-Based Descriptor.

4.4 Motion Descriptors

There are four motion Descriptors: camera motion, object motion trajectory, parametric object motion, and motion activity.

4.4.1 The CameraMotion Descriptor

It characterizes 3-D camera motion parameters. It supports the following basic camera operations: fixed, horizontal transverse movement also called as traveling in the film industry, booming (vertical transverse movement), horizontal rotation, tilting (vertical rotation), rolling (rotation around the optical axis) and zooming (change of the focal length), The Descriptor is based on time intervals characterized by their start time, and duration.

The Descriptor can describe a combination of different types of camera motion. The mix mode takes global information about the camera motion parameters disobey

full temporal information.

4.4.2 The Motion Trajectory Descriptor

It characterizes the temporal evolution of key-points. It is composed of a list of key points along with a set of functions that describe in the trajectory between key-points. The velocity is known by the important specification and the moving between two key components can be estimated if a second order interpolating function is used. These key components are specified by their time and their coordinates depending on the application.


Fig: 4.6 Camera model for MPEG-7 Camera Motion Descriptor. Perspective projection to image plane p and camera motion parameters. The (virtual) camera is located in O.

The insertion functions are defined for each component of x, y and z are independent. The granularity of the descriptor is selected through the number of key components used for each time interval.

Parametric motion model used in the various image processing and analysis applications. The Parametric Motion Descriptors defines the motion of regions in video sequences same in 2 Dimensional parametric model.

Especially the models include translations, rotations, scaling

and combination of them. Finally, quadratic models gives it's possible description for more complex movements.

The parametric model is associated with arbitrary regions over a specified time interval. The motion is snapped in a closed manner as a reduced set of parameters.


A human watching a video in sequence observes it as being a slow sequence, a "fast paced" sequence, an "action" sequence, etc. Examples of high activity include scenes such as "scoring in a basketball game" and a high speed car chase etc. On the other way incident such as "news reader shot" or "an interview scene" are perceived as low action shots. The motion Activity Descriptor is based of five main features: the intensity of the motion activity (value between 1 and 5), the direction of the activity (optional), the spatial localization, the spatial and the temporal distribution of the activity.

4.5 Face Descritor

The Face Descriptor can be used to obtain face images that match a query face image. The Descriptor is based on the classical faces approach. It represents the projection of a face region onto a set of basis 49 vectors which pick the space of possible face vectors.


5.1 MPEG-7 Audio

MPEG-7 Audio specifies a set of standardized audio descriptors. MPEG-7 Audio descriptors address four classes of audio signals: pure music, pure speech, pure sound effects and random soundtrack. Audio descriptor may address audio feature such as silence, spoken content, sound effects etc.

Audio descriptor may require other low level categories such as scalable series and Audio Description framework.

Examples of standardized Ds for various audio features are as follows:

Silence descriptor such as silence type.

Spoken content descriptor such as spoken content speaker type

Sound effects descriptor such as Audio Spectrum Basis Type and Sound effect feature type.

A number of descriptor such as that for spoken content, Sound effects which is utilize the descriptors have also been defined.

5.2 MPEG-7 Multimedia Description Schemes:

MDs specifies high level framework that allows generic description of all kinds of multimedia including audio, visual and textual data.

Figure shows an overview of levels and relationship between levels in MDS hierarchy. The lowest level, called the basic elements, consists of data types, mathematical structures, linking and media localization tools, and elementary DSs.


Fig 5.1 Overview of MPEG-7 MDs

The next level, called the content management & content description, builds on the lowest level. It describes the content from several viewpoints: creation and

production, media, usage, structural aspects, and conceptual aspects.

The first three elements address primarily information related to the content management, while the last two are used for the description of content.

The direct description of the content provided by these five sets of tools which are also defined for path detection and access. Variation and Decomposition elements, allow different multimedia presentations to the capabilities of the client terminals, network conditions and user preferences.

Some tools are defined for defining user preferences and use history for enhancing the user interaction experience. The last set of tools address the organization of content by collections and classification and by use of models.

5.3 MPEG-7 Reference Software

MPEG-7 Reference Software aims to provide a reference implementation of the relevant parts of the MPEG-7 Standard and is known as experimentation software

Some software for extracting Descriptor is also included, the focus is on creating bit streams of Descriptors and DSs with normative syntax, rather than the performance of the tools.

Currently it includes components in four categories: DDL parser and DDL validation parser, visual Descriptors, audio Descriptors, and multimedia DSs.

5.4 MPEG-7 Conformance

It aims to provide guidelines and procedures for testing the conformance of MPEG-7 implementations and has only recently been started.

5.5 MPEG-7 Systems

It specifies system level functionalities such as preparation of MPEG-7 descriptions for efficient transport/storage, synchronization of content and descriptions, and development of conformant decoders.

Fig. shows a high-level architecture of a terminal that uses MPEG-7 descriptions, and is referred to as an MPEG-7 terminal. The MPEG-7 data is obtained from transport

or storage and handed over to the delivery layer that allows extraction of elementary streams by undoing the transport/storage specific framing and multiplexing, and

retains timing information needed for synchronization.


Figure 5.2 MPEG-7 Terminal

The elementary streams consisting of individually accessible chunks called access units are forwarded to the compression layer where the streams describing structure of MPEG-7 data as well as the streams describing the content are decoded.

5.6 MPEG-7 DDL (Description Definition Language)

It is a standardized language for defining new DSs and Descriptors, as well as extending or modifying existing DSs and Descriptors.

MPEG-7 DDL is derived by extension of XML Schema. While the

XML Schema has many of the capabilities needed by MPEG-7 it had to be extended to address other requirements specific to MPEG-7.

The resulting language satisfies the following requirements necessary for MPEG-7:

datatype definition

D and description scheme declaration

attribute declaration

typed reference

content model

inheritance/subclassing mechanism

abstract D and description scheme

description scheme inclusion.


The MPEG-7 Standard for visual content description was explained. The MPEG-7 Visual standard specifies content based descriptors that can be used to efficiently identify, filter, or browse images or video based on visual content rather than text. MPEG-7 descriptors are extracted from images or video sequences using suitable extraction methods and can be stored or transmitted entirely separate from the media content.