This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
We have seen rapid increase in volume of image and video collections. A huge amount of information is available every day in gigabytes. The new visual information is being generated, stored, and transmitted. However, it is very hard to access this visual information unless it is organized in a well-mannered to allow effective browsing, searching and retrieval. Image retrieval has been a very active research and development domain since the early 1970s.
During the early 1990s the research on video retrieval became of equal importance. A very popular means for image or video retrieval is to note images or video with text and to use text-based database management systems to perform image retrieval. However text-based notation has important drawbacks when presented with large volumes of images. A notation can in this fact become importantly intensive. Furthermore, since images are valuable in content text may in many applications not be valuable enough to describe images.
To overcome these difficulties in the early 1990s, content-based image retrieval comes up as a promising means for describing and retrieving images. Content-based image retrieval systems describe images by their own visual content rather than text such as color, texture, and shape information.
The MPEG-7 standardized description systems that allow users to search, identify, filter and browse content. The purpose of this report to provide overview of MPEG-7 content-based description and retrieval specifications.
2. SCOPE OF MPEG-7 VISUAL STANDARD
The ultimate goal and objective of MPEG-7 visual standard is to provide description of streamed and stored images or videos. Standardized header bits that help users or applications to identify, categorize or filter images or video. These low-level descriptors can be used to compare, filter or browse image or video purely based on nontext visual description content. They will be used differently for different user domains and different application environments.
These selected applications include digital libraries, broadcast media selection and multimedia editing. Among this diversity of possible applications, The MPEG-7 visual feature descriptors allow users or agents to perform following task taken as examples.
Graphics: Draw few lines on a screen and get in return a set of images containing similar graphics or logos.
Images: Define objects, including color patches or textures
Video: On a given set of video objects, describe object movements, camera motion or relations between objects and get in return a list of videos with similar or dissimilar temporal and spatial relations.
Video Activity: On a given video content describe actions and get a list of videos where similar action happens.
The MPEG-7 Visual Descriptors describe basic audio-visual content of media based on visual information. For images and video, the content may be described for example by the shape of objects, Object size, texture, color, movement of objects and camera motion
Fig. 2.1 Scope of MPEG-7
The descriptor need to be removed from the image or video content. It should be noted that MPEG-7 descriptor data may be physically associated with AV material in same data stream. Once MPEG-7 descriptors are available, suitable search engines can be employed to search, filter or browse visual material based on suitable similarity measures. It must be noted that practical search engine implementations maybe also including common text-based queries.
Figure 1 depicts the MPEG-7 processing chain to understand the range of the MPEG-7 standard in simple way. A typical application scenario involves MPEG-7 descriptors are produced from the content. It is important to understand for most visual descriptors. The MPEG-7 feature describes how to extract these features. Extraction for most parts of MPEG-7 visual standard is not establishing type.
MPEG-7 descriptors are being used for further processing for search and filtering of content is again not specified by MPEG-7 to leave maximum of flexibility to applications. In particular how similarity between images or video is defined is left to specific application requirements.
3. DEVELOPMENT OF THE STANDARD
MPEG develops specifications based on a well-defined standard development framework. When a standard activity such as MPEG-7 is agreed within MPEG. The MPEG standard is being developed in a combined effort through the definition of experimentation Model and a series of core experiments. This procedure already proved successful in the course of development of the MPEG-1, MPEG-2 and MPEG-4.
The purpose of an experimental model within MPEG-7 is to specify and implement feature extraction, encoding and decoding algorithm as well as search engines. An experimental model specifies the input and output formats for the uncoded data the elimination method used to obtain descriptor and the format of bitstream containing data.
Various proposals for color, texture, shape/contour, and motion descriptors were evaluated in performance tests in 1998, and the most promising proposals were adopted for the first visual experimental model. In subsequent meetings, these first tools were improved in the Core Experiment process by introduction of refinements and new promising algorithms. This refinement
process took until early 2001, when the successful descriptors were finally taken for the MPEG-7 Visual Standard.
A Core Experiment aims to improve to the current technology
in the experimental model. It is defined with respect to the experimental model, which includes the Common Core algorithm. A Core Experiment is established by the MPEG Video group if two independent clients are committed to perform the experiment. If a Core Experiment is successful in improving on a technique described in experimental model in terms of retrieval efficiency, provisions for functionalities not supported by the experimental model, and implementation complexity, with the
successful technique, is incorporated into the newest version of the experimental model. The technique may either replace an existing technique, or supplement the algorithms already supported by the experimental model.
Core Experiments are being performed between two MPEG Video group meetings. At each MPEG Video group meeting the results of the Core Experiments are reviewed.
4. VISUAL DESCRIPTOR FOR IMAGES AND VIDEO
The MPEG-7 descriptors that were developed can be broadly classified into general visual descriptors and domain specific visual descriptors. The former include color, texture and shape while the latter are application dependent and includes identification of human faces and face recognition. Since the standardization of the domain-specific descriptors is still under development, this paper concentrates on those general descriptors which can be used in most applications.
Fig 4.1 Three color images and their MPEG-7 histogram color distribution, depicted using a simplified color histogram. Based on the color distribution, the two left images would be recognized as more similar compared to the one on the right.
4.1 Visual Color Descriptor
Color is one of the most widely used visual features in image and video retrieval. Color feature are relatively strong to changes in background colors and independent of image size. Color descriptors can be used for describing content in still images and video.
Considerable design and experimental work, and rigorous testing, hane been performed in MPEG-7 to arrive at efficient color descriptors for similarity matching.
A brief overview of each descriptor is provided.
4.1.1 Color Spaces:
To allow interoperability between various color descriptors, spaces are constrained to hue-saturation-value (HSV) and hue-min-max-diff (HMMD). HSV is a well-known color space widely used in image applications. HMMD is a new color space defined by MPEG and is only used in the color structure descriptor (CSD).
4.1.2 Scalable Color Descriptor:
One of the most basic description of color features is provided by describing color distribution in images. If such a distribution is measured over an entire image, global color features can be described. Fig. 2 depicts examples of color images and their respective color distributions in a color histogram.
4.1.3 Dominant Color Descriptor:
This color descriptor aims to describe global as well as local spatial color distribution in images for high-speed retrieval and browsing. In contrast to the Color Histogram approach, this descriptor arrives at a much more compact representation at the expense of lower performance in some applications. Colors in a given region are clustered into a small number of representative colors.
4.1.4 Color Layout Descriptor:
This descriptor is designed to describe spatial distribution of color in an arbitrarily-shaped region. Color distribution in each region can be described using the Dominant Color Descriptor above. The spatial distribution
of color is an effective description for sketch-based retrieval, content filtering using image indexing, and visualization.
4.2 Visual Texture Descriptor
Texture refers to the visual patterns that have properties of homogeneity or not, that result from the presence of multiple colors or intensities in the image. It is a property of virtually any surface, including clouds, trees, bricks, hair, and fabric. It contains
important structural information of surfaces and their relationship to the surrounding environment. Describing textures in images by appropriate texture descriptors provides powerful means for similarity matching and retrieval.
Fig 4.2: Examples of grayscale images with different textures. Using the
MPEG-7 Visual texture descriptors, the two images on the bottom would be
rated of similar texture, while less similar in texture compared to the two
images on the top.
To illustrate texture properties a collection of images with different textures is depicted in Fig. 3. MPEG-7 has defined appropriate texture descriptors that can be employed for a variety of applications and tasks.
4.2.1 Homogenous Texture Descriptor
Fig 4.3: Frequency layout for MPEG-7 Homogenous Texture Descriptor frequency extraction. Energy deviation values are extracted from this frequency division into 30 channels
In order to describe the image texture, energy, and energy deviation, values are extracted from a frequency layout. The descriptor is based on a filter bank approach employing scale and orientation sensitive filters.
To arrive at scale and rotation-invariant description and matching of texture, the frequency space is partitioned into 30 channels with equal division in the angular direction and octave division in radial direction (see Fig. 4).
4.2.2 Non Homogenous Texture Descriptor:
In order to also provide descriptions for nonhomogeneous texture images, MPEG-7 defined an Edge Histogram Descriptor. This descriptor captures spatial distribution of edges, somewhat in the same spirit as the Color Layout Descriptor.
The extraction of this descriptor involves division of image into 16 no overlapping blocks of equal size. Edge information is then calculated for each block in five edge categories: vertical, horizontal, 45, 135, and no directional edge. It is expressed as a 5-bin histogram, one for each image block.
4.3 Visual Shape Descriptor
In many image data-base applications, the shape of image objects provides a powerful visual clue for similarity matching. Typical examples of such applications include binary images with written characters, trademarks.
It is usually required that the shape descriptor is invariant to scaling, rotation, and translation. Shape information can be 2-D or 3-D in nature, depending on the application. In general, 2-D shape description can be divided into two categories, contour based and region-based.
4.3.1 Region-based Descriptor
The MPEG-7 Region-Based Descriptor Angular Radial Transformation belongs to the class of moment invariants methods for shape description. This descriptor is suitable for shapes that can be best described by shape regions rather than contours. The main
idea behind moment invariants is to use region-based moments which are invariant to transformations as the shape feature.
The MPEG-7 ART descriptor employs a complex Angular Radial Transformation defined on a unit disk in polar coordinates to achieve this goal. Coefficients of ART basis functions are quantized and used for matching. The descriptor is very compact and also very robust to segmentation noise. Examples of similarity matching between various shapes using the ART descriptor are shown in Fig. 5.
Fig 4.4: Examples of various shapes that can be indexed using MPEG-7 Region-Based Shape Descriptor. Images contained in either of the sets (a)-(d) would be rated similar and dissimilar to the ones in the remaining sets. For example, images in set (a) would be identified being similar and dissimilar to the ones in set (b), (c), or (d).
4.3.2 Contour-Based shape Descriptor
Objects for which shape features are best expressed by contour information can be described using the MPEG-7 Contour-Based Descriptor. This descriptor is based on curvature scale-space representations of contours.
Fig. 4.5. Examples of shapes that can be indexed using MPEG-7 Contour-Based Shape Descriptor
A CCS index is used for matching and indicates the heights of the most prominent
peak, and the horizontal and vertical positions on the remaining peaks in the so-called CSS image. The average size of the descriptor is 122 bits/contour. Fig. 6(b)-(d) depicts similarity matching results using the MPEG-7 Contour-Based Shape
Fig. 5(a) shows examples of shapes which have similar region but different contour properties. Such objects would be considered as very different by the contour-based shape descriptor.
4.3.3 2D/3D Shape Descriptor
The shape of a 3-D object can be described approximately by a limited number of 2-D shapes which are taken as 2-D snapshots from different viewing angles.
The MPEG-7 2-D shape descriptors can be used to describe each of the 2-D shapes taken as snapshots from the 3-D object.
A similarity matching between 3-D objects thus involves matching multiple pairs of 2-D views taken one from each of the objects. In general, good performance for 3-D shapes has been demonstrated using the MPEG-7 2-D Contour-Based Descriptor.
4.4 Motion Descriptors
There are four motion Descriptors: camera motion, object motion trajectory, parametric object motion, and motion activity.
4.4.1 The CameraMotion Descriptor
It characterizes 3-D camera motion parameters. It supports the following basic camera operations: fixed, tracking (horizontal transverse movement, also called traveling in the film industry), booming (vertical transverse movement), horizontal rotation, tilting (vertical rotation), rolling (rotation around the optical axis) and zooming (change of the focal length), The Descriptor is based on time intervals characterized by their start time, and duration.
The Descriptor can describe a combination of different types of camera motion. The mix mode takes global information about the camera motion parameters disobey
full temporal information.
4.4.2 The Motion Trajectory Descriptor
It characterizes the temporal evolution of key-points. It is composed of a list of key points along with a set of functions that describe in the trajectory between key-points. The velocity is known by the important specification and the moving between two key components can be estimated if a second order interpolating function is used. These key components are specified by their time and their coordinates depending on the application.
Fig: 4.6 Camera model for MPEG-7 Camera Motion Descriptor. Perspective projection to image plane p and camera motion parameters. The (virtual) camera is located in O.
The insertion functions are defined for each component of x, y and z are independent. The granularity of the descriptor is selected through the number of key components used for each time interval.
Parametric motion model used in the various image processing and analysis applications. The Parametric Motion Descriptors defines the motion of regions in video sequences same in 2 Dimensional parametric model.
Especially the models include translations, rotations, scaling
and combination of them. Finally, quadratic models gives it's possible description for more complex movements.
The parametric model is associated with arbitrary regions over a specified time interval. The motion is snapped in a closed manner as a reduced set of parameters.
A human watching a video in sequence observes it as being a slow sequence, a "fast paced" sequence, an "action" sequence, etc. Examples of high activity include scenes such as "scoring in a basketball game" and a high speed car chase etc. On the other way incident such as "news reader shot" or "an interview scene" are perceived as low action shots. The motion Activity Descriptor is based of five main features: the intensity of the motion activity (value between 1 and 5), the direction of the activity (optional), the spatial localization, the spatial and the temporal distribution of the activity.
4.5 Face Descritor
The Face Descriptor can be used to obtain face images that match a query face image. The Descriptor is based on the classical faces approach. It represents the projection of a face region onto a set of basis 49 vectors which pick the space of possible face vectors.
5. OTHER COMPONENTS of MPEG-7
5.1 MPEG-7 Audio
MPEG-7 Audio specifies a set of standardized audio descriptors. MPEG-7 Audio descriptors address four classes of audio signals: pure music, pure speech, pure sound effects and random soundtrack. Audio descriptor may address audio feature such as silence, spoken content, sound effects etc.
Audio descriptor may require other low level categories such as scalable series and Audio Description framework.
Examples of standardized Ds for various audio features are as follows:
Silence descriptor such as silence type.
Spoken content descriptor such as spoken content speaker type
Sound effects descriptor such as Audio Spectrum Basis Type and Sound effect feature type.
A number of descriptor such as that for spoken content, Sound effects which is utilize the descriptors have also been defined.
5.2 MPEG-7 Multimedia Description Schemes:
MDs specifies high level framework that allows generic description of all kinds of multimedia including audio, visual and textual data.
Figure shows an overview of levels and relationship between levels in MDS hierarchy. The lowest level, called the basic elements, consists of data types, mathematical structures, linking and media localization tools, and elementary DSs.
Fig 5.1 Overview of MPEG-7 MDs
The next level, called the content management & content description, builds on the lowest level. It describes the content from several viewpoints: creation and
production, media, usage, structural aspects, and conceptual aspects.
The first three elements address primarily information related to the management of the content (content management), while the last two are devoted to the description of perceivable information (content description).
The direct description of the content provided by these five sets of elements tools are also defined for navigation and access. Variation and Decomposition elements, allow different multimedia presentations to the capabilities of the client terminals, network conditions and user preferences.
Some tools are defined for defining user preferences and use history for enhancing the user interaction experience. The last set of tools address the organization of content by collections and classification and by use of models.
5.3 MPEG-7 Reference Software
MPEG-7 Reference Software aims to provide a reference implementation of the relevant parts of the MPEG-7 Standard and is known as experimentation software
Some software for extracting Descriptor is also included, the focus is on creating bit streams of Descriptors and DSs with normative syntax, rather than the performance of the tools.
Currently it includes components in four categories: DDL parser and DDL validation parser, visual Descriptors, audio Descriptors, and multimedia DSs.
5.4 MPEG-7 Conformance
It aims to provide guidelines and procedures for testing the conformance of MPEG-7 implementations and has only recently been started.
5.5 MPEG-7 Systems
It specifies system level functionalities such as preparation of MPEG-7 descriptions for efficient transport/storage, synchronization of content and descriptions, and development of conformant decoders.
Fig. shows a high-level architecture of a terminal that uses MPEG-7 descriptions, and is referred to as an MPEG-7 terminal. The MPEG-7 data is obtained from transport
or storage and handed over to the delivery layer that allows extraction of elementary streams by undoing the transport/storage specific framing and multiplexing, and
retains timing information needed for synchronization.
Figure 5.2 MPEG-7 Terminal
The elementary streams consisting of individually accessible chunks called access units are forwarded to the compression layer where the streams describing structure of MPEG-7 data as well as the streams describing the content are decoded.
5.6 MPEG-7 DDL (Description Definition Language)
It is a standardized language for defining new DSs and Descriptors, as well as extending or modifying existing DSs and Descriptors.
MPEG-7 DDL is derived by extension of XML Schema. While the
XML Schema has many of the capabilities needed by MPEG-7 it had to be extended to address other requirements specific to MPEG-7.
The resulting language satisfies the following requirements necessary for MPEG-7:
D and description scheme declaration
abstract D and description scheme
description scheme inclusion.
The MPEG-7 Standard for visual content description was explained. The MPEG-7 Visual standard specifies content based descriptors that can be used to efficiently identify, filter, or browse images or video based on visual content rather than text. MPEG-7 descriptors are extracted from images or video sequences using suitable extraction methods and can be stored or transmitted entirely separate from the media content.