This dissertation has been submitted by a student. This is not an example of the work written by our professional dissertation writers.

Chapter I


1.1 What is meant by Multimedia Data?

A number of data types can be characterized as multimedia data types. These data types are normally the essentials for the building blocks of core multimedia environments, platforms and integrating tools. The basic types can be described as text, images, audio, video and graphic objects. Following is a detailed explanation for the same.


Text can be stored in a variety of different forms. In addition to American Standard Code for Information Interchange (ASCII) based files, text is usually stored in spreadsheets, annotations, processor files, databases and common multimedia objects. The task of text storage is becoming more and more complex due to the easy availability and abundance of Graphical User Interfaces (GUIs) and text fonts, permitting unique effects such as text color, text shade etc.


Digitalized images are nothing but a string of pixels that signify an area in the user’s graphical exhibit. There is an immense variation in the quality and dimension of storage for motionless images. For motionless (still) images, the space overhead varies with respect to complexity, size, resolution and compression format used to store any given image. The frequently used and accepted image formats (file extensions) consist of bmp, jpeg, tiff and png.


Audio, being another frequently used data type is relatively space intensive. A minute of sound takes up to 3 Megabytes (MB) of space. Numerous methods can be deployed to compress an audio into suitable formats.


Another data type which consumes majority of space is categorized as the digitalized video data type. Videos are normally stored as a series of frames, the capacity of which depends on its resolution. A solo video frame can take up to 1 MB of space. Continuous transfer rate is needed to get a reasonable video playback with its proper transmission, compression, and decompression.

Graphic Objects

This data type consists of unique data structures that can define 2D and 3D shapes which further helps in defining multimedia objects. Today one can use different formats for image applications and video-editing applications. To list few examples Computer Aided Design (CAD) and Computer Aided Manufacturing (CAM) are graphic objects

1.2 How is Multimedia Data Different?

Theoretically multimedia data should be considered like any regular data based on the data types for instance numbers, dates and characters. Though, there are a few challenges that arise from multimedia as described in [2]:

• Multimedia data is usually captured with various unreliable capturing techniques such as image processing. These multimedia processing techniques require capabilities for handling these various available methods of capturing content, this includes both automated and manual methods.

• In multimedia database, the queries created by the user rarely come back with textual answer. To a certain extent, the answer to user query is a compound multimedia presentation that the user can glance through at one’s leisure.

• The size of the multimedia data being large not only affects the storage, retrieval but also the transmission of data.

• Time to retrieve information may be vital while accessing video and audio databases, for example Video on Demand.

• Automatic feature extraction and Indexing: User explicitly submits the attribute values of objects inserted into the database in contrast to advanced tools with conservative databases, such as image processing and pattern recognition tools for images to extract the various features and content of multimedia objects. Special data structures for storage and indexing are needed due to the large size of data.

1.3 Basic Approaches for Data Retrieval

Data management is being implemented since long. Many approaches have also been invented for the same to manage and inquire various types of data in the computer systems. The commonly used approaches for data management comprise of conventional database system, information retrieval system, content based retrieval system and graph/ tree pattern matching. The details for the same are as follows:

Conventional database system

It is the most extensively used approach to manage as well as investigate structured data. Data in a database system must match to some predefined structures and limitations (schema’s). The user should specify the data objects to be retrieved and the tables from which data has to be extracted. The user also has to predicate on which the retrieval of data will be based to formulate a database query. SQL, a query language has a restricted syntax and vocabulary that can be used for such databases.

Information retrieval (IR) system

This system is prominently used to search enormous text collections; where in the content of the data (text) is illustrated with the help of an indexer using keywords or a textual summary. The query demands are expressed in terms of keywords or natural language. For instance, searching for an image or video, the user is required to describe using words and also need means to store large amount of metadata in textual form.

Content based retrieval (CBR) system

This approach facilitates in the retrieval of multimedia objects from an enormous collection. The retrieval is based on various features such as color, texture and shape which can be extracted automatically from the objects. Though keyword can be considered a feature for textual data, conventional retrieval of information has a higher performance as compared to content-based retrieval.

This is due to the fact that keyword has the demonstrated ability to characterize semantics while no other features have revealed convincing semantic describing capability. A key disadvantage of this particular approach is its lack of accuracy.

Graph or tree pattern matching

This particular approach seeks the retrieval of object sub-graphs from an object graph as per several designated patterns.

Chapter II

Data Structures for Multimedia Storage

Many modern database applications deal with large amounts of multidimensional data. Multimedia content-based retrieval is one of the examples. Access Methods are essential in order to deal with multidimensional data efficiently. They are used to access selective data from a big collection.

2.1 Importance of Access Methods

Efficient spatial selection support is the key purpose of access methods. These include range queries or nearest neighbour queries of spatial objects. The significance of these access methods and how they take into account both clustering techniques and spatial indexing is described by Peter Van Oosterom [3]. In the absence of a spatial index, every object in the database needs to be checked if it meets the selection criteria. Clustering is required to group the objects that are often requested together. Or else, many different disk pages will have to be fetched, resulting in a very slow response.

For spatial selection, clustering implies storing objects that are not only close in reality but also close in computer memory instead of being scattered all over the whole memory.

In conventional database systems sorting the data is the basis for efficient searching. Higher dimensional data cannot be sorted in an obvious manner, as it is possible for text strings, numbers, or dates. Principally, computer memory is one-dimensional. However, spatial data is 2D, 3D or even higher and must be organized someway in the memory. An intuitive solution to organize the data is using a regular grid just as on a paper map. Each grid cell has a unique name e.g. ’A1’, ’C2’, or ’E5’. The cells are stored in some order in the memory and can each contain a fixed number of object references. In a grid cell, a reference is stored to an object whenever the object overlaps the cell. However, this will not be very efficient due to the irregular data distribution of spatial data because of which many cells will be empty while many others will be overfull. Therefore, more advanced techniques have been developed.

2.2 kd Trees

A kd-tree or a k-dimensional tree is a space-partitioning data structure used for organizing points in a k-dimensional space. kd-trees are a useful for several applications such as searches involving a multidimensional search key like range searches and nearest neighbour searches. Kd-trees are a special case of Binary Space Partitioning (BSP) trees.

A kd-tree only uses splitting planes that are perpendicular to one of the coordinate axes. This is different from BSP trees, in which arbitrary splitting planes can be used. In addition to this, every node of a kd-tree, from the root to the leaves, stores a point. Whereas in BSP trees, leaves are typically the only nodes that contain points. As a consequence, each splitting plane must go through one of the points in the kd-tree. [4]

2.2.1 Addition of elements to kd trees

A new point is added to a kd tree in the same way as one adds an element to any other tree. At first, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the left or right side of the splitting plane. Once you get to a leaf node, add the new point as either the left or right child of the leaf node, again depending on which side of the node’s splitting plane contains the new point.

2.2.2 Deleting from kd trees

Deletion is similar as in Binary Search Tree (BST) but slightly harder.

Step1 find node to be deleted.

Step2 two cases must be handled:

(a) No children - replace pointer to node by NULL

(b) Has children - replace node by minimum node in right subtree. If no right subtree exists then first move left subtree to become right subtree. [1]

2.3 Quad-trees

Each node of a quad-tree is associated with a rectangular region of space. The top node is associated with the entire target space. Each non-leaf node divides its region into four equal sized quadrants, likewise, each such node has four child nodes corresponding to the four quadrants and so on. Leaf nodes have between zero and some fixed maximum number of points.

2.3.1 Simple definition of node structure of a point quad-tree

qtnodetype = record

INFO: infotype;

XVAL: real;

YVAL: real;

NW, SW, NE, SE: *qtnodetype


Here, INFO is some additional information regarding that point .

XVAL, YVAL are coordinates of that point.

NW, SW, NE, SE are pointers to regions obtained by dividing given region. [1]

2.3.2 Common uses of Quad-trees

1. Image Representation

2. Spatial Indexing

3. Efficient collision detection in two dimensions

4. Storing sparse data, such as formatting information for a spreadsheet or for some matrix calculations.

2.3.3 Representing Image Using Quad-tree: [7]

Let us suppose we divide the picture area into 4 sections. Those 4 sections are then further divided into 4 subsections. We continue this process, repeatedly dividing a square region by 4. We must impose a limit to the levels of division otherwise we could go on dividing the picture forever. Generally, this limit is imposed due to storage considerations or to limit processing time or due to the resolution of the output device. A pixel is the smallest subsection of the quad tree.

To summarize, a square or quadrant in the picture is either :

1. entirely one color

2. composed of 4 smaller sub-squares

To represent a picture using a quad tree, each leaf must represent a uniform area of the picture. If the picture is black and white, we only need one bit to represent the colour in each leaf; for example, 0 could mean black and 1 could mean white. Now consider the following image : The definition of a picture is a two dimensional array, where the elements of the array are colored points.

Figure 2.3: First three levels of quad-tree

Figure 2.4: Given Image

This is how the above image could be stored in quad-tree.

Figure 2.5: 8x8 pixel picture represented in a quad-tree

Figure 2.6: The quad tree of the above example picture. The quadrants are shown in counterclockwise order from the top-right quadrant. The root is the top node. (The 2nd and 3rd quadrants are not shown.)

2.3.4 Advantages of Quad-trees:

1. They can be manipulated and accessed much quicker than other models.

2. Erasing an image takes only one step. All that is required is to set the root node to neutral.

3. Zooming to a particular quadrant in the tree is also a one step operation.

4. To reduce the complexity of the image, it suffices to remove the final level of nodes.

5. Accessing particular regions of the image is a very fast operation. This is useful for updating certain regions of an image, perhaps for an environment with multiple windows.

The main disadvantage is that it takes up a lot of space.

2.4 R-trees

R-trees are N-dimensional extension of Binary trees, but are used for spatial access methods i.e., for indexing multi-dimensional information. They are supported in many modern database systems, along with variants like R+ -trees and R*-trees. The data structure splits space with hierarchically nested, and possibly overlapping, minimum bounding rectangles.[4]

A rectangular bounding box is associated with each tree node. [5]

• Bounding box of a leaf node is a minimum sized rectangle that contains all

the rectangles/polygons associated with the leaf node.

• Bounding box associated with a non-leaf node contains the bounding box associated with all its children.

• Bounding box of a node serves as its key in its parent node (if any)

• Bounding boxes of children of a node are allowed to overlap.

2.4.1 Structure of an R-tree node

rtnodetype = record

Rec1, ....Reck : rectangle

P1, ....Pk : ∗rtnodetype


A polygon is stored in one node, and the bounding box of the node must contain the polygon. Since a polygon is stored only once, the storage efficiency of R-trees is better than that of k-d trees or quad-trees.

The insertion and deletion algorithms use the bounding boxes from the nodes to ensure that close by elements are placed in the same leaf node. Each entry within a leaf node stores two-pieces of information; a way of identifying the actual data element and the bounding box of the data element.

2.4.2 Inserting a node

1. Find a leaf to store it, and add it to the leaf.

• To find leaf, follow a child (if any) whose bounding box contains bounding box of data item, else child whose overlap with data item bounding box is maximum

2. Handle overflows by splits. We may need to divide entries of an overfull node into two sets such that the bounding boxes have minimum total area.

2.4.3 Deleting a node

1. Find the leaf and delete object; determine new MBR.

2. If the node is too empty:

• Delete the node recursively at its parent

• Insert all entries of the deleted node into the R-tree

2.4.4 Searching R-trees

Similarly, for searching algorithms, bounding boxes are used to decide whether or not to search inside a child node. Here we need to find minimal bounding rectangle. In this way, most of the nodes in the tree are never touched during a search.

1. If the node is a leaf node, output the data items whose keys intersect the given query point/region

2. Else, for each child of the current node whose bounding box overlaps the query point/region, recursively search the child.

2.5 Comparison of Different Data Structures [1]

• k-d trees are very easy to implement. However, in general a k-d tree consisting k nodes may have a height k causing complexity of both insertion and search in k-d trees to be high. In practice, path lengths (root to leaf) in k-d trees tend to be longer than those in point quad-trees because these trees are binary.

• R-trees have a large number of rectangles potentially stored in each node. They are appropriate for disk access by reducing the height of the tree, thus leading to fewer disk access.

• The disadvantage of R-trees is that the bounding rectangle associated with different nodes may overlap. Thus when searching an R-tree, instead of following one path (as in case of quad-tree), we might follow multiple path down the tree. This difference grows even more acute when range search and neighbour searches are considered.

• In case of point quad-trees, while performing search/insertion each case requires comparisons on two coordinates. Deletion in point quad-trees is difficult because finding a candidate replacement node for the node being deleted is not easy.

Chapter III


Metadata is data about data. Any data that is used to describe the content, condition, quality and other aspects of data for humans or machines to locate, access and understand the data is known as Metadata. Metadata helps the users to get an overview of the data.

3.1 Need of Metadata

The main functions of metadata can be listed as follows: [8]


To describe and identify data sources. These descriptions help create catalogs, index, etc., thereby improving access to them.


Formulation of queries.


To provide information to help manage and administrate a data source, such as when and how it was created, and who can legally access it.


To facilitate data archival and preservation like data refreshing and migration, etc.


To indicate how a system functions or metadata behaves, such as data formats, compression ratios, scaling routines, encryption key, and security, etc.


To indicate the level and type of use of data sources like multiversion, user tracking, etc.

3.2 Metadata in the Life Cycle of Multimedia Objects

A multimedia object undergoes a life cycle consisting of production, organization, searching, utilization, preservation, and disposition. Metadata passes through similar stages as an integral part of these multimedia objects [8]:


Objects of different media types are created often generating data of how they were produced (e.g., the EXIF files produced by digital cameras) and stored in an information retrieval system. Associated metadata is generated accordingly for administrating and describing the objects.


Multimedia objects may be composed of several components. Metadata is created to specify how these compound objects are put together.

Searching and retrieval

Created and stored multimedia objects are subject to search and retrieval by users. Metadata provides aids through catalog and index to enable efficient query formulation and resource localization.


Retrieved multimedia objects can be further utilized, reproduced, and modified. Metadata related to digital rights management and version control, etc. may be created.

Preservation and disposition

Multimedia objects may undergo modification, refreshing, and migration to ensure their availability. Objects that are out-of-date or corrupted may be discarded. Such preservation and disposition activities can be documented by the associated metadata.

3.3 Classification of Metadata

Metadata directly affects the way in which objects of different media types are used. Classifying metadata can facilitate the handling of different media types in a multimedia information retrieval system. Based on its (in)dependence on media contents, metadata can be classified into two kinds, namely content independent and content-dependent metadata [8]:

• Content-independent metadata provides information which is derived independently from the content of the original data. Examples of content independent metadata are date of creation and location of a text document, type-of-camera used to record a video fragment, and so on. These metadata are called descriptive data.

• Content-dependent metadata depends on the content of the original data. A special case of content-dependent metadata is content-dependent descriptive metadata , which cannot be extracted automatically from the content but is created manually: annotation is a well-known example. In contrast, content-dependent non-descriptive metadata is based directly on the contents of data.

3.4 Image metadata

Some of the image files containing metadata include Exchangeable image file format (EXIF) and Tagged Image File Format (TIFF).

Having metadata about images embedded in TIFF or EXIF files is one way of acquiring additional data about an image. Image metadata are attained through tags. Tagging pictures with subjects, related emotions, and other descriptive phrases helps Internet users find pictures easily rather than having to search through entire image collections.

A prime example of an image tagging service is Flickr, where users upload images and then describe the contents. Other patrons of the site can then search for those tags. Flickr uses a folksonomy: a free-text keyword system in which the community defines the vocabulary through use rather than through a controlled vocabulary.

Digital photography is increasingly making use of metadata tags. Photographers shooting Camera RAW file formats can use applications such as Adobe Bridge or Apple Computer's Aperture to work with camera metadata for post-processing. Users can also tag photos for organization purposes using Adobe's Extensible Metadata Platform (XMP) language, for example. [4]

3.5 Document metadata

Most programs that create documents, including Microsoft PowerPoint, Microsoft Word and other Microsoft Office products, save metadata with the document files. These metadata can contain the name of the person who created the file, the name of the person who last edited the file, how many times the file has been printed, and even how many revisions have been made on the file. Other saved material, such as document comments are also referred to as metadata.

Document Metadata is particularly important in legal environments where litigation can request this sensitive information which can include many elements of private detrimental data. This data has been linked to multiple lawsuits that have got corporations into legal complications. [4]

3.6 Digital library metadata

There are three variants of metadata that are commonly used to describe objects in a digital library:

  • descriptive - Information describing the intellectual content of the object, such as cataloguing records, finding aids or similar schemes. It is typically used for bibliographic purposes and for search and retrieval.
  • structural - Information that ties each object to others to make up logical units e.g., information that relates individual images of pages from a book to the others that make up the book.
  • administrative - Information used to manage the object or control access to it. This may include information on how it was scanned, its storage format, copyright and licensing information, and information necessary for the long-term preservation of the digital objects. [4]

Chapter IV

Text Databases

Basic text comprises of alphanumeric characters. Optical character recognition (OCR) practices are deployed to translate analog text to digital text. The most common digital representation of characters is the ASCII code. For this, seven bits are required (eight bits might be used, where in the eighth bit is reserved for a special purpose) for each character. Storage space for a text document that is required is equivalent to the number of characters. For instance, a 15 page text document consisting of about 4000 characters generally consumes 60 kilobytes.

Now days, structured text documents have become extremely popular. They comprise titles, chapters, sections, paragraphs, and so forth. A title can be presented to the user in a different format than a paragraph or a sentence. Different standards are used to encode structured information such as HTML and XML (hyper text markup language and extensible markup language)

There are different approaches like Huffman and Arithmetic Coding, which can be used for text compression, but as the storage requirements are not too high, these approaches are not as important for text as they are for multimedia data. [10]

4.1 Text Documents

A text document consists of identification and is considered to be a list of words. Likewise, a book is considered to be a document, and so is a paper in the events of a conference or a Web page. The key identification used for a book may be an ISBN number or the title of the paper together with the ISBN number of the conference event or a URL for a Web page.

Retrieval of text documents does not normally entail the presentation of the entire document, as it consumes a large amount of space as well as time. Instead, the system presents the identifications of the chosen documents mainly along with a brief description and/or rankings of the document.

4.2 Indexing

Indexing refers to the derivation of metadata from their documents and storage in an index. In a way, the index describes the content of the documents. The content can be described by terms like social or political for text documents. Also, the system utilizes the index to determine the output during retrieval.

The index can be filled up in two ways, manually as well as automatically. Assigned terms can be added to documents as a kind of annotation by professional users such as librarians. These terms can be selected often from a prescribed set of terms, the catalog. A catalog describes a certain scientific field and is composed by specialists. One of the main advantages of this technique is that the professional users are aware of the acceptable terms that can be used in query formulation. A major drawback of this technique is the amount of work that has to be performed for the manual indexing process.

Document content description can also be facilitated automatically resulting in what are termed as derived terms. One of the many steps required for this can be a step in which words in English text are identified by an algorithm and then put to lower case. Basic tools are used in other steps such as stop word removal and stemming. Stop words are words in the document which have a little meaning and most of the times include words like the and it. These stop words are erased from the document. Words are conflated to their stem in the document through stemming. As an example, the stemmer can conflate the words computer, compute and computation to the stem comput.

4.3 Query Formulation

Query formulation refers to the method of representing the information need. The resultant formal representation of information is the query. In a wider perspective, query formulation denotes the comprehensive interactive dialogue between the system and the user, leading to both a suitable query and also a better understanding by the user of the information need. It also denotes the query formulation when there are no previously retrieved documents to direct the search, thus, the formulation of the preliminary query.

It is essential to differentiate between the expert searcher and the relaxed end user. The expert searcher is aware of the document collection and the assigned terms. He/ she will use Boolean operators to create the query and will be able to adequately rephrase the same as per the output of the system. In case the result is too small, the expert searcher must expand the query, and in case if the result is too large, he/she must be able to make the query more restrictive.

The communication of the need for information to the system in natural language interests the end user. Such a statement of the need for information is termed as a request. Automatic query formulation comprises of receiving the request and generating a preliminary query by the application of algorithms that were also used for the derivation of terms. In general, the query consists of a list of query terms. This list is accepted by the system and it composes a result set. The system can formulate a successive query based on this relevant feedback.

4.4 Matching

The matching algorithm is mainly the most important part of an information retrieval system. This algorithm makes a comparison of the query against the document representations in the index. In the exact matching algorithm, a Boolean query, which is formulated by an expert searcher, defines precisely the set of documents that satisfy the query. The system generates a yes or a no decision for each document.

In the case of an inexact matching algorithm, the system delivers a ranked list of documents. Users can traverse this document list to search for the information they need. Ranked retrieval puts the documents that are relevant in the top of the ranked list, thus, saving the time the user has to invest on reading those documents. Simple but effective ranking algorithms make use of the frequency allocation of terms over documents. Ranking algorithms that are based on statistical approaches, halve the time the user has to spend on reading those documents.

Chapter V

Image Databases

Digital images can be defined as an electronic snapshot scanned from documents or taken of a scene, for example printed texts, photographs, manuscripts, and various artworks.

Digital image is modeled and mapped as a grid of dots, pixels or commonly known picture elements. A tonal value is allocated to each of these pixels, which can be black, white, and shades of gray or color. Pixel itself is symbolized in binary code of zeros and ones. Computer stores these binary digits or bits corresponding to each pixel in a sequence and are later reduced to mathematical representation by compressing them. After compression these bits are interpreted and read to generate an analog output by the computer for display or printing purposes.

Figure 5.1: As shown in this bitonal image, each pixel is assigned a tonal value, in this example 0 for black and 1 for white.

To further describe the grayscale of a pixel one needs to say that one byte is of eight bits. For a color pixel one needs three colors of one bye each, these colors are red, green and blue. So, for a rectangular screen one can compute the amount of data required for the image using the formula:

A = xyb

Where A is the number of bytes needed,

x is the number of pixels per horizontal line,

y is the number of horizontal lines, and

b is the number of bytes per pixel.

Using this formulae for a screen with value of x being 800, y being 600, and for b being 3; A=xyb thus A = 1.44 Mbyte.

Compression is required for this significant amount of data. Image compression is based on exploiting redundancy in images and properties of the human perception. Pixels in specific areas appear to be similar; this concept of similarity is called Spatial Redundancy. Human’s views of images are tolerant regarding some information error or loss, which means that the compressed image does not need to exactly represent the original image. A compressed image with some error may still allow effective communication. [8]

5.1 Image Compression Algorithms [14]

Lossless and Lossy are the two major types of image file compression algorithms being used.

The Lossless compression algorithms help reduce any given files size with no loss of quality of an image. But this algorithm usually do not compress image as small a file as a lossy method does. While choosing quality of an image over its size Lossless algorithms are used.

On the other hand Lossy compression algorithms take benefit of the natural limitations of the human eye and abandon information that cannot be seen. Most of the Lossy compression algorithms allow inconsistent levels of compressed quality. With increase in levels of compression the size of file is reduced. Once the image is compressed to the highest level, worsening in the image quality is quite noticeable. This deterioration of image file is known as Compression Artifacting.

Listed below are some of the most commonly used compression algorithms for image data:

5.1.1 Run Length Encoding (RLE)

RLE is the simplest of all the compression technique being used. RLE algorithms consist of Lossless, and generally work by searching for runs of bits, bytes, or pixels of the same value, and by encoding the length as well as the value of the run. RLE achieves for best results with images includes large areas of adjoining colour, and particularly monochrome images. For complex color images, such as photographs RLE algorithms do not compress good enough in some cases. Though RLE can increases the size of image file.

For instance, when considering a screen consisting of plain black text on a solid white background. The representation will be several long runs of white pixels in the blank space, and several short runs of black pixels within the text. To further elaborate this with a hypothetical example of single scan line, where B is representing a black pixel and W represents white:

If one apply the RLE data compression algorithm to the above hypothetical scan line, the result will be as follows:


Interpret this as twelve W’, one B, twelve W’s three B’s, etc.

There are a number of RLE variants commonly used which are encountered in the Tagged Image File Format (TIFF), PC Paintbrush Exchange (PCX) and Bitmap (BMP) graphic formats.

5.1.2 Lempel-Ziv-Welch (LZW)

Terry Welch developed the LZW compression algorithm in 1984 as a modification to the LZ78 compressor. It is a lossless technique that can be applied to any data type, but is most commonly used for image compression. LZW compression is useful for images that consist of color depths from 1-bit (monochrome) to 24-bit (True Colour).

LZW compression is used in various common graphics file formats including Tagged Image File Format (TIFF) and Graphics Interchange Format (GIF).

5.1.3 Huffman Encoding

David Huffman developed Huffman encoding in 1952. It is one of the oldest and most recognized compression algorithms. It is a lossless algorithm and is used to provide a final compression stage in many modern compression schemes, such as JPEG.

Huffman coding provides a useful way to compress data by determining the frequency of occurrence for each character. The idea behind the method is to assign bit codes of varying lengths to characters where more common characters receive a short code and less common characters receive a longer one. It is best used on images which have large amounts of data repetition. [15]

5.1.4 JPEG

The JPEG compression algorithm was introduced to develop compression techniques for transmission of color and grayscale images. It was developed in 1990 by the Joint Photographic Experts Group of the International Standards Organization (ISO) and International Telegraph and Telephone Consultative Committee (CCITT). JPEG is a lossy technique, which provides best compression rates with complex 24-bit (True Colour) images. It functions by discarding image data, which is unnoticeable to the human eye, using Discrete Cosine Transform (DCT). Then it applies Huffman encoding to achieve further compression.

JPEG compression is used in the JPEG File Interchange Format (JFIF), Still Picture Interchange File Format (SPIFF) and TIFF.

5.1.5 Fractal Compression

Fractal compression uses the mathematical principles of fractal geometry to identify redundant repeating patterns within images. These matching patterns may be identified through performing geometrical transformations, such as scaling and rotating, on elements of the image. Once identified, a repeating pattern need only be stored once, together with the information on its locations within the image and the required transformations in each case.

Fractal compression is extremely computationally intensive, although decompression is much faster. It is a lossy technique, which can achieve large compression rates. Unlike other lossy methods, higher compression does not result in pixelation of the image and, although information is still lost, this tends to be less noticeable. Fractal compression works best with complex images and high colour depths.

5.2 Common File Types [11],[19],[4]

  • JPEG (Joint Photographic Experts Group) files are a lossy format. The DOS filename extension is JPG, although other operating systems may use JPEG. Nearly all digital cameras have the option to save images in JPEG format. The JPEG format supports 8 bits per color – red, green, and blue, for 24-bit total – and produces relatively small file sizes.
  • TIFF (Tagged Image File Format) is a flexible image format that normally saves 8 or 16 bits per color – red, green and blue – for a total of 24 or 48 bits, and uses a filename extension of TIFF or TIF. TIFF can be lossy or lossless.
  • RAW refers to a family of raw image formats that are options available on some digital cameras. These formats usually use a lossless or nearly lossless compression, and produce file sizes much smaller than the TIFF formats of full-size processed images from the same cameras.
  • PNG (Portable Network Graphics) file format is regarded, and was made, as the free and open-source successor to the GIF file format. The PNG file format supports true color (16 million colors) whereas the GIF file format only allows 256 colors.
  • GIF (Graphics Interchange Format) is limited to an 8-bit palette, or 256 colors. This makes the GIF format suitable for storing graphics with relatively few colors such as simple diagrams, shapes, logos and cartoon style images. It also uses a lossless compression that is more effective when large areas have a single color, and ineffective for detailed images or dithered images.
  • BMP file format (Windows bitmap) is used internally in the Microsoft Windows operating system to handle graphics images. These files are typically not compressed, resulting in large files. The main advantage of BMP files is their wide acceptance, simplicity, and use in Windows programs.

5.3 Advantages of Digital Images

There are a number of advantages of storing two-dimensional materials in digital formats. [13]

  • Digital images do not deteriorate physically over time whereas the originals can deteriorate.
  • Digital images allow identical reproduction quality from copy to copy.
  • Digital images may be manipulated far more easily than by photographic means.
  • Digital images can easily be linked to textual descriptions and catalog records.
  • Access is greatly improved, using standard Internet technologies.

5.4 Content based Image Retrieval [16],[17],[18],[20]

Content based image retrieval (CBIR) is the application of computer vision to the image retrieval problem, i.e., the problem of searching for digital images in large databases.

"Content-based" means that the search will analyze the actual contents of the image. The term 'content' in this context might refer to colors, shapes, textures, or any other information that can be derived from the image itself. Without the ability to examine image content, searches must rely on metadata such as captions or keywords, which may be laborious or expensive to produce.

5.4.1 Query Techniques

Different implementations of CBIR make use of different types of user queries. Query by example

Query by example is a query technique that involves providing the CBIR system with an example image that it will then base its search upon. The underlying search algorithms may vary depending on the application, but result images should all share common elements with the provided example.

Ways for providing sample images to the system include:

  • The user may choose from a random set or a pre-existing image may be supplied.
  • The user may draw a rough approximation of the image he/she is looking for, for example with blobs of color or general shapes.

This query technique removes the difficulties that arise when trying to describe images with words. Other query methods

Other methods include specifying the proportions of colors desired (e.g. "80% red, 20% blue") and searching for images that contain an object given in a query image.

CBIR systems can also make use of relevance feedback, where the user progressively refines the search results by marking images in the results as "relevant", "not relevant", or "neutral" to the search query, then repeating the search with the new information.

5.4.2 Content Comparison Techniques

Described below are some common methods for extracting content from images so that they can be easily compared. The methods outlined are not specific to any particular application domain. Color

Retrieving images based on color similarity is achieved by computing a color histogram for each image that identifies the proportion of pixels within an image holding specific values (that humans express as colors). Current research is attempting to segment color proportion by region and by spatial relationship among several color regions. Examining images based on the colors they contain is one of the most widely used techniques because it does not depend on image size or orientation. Color searches will usually involve comparing color histograms, though this is not the only technique in practice. Texture

Texture measures look for visual patterns in images and how they are spatially defined. Textures are represented by texels (texture pixels), which are then placed into a number of sets, depending on how many textures are detected in the image. These sets not only define the texture, but also where in the image the texture is located. Texture is a difficult concept to represent.

The identification of specific textures in an image is achieved primarily by modeling texture as a two-dimensional gray level variation. The relative brightness of pairs of pixels is computed such that degree of contrast, regularity, coarseness and directionality may be estimated. However, the problem is in identifying patterns of co-pixel variation and associating them with particular classes of textures such as ``silky, or ``rough. Shape

Shape does not refer to the shape of an image but to the shape of a particular region that is being sought out. Shapes will often be determined first applying segmentation or edge detection to an image. In some cases accurate shape detection will require human intervention because methods like segmentation are very difficult to completely automate.

5.4.3 Potential uses of CBIR

  • Art collections
  • Photograph archives
  • Retail catalogs
  • Medical diagnosis
  • Crime prevention
  • The military
  • Intellectual property
  • Architectural and engineering design
  • Geographical information and remote sensing systems

Chapter VI

Audio Databases

Audio is caused by air pressure waves having a frequency and amplitude. When the frequency of the waves is between 20 to 20,000 Hertz, a human hears a sound. A low amplitude causes the sound to be soft.

6.1 How to digitize these pressure waveforms?

First, the air wave is transformed into an electrical signal (by a microphone). This signal is converted into discrete values by processes called sampling and quantization. Sampling causes the continuous time axis to be divided into small, fixed intervals, see Fig 6.1(b). The number of intervals per second is called the sampling rate. The determination of the amplitude of the audio signal at the beginning of a time interval is called quantization.

So the continuous audio signal is approximated by a sequence of values, see Fig 6.1(c). If the sampling rate is high enough and the quantization is precise enough, the human ear will not notice any difference between the analog and digital audio signal. The process just described is called analog-to-digital conversion (ADC); the other way around is called digital-to-analog conversion (DAC). [8]

Figure 6.1: Analog-to-digital conversion. (a) Original Analog signal; (b) Sampling pulses; (c) quantization; (d) digitized values.

6.2 Compression

Since audio data occupies a lot of space, there has long been driving force to compress it. Compression techniques are of two basic types: lossless and lossy. A lossless compression technique is one that yields a compressed signal from which the original signal can be reconstructed perfectly. No information is lost as a result of the compression. A lossy compression technique is one that discards information. The original signal cannot be reconstructed perfectly from a signal compressed by a lossy method. Some of the compression methods are listed below. [9],[23],[25]

6.2.1 VOC File Compression

This is the simplest compression technique that simply removes any silence from the entire sample. This form of compression was introduced by Creative Labs. This method analyzes the whole sample and then codes the silence into the sample using byte codes. It is similar to run-length coding.

6.2.2 Linear Predictive Coding (LPC) and Code Excited Linear Predictor (CELP)

This was an early development in audio compression that was used primarily for speech. A Linear Predictive Coding (LPC) encoder compares speech to an analytical model of the vocal tract, then discards the speech and stores the parameters of the best-fit model. The output quality was poor and was often compared to computer speech and thus is not used much today.

A later development, Code Excited Linear Predictor (CELP), increased the complexity of the speech model further, while allowing for greater compression due to faster computers, and produced much better results. Sound quality improved, while the compression ratio increased. The algorithm compares speech with an analytical model of the vocal tract and computes the errors between the original speech and the model. It transmits both model parameters and a very compressed representation of the errors.

6.2.3 Adaptive Differential Pulse Code Modulation (ADPCM)

This process is a simple conversion based on the notion that the changes between samples will not be very large. The first sample value is stored as a whole, and then each successive value describes that the wave will change by +/- 8 levels, which uses only 4 instead of 16 bits. Hence, a 4:1 compression ratio is achieved with less loss as the sampling frequency increases. Due to its simplicity, wide acceptance, and high level of compression, this method is widely used.

6.2.4 MPEG for Audio [21],[22]

The Motion Picture Experts Group (MPEG) audio compression algorithm is an International Organization for Standardization (ISO) standard for high fidelity audio compressions. It is one of a three-part compression standard, the other two being video and system. The MPEG compression is lossy, but nevertheless can achieve lossless compression.

MPEG compression is based on psychoacoustic theory. The principle behind this is: if the listener cannot hear the sound, then it need not be coded.  Human hearing is quite sensitive, but making out differences in a large collection of sounds is difficult. The phenomenon where a strong signal covers the sound of the softer signal so that the human ear cannot hear the softer one is known as masking. MPEG compression uses masking as the basis for compressing the audio data.

In addition to encoding a single signal, the MPEG compression supports one or two audio channels in one of four modes:

1) Monophonic

2) Dual Monophonic -- two independent channels

3) Stereo -- for stereo channels that share bits, but not using joint-stereo coding

4) Joint - stereo -- takes advantage of the correlations between stereo channels

The MPEG method allows for a compression ratio of up to 6:1. Under optimal listening conditions, expert listeners could not distinguish the coded and original audio clips. Thus, although this technique is lossy, it still produces accurate representations of the original audio signal.

6.3 Common File Types [4]

  • Wav - standard audio file container format used mainly in Windows PCs. Commonly used for storing uncompressed, CD quality sound files, which means that they can be large in size. Wave files can also contain data encoded with a variety of codecs to reduce the file size (for example the GSM or mp3 codecs).
  • Ogg - a free, open source container format supporting a variety of codecs, the most popular of which is the audio codec Vorbis. Vorbis offers better compression than MP3 but is less popular.
  • Raw - a raw file can contain audio in any codec but is usually used with PCM audio data. It is rarely used except for technical tests.
  • Au - the standard audio file format used by Sun, Unix and Java. The audio in au files can be PCM or compressed with the μ-law, a-μ law or G729 codecs.
  • Aac - the Advanced Audio Coding format is based on the MPEG-2 and MPEG-4 standards,
  • Mp4/M4a - MPEG-4 audio; most often AAC but sometimes MP3
  • Mp3 - the MPEG Layer-3 format is the most popular format for downloading and storing music. By eliminating portions of the audio file that are essentially inaudible, mp3 files are compressed to roughly one-tenth the size of an equivalent PCM file while maintaining good audio quality.
  • Wma - the popular Windows Media Audio format owned by Microsoft. Designed with Digital Rights Management (DRM) abilities for copy protection.
  • Ra - a Real Audio format designed for streaming audio over the Internet. The .ra format allows files to be stored in a self-contained fashion on a computer, with all of the audio data contained inside the file itself.

6.4 Content based Audio Retrieval

As compared with the content-based image and video retrieval, content-based audio retrieval provides a special challenge because raw digital audio data is a featureless collection of bytes with the most elementary fields attached such as name, file format, sampling rate, which does not readily allow content-based retrieval.

Current content-based audio-retrieval methods are based on content-based image retrieval methods. Major procedures are: [1]

  • A feature vector is constructed by extracting acoustic and subjective features from the audio in the database.
  • The same features are extracted from the queries.
  • The relevant audio in the database is ranked according to the feature match between the query and the database.

6.4.1 Audio Feature Extraction

There are two categories used to characterize the audio signal. [23],[26]

  • Acoustic Features
  • Subjective/Semantic Features Acoustic Features

Acoustic features describe an audio in terms of commonly understood acoustical characteristic, and can be computed directly from the audio file. Major acoustic features include:

  • Loudness
  • Spectrum Powers
  • Brightness
  • Bandwidth
  • Pitch Subjective/Semantic Features

Subjective features describe sounds using personal descriptive language. The system must be trained to understand the meaning of these descriptive terms.

Semantic features are high-level features that are summarized from the low-level features. Compared with low-level features, they are more accurate to reflect the characteristics of audio content.

Major Subjective/Semantic Features

  • Timbre
  • Rhythm
  • Events
  • Instruments

6.4.2 Content based Audio Segmentation

  • It is important to segment an audio stream into different semantic parts, such as speech, music, silence, and environment sounds.
  • Extracting the features from each segment of the audio stream and applying classification methods to obtain the audio scene achieves segmentation.

Chapter VII

Video Databases

A digital video consists of a sequence of frames or images that have to be presented at a fixed rate. Digital videos can be obtained by digitizing analog videos or directly by digital cameras. Playing a video at a rate of 25 frames per second gives the user the illusion of a continuous view. It takes a huge amount of data to represent a video. So compression is a must in the case of videos.

7.1 Need for Digital Video [23]

  • Ease of manipulation - The difference between analog and digital is like comparing a typewriter with a word processor. Just like the cut and paste function is much easier and faster with a word processor, editing is easier and faster with a digital video. Also, many effects that were exclusive for specialized postproduction houses are now easily achieved by bringing in files from Photoshop, Flash, and Sound Edit as components in a video mix. In addition, the ability to separate sound from image enables editing one without affecting the other.
  • Preservation of data - It is not true that digital video is better simply because it is digital. Big screen films are not digital and are still highly esteemed as quality images. However, it is easier to maintain the quality of a digital video. Traditional tapes are subject to wear and tear more so than DVD or hard drive disks. Also, once done, a digital video can be copied over and over without losing its original information. Analog signals can be easily distorted and will lose much of the original data after a few transfers.
  • Internet - A digital video can be sent via the Internet to countless end users without having to make a copy for every viewer. It is easy to store, retrieve, and publish.

7.2 Digital Video Compression Algorithms

There are two types of compression, “lossless” and “lossy”. The lossless compression retains the original data so that the individual image sequences remain the same. It saves space by removing image areas that use the same color. The compression rate is usually no better than 3:1. The low rate makes most lossless compression less desirable. The “lossy” compression methods remove image and sound information that is unlikely to be noticed by the viewer. Some information is lost, but since it is not differentiated by the human perception, the quality perceived is still the same, while the volume is dramatically decreased.

At its most basic level, compression is performed when an input video stream is analyzed and information that is indiscernible to the viewer is discarded. Each event is then assigned a code - commonly occurring events are assigned few bits and rare events will have more bits. These steps are commonly called signal analysis, quantization and variable length encoding respectively. There are four methods for compression, discrete cosine transform, vector quantization, fractal compression, and discrete wavelet transform. [1], [24]

7.2.1 Discrete Cosine Transform (DCT)

Discrete cosine transform is a lossy compression algorithm that samples an image at regular intervals, analyzes the frequency components present in the sample, and discards those frequencies which do not affect the image as the human eye perceives it. DCT is the basis of standards such as JPEG, MPEG, H.261, and H.263.

7.2.2 Vector Quantization (VQ)

Vector quantization is a lossy compression that looks at an array of data, instead of individual values. It can then generalize what it sees, compressing redundant data, while at the same time retaining the desired object or data stream's original intent.

7.2.3 Fractal Compression

Fractal compression is a form of VQ and is also a lossy compression. Compression is performed by locating self-similar sections of an image, then using a fractal algorithm to generate the sections.

7.2.4 Discrete Wavelet Transform (DWT)

Like DCT, discrete wavelet transform mathematically transforms an image into frequency components. The process is performed on the entire image, which differs from the other methods (DCT) that work on smaller pieces of the desired data. The result is a hierarchical representation of an image, where each layer represents a frequency band.

7.3 Compression Standards [21], [24]

7.3.1 MPEG

Moving Picture Experts Group or MPEG is an ISO/IEC working group whose job is to develop audio and video encoding standards. As of now, four MPEG standards are being used and one is under development. Every standard has been designed for a specific bit rate and application. See Appendix Afor details.

7.3.2 AVI

AVI stands for Audio Video Interlaced. It is one of the oldest formats. It was created by Microsoft to go with Windows 3.1 and it’s “Video for Windows” application. Even though it is widely used due to the number of editing systems and software that use AVI by default, this format has many restrictions, specially the compatibility with operations systems and other interface boards.

7.3.3 MOV

MOV format, created by Macintosh, is the proprietary format of the QuickTime application. It can also run on PCs. Being able to store both video and sound simultaneously, the format was once superior to AVI. The latest version of QuickTime also has streaming capabilities for Internet video. However, with the new MPEG-2 format, the MOV format started to lose its popularity, until it was decided that the MPEG-4 is to use the QuickTime format as the basis of its standards.

7.3.4 DivX

DivX is a software that uses the MPEG-4 standard to compress digital video, so it can be downloaded over a DSL/cable modem connection in a relatively short time with no reduced visual quality. The latest version of the codec, DivX 4.0, is being developed jointly by DivXNetworks and the open source community. DivX works on Windows 98, ME, 2000, CE, Mac and Linux.

7.4 Context based video indexing and retrieval

There are four main processes involved in content-based video indexing and retrieval: video content analysis, video structure parsing, summarization or abstraction, and indexing. Each process poses many challenges. [8],[27],[28]

7.4.1 Video Content Analysis

The main problem in video content analysis is that we cannot easily map extractable visual features (such as color, texture, shape, structure, layout, and motion) into semantic concepts (such as indoor and outdoor, people, or car-racing scenes). Although visual content is a major source of information in a video, valuable information is also carried in other media components, such as text (superimposed on the images, or included as closed captions), audio, and speech that accompany the pictorial component. A combined and cooperative analysis of these components would be far more effective in characterizing the video for both consumer and professional applications.

7.4.2 Video Structure Parsing

An important step in the process of video structure parsing is that of segmenting the video into individual scenes. From a narrative point of view, a scene consists of a series of consecutive shots grouped together because they were filmed in the same location or because they share some thematic content. The process of detecting these video scenes is analogous to paragraphing in text document parsing, but it requires a higher level of content analysis. In contrast, shots are actual physical basic layers in video, whose boundaries are determined by editing points or where the camera switches on or off.

Fortunately, analogous to words or sentences in text documents, shots are a good choice as the basic unit for video content indexing, and they provide the basis for constructing a table of contents for video. Shot boundary detection algorithms that rely only on visual information contained in the video frames can segment the video into frames with similar visual contents. Grouping the shots into semantically meaningful segments such as stories, however, usually is not possible without incorporating information from the other components of the video. Multimodal processing algorithms involving the processing of not only the video frames, but also the text, audio, and speech components that accompany them have proven effective in achieving this goal.

7.4.3 Video Summarization

Video summarization is the process of creating a presentation of visual information about the structure of video, which should be much shorter than the original video. This abstraction process is similar to extraction of keywords or summaries in text document processing. That is, we need to extract a subset of video data from the original video such as key frames or highlights as entries for shots, scenes, or stories. Abstraction is especially important given the vast amount of data even for a video of a few minutes’ duration. The result forms the basis not only for video content representation but also for content-based video browsing. Combining the structure information extracted from video parsing and the key frames extracted in video abstraction, we can build a visual table of contents for a video.

7.4.3 Video Indexing

The structural and content attributes found in content analysis, video parsing, and abstraction processes, or the attributes that are entered manually, are often referred to as metadata. Based on these attributes, we can build video indices and the table of contents through, for instance, a clustering process that classifies sequences or shots into different visual categories or an indexing structure. As in many other information systems, we need schemes and tools to use the indices and content metadata to query, search, and browse large video databases. Researchers have developed numerous schemes and tools for video indexing and query. However, robust and effective tools tested by thorough experimental evaluation with large data sets are still lacking. Therefore, in the majority of cases, retrieving or searching video databases by keywords or phrases will be the mode of operation.

Chapter VIII

Multimedia Databases

8.1 Introduction

Multimedia data basically means digital audio, video, images, animations and graphics together with text data. In the recent past, the acquisition, generation, storage and processing of multimedia data in computers and its transmission over networks have grown tremendously.

This astounding growth is made possible by three factors. To begin with, personal computer usage is becoming widespread and their computational power is increasing. Also, due to technological advancements, high-resolution devices, which can capture and display multimedia data and high-density storage devices have been developed. Secondly high-speed data communication networks are available these days. The Web has widely proliferated and software for manipulating multimedia data is now available. Lastly, some specific existing and future applications need to live with multimedia data. This trend is expected to go up in the days to come.

Multimedia data has been blessed with a number of exhilarating features. They can provide very effective dissemination of information in science, engineering, medicine, modern biology, and social sciences. It also facilitates the development of new paradigms in distance learning, and interactive personal and group entertainment.

Multimedia Databases (MMDBs) have to cope up with the increased usage of a large volume of multimedia data being used in various software applications. The applications include digital libraries, manufacturing and retailing, art and entertainment, journalism and so on. Some inherent qualities of multimedia data have both direct and indirect influence on the design and development of a multimedia database. MMDBs are supposed to provide almost all the functionalities, a traditional database provides.

Apart from those, a MMDB has to provide some new and enhanced functionalities and features. MMDBs are required to provide unified frameworks for storing, processing, retrieving, transmitting and presenting a variety of media data types in a wide variety of formats. At the same time, they must adhere to numerical constraints that are normally not found in traditional databases.

8.2 Why do we need Multimedia Databases?

The following points will justify the need of multimedia databases: [31]

  • Multimedia Database is capable of handling large volume of multimedia objects which a general database fails to do effectively;
  • Multimedia Database will help to create virtual museum;
  • It will surely help to develop multimedia applications in various fields like teaching, medical sciences and libraries;
  • Preserving decaying photographs, maps, films having got historical evidence or national importance;
  • Using multimedia database, we can develop the excellent teaching packages;
  • Helps multi-user operations.

8.3 Types of Multimedia Databases

There are basically two types of multimedia databases: Linked Multimedia

Databases and Embedded Multimedia Databases. [31]

8.3.1 Linked Multimedia Databases

Multimedia database can be organized as a database of metadata. This metadata links to the actual data such as graphic, image, animation, audio, sound etc. This data may be stored on Hard Disc, CD-ROM, DVD or Online. In this database, multimedia elements are organized as image, audio/ MP3, video etc.

In this multimedia database system, all data may be stored either off-line i.e. CD-ROM, Hard Disc, DVD etc. or Online. One great advantage of this type of database is that the size of database will be small due to the reason that multimedia elements are not embedded in the database, but only linked to it.

Figure 8.1: Multimedia Linked Meta Database

8.3.2 Embedded Multimedia Databases

Embedded Multimedia Database implies that the database itself contains the multimedia objects as in the binary form in the database. The main advantage of such kind of database is that retrieval of data will be faster because of the reduced data access time. However, the size of the database will be very large.

8.4 Multimedia Database Content

As described earlier, MMDBs generally hold the following multimedia components.

8.4.1 Text

Text is used in multimedia applications to describe multimedia data. When a piece of information fails to convey to others using other multimedia elements, text is mandatory. Text should only be used for cases where it eliminates potential information ambiguity.

8.4.2 Speech

Speech is a continuous concept. Speech can introduce, give survey, stimulate and tell. Speech is ideal as an additional explanation of text.

8.4.3 Graphics

Graphic is a powerful multimedia component. The real strength of graphics is to maintain context. Graphics are discrete concepts. The user himself determines viewing moments and duration. In this way, graphics are very suitable for individual studying and analyzing connections. Graphics provide more interpretation than an image and can be used for the support of mental model.

8.4.4 Image

An image is very much related to its contents by its photorealistic representation. User’s mood can be influenced by images. In such a case, the combination of image and sound is very much effective.

8.4.5 Animation

Animation is also a multimedia component. It can be defined as the change in the characteristics of an object over a period of time. Animation files require more storage space than graphic files involving single image.

8.4.6 Sound

Sound as music or speech has a power to invoke emotions. Music can stimulate moods positively in relaxation of mind and body, whereas sound as noise helps to irritate people. The combination of sound with animation will have a realistic effect on users.

8.4.7 Video

The most powerful of all the multimedia components is video. It helps to represent the real world events. It also helps to grasp the more delicate and complicated ideas into minds.

8.5 Structure of a Multimedia Database

Multimedia database structure can best be explained with the following components: [31]

  • Data Analysis
  • Data Modeling
  • Data Storage
  • Data Retrieval
  • Query Language
  • Multimedia Communication

8.5.1 Data Analysis

Data can be stored in the database in either unstructured form or structured form. Unstructured data are represented in a unit where the content cannot be retrieved by accessing any structured details. Structured data are stored in variables, fields or attributes with corresponding values. Multimedia data can be stored in database as raw, registering and descriptive data types. Raw data are generally represented by pixels in the form of a bytes and bits.

8.5.2 Data Modeling

Data model concentrates on conceptual design of the multimedia database in order to carry out certain operations like media object selection, insertion, querying and retrieval etc. Time-based multimedia like video, audio and animation involve notions of data flow, timing, temporal composition and synchronization. These notions are quite different from conventional data like textual data flow. One of the biggest problems of multimedia database is the description of the structure of time constraint media for querying, updating, retrieval and presentation.

8.5.3 Data Storage

Multimedia objects are stored in the database. These are of 2 types – non-continuous media such as static media like text, and images; and continuous media such as dynamic media. Continuous media data has the real time property while non-continuous data does not. Hence, storage mechanism will be different for these types of data. Most of the continuous media data are stored using separate storage server to meet the real time constraint requirements. Non-continuous data are stored in the database with meta-information about the files. In general, data can be stored either in Hard Disc, CD-ROM, DVD or Online.

8.5.4 Data Retrieval

The main objective of a multimedia database is how to access multimedia information effectively. With respect to access, multimedia objects can be classified into two - active and passive objects. The objects that participate in the retrieval process are called active objects. Likewise, the objects that are not participating in the retrieval process are called passive objects. In a real multimedia database environment all objects should be active objects.

8.5.5 Query Language

In order to retrieve multimedia data from a database system, query language is provided. In a DBMS process, user queries are processed by defining a query language as part of DBMS. It is an inseparable part of DBMS. A multimedia query language must have the ability to handle complex, spatial, and temporal relationships. A powerful query language should have to deal with keywords, index to keywords and contents of multimedia objects. Traditional DBMS deals with exact match query.

Generally, there are two types of queries used in databases. They are well-defined query and fuzzy query. In a well-defined query, the user must know what they intend to search. The second one is called fuzzy where the properties of query objects are ambiguous. In such a situation, multimedia data queries can be divided into the sub-groups like keyword querying, semantic querying, and visual querying. Keyword querying is still popular because of its simplicity. Semantic query is the most difficult query method in terms of its indexing and pattern matching. Visual querying is used in QBIC (Query By Image Context) through icon leading to content search in the domain of image.

8.5.6 Multimedia Communication

Communication is the sole objectives of any information system. Distributed Multimedia Systems with sophisticated features are capable of satisfying multi- users environment allowing more than one users to communicate at each other simultaneously.

8.6 How to create a Multimedia Database?

There are few steps required to create a Multimedia Database as described below:

Step-1: Take various multimedia elements as described in section 8.4

Step-2: Digitize multimedia materials, which are not in digital format acceptable for storing in the computer. Here different file formats are required to be maintained multimedia documents. In the case of music the common file formats are: WAV, MIDI, MP3, AIFF, RA etc. Movie file formats may be of: AVI, MPEG, ASF, QT etc. Similarly, for Sound we know that the common formats are: WAV, MIDI, MP3, RA, ASF, WMA, CDA etc.

Step-3: After that, it is required to classify, catalog and index the digitized multimedia elements. These steps are also the same as in general database but in case of animation or image, the classification is difficult. For example, a multimedia data categorized as a graphic image could be a graphic image containing 256 colors, or a graphic containing millions of color.

Step-4: The final step is to input descriptive text pertaining to the multimedia data into the RDBMS (Relational Database Management System). A standard query language like SQL is required to retrieve information from the database.

Chapter IX

Digital Libraries

9.1 Introduction

A digital library (DL) is a library in which collections are stored in digital formats (as opposed to print, microform, or other media) and accessible by computers. It is an integrated set of services for capturing, cataloging, storing, searching, protecting, and retrieving information, which provide coherent organization and convenient access to typically large amounts of digital information.

9.2 Why use digital libraries?

Some of the potential benefits of digital libraries are:

  • The digital library brings the library to the user

A digital library brings the information to the user's desk, either at work or at home, making it easier to use and hence increasing its usage. With a digital library on the desktop, a user does not need to visit a library building. The library is wherever there is a personal computer and a network connection.

  • Computing power is used for browsing and searching

Computer power can be used to find information. Paper documents are convenient to read, but finding information that is stored on paper can be difficult. In most aspects, computer systems are already better than manual methods for finding information. Computers are particularly useful for reference work that involves repeated leaps from one source of information to another.

  • Information can be shared

Libraries and archives contain lots of information that is unique. Placing digital information on a network makes it available to everybody. Many digital libraries or electronic publications are maintained at a single central site, perhaps with a few mirror images or duplicate copies strategically placed around the world. This is a vast improvement over expensive physical duplication of little used material.

  • Information is easier to update

Much important information needs to be updated continuously. Printed materials are difficult to update since the entire document must be reprinted and all copies of the old version must be tracked down and replaced. Keeping information current is much less of a problem when the definitive version is in digital format and stored on a central computer.

Many libraries provide the text of reference works, such as directories or encyclopedias online. Whenever revisions are received from the publisher, they are installed on the library's computer. The new versions are available immediately.

  • The information is always available

The doors of the digital library never close. Materials are never checked out to other readers, miss-shelved or stolen; they are never in an off-campus warehouse. The scope of the collections expands beyond the walls of the library.

Digital libraries are not perfect. Computer systems can fail and networks may be slow or unreliable, but, compared with a traditional library, information is much more likely to be available when and where the user wants it.

9.3 Principles for Digital Library Design

The following principles guide the development of the architecture. [35],[36]

a. Service driven

The architecture for DL must be driven by the services it provides and tools required for delivering the service.

b. Open architecture

The architecture must be open, extensible and support interoperability among heterogeneous, distributed systems.

c. Scalability

The architecture must be robust, scalable and reliable in a high transaction rate production setting thousands of patrons with a wide variety of backgrounds and information needs.

d. Preservation

The architecture must ensure persistent access to collection of the DL, addressing such issues as naming, digital archiving and digital preservation.

e. Privacy

The architecture must be sensitive to privacy issues and support both anonymous and customized access to resources.

f. Practicality

The architecture should represent a flexible and practical approach to standards, recognizing the need to balance the level of information collection with economic constraints.

g. Modularity

The architecture should represent a mix of new technology and legacy pieces, all of which must inter operate while involving at different rates.

h. Time frame

The time frame required to plan for system migrations in the next year as well as planning for a technology generation framework should be approximately 3 to 5 years.

i. Client support

The architecture should support a base line level of services, which can be accessed with common desktop configuration and software. Certain higher-level services may require proprietary clients but DL tool and services group should determine the support of these clients.

9.4 Components of a Digital Library

Digital library framework permits many different computer systems to coexist. The key components are shown in the figure below. They run on a variety of computer systems connected by a computer network, such as the Internet. [32]

Figure 9.1: Major System Components

9.4.1. User Interfaces

We have to use two user interfaces: one for the end-users of the digital library, the other for digital librarians and system administrators who manage the collections. Each user interface is in two parts. A standard Internet browser is used for the actual interactions with the user. This can be Netscape Navigator, Microsoft's Internet Explorer.

The browser connects to client services, which provides intermediary functions between the browser and the other parts of the system. The client services allow the user to decide where to search and what to retrieve; they interpret information structured as digital objects; they negotiate terms and conditions, manage relationships between digital objects, remember the state of the interaction, and convert among the protocols used by the various parts of the system.

9.4.2. Repository

Repositories store and manage digital objects and other information. A large digital library may have many repositories of various types, including modern repositories, legacy databases, and Web servers. The interface to this repository is called the repository access protocol (RAP). Features of RAP are explicit recognition of rights and permissions that need to be satisfied before a client can access a digital object, support for a very general range of dissemination of digital objects, and an open architecture with well defined interfaces.

9.4.3. Handle System

Handles are general-purpose identifiers that can be used to identify Internet resources, such as digital objects, over long periods of time and to manage materials stored in any repository or database. When used with the repository, the handle system receives as input a handle for a digital object and returns the identifier of the repository where the object is stored.

9.4.4. Search System

The design of the digital library system assumes that there will be many indexes and catalogs that can be searched to discover information before retrieving it from a repository. These indexes may be independently managed and support a wide range of protocols.

9.5 Digital Library Architecture

An architectural approach to digital libraries can be discussed under the following points. [33],[36]

9.5.1 Notional Architecture

At notional level, data and metadata and meta-object are considered. Data are library materials in the traditional libraries where as digital library deals with digital information or data and metadata is data about object in the digital library. The traditional card record is an example of metadata for traditional library. A meta-object is an object that provides references to a set of digital objects.

In its simplest form, a meta-object is a list of handles of other digital objects. For example, a meta-object for an anthology is a digital object that lists all the poems. An important example of a meta-object is a digital object that lists all converted versions of a specific physical item. Digital objects are kept for defining the metadata (Data about data). The designing of metadata is important for searching and retrieval of information.

Most of the integrated library automation software takes care of the process of defining metadata. The metadata are entered in fields. This software indexes all the fields according to the requirement of users and the system administrators.

Figure 9.2: Notional level

9.5.2 Operational System

At operational architecture level, it is important how information flow is manage through the system's components. Digital library will be a collection of disparate systems and resources connected through a network, and integrated within one interface, most likely a Web interface or one of its descendants. These resources may reside on different systems and in different databases, they would appear as though they were one single system to the users of a particular community. So for both contemporaneous and retrospective search and retrieval of information, the digital library service must provide information interoperability in middleware. And for this some common standards will be needed which will facilitate cross- domain searches and retrieval.

Figure 9.3: Operational level

9.5.3 Technical System

At technical level we have to think about the functional component. The metadata is for content and is added to the digital library. It provides information about the content. So, Metadata and data must be bound together logically, and there must be a robust underlying technology to manage the logical connection through time, across platforms, and over geographical separations, all on a networked, distributed system. It describes major functional areas that taken together provide necessary components to build robust, scalable and interoperable digital library applications and services with the resulting digital objects.

Functional Components:

  • Hardware (Servers, PCs (Clients), Modems, Storage devices, Book Scanner, CD/DVD Writers and digital camera, Video digitizer, UPS backup etc)
  • Software (OCR, Linux/Solaris, MS Windows (Windows NT, Windows 95, Windows 98 etc), ORACLE, publishing Software, Search Engines etc.)
  • Digital Resources (CDs, E-journals, Scientific & Technical journals like, IEEE, ACM, ACS etc)
  • Conversion of Materials to digital format with proper licensing agreements
  • High Speed Internet connectivity to broadband backbone
  • Miscellaneous expenditure

Figure 9.4: Technical Architecture

9.5.4 System Architecture

The system architecture is rationalized relative to the operational and technical architecture. It is desirable, to concern, system properties such as scalability and extensibility can be taken into account at the system architecture level. At this level whole digital library system is kept in mind. It can be said that DL is a centralized subsystem that interacts with variety of data producers and customers within a complex distributed system.

Figure 9.5: System Architecture

9.6 Design Considerations of a Digital Library in a University Environment

Libraries form a vital part of the world's system of education and information storage and retrieval. They make available through books, films, recordings, and other media, knowledge that has been accumulated through the ages. People in all walks of life- including students, teachers, business executives, government officials, scholars, and scientists use library resources for their research. Large numbers of people also turn to library to satisfy a desire for knowledge or to obtain material for some kind of leisure-time activity. In addition, many people enjoy book discussions, and other activities that are provided by their Libraries. [41]

9.6.1 Goal

  • Students and faculty, both can access the digital library from all over the campus as long as they are connected to the network.
  • To provide a very user-friendly interface, giving clickable access to its resources.
  • Easy access to theses and dissertations.
  • Easily locate both digitized and physical versions of scholarly articles and books.
  • Optimize searches; simultaneously search the Internet, commercial databases, and library collections.
  • To give efficient access to worldwide information directly to the faculty and research scholar’s desktop.

9.6.2 Methods of digitization

Methods to be adopted depend on the demand the library faces from the users. This is a critical area where a number of decisions need to be taken involving several issues. The methods include:

i. Selection of material i.e. which documents should be digitized

ii. Choice of Technology i.e. which medium and what kind of equipment should be used

iii. How should the information be represented and how will it be stored

iv. What will be the access mechanism to make a document available

The first step is concerned with the printed material, which is already available in the library. The transformation of a traditional document into an electronic document by means of digitizing must always take account of storage and accessibility. Before a title is selected for digitizing it must be determined what advantages the process will bring to the library users.

Digitizing by scanning can mean a reduction of information. The high quality of an image must be assured. Various coding systems can be employed, mainly ASCII, for the texts. The technical requirements for the availability of electronic documents have an impact on investment in equipment and training of personnel. Even though electronic access means greater efficiency and wider geographic ranges, costs should not be underestimated. The problem that is critical is copyright area. It may be good idea to involve the publishers in the project planning at an early stage. [34],[38]

By achieving and making available the printed publications in electronic format the libraries can start building a digital library apart from acquiring the material in digital format. This will help in building a technical infrastructure for the information society, which is depending largely on the print materials. The accessibility and ease of use can be improved substantially through the regulation of storage and access processes.

9.6.3 Key Components of the proposed Digital Library

  • Initial conversion of content from physical to digital form e.g. Electronic Theses and Dissertations (ETDs) - Digital conversion of Dissertations and Theses and providing access to full text (PDF as well as HTML formats) as well as browsing facility for the same.
  • Storage of digital content in an appropriate repository system.
  • 2 mbps high speed leased line from Internet Service Provider.
  • Client services for the browser including repository querying.
  • Digital content Delivery through file transfer or streaming media.
  • Private/Public network.
  • User’s access through a browser.

Figure 9.6: Proposed Digital Library Design

Chapter X


We saw how multimedia data can be stored using advanced data structures and how metadata can help in making the search and retrieval process easy. This report presented an approach on how we can store images using quad-trees and can be searched then. These data structures help us to extract features from data like images and video so that we can perform content-based queries.

We also discussed the advantages and disadvantages of these structures. This report also conveyed the need for metadata, types of metadata and metadata in the lifecycle of multimedia objects. It presented an in-depth study of image, video and audio databases discussing the compression techniques and standards and the common file formats for these multimedia data types.

This thesis throws light on the need, the types and the structure of multimedia databases and discusses their architecture. Digital Libraries have also been discussed. The basic principles involved in their design have been studied. In the end, a design for a digital library in a University environment has been proposed.

Future Scope

In the present day, the use of multimedia data in various fields has been growing phenomenally, be it image or audio databases for the police to catch criminals or image databases for the medical world or even video database application like where people can stream videos. Multimedia databases are vital for effective use and efficient management of huge amounts of data. The variety of applications using multimedia data, the rapidly changing technology, and the intrinsic complexities in the semantic representation, comparison and interpretation for similarity pose many challenges. Multimedia databases are still in their formative years. Today's multimedia databases are closely bound to narrow application areas. The experience gained from developing and using new multimedia applications will help advance the multimedia database technology.

Digital Libraries are the future. The future digital libraries will no more be only the digital equivalent of the physical libraries. They will give access to a huge variety of multimedia and multimedia documents created by integrating content from various sources that range from text, images, audio-video repositories, scientific data archives, databases, and program repositories.

The digital library will provide a faultless environment where the cooperative access, filtering, manipulation, generation, and preservation of these documents will be supported. Either by themselves or in collaborations with other users, the users of the library will be both consumers and producers of information. Policy ensuring mechanisms will ensure that the information produced is visible only to the people who have appropriate rights to do so. The realization of this new vision requires the provision of new technologies and for that, research is going on at a large scale.



Moving Picture Experts Group or MPEG is an ISO/IEC working group whose job is to develop audio and video encoding standards. As of now, four MPEG standards are being used and one is under development. Every standard has been designed for a specific bit rate and application. [21], [22], [24]


MPEG-1 is the standard for compression of audio and moving pictures which has been designed for up to 1.5 Mbit/sec. Transmitted as .mpg files, this standard is very popular over the internet for videos. Furthermore, MP3, level 3 of MPEG-1, is an immensely popular standard for audio compression. MPEG-1 is Video CD compression standard, the most popular video distribution format across Asia.


DVD Compression and Digital Television set top boxes are based on MPEG-2 standard. It has been designed to handle between 1.5 and 15 Mbit/sec. Though based on MPEG-1, its design is for the compression and transmission of digital broadcast television. The most important improvement from MPEG-1 is its ability to proficiently compress interlaced video. MPEG-2 scales well to HDTV bit rates and resolution, bypassing the need for an MPEG-3.


MPEG-4 is the standard for multimedia and Web compression. It is based on object-based compression, analogous in nature to VRML (Virtual Reality Modeling Language). In order to create an MPEG4 file, individual objects in a scene are tracked and compressed together. This results in a compression that is very efficient, scalable, from low bit rates to very high. It also gives developers independent control of objects in a scene, and hence introduces interactivity.


MPEG-7 is known as Multimedia Content Description Interface. It makes available a framework for multimedia content that includes information on filtering, content manipulation and personalization, as well as the security and integrity of the content. Differing from the previous MPEG standards, which describe actual content, MPEG-7 gives information about the content.


This standard is also known as the Multimedia Framework. Work on it is still going on. MPEG-21 will describe the elements required to build an infrastructure for the consumption and delivery of multimedia content, and how they will relate to each other.


[1] V.S. Subrahmanian, Principles of Multimedia Database System, Morgan Kaufmann Publishers, 1998.

[2] Sherry Marcus, V.S. Subrahmanian, Foundation of Multimedia Database System, May 1996, Vol. 43, Journal of ACM.

[3] Peter van Oosterom, Geographical Information Systems: Principles, Techniques, Applications and Management, Wiley 1999, Vol. 1, pp. 385-400.

[4] Encyclopedia, [Online]. Available:

[5] Antonm Guttman, R-trees A dynamic index structure for spatial database Proceedings, ACM SIGMOD Conference, pp.47-57, Boston, MA, 1984.

[6] Jane Greenberg, Metadata Extraction and Harvesting: A Comparison of Two Automatic Metadata Generation Applications, Journal of Internet Cataloging, 6(4): 59-82.

[7] “Picture Representation Using Quad-Trees”, [Online]. Available:

[8] Henk M. Blanken, Arjen P. de Vries, Henk Earnst Blok and Ling Feng, Multimedia Retrieval, Springer 2007.

[9] “Audio Data”, [Online]. Available:

[10] Alistair Moffat, Justin Zobel, Neil Sharman, Text Compression for Dynamic Databases, March 1994.

[11] “Common Image File Formats”, [Online]. Available:

[12] “Digital Imaging Tutorial”, [Online]. Available:

[13] Ester, Michael. Digital Image Collections: Issues and Practice, Washington D.C. Commission on Preservation and Access, 1996.

[14] Adrian Brown, Image Compression, The National Archives, July 2003, DPGN-05.

[15] “Huffman Encoding – Image Compression”, [Online]. Available:

[16] Michael S. Lew, Nicu Sebe, Chabane Djeraba, Ramesh Jain, Content-based Multimedia Information Retrieval: State of the Art and Challenges, ACM Transactions on Multimedia Computing, Communications, and Applications, Feb. 2006.

[17] Abby A. Goodrum, Image Information Retrieval: An Overview Of Current Research, Drexel University, Volume 3 No. 2, 2000.

[18] Yong Rui, Thomas S. Huang, Shih-Fu Chang, Image Retrieval: Current Techniques, Promising Directions, and Open Issues, Journal of Visual Communication and Image Representation 10, 39–62, 1999.

[19] “Digital Imaging Best Practices”, Western States Digital Standards Group, January 2003.

[20] Chia-Hung Wei, Chang-Tsun Li, Design of Content-based Multimedia Retrieval, Department of Computer Science, University of Warwick.

[21] “Moving Picture Experts Group”, [Online]. Available:

[22] “The MPEG Homepage”, [Online]. Available:

[23] Meinard Muller, Information Retrieval for Music and Motion, Springer, 2007.

[24] “Video Compression Tutorial”, [Online]. Available:

[25] “Audio Compression Techniques” by Rusty Nejdl, [Online]. Available:

[26] Colin R. Buchanan, Semantic-based Audio Recognition and Retrieval, 2005.

[27] “Video Retrieval and Summarization”, [Online]. Available:

[28] Volker Roth, Content-based Retrieval from Digital Video, Image and Vision Computing Journal, 1999.

[29] “Multimedia Databases: Issues and Advances”, [Online]. Available:

[30] G.W.Hiddink, Educational Multimedia Databases, Twente University Press, 2001.

[31] Samir Kumar Jalal, Multimedia Database: Content and Structure, DRTC Bangalore, 2001.

[32] “An Architecture for Information in Digital Libraries”, [Online]. Available:

[33] “Chapter 4: Digital Library Architecture”, [Online]. Available:

[34] William Arms, Digital Libraries, MIT Press, 2000.

[35] “Key Concepts in the Architecture of the Digital Library”, [Online]. Available:

[36] Richa Pandey, Digital Library Architecture, DRTC Workshop on Digital Libraries, March 2003.

[37] Vamshi Ambati, N.Balakrishnan, Raj Reddy, Lakshmi Pratha, and C.V. Jawahar, The Digital Library Of India Project: Process, Policies and architecture.

[38] “University of California Digital Library”, [Online]. Available:

[39] “The Stony Brook Algorith Repository”, [Online]. Available:

[40] “Fast Hierarchical Methods for the N-body Problem”, [Online]. Available:

[41] M.M.Koganuramath, Mallikarjun Angadi, Design and Development of Digital Library: an initiative at TISS.