Distributed Multimedia Database Systems

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


the success of distributed multimedia database systems and the related applications relies heavily on the integration of different technologies. one of the most important issues is the infrastructure of network. current internet was not designed for real-time multimedia information retrieval and communication. it is a best-effort based protocol. applications running on the current internet will take whatever resource is available on the internet.

priority is differentiated. this type of infrastructure does not guarantee quality of services of multimedia communication. another problem of the current internet is the ip address domain, and its capacity of embedding extra information, such as for security reasons. the newly proposed ipv6 solution seems to replace ipv4. hopefully, with the broader bandwidth and the new protocol of lnternet2, multimedia communication will be more realistic in the near future. yet, distributed multimedia database needs other enhancements on current technology. some of the issues are potential research

Multimedia database techniques

Content-based multimedia information retrieval is one of the research areas contributing to the success of multimedia database. Unlike traditional database systems, which focus on numerical and keyword search, multimedia information needs visual query methods and tools to retrieve useful information from video clips, pictures, and audio clips. We summarize some of the challenges in this section.

Video database

The challenges of video database include data compression, user-video interaction, segmentation, object extraction, and clustering/indexing of video data. Most of these subjects were investigated (flicker et al., 1995). However, there are problems to be solved. The structure of a video clip consists of a sequence of frames. Video frames have different purposes. Some store the entire screen layout while others store only the differences between frames. Video clips can be compressed by compression standards, such as mpeg. General strategy of compression includes the estimation of space that can be saved either in the spatial domain or in the temporal domain. Compression technique not only saves storage and increases transmission efficiency, but affects the efficiency and accuracy of video information retrieval.

One of the most interesting recent subjects is the technology to divide a video clip into a sequence of shots, which is a composition from a number of video frames. A sequence of shots may compose a scene. Shot and scene detection are useful. Because video clips can be summarized such that a video tape can be shortened to a sequence of representative scenes. Audiences can retrieve the portion of video clip based on an efficient browsing of a video summary. Video summarization is difficult since shot and scene detections are not easy. However, several solutions have already successfully separated video shots. Detecting the boundary of a video scene sometimes involves semantic processing, as well as human perception. In spite of this difficult, scene detection solutions are also available; however, the accuracy varies. Another issue is the level and granularity of summarization. A videotape can be summarized into one minute, five minutes, or even twenty minutes. Some researchers propose a hierarchical organization strategy that allows a video summary to be extracted based on different length requirement. Common approach of video summary hierarchy includes the following levels: frame, shot, scene, and episode. The higher the level of the abstraction, the more difficult is precise summarization due to the degree of semantics involved. Yet, video segmentation and summarization are challenging research topics.

The purpose of video summarization and shot detection is to provide the user with a convenient interface for information browsing. However, the mechanism still relies on a human to look through each summary. One of the most interesting research directions is to 353 extract objects in video. For instance, MPEG-4 is able to separate objects from video background. However, object extraction is very difficult since the process requires sophisticated image processing technique. Detecting the boundary of objects is hard, especially when video background is complicate or noise is high. If objects can be extracted precisely from video, it is possible to search videoclips based on a particular object (e.g., a tree, a card, or a person). This automatic search involves another level of difficulty, and we will discuss image content-based retrieval later.

Another interesting but tough issue is video interaction. Current technology promises users the ability to select movies on the network (so called video-on-demand, or VoD). Solutions of VoD systems are available, with some limitation on the number of simultaneous users or the underlying network infrastructure. Difficulties of VoD technology includes buffering techniques and progressive transmission methods to ensure the low congestion rate of video network, as well as mechanisms to allow user interaction (such as slow motion, fast forward, etc.). This type of interaction is limited, as compared to interactive TV. Interactive TV aims to provide the users with some degree of interaction control. For instance, users can select the actors, the panorama, or even the scenario. Interactive TV is still challenging. Intelligent mechanisms are required to realize such a system.

Video database contains large amount of data. Traditional database techniques such as B-tree alike indexing and data clustering may not be suitable for efficient video storage clustering and indexing. The new clustering and indexing methods should incorporate compression standards and interaction considerations. Moreover, it is possible to develop video operating systems based on the requirements of random continuous data access. Traditional operating systems are for programs and data accessed in a relatively smaller volume (e.g., page-in and page-out of disk memory). Continuous media, especially video, should be treat differently, and new operating system techniques should be developed.

Picture and image database

Multimedia database is different from traditional database, which mainly stores numerical or alphabetical records. Traditional database query compares numerical values or string matching, which can be efficiently computed by current computer architecture. However, image database does not have an effective feature, such as numerical value, for database query. Imagine that a user wants to select ten most suitable pictures to represent a birthday party. How does the user give query? Content-based image retrieval is an interesting but challenging research topic. Some of the existing solutions (Bach et al., 1996) use lower level features to compare image against user query. for instance, a color histogram of a picture is calculated. Dominating colors of the picture are selected to compare with those in the user query. Others use texture features, such as the signature of textures in the frequency domain, including granularity and directional distribution, to make a comparison. Color histogram and texture features are the initial approach to content-based image retrieval.

However, to effectively represent a picture of birthday party, it is impossible to specify the picture using color and textual properties. Perhaps, a birthday cake would be the most suitable object to search in the case. This means that the effective search of a birthday party picture depends on the search of a cake, which can be represented as an object of certain shape (e.g., a cake with candles). Comparison of shape features between two objects is difficult, yet there is no definitely effective solution found in the literature.

Searching on objects also involves another level of challenge, lithe query involves more than one object, the spatial feature of the objects should be considered. Some solutions (keh, shih, and chang, 1999) use the analysis of spatial-temporal relations among objects. 354 the relation of objects in a picture can be computed to a value (or compounded value).

Similarity between two relations canbe calculated.

therefore,thesimilarityprecisionbetween two pairs ofpictures can be justified. the computation of spatial relations shows a feasible solution. however,there aredifficultiestobe solved. forinstance, ifafrrllyautomatic system is the goal, how to identi$ the main objects inthe two pictures, suchthat their corresponding objects are compared using the spatial relations, becomes,very difficult, especially if the comparison of object shapes is not precise. conclusively, currentresearch ofimage content-basedretrieval involves the comparison of the following lower level features: color histogram, texture, shape, and spatial relations. some systems allow users to decide the percentage of importance. the factors of lower level features are computed using some sort of information fusion technique. a final factor is used to make the comparison among objects. however, information fusion technique is hard to deal with, as most of the approaches do not tell the users what to select (e.g., the percentage ofcolor feature against shape feature). the fundamental difficultyisthat there is a huge gap between human perception and the representation of these lower level features. the identification of the gap and the methods to narrow these gaps will be the next step ofreseatch.another direction ofresearch in content-based information retrieval is to compute the relevance feedback from the users. basically, the user trains the system via some automatic or semi-automatic feedback bookkeeping methods. these relevance factors are anaÃÆ'¾ed, either using some statistical methods, or using a neuÃ"šÃ‚¡al network, to allow a reference when the same user wants to select a similar picture next time. relevance feedback seems to solve part ofthe gap problem. however, the methods workto some extent. for instance, different usersmayhave differentperception. also,relevance feedbackis hardto quantiff, that is, the impact of lower features in the fusion process is hard to decide. conclusively, the following issues still remain to be solved: . effective visual query language; . precise definition of image features to meet the need of human perception; . reasonable featuÃ"šÃ‚¡e fusion methods; and . non-user dependent relevance feedback techniques.

image content-based retrieval can be extended to search objects or scenes ofvideo. In addition, image database searching application is useful. some of the applications include photo album database, digital art museum, medical image database, and others.

audio database

audio database may contain speeches, music records (including digitalized music records, and midi records), as well as a combination ofthe above. current technology is able to efficiently separatemusic fromspeech. however, therearequite alotofchallenging issues to be solved. some research projects provide an interface for the users to sign a song via humming. the system is able to find the song to a veryhigh precision. this is one ofthe most successful examples of automatic audio rehieval techniques. the techniques involve methods to convert humming to either some intemal wave representation, or to the standard midi form, and the fussy comparisonbetween these representations against records in the audio database. issues ofresearch also include the performance consideration, especially if the number of music records is large. some of the successfirl applications are used in karaoke. however, there are other challenging issues. speech and voice print recognitions are difÃÆ'¯icult. the challenge not only includes advanced signal processing methods, but the 355 understanding of speech semantics is diffrcult. some examples used in telephone companies are able to build a machine to help the users, however, the precise identification method of voice print is still not known.