Sound can be distinguished

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Digital audio is the digitisation of sound. Sound can be distinguished as a continuous wave that travels through air. Furthermore, the pressure differences exist at each wave location constructs the wave. Thus, a continuous sound wave can be measured by analysing the different pressure points at different locations.


Image is a two-dimensional picture that resembles or has the same appearance as the original object. Images can be captured by optical devices such as lenses, mirrors, cameras or even human eyes. Furthermore, other figures such as paintings, maps, graphs and charts can also be referred as image.


Video is mainly about the construction of a sequence of still images to represent a scene in motion. Thus, it gives a perception of an object moving, turning and rotating. Furthermore, a video can integrate the audio and image elements into a single file.


Text is the most basic representation object. 'abcABC12345' is an example of text. Plain text and rich text are the 2 distinctive types of text. The former is free from any formatting. Therefore, plain text is accepted universally by most of the text editors. Compare to plain text, rich text are text that are formatted. 'abcABC12345' is an example of rich text.


Animation is a simulated movement created by a series of still images or frames. Animation and video may seem familiar, however, they can be distinguished. A video takes a continuous motion and break it into individual frames whereas an animation combines independent pictures to form a continuous motion.

[control layer.mht]

2.2 Layered Multimedia Data Model (LMDM)

A data modelling approach must encompass the spectrum of audio, video, text, graphic and animation. In addition, multimedia data models are based on the conceptual separation of the multimedia data and their presentation. The Layered Multimedia Data Model is an approach to make distinctive separations between user interfaces components, data management, Quality of Service (QOS), presentation management and others. There are four essential features about the Layered Multimedia Data Model:-

  1. Hardware and OS-independent - Hardware-specific and Operating System-specific concerns are isolated in all of the layers. Therefore, the data model will be deemed as a general data mode.
  2. Independent implementation - By breaking the model into different layers which specialised functionalities, the implementation of each process can be performed individually.
  3. Inter-usability - Each layer can be reused by implementing a specific component in a layer with other components from another layer. Therefore, reusability can be promoted.
  4. Micro - Dividing a huge model into smaller parts can facilitate easier problem tracking and solving.

The Layered Multimedia Model is consists of four layers:-

  1. Data Definition Layer - The Data Definition Layer (DDL) provides the abstraction of multimedia objects required for higher layers. A multimedia object can be referred to an actual multimedia data, a reference to a multimedia data and a collection of metadata describing the specific multimedia data. Furthermore, this layer serves as an essential interface to any multimedia data or data generating devices.
  2. Data Manipulation Layer - The Data Manipulation Layer (DML) provides services to bind multimedia objects together into an abstract temporal structure, namely, multimedia events. Multimedia events represent a collection of objects bound to an abstract clock. Moreover, DML defines synchronisation structure that will be interpreted by the Data Presentation Layer.
  3. 3) Data Presentation Layer - The Data Presentation Layer (DPL) is responsible for describing how a multimedia event is to be communicated to the user. The multimedia presentation object may include information such as playback rate, spatial layout and directions.
  4. 4) Control Layer - The Control Layer (CL) describes how a multimedia composition is constructed from several multimedia presentations. Moreover, Multimedia composition is a complete package of multimedia information that is ready to use. In addition, the Control Layer describes the signal a multimedia composition can accept from a user or Input/Output devices. the Control Layer also describes what actions an user can take.

Chapter 3 Metadata: Data about Data

3.2 Introduction to metadata


Metadata is data that describes data in any form. The purpose of metadata is to clarify and assist in data searching in order to achieve accurate results. Furthermore, the metadata can be in the form text, images or voices. Generally, metadata is considered to be descriptive information about the context, characteristics and attributes of data. Below are some examples of metadata application for distinctive objects.

Object Metadata attributes

Video Title, Director, Actors, Length, Rating

Book Title, Author, Date of Publication, Unique Identifier (ISBN no), subject

Image Date, Time, Resolution, Camera Settings, Format

Audio Title, Artist, Format, Copyright, Composer, Genre, Length, Rating

3.3 Classification of metadata

Metadata has can be classified into three different categories:-

Category Description


The content of a data itself can be adopted as metadata. The most important step is to determine whether the metadata is describing an object or the content of an object. Name and title are example metadata attributes that describe an object. However, a sentence such as 'Cristiano Ronaldo scored a goal' describes the content of an object.


Metadata can be static (immutable) or dynamic (mutable). For example, a video title is immutable due to the video title is fixed. However, the description for every scene in the video may change, therefore, deems it mutable.

Logical Function

There are three layers of logical function: subsymbolic, symbolic and logical layers. The bottom layer is the subsymbolic layer which consists of raw data. Symbolic layer underlines the metadata that is used to describe the raw data. Finally, logical layer deals with logical reasoning about the usage of symbolic layer.

3.4 Advantage of Metadata

The advantage that derived from applying metadata is that it facilitates management, retrieval and usage of data by human and computers. Furthermore, metadata is able to present the concept of a multimedia object to the users. Thus, by understanding the concept of each object, data management process can be simplified.

Besides that, metadata is able to improve data retrieval speed. Due to metadata contains the keywords of every object, results can be produced quickly by comparing user inputs against the attributes of each object. This is an advantage if compared to traditional recurring search method. Traditional search method requires a lot of time and resource to retrieve data. Thus, overall system performance will be hampered.

Other than that, object association can be created by using metadata. For instance, Skyline GTR car model is offered by Nissan. If we search for all the car models offered by Nissan, the car model Skyline GTR will definitely be included in the search result. Besides that, if we scrutinise further, the search result will link us to other manufacturers that produce Skyline GTR car parts. Therefore, navigation between objects is viable.

Chapter 4 Data Indexing Approaches

4.1 Introduction

Multimedia database can be very complex and sophisticated. Besides that, a multimedia database deals with an enormous number of complicated issues which include handling a large volume of data and data retrieval performance. Therefore, indexing approaches are proposed to facilitate fast access of stored data in multimedia database.

4.2 R-Tree


R-tree is a data structure that is designed to index multi-dimensional information. It uses solid Minimum Bounding Rectangles (MBR) as its page regions. A MBR is a multidimensional interval of the data space. Furthermore, the MBRs overlap on each other to form the R-tree index structure. However, overlapping of MBRs can deteriorate the overall search performance of a database.

From the figure, there are leaf nodes and non-leaf nodes. A leaf node is a node that is farthest from the root. In the figure shown above, leaf nodes are the nodes on the third row whereas non-leaf nodes are the nodes on the second row. The leaf node and the non-leaf node store different data respectively. Each leaf node stores the way to identify the actual data element and the bounding box of the data element. On the other hand, each non-leaf-node stores the method of identifying a child node and a bounding box that stores all entries of the child nodes.

Besides that, it is noticeable that the R-trees structure allows the determination of MINDIST, MAXDIST and MINMAXDIST features that will be discussed in the later chapter. Therefore, by determining the mentioned features, the bounding box of each node will be used to determine whether to search inside a node.

4.3 X-Tree

[important, wiki]

X-tree is an index tree structure which is based on the R-tree. Moreover, It is also used for storing multi-dimensional data. However, there is a significant difference between X-tree and R-tree. X-tree emphasises the application of split axis to prevent the occurrence of overlapping between Minimum Bounding Rectangles (MBRs). It is known that overlapping can deteriorate the overall search performance. Thus, the X-tree signifies better performance compared to R-tree when dimensionality is high.

Referring to the figure, the index structure starts with a single data page A that contains all items. If the page overflows, then the page A will be split into new pages A' and B. As we travel away from the root, each page will be split into new pages. As data page splits, the history will be recorded. Therefore, a binary tree will be produced with split dimensions as non-leaf nodes and data pages as nodes.

However, there is a special occurrence, which is the creation of a supernode. Referring to the figure, let us assume that dimension 5 is chosen to be the split axis. The A'' and E must be inserted into one of the partitions. However, by inserting these two into the partition, the MBR of the partition will span almost of the entire data space. Consequently, overlapping is inevitable. As a result, dimension 2 is more suitable to be selected as the split axis. However, another circumstance must be considered. If dimension 2 is selected, an unbalanced directory would be created as a node will be under filled whereas another node will be overflowed. In the figure, the node which is under filled will be on the left whereas the overflowed node will be on the right. As a result, a supernode will be created to address this issue.

4.4 VP-Tree

[technical, wiki]

The VP-tree is an index structure that utilises ball-partitioning method. Ball partitioning is refer to the division of data points based on a selected pivot (vantage point). There are several ways of selecting a pivot.

The simplest way of selecting a pivot is to select by random. However, a more thorough search can yield better system performance. The figure above displays three methods of selecting a pivot. The difference among these three methods is the boundary of the ball (circle). It is noticeable that the ball in (a) method has the largest boundary whereas the ball in (c) has the smallest boundary. Furthermore, larger boundary can increase search performance efficiency.

After the pivot is selected, a median must be elected. For VP-tree, the ball radius will be selected as the median. The algorithm of VP-tree is to compute the median of the distances of every data object to the pivot. Then, the objects will be equally divided into two sub groups. Among the two sub groups, one will be closer to the vantage point whereas another will be further to the vantage point. By applying these rules repeatedly, smaller sets can be obtained and eventually a binary tree will be created. For, VP-tree, the left and right tree will be constructed by subsets insides and outside of the corresponding ball respectively.

Chapter 5 Queries and Search Algorithms

5.2 Range Query

[technical - query & search algo]

The algorithm for range query is to return a set of values between the lowest boundary and highest boundary. In addition, it can be explained by retrieving a result from the database S with a query object A and distance of E larger than 0. The result contains data that are extracted from the database where the distance of A and database object B is smaller than E.

Therefore, let us assume that

Query object = A

Default distance = E ( where E = 0 )

And, Result = Find all database object B where Distance(A,B) = E

An example of range query scenario is to search for employee that has 3 to 5 years of working experience. In this scenario, the query object A will be 3 whereas the distance E will be 5.


However, it is very difficult to determine the range of the result. Sometimes, the result can be the whole database itself. In order to address this issue, a metric with a weighted dimension has to be inserted in the query. Consequently, the result returned will only contain a partial range instead of full range result. Thus, data retrieval can be performed faster.

5.3 Nearest Neighbour Query


The algorithm of nearest neighbour query is to list all database objects B in database S when a query object A is given and the size of B is larger than nil. Based on the scenario before, let us assume that the query object, A remains at 3 and there are 4 employees with respective years of working experience.

a) RKV Algorithm

- MINDIST and MAXDIST are the two essential distance functions defined. A page region is a container than contains objects. MINDIST refers to the nearest distance of a point in a page region to the query object A whereas MAXDIST refers to the furthest distance to the query object. In addition, another term, MINMAXDIST was introduced. The MINMAXDIST can be obtained by claiming the smallest distance from the distance between the query object and edges of a page region. After the MINMAXDIST is retrieved, objects that have MINDIST smaller or equal to the MINMAXDIST will be added into the nearest-neighbour query result.

b) HS Algorithm

- The HS algorithm does not access page regions in an order induced by anindex structure. Moreover, the algorithm accesses pages based on the increasing distance to the query point of the query object. The process of HS algorithm starts by selecting a page region P that has the minimum distance from the query point. Then, the distance between the query point and the child pages contained in P will be calculated and stored. The process is to search for closest point candidate (cpc). The processing will be repeated until the closest point candidate is closer to the query point than P. After all page regions have been looked upon, the process stops.

5.4 Ranking Query


The basic idea of ranking query is to retrieve a list of objects in database S in an order of distance from the query object A. Moreover, the ranking query adopts the identical algorithm as the nearest-neighbour query. In this case, the minimum distance between query object and database object is the metric of determining the position of a database object in the return result.

Based on the scenario discussed earlier, let's assume that we are going to search for employees that possess 3 to 5 years of working experience.

The result will be ordered in this priority queue: {(B1, 1), (B4, 1.5), (B3, 2), (B2, 3)}

Therefore, database object B1 is considered to be the first nearest neighbour and ordered as the first object in ranking query result.

Similar to the problem faced by range query, there is no way to determine the return size of a result. This is due to the algorithm may rank all the database objects of a database into a priority queue. Thus, a maximum distance value D can be introduced to terminate the query process when the distance of query object A and database object B exceeds D.

Chapter 6 Critical Evaluation

6.1 Multimedia Impact on Workflow

Multimedia database is different from the conventional relational database. Words and texts are the components that construct conventional database. However, other than text, multimedia database is consisting of audio, images, videos and animations. Thus, the data storage mechanism for multimedia database will be different. Other than usual characters and strings, the Binary Large Object (BLOB) will be introduced to store images, audio and other multimedia objects.

Furthermore, the data storage mechanism can greatly influence the workflow of a system. This is due to the data retrieval will be burdened with more processes. For instance, a single object such as worker, it is required to retrieve its textual information, pictures, audio and video elements. Compared to conventional database, the cost of retrieving a single record has increased for multimedia database. Thus, indexing and search techniques must be implemented on the database in order to reduce data retrieval cost whilst improving the system performance.

Another impact is the size of each multimedia object. There must be a restriction on object size to ensure that the database storage is optimised. Therefore, all multimedia elements especially video must be standardised to prevent them from consuming too much space. The correlation between bandwidth and multimedia objects is an issue that must be considered. The main goal of developing the MMHRIS is for workforce management. However, the inclusion of multimedia objects into MMHRIS may burden the system. This is due to loading large multimedia objects over the Internet requires a good connection and a significant amount of time. Thus, fine tuning the system, database and multimedia objects is essential in order to promote smooth workflow.

6.2 Data Optimization for Optimum Performance

There are a plethora of index structures and search algorithms. However, not every available method is viable for our current system, Multimedia Human Resource Information System (MMHRIS). A system will operate in the most efficient and cost-effective manner if its index structure is adhere to its data requirement and vice versa. Therefore, in order to ensure a multimedia system is running optimally, we must customise the index structure and search algorithms to meet its requirement.

In this scenario, we will be adopting the idea of R-tree into our system design. At the same time, as discussed, R-tree allows the implementation of MINDIST and MAXDIST features which are essential for us to introduce the nearest-neighbour query for our system. Even if nearest-neighbour query can yield accurate result, there is another issue which must be aware of, the size of data space. Moreover, there must be a restriction of result size to prevent the search query from bottlenecking the system performance. Therefore, range query has to be adopted into the nearest-neighbour query to ensure fast and accurate data retrieval. After range query is implemented, the cost of data retrieval will be reduced because we will not getting a huge chunk of data. Thus, lowered cost is equal to increased performance.

Other than that, to improve the effectiveness of search result, we will be implementing the use of word filter. A word filter will be used to filter the description of every object. This means that redundant words such as articles ('a', 'an', 'the') and pronouns ('he', 'she', 'it') will be trimmed from a particular description.

In addition, object clustering will be introduced to the system. Clustering allows the breaking of a single large group into smaller distinctive groups. For our MMHRIS, our main object will be the worker. By dividing these workers into different groups based on their traits and performance, we are able to search for data quicker and more accurately.