Efficient Invisible Watermarking In Audio Files Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract-In this paper various watermarking techniques for the audio signal authentication and copyright protection has been proposed .It covers the research given by various authors in order to examine the robustness of watermarking system. Digital watermarking involves a process of embedding into host signal a perceptually transparent digital signature, carrying a message about the host signal in order to "mark" its ownership. The watermark is embedded with the use of a key by imposing unnoticeable changes to the original host multimedia signal.

Keywords- Steganography, Robust, MDCT, RIFF, .WAV files.


Audio is classified into two types: one is speech signal used in telephony and video telephony, second one is music quality audio used in applications such as CD on demand and broadcast TV. Audio is stored in digital form in computer memory. The bandwidth of speech signal is from 50 Hz to 10 kHz whereas of music is signal from 15 Hz to 20 kHz. Sampling rate in case of speech is 20ksps and in case of music is 40ksps (2fmax.). The number of bits chosen for speech is 12 bits and for music are 16.

Actual sampling rate is much lower than above to reduce the memory size.MPEG-1 layer 3 (MP3) is the standard for the audio compression techniques. Compressions of 1:10 or 1:12 can be achieved thus allowing more digital storage in the given space. Because of MP3 popularity a lot of people search for music on internet .Unauthorized persons may take advantage of and copy other's work without paying them. To avoid this problem copyright protection is needed

Digital watermarking has been proposed as a new, alternative method. Paper watermarks appeared in the skill of handmade paper marking hundred of years ago. Watermarks were mainly used to identify the mill producing the paper and paper format, quality and strength. A paper watermark was a perfect technique to eliminate confusion from which mill paper is and what are its parameters. Digital watermarking involves a process of embedding into host signal a perceptually transparent digital signature, carrying a message about the host signal in order to "mark" its ownership. The digital signature is called the digital watermark. Digital watermarking technique is an idea to protect the illegal trade and protect the rights of digital media data. Although perceptually transparent, the existence of the watermark is indicated when watermarked media is passed through an appropriate watermark detector.

A watermark, which usually consists of a binary data sequence, is inserted into the host signal in the watermark embedder. Thus, a watermark embedder has two inputs; one is the watermark message (usually accompanied by a secret key) and the other is the host signal (e.g. image, video clip, audio sequence etc.). The output of the watermark embedder is the watermarked signal, which cannot be perceptually discriminated from the host signal.

Human auditory system is stronger as compare to visual system, the development of good performance audio watermarking technique is a difficult task. Watermark to this audio signal corresponds to adding noise to the host signal which is impossible to hear .Watermarking algorithm for MP3 compressed signals uses psychoacoustic model which exploit a number of the limitations of human ear. The audio file is portioned into frames which are 90 milliseconds in duration. This frame size is chosen so that the embedded watermark is not audible into the audio file.

According to this model when multiple signals are present, a strong signal may reduce the level of sensitivity to other signals which are near to its frequency, an effect known as frequency masking. In addition when ear hears a loud sound ,it takes a short but finite time before it can hear a quieter sound, an effect known as temporal masking. The watermarked signal is then usually recorded or broadcasted and later presented to the watermark detector. The detector determines whether the watermark is present in the tested multimedia signal, and if so, what message is encoded in it. The research area of watermarking is closely related to the fields of information hiding and Steganography.

Cryptography hides the content of message from an attacker but not the existence of the message. Steganography/Watermarking even hides the very existence of the message in the communication data. Consequently, the concept of breaking the system is different for cryptosystems and watermarking systems.

The watermark is embedded with the use of a key by imposing unnoticeable changes to the original host multimedia signal. The principal design challenge in embedding the watermark is that it reliably fulfills its intended task. For copy protection applications, the watermark must be recoverable even when the signal undergoes a reasonable level of distortion. The security of the system comes from the insecurity of the key. Without access to this information, the watermark cannot be extracted or be effectively removed or fake. The key is used to extract the watermark from the possibly distorted watermark signal. Unlike standard secret message systems used for encryption and authentication, digital watermarking does not restrict access to the information to prevent unlawful acts. Instead, it provides evidence of a wrong-doing after it has taken place. This is similar to law enforcement authorities who investigate crimes only after unlawful events occur.

Audio watermarking is of two types i.e. blind and non blind audio watermarking. If the detection of the digital watermark can be done without the original data, such techniques are called blind. Here, the source document is scanned and the watermark information is extracted. On the other hand, non blind techniques use the original source to extract the watermark by simple comparison and correlation procedures. However, it turns out that blind techniques are more insecure than non blind methods.

In the audio industry today there is much interest in copyright management and protection. Embedding some form of "hidden signal" or watermark in the audio stream is seen as a potential method for managing the use of the material. To ensure that only those with the right to access it can do so, methods have been proposed to include "gatekeepers" in audio equipment. That way, unauthorized reproduction and especially unauthorized copying, could be prevented. However, there are many serious problems with this concept. It is not at all clear that they can be overcome sufficiently to provide a reliable and effective control system.

Properties Of Audio Watermark

A watermarking algorithm can be characterized by a number of properties. The relative importance of each property however depends on the demands of the application. The six important properties are as follows [1].

Perceptual Transparency

In all most every application, the watermark-embedding algorithm has to insert watermark data without changing the perceptual quality of the host audio signal. The faithfulness of a watermarking algorithm is usually defined as a perceptual similarity between the original and watermarked audio sequence. The watermark should be invisible in a video and inaudible in audio signal. However, the quality of the watermarked audio may get tainted, either intentionally by an adversary or unintentionally during the transmission process, before a person perceives it. In such a case, it is more sensible to redefine the fidelity of a watermarking algorithm as a perceptual similarity between the watermarked audio and the original host audio at the point at which they are presented to a consumer.

Watermark Bit Rate

Bit rate is of an embedded watermark is defined as the number of bits of the watermark embedded in one second of the host audio signal and is given in bits per second (bps). The bps condition of a watermark depends on the application. For example, in certain applications, such as copy control, require the insertion of a serial number or author ID, with the average bit rate of 0.5 bps. In some envisioned applications, like hiding speech in audio, algorithms have to be able to embed watermarks with the bit rate that is a significant fraction of the host audio bit rate, i.e. up to 150 kbps.


The robustness of a watermarking algorithm is defined as its ability to detect/ extract the watermark after common signal processing manipulations. The set of signal processing modifications to which a watermarking algorithm needs to be robust against is completely application dependent. For example, in radio broadcast monitoring, embedded watermark need only to survive distortions caused by the transmission process, including dynamic compression and low pass filtering, because the watermark detection is done directly from the broadcast signal. On the other hand, in some algorithms robustness is completely undesirable and those algorithms are labeled fragile audio watermarking algorithms.

Blind or informed watermark detection

In some applications, a detection algorithm may use the original host audio to extract watermark from the watermarked audio sequence (informed detection). It often significantly improves the detector performance, in that the original audio can be subtracted from the watermarked copy, resulting in the watermark sequence alone. However, if detection algorithm does not have access to the original audio (blind detection) and this inability substantially decreases the amount of data that can be hidden in the host signal. The complete process of embedding and extracting of the watermark is modeled as a communications channel where watermark is distorted due to the presence of strong interference and channel effects. A strong interference is caused by the presence of the host audio, and channel effects correspond to signal processing operations.

E. Security

Watermark algorithm must be secure in the sense that an adversary must not be able to detect the presence of embedded data, let alone remove the embedded data. The security of watermark process is interpreted in the same way as the security of encryption techniques and it cannot be broken unless the authorized user has access to a secret key that controls watermark embedding. An unauthorized user should be unable to extract the data in a reasonable amount of time even if he knows that the host signal contains a watermark and is familiar with the exact watermark embedding algorithm. Security requirements vary with application and the most stringent are in cover communications applications and in some cases, data is encrypted prior to embedding into host audio.

Computational complexity and cost

The implementation of an audio watermarking system is a tedious task, and it depends on the business application involved. The principal issue from the technical point of view is the computational complexity of embedding and detection algorithms and the number of embedders and detectors used in the system. For example, in broadcast monitoring, embedding and detection must be done in real time, while in copyright protection applications; time is not a crucial factor for a practical implementation. One of the economic issues is the design of embedders and detectors, which can be implemented as hardware or software plug-ins, is the difference in processing power of different devices (laptop, PDA, mobile phone, etc.).

Problems Of Audio Watermarking

Audio Quality

It is obvious that the system must not excessively spoil the audio quality. The acceptability of the impairments depends on the intended audience. Some people argue that any alteration of the original sound is unacceptable. They probably argue from an idealistic position, that the best current audio quality is only barely acceptable and still needs further improvement before it could be described as 'perfect'.

Others, working from a more sensible position, recognize that most of the potential audience is either not that critical or is not in a position to judge audio quality to that extent, because of their limited hardware facilities or oral capabilities. To those, a degree of impairment that might actually be finite but still imperceptible would, by definition, be acceptable.

Yet others, listening under very poor conditions, as in a car or via a low bit rate channel, might even tolerate some clearly perceptible impairment, especially if they could not distinguish them from the other impairments. The wide range of these different conditions presents a very difficult problem for watermarking system application. It is unlikely that a single watermark would survive translation between those different environments.


Any system intended to control access by a legitimate customer to legally acquired material will have to be reliable. It takes only a tiny amount of reported difficulty to give a system a bad reputation commercially. One of these serious potential problems would be the lack of acceptance of liability by vendors. In both the pre-recorded audio and the broadcasting industries, the hardware and the content are generally provided by different suppliers. The potential for each supplier to blame the other's product, leaving the customer without option, is self evident .In fact ,at the time of writing, just such a conflict is taking place over other sorts of copy restriction techniques. It has become an exceptionally contentious issue, one that may be responsible for some of the recent decline in retail CD sales, at least according to some industry opinions.

Any watermarking system used for access control would have to detect the watermark quickly and reliably and default to not preventing access when it could not do so. It is somewhat less important that access by non-authorized persons is properly banned. It is doubtful whether any protection system with adequate performance in that respect could be developed.

Identification applications need not to be so reliable. At least in the short term a longer detection period provides extra detection performance, allowing a reduction in watermark amplitude and audibility. It means that the required degree of reliability can be attained over longer extracts, perhaps most of a 'single' record track of. Say, three to five minutes.


To prevent unauthorized access, a watermarking system has to be resistant to significant degradation of the material .Many of the potential customers for illegally copied audio are quite tolerant of impairments to the material. Any included watermark would also be impaired. The watermarking system has to be resistant to those impairments. The general view is that the watermark should survive until the content becomes "of no commercial value".

Audio coding for bit rate reduction

The inherent conflicts between bit rate reduction systems that try to remove inaudible components and watermarking systems that try to hide additional data inaudibility are self evident.

Boney et al. [2] generated watermarks by filtering a pseudo noise (PN) sequence with a filter that approximates the frequency masking characteristics (a combination of local masking and absolute hearing thresholds) of the human auditory system (HAS). The watermarks thus created will differ for different audio signals. (That is, they are signal-dependent).

Their experiments revealed that the watermark noise was almost inaudible and has high detection and low false alarm rates, under coding, re-sampling, and multiple watermarking manipulations. However, the decision threshold used in the watermark detection process must be tuned in advance for various types of attacks and host signals. Their detection scheme also assumed that the user can access the original signal and the PN-sequence that he used to watermark a signal.

According to chen , zhoal ,wang [3] Digital watermarking is a technology that allows users to embed watermark into digital contents to identify the copyright holder; to prevent illegal copy, and to verify modification to the original content. This paper proposes a novel adaptive watermarking algorithm for MP3 compressed audio signals, based on human auditory system. In the proposed algorithm, watermark is embedded adaptively and transparently after Modified Discrete Cosine Transformation (MDCT) and before quantization. Gaussian distribution statistic analysis is introduced to make this watermarking algorithm adaptive. The paper tested shows adaptive watermarking algorithm on various types of audio signals. The experimental results show that the new algorithm can survive most common signal manipulation including MP3 compression. Watermark is embedded after Modified Discrete Cosine Transform (MDCT) and before quantization. During the MP3 encoding process, two-time compressions happen. One is the hybrid filter bank with psychoacoustic model II masking thresholding and the other is the quantization.

Quantization in MP3 is a non-uniform quantization mapping amplitude values into finite number of bits. There are two-nested loop during the quantization. One is the inner iteration loop to control the quantization step size and the other one is the outer iteration loop to control the noise shaping factors for each scale factor band. Frames forms MP3 audio stream after the MDCT .According to the experimental results, if two (or more) original frames (32 x 18 matrix) are merged to a macro frame (32 x 36 matrix), the MDCT values in the macro frame follow the Gaussian distribution. One original frame alone can also be used to carry watermark, but the performance is not as good as when two frames are merged. After many experiments, it is decided that merging two frames together will generate good result while not taking too much watermarking space.

Xiaohong Ma, Xin Li, and Wenlong Liu [4] presented in his paper, proposed a new multipurpose audio watermarking method .The region of interest (ROI) audio and one binary watermark image are two robust watermark signals which achieve the purpose of protecting the crucial part of the host signal and copyright simultaneously . In the extracting procedure, fast fixed-point independent component analysis (FastICA) algorithm is adopted to extract the two robust watermarks, and detect tampering areas automatically without embedding the fragile watermark. Experimental results show the effectiveness and reliability of the proposed method.

Ki-Young Kim [5] proposed digital watermarking algorithms for high quality audio to improve strength of embed watermark by adding spread spectrum. The watermark is embedded in each audio frame by adding a perceptually-shaped pseudo noise sequence. The proposed method realized digital audio watermarking that audience cannot perceive as a noise by inserting the watermark by using the psychoacoustic model.

Larbi [2005] et al. [6] has viewed watermarking as a preprocessing step for further audio processing systems. The watermark signal conveys no information, rather it is used to modify the statistical characteristics of an audio signal, in particular that its non stationary. The embedded watermark is then added in order to stationnarize the host signal.

Li Zhi [2005] et al. [7] has presented a scalable (i.e. lossy-to-lossless) watermark scheme based on a recently standardized scalable audio coder. The proposed framework enables the recovery of the original lossless audio after watermark embedding, and in the meanwhile, is able to make the watermark adaptive such that the watermark distortion to the lossy host audio is minimized. An encryption mechanism is further employed for restricting unauthorized access to lossless audio and watermark removal.

Zhang Li [2006] et al. [8] Gong-bin has proposed a self-synchronization blind audio watermarking based on wavelet transform. A way to estimate attacking parameters of watermarked audio may encounter is proposed by using statistics of original audio and these statistics characteristics can be used as private key for watermark detector. So the watermark detection scheme can make the re-synchronization between the watermark embedding and the detection process without any additional synchronization code to cause additional noise.

Zhi Li [2006] et al. [9] has presented a scalable approach to lossless watermarking for audio signals. The proposed watermarking framework is built on a recently standardized two-layer scalable audio coder advanced audio zip. By embedding watermarks in both the core layer and enhancement layer bit streams in a special way, the watermark distortion in either layer is compensated by the watermark in the opposite layer.

Wei Li [2006] et al. [10] has proposed a novel content- dependent localized robust audio watermarking scheme. The basic idea is to first select steady high-energy local regions that represent music edges like note attacks, transitions or drum sounds by using different methods, then embed the watermark in these regions. Such regions are of great importance to the understanding of music and will not be changed much for maintaining high auditory quality.

Ji-Xin Liu [2006] et al. [11] has proposed a novel robust audio watermarking algorithm. It takes full advantage of the multi-resolution and the energy compression properties of the Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) respectively, and embeds a pseudorandom permuted binary image in the original audio signal by Vector Quantization (VQ). The original audio is first transformed using the properly selected wavelet bases into the DWT domain, then the approximate coefficients are segmented into frames and transformed to the DCT domain and for each frame, several middle-frequency DCT coefficients are composed as a vector to be substituted.

B. Charmchamras [2008] et al. [12] has presented the technology of embedding image data into the audio signal and additive audio watermarking algorithm based on SNR to determine a scaling parameter. The audio is based on DWT. The intensity of embedded watermarks on the original audio signal is modified by adaptively modulation of the scaling parameter.

According to chen , zhoal ,wang[3] Performance on different types of five different types of 16-bit signed mono audio signals sampled at 44.1 kHz when Watermark was embedded into them. Table I gives the experimental results in terms of SNR (Signal to-Noise Ratio), PSNR (Peak Signal-to-Noise ratio), and BER (Bit Error Rate).

44.1-kHz 16 bit mono audio





Music 1




Music 2




Music 3





In this paper various methods of hiding a text in audio signal has been proposed. Watermarking hiding is based on psyacoustic model. The watermark is robust against audio compression. The security of watermark process cannot be broken unless the authorized user has access to a secret key that controls watermark embedding. Comparison has been done on different audio signals on the basis of Peak Signal to Noise Ratio and Bit Error Rate. Higher Peak Signal to Noise Ratio (PSNR) and lower Bit Error Rate (BER) itself are proving that the proposed technique for audio Watermarking is good one.