This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Digital Watermarking today is one of the main concerns for professionals that deal with audio recording, image creation and other multimedia aspects. It's applications in the fields of copyright; data hiding, encryption and intellectual property protection are of great importance to producers everywhere. Some other applications that may be considered are the addition of meta-data such as lyrics, composer information, album information, etc. into an audio file without any loss in quality or change in perceptibility to the human ear.
This area is under active research in industry and research institutes. Every producer of intellectual property feels the need for digital watermark protection worldwide. Also, with the widespread use of the Internet and ubiquity of file sharing, the need for digital watermarking has gained paramount importance.
Online music downloads such as Apple's iTunes use complicated content management systems often crippling users freedom to use their files on different platforms. With digital watermarking, such restrictions can be removed and hence users would be free to keep their files on any device of their choosing.
Due to high bandwidth connections being available at home, the movie industry is also interested in distributing online. However, the major challenges that are presented are protection of their intellectual property rights. This is another subject that can be addressed with an extension of digital audio watermarking into the visual domain.
Most watermarking schemes have had limited success as they are always under constant attack, decryption and removal. Lossy compression of digital audio often results in the corruption of watermarks and embedded signatures. There is a very small margin where frequencies can pass through a perceptual encoder and yet be inaudible to the human ear.
The premise is to come up with an algorithm such that MP3 or WMA codec's will not discard watermarks as humanly imperceptible audio and these signals will yet remain inaudible.
In this chapter appropriate background literature is surveyed and the concept of hiding information in audio sequences is explained. There are a number of scientific publications listed that will help in the formulation of the problem listed in Chapter 1 and the solving of the sub-problems it presents.
Watermarks have been hidden in images before by modifying the LSB of the image. The image loses a few pixels in the process however the difference is not perceptible to the human eye. 
Interleaving patient data with medical images has also showed bio Medical applications. The change in the images is not noticeable to the human eye as the LSB replacement causes only a change of 1 in 256 levels of brightness of the pixel and hence does not affect the visual perception of the image. 
Present watermarking techniques are not reversible. This is because the watermarking causes some distortion in the image or audio signal. In most applications, the distortion after extraction should be zero.
The watermarking techniques, which are presently used, are not reversible, as there is a small amount of distortion in the watermarked image. In most applications, the distortions after extracting the watermark should be zero. Data-hiding techniques such as spread spectrum are not reversible because of truncation and round off errors . LSB insertion method is also not reversible.
Many reversible watermarking techniques have been published recently. The first reversible watermarking technique was in spatial domain technique .
The human audio perception is based on band analysis in the inner parts of the ear. Here, a frequency to location transform takes place. The power spectra of received sounds are represented on limited frequency bands called critical bands . The auditory system is similar to a cluster of bandpass filters with strongly overlapping filters with bandwidth around 100 Hz for bands at low frequencies and up to 5000 Hz for bands at high frequencies.
Auditory perception is based on the critical band analysis in the inner ear where a frequency-to-location transformation takes place along the basilar membrane. The power spectra of the received sounds are not represented on a linear frequency scale but on lim- ited frequency bands called critical bands . The auditory system is usually modeled as a bandpass filterbank, consisting of strongly overlapping bandpass filters with band- widths around
Same as for image data, the earliest approach for audio watermarking was done with (as well as other media types [20, 21, 22]) LSB coding [23, 24]. A basic technique in the case of the audio samples is to embed watermark data by alternation of each sample of the digital audio stream having the amplitude resolution of 16 bits per sample.
In a number of the developed algorithms [31, 32] the watermark insertion and extraction is carried out using spread-spectrum (SS) technique in combination with a few other techniques. SS sequence can be added to the host audio samples in time domain , and even to the FFT coefficients .
When the embedding takes place in a transform domain it is usually located in an area that is not affected by common attacks such as dynamic range compression, re-sampling, low-pass filtering or other processing techniques.
In the first section the basic techniques of digital audio encoding are reviewed. The technique of LSB replacement watermarking for digital audio is then looked into with practical implementations. Once the limitations for this kind of approach are identified it becomes important to look at watermarking techniques in the transform domain. Thus follows an algorithm for watermarking in the frequency domain.
Initially, audio watermarking was explored as a part of Digital Signal Processing. The focus was mainly on inserting convenient information into the parent audio signal. Only recently has watermarking developed a much stronger theoretical foundation and has become a more mature discipline in itself. 
Digital Audio Formats:
Microsoft and IBM developed the WAV format also known as the Waveform Audio File Format in 1991 for storing an audio stream. The WAV format commonly holds information in the Linear Pulse Code Modulated format. (LPCM). LPCM commonly stores information by storing an audio wave by noting its corresponding amplitude at fixed intervals of time.
The most common WAV files use 44.1Khz 16-bit 2-channel audio. WAV audio is lossless and thus is used widely for high quality music storage and archival. However, due to the large size of WAV files and the rising popularity of Internet file sharing, it's uncompressed nature has led to it's decline.
WAV files usually have a 44-byte header and store data in Intel's Little-Endian byte order.
MP3 or MPEG-1 Audio Layer 3 is one of the most popular file formats in the world today. With a plethora of MP3 players it is definitely the king as far as file formats are concerned.
MP3 being a compressed file format was mostly popularized by file sharing on the Internet. This led to rampant piracy and hence the need for audio watermarking in copyright applications.
MP3 works on the principle of compression by neglecting data that it considers aurally imperceptible. Perceptual encoding comes into play where the codec determines if a sample of audio is within the auditory resolution of human beings and discards data if it's not. Some people can hence tell that an MP3 sounds 'fuzzy' at times due to the compression in dynamic range. However, compression gains can be as much as 12 times of the original WAV file-size and hence this has led to the popularity of MP3 as a universal audio file format.
3.3 Other Formats:
There are a number of other audio formats that have been developed by various agencies. Some of them are:
Monkeys Audio (APE)
Windows Media Audio (WMA)
And many more..
However, for the sake of our research we will focus on the most popular formats (WAV and MP3) as these will broadly cover the entire subject of audio watermarking.
4.1 Types of Watermarking
Digital Watermarking is the process of adding data [either visual or audio or hidden] that is difficult for a user to remove. If the user tries to copy the signal, the watermark is also copied. Effectively, this watermark is a part of the original signal and subject to digital replication.
There are two types of watermarking:
4.1.1 Visible Watermarking
In visible watermarking (usually used for copyright protection) the intention is to make the user aware of the producer of the work. Visible watermarking also applies to audio watermarking where a message may be relayed prior to the actual broadcast of the audio signal.
Typically, visible watermarking is used for brand identification and protection. An example of this are the logos that appear at the bottom of the screen during a television broadcast identifying the channel.
4.1.2 Invisible (Hidden) Watermarking
Invisible Watermarking shall be the main subject of our research. In invisible watermarking, data may be added to an audio, video or image file and it should remain imperceptible. For a good digital audio watermark, the data should be completely undetectable via traditional means.
The main application of this is to track down copyright violations and prevent piracy of digital media. Some digital watermarks may also be used in Steganography, i.e. the transmission of a secret message hidden within another medium.
Digital Watermarks have been used to track down sources of movie piracy. A watermark is embedded into the movie at every point of distr. For example: a movie theater gets a copy with a customized watermark. If piracy does take place, the copyright owners can track down the point of origination and prosecute the people responsible.
The main distinction between annotation and digital watermarking is that in the latter case, the information is carried in the signal itself.
Basic applications for invisible digital watermarking are:
Copyright Infringement Prevention
Origin of infringement tracking
4.2 Watermarking Process
Fig. 1 The Watermarking Cycle
There are basically three phases to a watermarking cycle.
Embedding is the process of inserting the watermark into a Host Signal. The original unmodified signal is usually referred to as the Host Signal.
An algorithm is used to insert data (text or audio) into the Host Signal. Once the watermarked file is created it is ready for distribution.
Attack is the process in which pirates will try to copy the signal without the digital watermark. They will use a number of processes from lossy encoding to watermark detection and removal.
A good watermark will survive many phases of attack and be reproducible at the end as well. A softer watermark will not be detectible after one or more phases of attack.
After the attack phase, copyright agencies can try and detect the source of piracy by trying to detect the watermark. Usually this is done using a detection algorithm that will extract the watermark from the file. Robust watermarking can survive many phases of attack and still reproduce an accurate watermark in the detection phase.
4.3 Classification of Watermarks
Robust digital watermarks that are imperceptible have been quite challenging to create. In-fact, watermarks are classified according to:
Strength or robustness is a hallmark of watermark creation. Fragile watermarks can only survive one or two transformations before becoming undetectable. Semi-fragile watermarks can survive digital transformations but cannot enforce any error compensation. Robust watermarks will be able to survive a number of transformations and still be extractable in the end.
A watermark is called imperceptible if it cannot be detected in the original signal while processing the signal. For e.g. A video watermark that can't be viewed by the naked eye while viewing a watermarked video is imperceptible.
A perceptible watermark is that which while being non-intrusive is still visible to the viewer. For e.g. An image watermark that is in the background of an image but does not take away from the content of the image.
4.3.3 Watermark Length
The watermark length determines the two main classes of schemes:
The watermark message is zero-bit long and thus the system is designed to detect the absence of it.
The watermark message is n-bits in length and thus is modulated in the watermark stream.
4.3.4 Embedding Technique
There are 3 main embedding techniques that are usually employed:
Spread Spectrum - In this technique a watermark is inserted by additive modification. This has a modest amount of strength but has the disadvantage of causing interference.
Quantization - In this technique the watermarked signal is obtained using Quantization. Basically, watermarks are inserted by modifying the LSB or least significant bits in a binary file.
Amplitude Modulation - Similar to spread spectrum but concerns mostly the spatial presence of a signal. Adding low amplitude data to high amplitude areas and so on.
Digital audio is based on using binary data for audio reproduction. The techniques employed in digital audio include A-D conversion, D-A conversion, storage and transmission.
Digital systems are both discrete time and discrete-amplitude systems. Hence error correction can be performed on these signals and in case of signal degradation re-constitution can occur. This is in direct contrast to analog signals where signals after degradation cannot be recovered.
Digital audio encoding starts when an analog audio signal is first sampled, and then (incase of pulse-code modulation) converted into binary signals-'on/off' pulses that are stored as binary electronic, magnetic, or optical signals, rather than as continuous analog signals.
The signal might then be encoded further to combat any errors that might occur in the storage or transmission of the signal.
The discrete time & level of the wave allows decoding software to recreate the analog signal upon playback. An example of a channel code is Eight to Fourteen Bit Modulation commonly used in audio CDs.
5.1 Pulse Code Modulation (PCM)
Pulse Code modulation (PCM) is a digital encode of an analog wave where the amplitude of the wave is sampled regularly at discrete time intervals, and then converted into binary code.
PCM is the most common form of digital audio and has been used in telephone systems ever since the 1980s. PCM samples have also been used in electronic keyboards (Casio's) that were a favorite among children in the 1980s and 1990s.
In electrical communications, the earliest recorded sampling of a wave was to be able to put several telegraph messages together and convey them over a single cable. Telegraph time-division multiplexing (TDM) was done as early as 1853. Then in 1903 an electro-mechanical commutator was used for TDM of multiple telegraph signals, and also applied this technology to telephony. Intelligible speech was obtained from channels sampled at a rate above 3800-4400 Hz: below this was not retrievable. This was TDM, but in the form of pulse amplitude modulation.
In 1926 an opto-mechanical analog converter was used to transmit 5-bit PCM. Then, in 1937 a British engineer created PCM with no knowledge of prior art while working in France.
The first known transmission of a digital vocal signal was done using the SIGSALY platform in 1943. This signal was encoded using PCM.
5.1.2 Modulation - PCM
Fig. 2 PCM Audio Encoding (Analog to Digital)
As shown in the figure, a sine wave is sampled and then discrete quantities are assigned to it for PCM. The sine wave is sampled at regular time intervals which are shown on the x-axis. For each time interval the corresponding amplitude is chosen on the y-axis based on the floor function. (In the case of PCM)
For the sine signal representation above, we can verify that the quantized values at the sampling moments are 7, 11, 13, 14, 15, 15, etc. These values can be encoded to binary numbers, we'll get the following 4-bit nibbles for them: 0111, 1011, etc.
Once this is done, a DSP or CPU can process these nibbles and apply certain corrections or introduce enhancements. Several signals may also be mixed into a larger stream. This enables transfer of multiple signals over a single link. Usually Time Division Multiplexing is the most common technique used since this requires lower bandwidth. However, another method called Frequency Division Multiplexing is also popular.
5.1.3 Demodulation - PCM
Demodulation is the process of decoding the wave from the original signal. Performing the modulation process in reverse does this.
When each sampling period has finished, the next section of data is read and the signal gets a new value. Since the wave upon decoding has edges and corners [depending on the resolution of the modulation], the output signal must be anti-aliased for smoothing. This is done to suppress energy outside the expected frequency range. For this, we calculate the Nyquist Frequency using:
Nyquist Frequency = Â½ f
Where f is the sampling frequency.
Fig. 3 Nyquist Sampling Theorem
This is because during the process of sampling, the Sampling Theorem applies which states that:
If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart.
Which basically means that if a wave of limited bandwidth is sampled at a frequency twice that of the highest frequency in the wave then it can be accurately reconstructed from an infinite sequence of samples.
For example WAV files generally have a sampling frequency of 44100 Hz. The Nyquist frequency is therefore 22050 Hz, which is an upper limit on the greatest frequency the data can have. If the chosen anti-aliasing filter (a low-pass filter in this case) has a transition band of 1000 Hz, then the cut-off frequency should be no higher than 21050 Hz to deliver a signal with negligible power at frequencies of 22050 Hz and greater.
A digital filter may be used to remove some aliasing. The higher the sample rate used, the less aliasing occurs and hence the reproduced signal is far more accurate.
The sampling theorem suggests that real-world PCM devices, when given with a sample rate that is sufficiently greater than the incoming signal (greater than 2B) will be able to operate without any distortion due to aliasing.
DACs are used to get an accurate analog signal from the digitally encoded wave. They operate similarly to ADCs. Their output is usually in the form of a volate or current that is then routed to an analog audio device for filtering and amplification.
PCM with linear quantization is commonly known as LPCM. This is the kind of PCM used in the WAV file format.
Audio Watermarking Techniques:
Audio Watermarking techniques are under constant research and evolution. The music industry today wants to protect artists and it's own interests and therefore is investing heavily in audio watermark research.
Audio watermarking is the addition of a signal to an audio file without a perceptible change in the audio output.
The entire premise is to hide a signature signal within the original audio file that is protected from attack, detection and subsequent removal.
There are a number of techniques that can be used to watermark audio files today.
6.1 LSB Replacement
LSB replacement was one of the first techniques explored in watermarking of digital audio or digital image files.
This is achieved by altering the least significant bits of samples in the original audio signal. The LSB replacement technique doesn't use any psycho-acoustic models to ensure that only imperceptible frequencies are removed. Hence this replacement results in the introduction of white noise. However, the white noise perceptibility depends on the bit rate of the watermark signal that has been added.
The advantage of this technique is that the watermark can have an extremely high channel capacity. The addition of a single watermarking bit in each sample of the signal gives a channel capacity of 44.1kbps.
The disadvantage of this method is that it is not robust against signal modification and doesn't survive digital to analog to digital conversions easily. Random changes in the LSB can easily destroy or corrupt the watermark.
There have been some recent advances in LSB Replacement encoding that have resulted in psycho-acoustic shaping so as to make the watermark signal noise even less perceptible to human beings.
6.2 Phase Modulation & Coding
Phase coding is a more advanced watermarking technique. It relies on the fact that humans cannot easily detect changes in audio phase. Hence watermark data can be stored in phase changes in the audio file.
The basic phase coding technique involves splitting the audio signal into blocks and then insertion of the watermark into the phase spectrum of the first block. The obvious limitation of this type of approach is that since the watermark is only present in the first block of the audio signal, it can be easily removed by simply cutting off the watermarked portion of the file. Another major drawback is that the size of the watermark signal has to be limited so as to fit within the first block of the audio file.
6.3 Echo Addition
Echo adding is an advanced technique for watermarking. It relies on the principle that the human auditory system cannot perceive a low amplitude signal after a high amplitude signal. In simple words, the human ear cannot hear a soft sound after a loud sound and attributes it to extra resonance.
Fig. 4 Echo Encoding
The important parameters for this are the signal amplitude, the decay rate and offset of the watermark signal with respect to the parent signal. Increasing the strength of this kind of scheme against attacks requires the addition of many echoes in different parts of the audio signal. This sometimes results in increased SNR for the final signal.
There have been many modifications to this scheme to make it more robust against third-party attacks and this technique is under constant R&D.
6.4 Spread Spectrum
The spread spectrum modulation technique relies on the masking effects of the human ear. The technique relies on addition of watermark data to the signal in the time domain or FFT co-efficient. The benefit of addition in the transform domain is that the watermark can then be robust against multiple attacks such as compression, format conversion, low pass filtering, re-sampling and other common techniques. This can be achieved by only modifying those Fourier coefficients that don't change in these processes.
In most modern watermarking algorithms, this is the most common scheme employed. The change in each co-efficient can be made small by spreading the changes over a large number of co-efficients.
Fig. 5 Model for Spread Spectrum Watermarking
There are many other techniques that use improvements over the original spread spectrum technique. This scheme of watermarking is also under constant research to find better ways to make it more robust against attack and removal.
In the LSB replacement technique of audio watermarking the least significant bit is replaced in each audio sample. This results in data being embedded into the inaudible frequencies of the sample.
LSB coding provides a high channel capacity and thus a lot of data can be embedded into the signal. This technique can be used for audio, video and image files as well. Hence the applications of LSB replacement are many.
LSB replacement has to be done on a file in binary mode. Due to this requirement any programming language that supports binary file modifications can be used.
This implementation can be successfully carried out on WAV files in C/C++/PHP/JAVA/MATLAB/etc.
The benefit of using MATLAB for this procedure is that MATLAB has ready-made functions for reading, writing and modifying WAV files. This means that header areas for the file are automatically identified and write operations take place in the DATA track of the file. The structure of the PCM WAVE file is important in this regard if another programming language is to be used.
7.2 PCM WAV File Structure
The headers for WAVE files follow the RIFF structure. The 1st 8 bytes in the file constitute a RIFF header which has an ID "RIFF" and has a size equal to the file minus 8 bytes for the header. They also contain the type of resource called "WAVE".
(file size) - 8
Chnk Data Size
Table 1 The format and values for different chunks
Wave files most of the times contain only two chunks, the format chunk and the data chunk. The format chunk carries information on the type of WAVE file, the encoding used, and information on the samples and sometimes information on the source. The official WAVE file specification doesn't ask for the format chunk to be placed before the data chunk. However, since most programs will look at the beginning of the WAVE file for format information, this is generally a good idea. Another reason to confirm to this layout is because a player needs information about the file before being able to play it so in the case of streaming audio, the player would need the entire file before it starts playback in-case the format chunk is placed after the data chunk.
Chnk Data Size
Chunk Data Bytes
Table 2 Basic Chunk Outline ("fmt" chunk)
Chunk ID & Data Size
This is always "fmt " (0x666D7420). The ID string ends with a space character 0x20.
This is a code based on the wave compression code specification. Contained in the first word of data.
No. of Channels
1 for a mono signal and 2 for a stereo signal.
The number of sample slices per second. This value is unaffected by the number of channels.
Average Bytes Per Second
This is the byte-rate of the file. Useful in determining how to play-back the file and whether streaming conditions can be satisfied.
AvgBytesPerSec = SampleRate * BlockAlign
The number of bytes per sample slice.
BlockAlign = SignificantBitsPerSample / 8 * NumChannels
Significant Bits Per Sample
Specifies the number of for each sample.
Extra Format Bytes
Is set only if extra bytes follow. Depends on the compression code specified. "0" for uncompressed PCM.
Chnk Data Size
16 + extra format bytes
1 - 65,535
Number of Ch.
1 - 65,535
1 - 0xFFFFFFFF
1 - 0xFFFFFFFF
1 - 65,535
2 - 65,535
Extra Format Bytes
0 - 65,535
Extra format bytes
Table 3 Example Header
7.3 Why MATLAB ?
After a detailed look at the WAV file format structure it becomes obvious that the easiest way to follow through with LSB replacement would be to use a programming language that directly supports the WAV format through inbuilt functions.
MATLAB has in-built functions to support these operations.
MATLAB is software developed by The Mathworks inc. and is used for various scientific computation applications around the world. It has a number of WAV file functions that help use it for direct audio watermarking applications amongst other purposes. The WAV functions available in matlab are:
y = wavread(yourfile)
Will read the contents of your file into the variable y.
[m d] = wavfinfo(yourfile)
Will write the contents of your wave into the file.
[m d] = wavfinfo(filename)
Will display information about your WAV file.
7.4 MATLAB on Mac OS X
Matlab on Mac OS X runs through the X Window System. Therefore, it is important to install the X Window System from the OS X Install CD before installing Matlab. Once X11 is intalled, Matlab can be run from OS X like any other application.
Fig. 6 Starting Matlab for Mac OS X
Fig. 7 Select Desktop as the option for starting.
Fig. 8 The MATLAB Window
7.5 Practical Implementation
The first watermarking method that is being explored is Audio Interleaving. This works on the principle that frequencies represented by the least significant bit (LSB) in a WAV file are not audible to the human ear. Hence due to the limited auditory resolution of the human ear, data may be hidden in these bits.
This process can be done in MATLAB and a program has been developed for the same purpose.
The first step is to load the program into MATLAB.
Fig. 9 Program Loaded into MATLAB
Once the program has been loaded and run, it will display the initial WAV file with only 1 in every 50,000 samples displayed.
Fig. 10 Original Wave File (1:50,000 samples)
After displaying the initial WAV file, it will convert all the sample values to integers by multiplication by 10000 and then display the resulting WAV file. This is done to make addition calculations easier.
Fig. 11 WAV file with sample amplitude multiplied 10,000x
Then, the program adds the minimum absolute value of the WAV signal to itself to move it above the origin.
Fig. 12 WAV file with samples moved above origin with min. amplitude added.
Finally, after adding the hidden watermark into the WAV, the program displays the resulting output WAV-form.
Fig. 13 WAV file with hidden data.
The Input and Extracted data is shown in the MATLAB window.
Fig. 14 MATLAB Output (Watermark Extraction)
7.6 Result of LSB Replacement:
The result of running this program is that a message of text data has been successfully hidden within the WAV file using MATLAB. The changes made to the WAV file are imperceptible to the human ear and hence this is a successful audio watermark.
The watermark is done in the time domain as the LSB replacement takes place in consecutive samples. This poses obvious limitations on the robustness of this watermark.
Since the input source chosen is a text file, the data is extracted as text. However, even if a binary file (image, video, another audio file) is chosen it can be inserted into the wave file for watermarking.
After insertion there is some evident white noise. Spreading the watermark over a larger number of samples can reduce this.
7.7 Limitations of LSB Replacement:
The key limitations of this watermarking technique are:
Cannot be applied to compressed files
Is not in the frequency domain and hence is subject to degradation by compression.
Extraction depends on already knowing the watermark length and identifying sequence.
We've covered LSB replacement watermarking in the previous section and one of the major limitations of this scheme is the lack of robustness it offers against attacks.
Since intellectual property protection is the main motivation behind research into digital watermarking techniques, a method with stronger resilience needs to be developed.
Frequency Domain watermarking is a more effective technique for watermarking audio files. This is because the watermark is stored in frequencies that are inaudible to the human ear but are not affected by the perceptual shaping of audio frequencies in various compression formats.
Applying the watermark to an already compressed file by shaping frequencies along those that have not been rejected by the compression codec employed can do this much more easily. For example: applying this technique to an mp3 file would result in a watermark that is robust to re-compression using the mp3 codec.
The software used to access a watermarked file will be equipped with Digital Rights Management technology. When the file is loaded, it will check the watermark against the already loaded license file present on the system. Most modern operating systems readily support this functionality.
A possible video application of watermarking is to have subtitles streamed along-with the video file.
8.1 DC Watermark Insertion
The DC technique hides data in the lower frequency components of the audio signal. These are below the human auditory threshold and are thus imperceptible.
Fig. 15 Watermark Insertion Process
Under this scheme the audio file is divided into frames each of which are 90ms in length. This number is chosen so as to reduce the audible distortion in the file. With 90ms as the frame size, the watermark bit-rate achieved will be:
1/0.09 = 11.1 bps
After the framing a spectral analysis is performed on the signal consisting of a Fast Fourier Transform that lets us calculate the low frequency components of each frame. Also, the total frame power is calculated.
The FFT can be done in MATLAB using the following equation:
N = Last frame in the audio file
Most standard audio CDs have a sample rate of Fs = 44,100 samples/second.
Hence a frame consists of 3969 samples. The frequency resolution for a sample with this value of N:
Â Â Â Â Â Â Â Â
We can now find the low frequency DC component of the frame F(1) as well as the spectral power of the frame. The frame power is the sum of the amplitude spectrum squared.
Fig. 16 Spectral Analysis of first eight frames
After the DC component of the frame F(1) is calculated we can remove it using the following formula:
Watermark Signal Addition:
Now that the spectral power has been calculated, we use it to embed the watermark.
The power for each frame will determine the amplitude of the watermark in that particular frame. The magnitude of the watermark is calculated by the formula:
Over here, Ks is a scaling factor that ensures that the watermark is added below the human auditory ability. W(n) represents the watermark signal data having a value of either 1 or -1.
8.2 DC Watermark Extraction
The extraction process is similar to the insertion process. First, the signal is framed, and then analyzed and finally the watermark carrier is identified and processed. The embedded data in which the watermark exists is then extracted.
Fig. 17 Extraction of the Watermark
Similar to the previous process of insertion, we partition the signal into frames of 90ms each. With this frame resolution, we expect a watermark bit-rate of 11.1bps.
We now perform an FFT to do a spectral analysis and calculate the low frequencies as well as the overall frame power. This is achieved in MATLAB by using the same equations as in the previous section. Like the insertion process, our frame will consist of 3969 samples. [for 16-bit CD Quality sound]
Watermark Signal Extraction:
Once the spectral analysis is completed, we know the spectral power of each frame. This in turn will help us examine the power of low frequencies in each frame. Now we can extract the watermark using the following formula:
N = Last frame in the audio file
The watermark signal w(n) will be an exact replica of the original inserted signal under the following conditions:
The audio file should have sufficient power/frame to embed information below the human audible threshold.
The audio file should have enough power to embed information above the quantization floor.
Even if these conditions aren't met, it is possible to perform some mathematical manipulations and adjust the quantization floor however this will result in the introduction of some Gaussian noise in the sound signal.
8.3 Practical Implementation
MATLAB provides the following functions for a practical implementation of watermarking for MP3 files.
y = mp3read(yourfile)
This function, courtesy Alfredo Fernandez and the LAME MP3 encoder allows MATLAB to directly read an mp3 file (with headers) into a variable.
y = mp3write(variable, parameters)
This function, courtesy Alfredo Fernandez and the LAME MP3 encoder allows MATLAB to directly read an mp3 file (with headers) into a variable.
The practical implementation of mp3 watermarking needs to be done in the frequency domain as explained above since it must be robust against compression from the mp3 codec.
First, the program opens a .WAV file and checks it's SNR. Once the file is watermarked, the file is converted to an MP3 file.
Fig. 18 The WAV file without a watermark
The WAV file is opened by the program and then read. Once it is loaded the watermark is inserted into the file.
Fig. 19 After watermarking
After watermarking, the WAV file looks like this. A comparison shows that the dynamic range of the file is reduced after the watermark.
Fig. 20 After re-sampling, corruption and compression
The program accepts a number of parameters to re-sample, re-quantize, re-compress and basically attack the file in a number of ways. After these operations, the watermark is still detectable as has been shown in the above figure.
The parameters that are set are:
Fig. 21 Parameters for watermark corruption
Hence, the signal after watermarking undergoes re-sampling, noise addition, low pass filtering, re-quantization and mp3 compression. After all these operations are performed, the watermark is still detectable in the modified sound file.
Fig. 22 Program Output
8.4 Result of DC Watermarking:
The initial signal to noise ratio after watermark insertion is calculated to be 25.44.
After the file is re-sampled, re-quantized, filtered and compressed the final SNR comes out to be 3.68.
In spite of this, the algorithm is able to detect the watermark in the mp3 file. This shows the relative robustness of the DC watermarking scheme as compared to the LSB scheme.
8.5 Limitations of DC Watermarking:
The frequency domain watermarking with the DC watermarking scheme has some major limitations. The main ones being challenges in terms of:
Robustness can be improved to a great extent by using longer audio files and inserting the watermark signal multiple times. This also helps in error correction in case the signal is manipulated with the fact that we can compare the extracted watermark(s) with each other and then perform error correction.
Data density can be improved by using more advanced techniques such as spread spectrum, phase encoding or echo hiding.
The best watermarking scheme will use a combination of all of these techniques to insert a strong watermark, which is resilient to multiple attacks.
It was found that the robustness of watermarking in the frequency domain was far greater than time domain watermarking. The watermarking scheme employed in the time domain by LSB modification could only sustain some noise addition before corruption of the watermark signal. However, the frequency domain watermark was able to withstand noise addition, compression, re-sampling, and re-quantization.
Audio watermarking in the digital domain is a field of great interest today. The main applications of this technology lie in intellectual property protection and prevention of copyright infringement. With high-speed Internet connections and file sharing applications, it's now easier than ever before to illegally share music and videos over the Internet.
Another application for audio watermarking is the inclusion of meta-data for audio files. Lyrics, notes, comments and other data can be included as part of the audio stream.
This technology is also of great significance in data hiding, cryptography, and steganography. Organizations worldwide are looking into more ways of hiding their important data to protect it. Embedding critical data as a watermark in an audio file to keep it safe is one option for this.