This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Steganalysis had gained popularity in the field of national security, forensic sciences and cyber crime since detection of hidden information ciphertext or plaintext can lead to the avoidance of serious security incidents. Steganography and Steganalysis are very challenging field in Information Technology domain because of the scarcity of important information about the specific requirements and characteristics of a cover medium which can be used to hide secret messages and detect the same. Steganalysis approaches and techniques are adopted for analyzing hidden messages also depend on the steganography algorithm used to hide messages. In this report, available steganalysis algorithms are reviewed and discussed for the three generally used cover media: Audio, Video and Image. Audio steganalysis algorithms discussed in this report are basically based on general characteristics aspects of audio such as high-order statistics, the distortion measure of the audio signal and etc(X-M. RU, H-J Zhang and X. Huang, 2005). Video steganalysis algorithms refer the temporal and spatial redundancies available in the video signals within inter-frame level and at individual frames (MUKKAMALA, K. Kancherla and S., 2009). Image steganalysis algorithms analyze general characteristic of natural images such as inter-pixel dependencies (K. SULLIVAN, U. Madhow, S. Chandrasekaran and B. S. Manjunath, June 2006). Various algorithms proposed for audio, video and image steganalysis will be explored in terms of 'method', 'algorithm', 'technique' and 'approach' in this chapter and analysis of algorithms will be dome. For the discussion purpose, the term 'stego' is used to refer a media that contains hidden secret information and 'cover' is used to refer a media which is used to hide secret information.
Steganalysis methods and algorithms can be classified in various ways depending upon the strategies of steganography, their objectives, if they are active or passive, carrier formats supported and targeted steganographic methods.
CLASSIFICATION DEPENDING ON THE KNOWN INFORMATION: Steganalysis techniques are following the same criteria as cryptography techniques are used to classify information known or available to the cryptanalyst in terms of plain messages and ciphered messages. Steganalysis approaches can be classified depending upon information available to the steganalysis (VICO, JESUS DIAZ, SEPTEMBER,2010). There are certain differences in steganalysis approach with respect to cryptographist approach:
Stego only attack: Steganalyst only knows the stego-object.
Known Cover attack: Steganalyst is only aware the about final stego-object and the original carrier.
Known message attack: In this attack, Steganalyst has the stego-object and the hidden message at his disposal. The complexity of this steganalysis attack is quite similar to a stego-only attack except the awareness of the hidden message.
Chosen stego attack: In this case, Steganalyst has detailed of the steganographic algorithm and the final stego-object used.
Chosen message attack: Steganalyst creates a stego-object using a message chosen by himself(K. SULLIVAN, U. Madhow, S. Chandrasekaran and B. S. Manjunath, June 2006)
Known stego attack: In this attack, the final stego-object and the original carrier are known to attacker.
CLASSIFICATION DEPEDING UPON THE DESIRED GOAL: Attacker has various intentions regarding attack on the hidden information, though the main goal of steganography is to transfer information which should be completely passed unperceived. Attacker wishes to analyze the stego-object in order to determine whether it carries secret hidden information or not. Sometimes, he wishes to recover the stego-key and the secret information both; and sometimes he wished to modify the hidden secret information. The first explained two cases are the passive steganalysis attacks and the last case is known as active steganalysis attack.
UNIVERSAL STEGANALYSIS METHODS:
There is no specific steganalysis method that works for every steganalysis method without false negatives and false positives. Some of them are directly applied for specific situation and others techniques are needed to be modifying to achieve specific goal. Steganalysis(SUBBALAKSHMI, Rajarathnam Chandramouli and K. P., 2004) methods have been classified based on their strategies used for different groups:
Supervised Learning Based Steganalysis: The main objective of this steganalysis method is applied to differentiate between objects which carries hidden message from objects. They can be generally based on supervised automatic learning methods such as decision trees, neural networks, etc. With the help of adequate training set, we can achieve excellent universal classifiers suitable for every steganographic methods. Main advantage of this method is, they do not assume for any statistical property of the stego-objects. The disadvantage of this method is, it uses different large training sets depending up on the type of steganographic algorithm used to detect steganalysis attacks. Along with this, it is difficult to extract the most determinant characteristics of possible stego-object classification because for many learning methods, it depends on the steganalyst and his experience in this field. False negatives rates and false positive rates are not controlled directly by the steganalysis.
Blind Identification Steganalysis
This method is based on statistical properties of the carrier and does not make any assumption of the steganography algorithm used. This type of analysis may give more precise results because they do not based upon the measures retrieved by means of analysis of a subset given for specific objects. They give extraction of hidden information with hidden information detection. In paper (R. CHANDRAMOULI, 2003)author has proposed and analyzed a framework for linear hiding algorithms for images which reduces the problem of an estimation of the inverse of the transformation matrix used during the hiding process. Same stego-key has been used for two different stego-objects to maintain the maximum range, transformed coefficients, subliminal bits and Gaussian distribution of the carrier image. If all the conditions are satisfied then estimation of the subliminal message is produced.
Parametric Statistical Detection Steganalysis
In this type of detection steganalysis, assumptions should be made that the information is available to the attacker and the attacker has probabilities methods at his end to estimate the unknown operations as well as some parameters is deducted to analyze the steganographic algorithms. This method is not limited to the specific steganalysis algorithms and it can able to determine and deduce a maximum error rate. This method can also be implemented for detection of subliminal information presence and to recover the stego-key or the hidden message. One important point for analysis is, the estimation of the required parameters for steganalysis can determines the effectiveness of the technique.
The name of this message itself suggests the functionality of this method. This approach is the combination of above explained approaches before. Depending on the specific situation and steganalysis, proper combination can be made for it. Advantages and disadvantages of this approach are surely depends on the combination made.
This method tries to identify the suspicious parts of an image. This is also equivalent in audio would be auditory steganalysis. The process of steganalysis can be done without prior modification of the audio or image under analysis. In this case, attacker has very little chance of successful against any specific steganography technique. Processing the stego-object and visual steganalysis over generated image gives surprising improvements in results. In ( Andreas Westfeld and AndreasP¬TZMANN, 200) paper various applications are analyzed using this method. The authors have proved one assumption wrong that the lease significant bit of any image pixel follows a random distribution. They have claimed mentioned assumption to be false and develop visual steganalysis method. To prove one former assumption, we should accept that there would not be getting any perceivable difference after substituting the LSBs using random bits and focusing our attention to the LSB level.
SPECIFIC STEGANALYSIS METHODS
With the correspondence of steganalysis methods, anyone can refer to any steganalysis methods that can be applied over a large family of steganography applications, for example, methods use sample principle to hide information. One can refer and use to steganalytic methods that may be suitable for specific steganography algorithm.
This attack can be found to images which are using LSB type subject to steganographic algorithms. The pairs of values to be modified are the actually pixel values that may differ only in their LSB, conforming and group of values pairs. When modification has been done on the LSB bit according to the subliminal message then a pixel belongs to pi will remain on the same position after the flipping. In this way, subliminal message bits can follow uniform distribution after hiding information and one can expect equal values for each pairs of value elements. If another distribution is selected to perform, then the expected frequencies will be modified according to selected distribution. Using Chi-square test, anyone can get statistical evidence for pointing out the existence of subliminal data. If the message is selected sequentially then p-value will be drastically reduced. Chi-square steganalysis is also known as PoV steganalysis. There are some methods available to avoid chi-square attack by keeping statistics of first order carrier image. The color histogram of any carrier image keeps similar probability distribution so carrier image looks like original image. No one can get an idea about the hidden secret message inside it. This type of method is known as Preserved Statistical Properties method. It extends classic LSB methods in two ways. First, process the image areas which contain statistical properties of hidden information. Second, alter the subliminal message bits to get the same probability distribution than substituted LSB bits. The authors (Rainer Bohme and Andreas WESTFELD, 2005) have studied these methods and developing the attack which may contain a structure as an evolution process of the explained technique based under first order statistics.
Methods based on figure print detection
These methods are based on the identification of the identifiable patterns which are meant to be known to produce using specific steganographic methods. To explain this method, it is better to take one example that is create GIF images of 256 entries in the color indexes and present the detectable pattern with 128 unique colors using two entries per color in the image palette. This steganographic attack requires rigorous depth knowledge and understanding of each steganography method. Some methods are more considered as a forensic than steganographic. These are the applications where we can trace the known steganographic applications installed into the system. Use of stored hash values with their executables is popular in this type of method and searching can be done in memory, hard drive and windows registry to find that there is something to the given application was installed.
Methods based on transformation function properties
This type of method exposed the possibility or probability of identifying the existence of subliminal data via analyzing the compatibility of the coefficient/pixels values of the image using the methods retrieved for space-frequency transformation.
Image steganalysis algorithms are basically of two types: Generic and Specific. The Generic image steganalysis approach focuses on a class of image steganalysis methods which are independent of the steganography algorithm used to hide information and produces better results for identifying the existence of secret information hiding using unconventional steganography and/or new steganography algorithms. The specific approach focuses on the class of image steganographic methods which are highly depend on the steganography algorithm used and success rate of detecting the existence of secret information; if the message is hidden with the help of the algorithm for which methods are meant for. The image steganalysis techniques under both the generic and specific categories are designed to detect the existence of secret information and decoding of that information is considered as a complementary service not mandatory.
Specific Image Steganalysis Algorithms
Image steganography algorithms are mostly based on an embedding mechanism which is widely known as LSB (Least Significant Bit) embedding. In an image, every pixel is given as 24-bitmap value containing three bits representing the R, G and B component colors red, green and blue respectively. It is based on the principle that as much higher is the value of pixel such larger is the intensity .At a point, a pixel p given as FE FE FE16 is the summation of all these three primary colors having their at most intensity and thus the color represented is "white". This type of embedding system restricts the fact that changing LSB of each three bytes of a pixel would only create a little change in the intensity of the color given by a pixel. Thus this difference in the intensity is not appreciable to human eye. As such example we can take by changing the values of colors of pixel p to FE FE FE16, this will make the color dark by the factor of 1/256. LSB embedding based steganographic algorithms differ through the type of modification - a modification of randomly chosen pixels for modification restricted to pixels situated in various areas of the particular image. Various formats are used for the representation of an image. Most commonly we use three formats such as Joint Photographic Exchange Group (JPEG), Bit Map (BMP) and Graphics Interchange Format (GIF). The very special property of these formats is that every images of the particular format behaves individually different according to their formats when any notification is embedded in it. There exist different image steganalysis algorithms supporting the format of each of three images.
Palette Image Steganalysis
This type of steganalysis is generally focused for GIF images. Eight bits per pixel can be supported using GIF format and the color of the pixel is pointed from 256 distinct colors stored in a palette table which can be mapped to the RGB 24-bit color space. Generally, LSB embedding of a GIF images will format and change the 24-bit RGB color value of a pixel and it may bring a little change in the palette color of a pixel from given 25 distinct colors stored in palette table. The strength of any steganographic can be measured using reduction in the probability of a color value change and minimize the distortion of embedding secret image. The statistical steganalysis can be performed for GIF stego image via analyze the palette table of an image and detect a change where there is a visible increment in entropy. If the length of an embedding message is maximum, then change in entropy is always maximum (J. FRIDRICH, M. Goljan, D. Hogea and D. Soukal, 2003).
Raw Image Steganalysis
This technique is generally applied for BMP images that can be characterized using a lossless LSB plane. Least significant Bit embedding technique on BMP images causes the swapping of the given two gray-scale values. The hidden message can be embedded by taking an average of the frequency using two gray-scale values. Authors (A. PFITZMANN, A. Westfeld, 1999) has explained this situation with some practical example, if a raw image has 40 pixels of one gray-scale value and other 20 pixels of another gray-scale value, then after applying LSB embedding, expected count of two gray-scale values is around 30. Proposed approach is based on the assumption that the embedding message length must be comparable with pixel count of the cover image and the location of the embedding message must be known for smaller messages. In the statistical analysis of an image when we make changes in the membership of the pixels in the stego image, for a given message the detection of the size of the embedded message is enlarged.
Author (J. FRIDRICH, M.Long, 2000) has proposed one method of steganalysis technique which observes color bitmap images embedding using LSB technique and always provides fair detection rates for small hidden messages. Proposed technique uses main property of an image that number of distinct unique colors for very large high quality BM image is exactly half pixels in the image.
JPEG Image Steganalysis
This is one of the popular and best secret cover image format used in steganalysis and steganography. There are two steganography algorithms are available which hide secret messages in JPEG images: Outguess algorithm and F5 algorithm. The F5 algorithm mainly uses matrix for embedding secret message inside it with the help of bits in the DCT coefficients. DCT coefficients are used to minimize the number of changes in the hidden secret message. Sometimes, F5 mutates the DCT coefficients. The authors (J. FRIDRICH, M. Goljan, D. Hogea and D. Soukal, 2003) have proposed one method for capturing the unaltered histogram of DCT coefficients in order to find length of secret message and the number of changes. This process of hiding message involves dividing the JPEG image in four columns and after that apply quantization table to decompress the image. Final DCT coefficient histogram will always be close to the original. Authors (J. FRIDRICH, M. Goljan, D. Hogea and D. Soukal, 2003) have also proposed one other technique for Outguess embedding algorithm. This algorithm makes a random choice and hides message using bits in the LSB and some of the bits of DCT coefficients. Other unused DCT coefficients are adjusted to manage and keep the histogram intact. F5 algorithm is useful in calculating the estimation of the original image and it is also helpful for Outguess algorithm. The process of concealing message into a cover medium introduces noise in the DCT coefficients and generates spatial discontinuities in the JPEG 8*8 image blocks. Sometimes, changes will be made to the LSB of DCT coefficients as partial cancellation changes. When another secret message is concealed into a stego image, then the increase into discontinuities behaves to be smaller. Decrease or increase in discontinuities is used to gauge the size of the secret message.
Generic Image Steganalysis Algorithms
The generic image steganalysis algorithms are widely known as Blind Steganalysis or Universal algorithms work well on all unknown and known steganography algorithms. These types of image steganalysis algorithms exploit certain innate features' changes of an image which are always monotonic and statistical changes as a result of message concealing. Main aim of developing generic steganalysis algorithms is precisely identify and distinguish changes in image. The accuracy of the prediction for embedding message in an image is highly depends on the selection of the right image features which must not varies across different varieties of images.
The authors (I. AVCIBAS, N. Memon and B. Sankur, 2003) have proposed a set of IQMs (Image Quality Metrics) to develop a discriminator algorithm which differentiates stego images from cover images. The authors have used these IQMs as one kind of steganalysis tool rather than indicator of algorithm performance and image quality. The Analysis of Variance (ANOVA) statistical test gives rank to the IQMs with the help of their F-scores and tries to identify the concealing of the message. The achievement of a success lies in the identification of Image Quality Metrics that are sensitive to steganography and final result of embedding message can be measured well. Various IQMs are generally used to measure the distortions located at different levels of sensitivity. The message concealing image steganography algorithms differ in the changes that anyone can bring to the various IQMs. The authors (I. AVCIBAS, N. Memon and B.Sunkur, 2002) have proposed other steganalysis technique which analyzes every eight and seventh bit of an image plane and measures binary similarity between them. This technique measures the correlation which affects the result of embedding message. It has been proven that the correlation between contiguous bit planes of an image will be decrease while message embedding.
Very few video steganalysis techniques are available due to low performance result. Application of image steganography methods can be applicable to video sequences on a frame-by-frame basis.
Video Steganalysis via measuring temporal correlation between two frames
The author (U. BUDIA, D. Kundur and T. Zourntos, 2006) has proposed one method for video steganalysis using the redundant information exist in the temporal domain of a video as a deterrent against hidden messages concealed by spread spectrum steganography. Using linear collusion approaches, anyone can successfully identify hidden watermark of low energy with better precision. The results derived from successful experiments also prove that the superiority of this temporal correlation based methods over spatial-based methods in secret message detection process.
Below figure shows the video steganalysis and steganography system used in (U. BUDIA, D. Kundur and T. Zourntos, 2006). To start the process off, the sender conceals a hidden secret message vector into the cover medium video sequence in order to produce a stego sequences which seems identical to the original cover video sequences. The secret message bits can be hidden via concealing them into the cover video sequence with modulating secret bits into signal known as the Watermarking.
Then these stego video sequences are communicated from Internet to the receiver. By utilizing the secret key and the particular stego video, the hidden message can be extracted by the receiver. There may be chances of interruption by a vigilant steganalyst while the message is transferred to the receiver. The presence of hidden secret information in a cover medium can be identified using detection of the watermark. Even though, the sender inserts watermark into a non-spatial domain like DCT, it will be defined over the same spatial domain as a cover medium.
The authors (U. BUDIA, D. Kundur and T. Zourntos, 2006) have measured the importance of temporal correlations in video steganalysis. They have created one classic framework which is based on the Gaussian Spread Spectrum.
There are basically two types of essential blocks in video steganography: (i) A pattern recognition stage can be used to detect steganographic activity and (ii) A Watermarking attack stage can be used to estimate the cover medium from all possibilities if watermarked stego media. Various algorithms are being used to replace for each of the image blocks to produce better steganalysis methods for different kind of applications. The well-known block-based approach offers the use of the current advanced methods and algorithms for Pattern Recognition and Watermark attacks.
The authors (U. BUDIA, D. Kundur and T. Zourntos, 2006) have developed an algorithm of steganalysis which takes benefits of temporal redundancy available in video. This proposed method offers improvement in the overall performance compared to spatial methods which can be operated on frame-by-frame block. Because of suitability and low complexity for real-time video, simple linear collusion will be observed as one of the advantages. The statistical framework also displays how statistical redundancy available in the cover video may be benefited in hidden watermarks detection. Large inter-frame correlation gives high collusion performance. It is also proved that the detection rate of steganalytic process increases when the embedding watermark strength shoots up and it also implies that chances of robustness watermark detection increases at the same time. Moreover, low embedding strength will result in vulnerable watermark attack and easy removal of it. Hence, it is suggested to keep moderate watermark value which gives high embedding strength.
Video Steganalysis based on ARE (Asymptotic Relative Efficiency)
Video steganalysis algorithm has been proposed by (J. S. JAINSKY, D. Kundur and D. R. Halverson, 2007) which combines asymptotic based detection. This proposed detection algorithmic technique is better suited for applications in which all subset of video frames will be watermarked with the hidden message. The stego video signal obeys Gauss-Markov temporal based correlation model and that must be assumed for stego video signal to maintain consistency in a sequence of related image frames. Video steganalysis is a two step process: first step is signal processing step and second step is detection phase. Signal processing step generally focuses on the existence of hidden message in the sequence of frames with the help of a motion estimation scheme. Detection phase mainly focuses on the ARE (Asymptotic Relative Efficiency), wherein both the watermarked secret hidden message and the cover video are considered as random variable values. ARE based detection is memory less and uses an adaptive dynamic threshold values for the video characteristics which are very useful in differentiation process of a stego-video and a cover-video. The video characteristics such as standard deviation, size and correlation coefficient can vary from one sequence frame to another sequence of frames. The number of frames in any sequence will be analyzed at each pass using detector as a parameter detection method.
Video Steganalysis based on Mode Detection
Moscow State University stego video software is one of the very few existing video steganographic tools which can be concealed in any Audio Video Interface (AVI) format and concealed messages may be retrieved correctly even after the compression of the stego-videos. This steganalysis algorithm uses similarities and correlation between consecutive frames and detects a mode of distribution across frames. The 32*32 pixel block is the embedding unit and the 16*16 blocks within a unit is used as a distribution pattern. After analysis between correlated frames, if the result of a ratio for 32*32 pixel blocks within a specific mode against the total number of 32*32 pixel blocks is above a threshold value, then it is to be assumed that video signal is carrying hidden secret message within it.
Video Steganalysis based on Temporal and Spatial Prediction
The author has proposed one video steganalysis scheme (V. Pankajakshan and A. T. S., 2007) for the specific MPEG video coding standard. In this type of coding standard, specific frame will be predicated using its neighboring frames with the help of motion compensation. The MPEG video coding standard supports two types of frames: the B-frames and the P-frames. The P-frames use past single frame as a reference frame and the B-frames use a future frame and a past frame as the reference frame.
With the rapid development of communication protocol and media, various audio services offer plenty of opportunities for voice communication such as Voice over Internet Protocol (VoIP) and Peer-to-Peer (P2P). Little modification in binary signal of audio sequences with steganography tools can easily generate covert communication as a reality. Audio sequences have inbuilt unpredictable nature and characteristic redundancy that make them popular to be used as a cover medium for secret communication to hide information.
Audio steganalysis is very hard compared to image and video steganalysis due to obvious nature of audio signals and the existence of available advanced audio steganography schemes. Statistical analysis of audio signal is very challenging because of the necessity of high-capacity data streams (C. Kraetzer and J.Dittmann, 2008). Below are the proposed audio steganalysis algorithms by different researchers.
Phase and Echo Steganalysis
The Authors have proposed audio steganalysis algorithms which can be used to identify echo steganography based in the statistical values of peak frequencies (W. Zeng, H. Ai and R. Hu, 2008) and to detect phase coding steganography which is based on the main analysis of phase discontinuities (W. Zeng, H. Ai and R. Hu, 2008, 2007). The phase steganalysis techniques and algorithms explores the hidden fact that phase coding can corrupts the extrinsic continuation of unwrapped phase in each audio segment and cause major changes in difference of phase. A statistical and mathematical analysis of the phase difference can be used to monitor the train classifiers, any change in audio signal and differentiation in embedded audio signal and clean audio signal.
Universal Steganalysis based on Recorded Speech
The (M. K. JOHNSON, S. Lyu, H. Farid, 2005) authors have proposed steganalysis algorithms which are based on statistical regularities of audio recorded speech. Proposed statistical model combined an audio signal with the help of basic functions located in both frequency and time domains in the basic form of STFT (Short Time Fourier Transform). These collected decompositions of spectrograms are analyzed using non-linear version of support vector machines for differentiation between stego and cover audio signals. This approach can work on only on high-bit rate audio steganalysis and it doesn't work effective for low bit-rate embedding detection.
Use of Statistical Distance Measures for Audio Steganalysis
This technique can be used for audio quality metrics in order to capture the anomalies in the embedded data signal. The author has introduced calculation method for the distribution of different statistical distance measures used for stego audio signals and cover audio signals. Those proposed methods have been observed without voice and they found statistically different in various situations. Designed steganalyzer can able to measure audio quality measures (H. OZER, I. Avcibas, B. Sankur and N. D. Memon, 2003). They are also tested based on their perceptual or non-perceptual nature. Proper selection of quality measures and features may be conducted using ANOVA test in order to determine whether any significant statistically differences are available between conditions or not and Sequential Floating Search (SFS) algorithm considers the inter-feature correlation between test features in ensemble(P. PUDIL, J. Novovicova and J. Kittler, 1994). In ensemble, two classifiers are connected sequentially, in that one classifier was based on linear classifier and another one was based on SVM (support vector machine) and simultaneously evaluated in order to detect stego messages which are embedded in the audio signals.
Audio Steganalysis based on Hausdorff Distance
The author has proposed an algorithm that uses (Y. LIU, K. Chiang, C. Corbett, R. Archibald, B. Mukherjee and D. Ghosal, 2008) the Hausdorff distance as one of the measure for calculating distortion between stego audio signal and a cover audio signal (P. HUTTENLOCHER, G. A. Klanderman and W. J. Rucklidge, 1993). The algorithm takes stego audio signal x as a potential input and its de-noised version x as an approximate estimation of a cover audio signal. Signal x and de-noised version are then converted into appropriate segmentation. After that wavelet decomposition can be operated in order to generate wavelet coefficients (T. HOLOTYAK, J. Fridrich and S. Voloshynovskiy, 2005) with different steps of resolution. Hausdorff distance measure value of wavelet coefficients and their de-noised versions of the audio signals are measured.
Audio Steganalysis for High Complexity Audio Signals
Recently, one (I.AVCIBAS, 2006) author has invented the use of data mining techniques in steganalysis of high complexity audio signals. This approach extracts Markov second order transition probabilities and statistics of high frequency spectrum as main features of audio signal streams. Then after features derived using the variations in the second order derivative are explored in order to distinguish between stego audio signals and the cover. This approach widely uses the Mel-frequency cepstral coefficients which can be useful in speech recognition for audio data streams and audio steganalysis. Recently, two methods of audio steganalysis using spread spectrum in information hiding has been proposed (WEI ZENG, Ruimin Hu and Haojun Ai, 2009).