Acoustic Cryptanalysis Side Channels Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Acoustic cryptanalysis is a side channel attack which exploits sounds, audible or not, produced during a computation or input-output operation. It is one of the methods for extracting information from supposedly secure systems is side-channel attacks: cryptanalytic techniques that rely on information unintentionally leaked by computing devices. Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key. In non-technical language, this is the practice of code breaking or cracking the code. Most side-channel attack research has focused on electromagnetic emanations (TEMPEST), power consumption and, recently, diffuse visible light from CRT displays. The oldest eavesdropping channel, namely acoustic emanations, has received little attention. Our preliminary analysis of acoustic emanations from personal computers shows them to be a surprisingly rich source of information on CPU activity.


K Keyboards are the most common inputting devices. Specialized keyboards, such as keypads, are widely used for specialized data input. It is significant that keypads are usually used to protect a security system. In Hong Kong, keypads are widely used in banking industries and housing estates. Side channeling keypad data may be a potential threat to the security systems. According to a research conducted by Berkeley Doug Tygar, a researcher of the University of California, clicks and clacks from a computer keyboard can be transposed into a startlingly accurate transcript. Imagine that you are typing your password at an automatic telling machine. The ATM produces similar electronic sounds while you are pressing the keys. You think nobody knows what you are typing since the sounds are not distinguishable to human ears. However, you have made a big mistake. In fact, hackers use the mechanical sounds emanated to guess your password!! With some microphones, a computer and a sound-processing software, the keys pressed can easily be retrieved. The fact that an attacker can use this acoustic emanation from keypads to collect confidential information has been a great concern in security and privacy communities. In this project, we stimulate the keyboard acoustic triangulation attack. Forerunners determine different keys by their frequency components. We introduced a new attack using time difference approach. This approach works even when the mechanical sound

Dheeraj Basant Shukla Shri Satya Sai Institute of Science and Technology, Sehore (M.P.)

is veiled by electronic sound.

Side Channel Attack

A side channel attack is an uprising security issue in cryptography. It refers to any attack which gains the information from the physical implementation on a cryptosystem, rather than the theoretical weaknesses in algorithms. Usually, an attacker is not required to equip a thorough technical understanding in the internal operation of a system in order to perform a side channel attack.

Acoustic Cryptanalysis

Acoustic cryptanalysis is a type of side channel attack which extracts information unintentionally exploited from sounds produced during a computation or input-output operation. It is a new research area of applied cryptography that has gained more and more interest since the mid nineties. According to a book "Spycatcher", written by a former MI5 operative Peter Wright, similar attack technique had already been used as early a s in 1956. By that time, the attack "ENGULF" is used against the Egyptian Hagelin cipher machines.

Today, hackers collect sounds produced by a computer system during computations or input-output operations. They analysis them by implementing secure mathematical algorithms on the acoustic signals. Experiments show that valuable and distinguishable information can be extracted from those sounds. An example of acoustic cryptanalysis is the experiment conducted by Adi Shamir and Eran Tromer in 2004 . They demonstrated that it may be possible to conduct a timing attack against a CPU by analyzing the variations in its humming noise.

Acoustic Triangulation Attack

Triangulation is a process of finding the distance of a point using the concept of a triangle. The distance can be treated as a side of a triangle. It is calculated by measuring angles and sides of a triangle. Triangulation is a common technique for locating an object. It is often used in surveying, navigation, metrology and astrometry. Acoustic triangulation attack means finding the location of an object based on the measurement of acoustic waves generated by a keystroke. By detecting and measuring the differences in arrival times of the sound wave at two microphones, the impact location can be found uniquely.

A Keyboard

A mechanical keyboard consists of a number of keys and a circuit board. (Figure 1.2) On the keyboard there are many rubber buttons. Each key corresponds to

Fig. 1. An acoustic triangulation attack

a button on the board. Each key in a keyboard (Figure 1.3) generally consists of three parts:

a) A head -- This is the part where we contact with the key.

b) A bottom rubber part -- The dome-shaped rubber is used to make contact with an electrical switch corresponding to the key.

c) An intermediate plastic part in between the head and the rubber.

When a key is pressed, the dome-shaped rubber is squeezed. It then pushes the electrical switch and closes the circuit. Upon strike, the keyboard plate vibrates and produces a sound.


This chapter is divided into four parts. Firstly, the generalized approach for an acoustic attack is presented. Secondly, we discuss the details of the new attack, the acoustic triangulation attack. This includes the computation methodologies and experimental procedures. Thirdly, we evaluate the results and recognition rates from a set of experiments. Lastly, we suggest some ways to defend the new attack.

Generalized Approach

Summarizing different acoustic approaches, an acoustic attack can be generalized in three main steps:

Sound Collection

The first step of an acoustic attack is to collect sound waves. When a sound is recorded into the computer, analogue sound waves are converted into digital signals. During this process, parameters should be set carefully to generate a digital signal whose waveform is much closed to the original acoustic one. Preprocessing is also required to make the signal more distinguishable.

Signal Analysis

After the signal is recorded, a unique characteristic must be extracted to distinguish between different signals. In our proposed new attack, we check the time difference for a sound to reach two microphones. Analyzing techniques are also important to get accurate and most of the information from a signal. In our experiment, we propose two analyzing methodologies, the maximum peak position approach and the correlation approach.

Statistical Classification

Before classification, training should be done to teach the system about the features of a key. It is usually done by inputting a large set of sample data. To decide which key the signal belongs to, an efficient algorithm is necessary for decision making. Neural networks with the theory of Hidden Markov models are usually used for this purpose.

Acoustic Triangulation Attack

Since sound is a wave, its traveling time increases when distance increases. By measuring the differences in arrival times of the sound wave at two microphones, we a reable to determine the distance between the source and the microphones. In this experiment, we try to use two microphones to locate a key. The time difference is defined as the time when the first sound is received minus that the second one is received. By proving that the time delay is unique for every key, we can find the location for a particular key. In this section, the computation methodologies and experimental procedures will be discussed in depth. We will also predict the results and evaluate the errors which may occur.


The two microphones X and Y are placed according to Fig 2.



Fig 2. An experimental setup

Let us define the distance between key i and microphone X be Dix, and the one between key i and microphone Y be Diy . The keystroke emits a sound which has a velocity V.

When key i is pressed, the sound wave produced will be received by the two microphones. Let us define x(t) be the time required for the sound wave to reach microphone X and y(t) be the one for sound wave to reach microphone Y.

Therefore, x (t) = Dix / V and y (t) = Diy / V

The time delay of the received signal between microphone X and Y is

ti = x (t) - y (t)

= Di x / V - Di y / V

= ( Dix - D iy ) / V

For a key to be distinguishable, we assume that the delay ti is unique for each key. Based on the assumption, we are able to know which key is pressed just by finding the delay difference ti..

Keyboard Acoustic Characteristics

Mechanical keyboard consists of a number of keys and a circuit board. Each key corresponds to a button on the board. It generally consists of three parts, a head, a bottom rubber part and an intermediate plastic part in between the head and the rubber. When a key is pressed, the dome-shaped rubber is squeezed. It then pushes the electrical switch (Fig 3) and closes the circuit. Upon strike, the keyboard plate vibrates and produces a sound. Although the sounds emanated by the numeric keys are similar to our human ears, in fact they are different and can differentiate by computational analysis . The reason for different sounds is due to different positions of the keyboard plate stroked. Consider the acoustic mechanism of a drum. When it is stroked at different positions on the plastic plate, different timbres can be produced. With similar principle, striking on the keyboard plate at different positions will causes the plate makes different sounds. Thus, the frequency components of different keys are different. When a key is pressed, it actually produces two sounds. Fig 3 shows the acoustic signal of one click. The click lasts for approximately 100ms. We can see that the acoustic signal has two distinct peaks, a press peak and a release peak, corresponding to the pushing and releasing of a key. There is relative silence between the push and release peaks. Since the first peak is more significant than the second one, we will only consider the first one..

Release peak

Press peak

Fig 3. The acoustic signal of one click

Computation Methodologies

The goal of our experiment is to develop an easy way for finding the differences in arrival times of the sound wave at two microphones. Therefore, we would like to compute the input signals in simple methods. The two methods used in our experiments are:

-The Maximum Peak Position Approach

From some simple experiments, we find out that there are some very sharp peaks in a keystroke. Assume that noise is not large enough to interfere the signal significantly. The positions of the sharp peaks are not affected in a great sense. Hence, we are able to find the time difference at receivers by comparing the positions of sharp peaks from the two waves.

Fig 4. Explanation of using peak values as a reference point

A typical example is shown in Fig 4 . We choose a few sharp peaks as reference points and compare the differences between them. In this case, the difference in received time is (T2-T1).

In the above approach, the comparison is conducted such that noise has a little influence on the sharp peaks. However, in real situation, noise do affects the position of maximum peaks at a random process. Therefore, it is expected that the received time differences vary within a range.

-The Correlation Approach

The correlation between two signals, cross correlation, is a standard approach to feature detection and pattern recognition. A cross correlation function measures the similarity of two signals. It is commonly used to find features in an unknown signal by comparing it to a known one.

Consider x( i ) and y( i ) represents the digital signals received by microphone X and microphone Y respectively, where i = 0, 1, 2 … N-1. The cross correlation r at delay d is defined as

r(d) will achieve the greatest value when x(i) overlaps with y( i-d ) to show large amplitudes. (Fig 5) In this way, we can achieve the delay d which is equal to the time difference between sounds received by the microphones. We can also use Matlab to find d by plotting a graph of r(d) against d.

Fig 5. A graph of r(d) against d

-Advantages of the New Approach

The previous approach relies on different sounds produced by keystrokes. However, such acoustic emanation attack can be prevented by making the keys to sound similarly or by applying an electronic sound of the same frequency. With these defenses, it is harder to analysis the keystrokes using the old approach. Moreover, analyzing the frequency domain requires very deep knowledge in speech recognition and classification techniques. A lot of research must be done beforehand in order to perform this attack.

However, in the new approach, we do not need to equip with difficult acoustic theories and techniques. The only thing we need to do is to check the press peak and release peak of a keystroke. The time domain properties can be easily found out with many sound processing software, e.g. GoldWave. Then we can use a simple program to distinguish signals among keys by Matlab. This software is common and easy to use.

-Experimental Setup

Keyboards. We used a Dell PS2 keyboard P/N 7N242.

Microphones. Multimedia condenser type microphone. Sensitivity: -42dB, 0dB + 1Pa, 1kHz;

Frequency response: 20Hz - 20kHz; Impedance: 2000 ohm; 3.5 mm stereo jack. (HKD $20)

Computers. We used Dell Computers in IE Computing Lab ERB 1008. They are equipped with Pentium(R) 4 CPUs with 2.8GHz. The sound cards installed are SoundMAX Integrated Digital Audio.

Softwares. The input was digitized using a standard PC sound card. GoldWave (Free trial version) was used for recording the sound in mono with 96 kHz sampling rate. Matlab version 7.0.1 was used for comparing the waveforms and analyzing the recognition rate.


Firstly we placed the microphones on a horizontal line. The microphones were placed approximately 20 cm away from the keypad. (Fig 6) Then we recorded the sounds emanated using two computers.

Fig 6. The actual experimental setup

For each set of samples, we pressed the same key for a number of times with similar strength. Then we modified the signals using GoldWave before analysis. After that, we fed the signals into a program written in Matlab . The signals were analyzed in two approaches, the peak difference approach and the correlation approach. Finally, we compared the recognition rates resulted from these two approaches.

--Pre-processing the Recorded Sounds

When the keystrokes were recorded, the background noise was also recorded. Because keystroke sounds are comparatively weak, noise influences the signal in a greater sense. Since the signals can be barely distinguished in random noise, it is better to reduce them before processing. Due to time limitation, we have not developed our own noise reducing function. Therefore, we apply the "noise reduction" filter in GoldWave as its performance is quite good. This filter filters away noise using frequency analysis. The result after processing is apparently showing a very clear signal.

--Extracting the Keystrokes

A sound wave file contains many keystrokes. It is necessary for us to chop them out one by one for comparison. First of all, we read two processed sound waves into Matlab . A sound wave is treated as a single array which stores the wave amplitudes as values. Then, two functions, "chopping" and "compare", are written for the purpose of extracting single keystrokes.


In the function "chopping", its objective is to find two

points representing the beginning and the end of a keystroke signal from a sound wave. Firstly, this function imports a matrix of sound wave and its initial searching point. Then, the function checks the values sequentially from the initial point. If the absolute amplitude of a sound wave is greater than a per-set threshold, then it is regarded as the beginning of a keystroke. According to our observation, a key pressed by normal strength can usually be distinguished with a threshold of 0.1. The first point which fulfils the above requirement is regarded as Point A. Then, the start point is set by Point A minus 2000. (Fig 7) By doing so, we can ensure that the complete beginning of a keystroke is included. After the start point is found, the function checks for the end point. Another threshold is set to check the end of a keystroke. It is found that, after noise reduction, waves with absolute amplitudes less than 0.02 are generally not considered in a keystroke signal. Hence, the second threshold is set to be 0.02. When there are 2000 successive points with amplitude lower than 0.02, we set the last point as Point B. The end point is recorded as adding Point B by 2000 to obtain the complete keystroke. (Fig 8) Finally, this function outputs the start and the end points to the function "compare".

Fig 7. The start point and end point of a typical keystroke


The objective of this function is to extract slices of chopped signals such that they are synchronized in time. Firstly, this function reads in two wave matrices. The matrices are passed into the function "chopping" separately. With the two pairs of start and end points, we compare them to get a common pair. To get a common start point, the start points are compared and the earlier one is chosen. Similarly, the end points are compared and the later one is chosen as the common end point.

Fig 8. The selection of the common start and end point

Secondly, the data within the selected range is copied from both sound matrices. They are then saved in two independently matrices for later comparison. The initial point for the next comparison is set to be the common end point. The above process is repeated until all the keystrokes are extracted. Finally, the two processed signal matrices are output to the analyzing functions.

--Computing the Received Time Difference

As mentioned, there are two methods of computing the received time difference - the maximum peak position approach and the correlation approach. In the experiment, we process the signals using these two methods separately and evaluate their respective performances.

Two functions are written for each method. Both functions are divided into two parts.

The first part is the main computation part. The second part is recognition calculation which is the same for both functions.

In this section, we will only discuss the first part of these two functions.


From the function "compare", we get two matrices of chopped signals. For each set of data, we find the position of the maximum peak in each wavelet. Then the time difference is simply found by the difference between the positions of the two maximum peaks. The values are stored in an array for later recognition.

compute_receive_time_difference_by_correlation. m

To perform cross correlation between two signals, we apply the xcorr function in Matlab . This function returns the cross-correlation sequence in a length 2*N-1 vector,where x and y are length N vectors (N>1). If x and y are not the same length, the shorter vector is zero-padded to the length of the longer vector. By default, xcorr computes raw correlations with no normalization.

After xcorr is applied, we find the size and the maximum peak position of the output vector. The time difference is calculated by the following equation.

Time difference = Maximum peak position - ( Size of output ma trix / 2 )

Then the values are stored in an array for later recognition.

--Computing the Recognition Rate

For doing signal classification and recognition, there exist lots of advance algorithms and statistical models, e.g. the Hidden Markov models. Due to limited time, we only choose the simplest computation, the distance approach.

The recognition phrase can be further divided into two parts, the training and the classification. In the training phrase, the mean received time difference of each key is calculated from a set of training data. It is used as the reference for classification. After that, we classify keystrokes in a recognition data set. The received time difference of a sample keystroke is compared with the means. The distance between the keystroke and a specific key is given by time difference minus the mean. By finding the minimum distance, we can classify a key for the sample keystroke it belongs to.

Fig 9. Classification of a sample key


Since we are measuring the time domain properties using two microphones, we need to synchronize the signal received in order to set the initial point. Synchronization can be done by using a mixer. The output of a mixer is a single sound wave. Since we need two wavelets to compute correlation, we choose the software approach for analysis.

Consider microphone X starts recording earlier than microphone Y. Let n be the time difference between the starting times of the two recordings. For a particular key, the time Tx is required for the sound wave to be recorded by microphone X. While the time Ty is required for the sound wave to be recorded by microphone Y.

So the particular delay will be the sum of delay due to difference between Tx and Ty . The actual delay d is

D = ( Tx - Ty)

The time difference between the sounds received is unique. Since time difference n is a constant in the recording sample, the time difference between the sounds received under unsynchronized condition is still unique. So the synchronization issue is not a very big problem. However, we usually synchronize the microphones in real practice for easier comparison.

-Expected Results

The speed of sound in air varies with the temperature. In IE Computing Lab ERB 1008, the room temperature is approximately 20C (70 F). Hence the speed of sound is approximately 344 meters/second.

The least value of (D i x - D i y ) = 2 * separation of two adjacent keys

= 2 * 1.5cm

= 0.03m

The minimum value of dela = The least value of (D i x - D i y ) / speed of the sound in air

= 0.03 / 344

= 87.2 picoseconds


Previous keyboard acoustic attacks are performed by examining the frequency domain of a keystroke. It can defend by making keys sound similar. However, in our new approach, keyboard acoustic triangulation attack, keys can be detected when they just make a sound! It is more difficult to defend than before. From our experiments, we find out that the recognition rate reduces when the surrounding noise level is high. Therefore, the acoustic triangulation attack may be defended by typing in a very noisy environment. Another suggested method is to use virtual keyboards. (Fig 10) A projector projects the image of a keyboard. An infrared sensor scans the plane of the images to detect the intrusion of a finder into the desired portion of those images. Since no sound is produced during typing. The acoustic triangulation attack can be defended.

Fig 10. A virtual keyboard


A. Advantage

-A powerful sound reception (microphone) and pre-processing devices (filter) give significant pattern recognition results.

-The result is also can reveal the time consumed for each computation to complete, so this can be used for measuring time for timing attack approach.

-The spectra pattern could be a tool for the Eavesdropper to determine what kind of algorithm and instruction are used to encrypt or decrypt information.

-Acoustic attack is a natural attack which efficient and simple. No need to implement deep mathematical equations to reveal important information.

-Acoustic attack is spectacular because ordinary people can also perform this attack! Experimental tools and softwares can be obtained easily from shops and the Internet.

B. Disadvantage

-The attack would be difficult if it is conducted in very noisy environment.

-Using a touch screen and touch stream or special keyboard prevents acoustic eavesdropping from keyboard or keypad.

-It is difficult to determine the exact characteristic of spectrogram from CPU's sound since the acoustic emission corresponds to CPU activities is very hardware dependent.

-Use sound dampening equipment (i.e. "sound-proof" boxes or rooms)


Previous keyboard acoustic attacks require a deep knowledge in frequency domain characteristics of a keystroke sound. It also has a complex computation algorithm. In this project, we have proven that a key can be uniquely identified by applying Acoustic Triangulation Attack . This attack computes the differences in arrival times of a sound wave at two microphones. The recognition rate of 3 to 5 keys from a set of 104 keys can be up to 80%.

This new finding is spectacular because ordinary people can also perform this attack! Experimental tools and software can be obtained easily from shops and the Internet. Hackers are able to know what you are typing just because your keyboard makes a sound! We hope, through this project, to increase public's awareness in keyboard security. By knowing the attacking methods, people could protect themselves against hackers. In the near future, hackers do not only thieve information from your computer or hijacking messages over the Internet. They do side channel attack!

In order to be immunated to a high number of cryptanalysis, implementations must now integrate a very high level of expertise. The countermeasures are always possible and available, but they must be well thought. It is easy to believe avoiding a side channel and in fact to become weaker from another one it is an eternal game between robbers and policemen; for the moment cryptanalysts seems to be better than designers, but in the future it will undoubtedly quickly evolve.


[1] D. Asonov, and R. Agrawal, "Keyboard Acoustic Emanations", In Proceedings of the IEEE Symposium on Security and Privacy, 2004.

[2] L. Zhuang, F. Zhou and J. D. Tygar, "Keyboard Acoustic Emana tions Revisited". In Proceedings of the 12th ACM Conference on Computer and Communications Security, 2005.

[3] F. J. Owens, "Signal Processing of Speech", New York: McGraw-Hill, 1993, pp.25.

[4] J. Harrington and S. Cassidy, "Techniques in Speech Acoustics", The Netherlands: Kluwer Academic Publishers, 1999.

[5] S. Goronzy, "Robust Adaptation to Non-Native Accents in Automatic Speech Recognition", Berlin Heidelberg: Springer-Verlag, 2002.

[6] H. Bar-El, "Introduction to Side Channel Attacks", [7] Discretix Technology Ltd.,