Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.
Ambisonics, being a technique for the capture and playback of 3D sound, has seen many application in various fields. Thus far, a lack of attention has been paid to the characteristics of distributed Ambisonic microphone arrays, and the benefits or drawbacks of increased distance between individual Ambisonic microphones. We explore the background and history of the technology, and several recent applications of distributed arrays of various sizes.
Keywords: Ambisonic, Distributed, Microphone, Array, Localisation
Three-dimensional sound technologies have seen enormous advancement over the past decade, especially with the rise of Virtual Reality and Augmented Reality technologies. From the methods of human binaural hearing to the applications of large-scale 3D microphone arrays, this review aims to point out a lack of evidence in several key areas with regard to specific types of microphone array. A goal of the author is to evaluate the effects of the distribution of Ambisonic microphone arrays on human sound localisation, especially with respect to journey preparation, navigation and wayfinding for vision-impaired people.
Human Sound Localisation
Sound localisation is the ability to recognise the direction or distance of an incoming sound . In humans and other mammals, it provides relevant information for orientation in the world and spatial awareness. Additionally, it is an effective way of noticing and recognising threats, especially from areas outside the field of vision.
In humans, the direction of an incoming sound is primarily detected through the interaural time difference (the time difference between ears, often less than 0.5 milliseconds) and interaural level difference (the volume difference between ears). Time differences are more dominant at lower frequencies below 2000 Hz, while volume differences are more dominant at higher frequencies . This is due to the fact that at higher frequencies the wavelength of sound becomes smaller than the distance between a listener’s ears, which makes it much more difficult to detect differences in phase.
The cues for distance of a sound source include volume, high-frequency attenuation and reverberation . More distant sounds are of course softer, but higher frequencies drop off faster than lower frequencies as distance increases. In addition, the ratio of the loudness of a sound to the loudness of its reverberations and echoes can inform listeners of the distance of the source .
For each interaural time difference, there is a range of directions from which a sound could have come. To combat this, humans and other mammals will often turn their heads in order to localise sound more accurately . This allows for multiple direction estimates of the same sound, since ITDs and ILDs will change with the angle of the head .
A further method of sound localisation is based on the shape of the upper body, head and ears, which affects which frequencies and directions are amplified or dampened. This pattern is unique to each person and is known as a Head Related Transfer Function(HRTF). The measurement of HRTFs is done by a variety of methods, such as by recording a known stimulus sound from multiple directions with an in-ear microphone . Another method takes advantage of the reciprocity of sound, by using an in-ear speaker to play a known stimulus sound to an array of microphones surrounding the subject . Other methods include creating 3D models of a person’s upper body and ears in order to compute their HRTF manually.
History of Binaural Audio
Almost since the invention of recorded audio, there have been attempts to mimic the immersive nature of 3D sound. One of the first binaural audio systems was built by Clement Ader in 1881, transmitting audio from two microphones at the Paris Opera via two phone lines – one for each ear . In later decades, some radio stations would broadcast stereo sound by using two different frequencies requiring listeners to have one radio tuned to each frequency. Binaural recordings were also made by constructing mannequins with microphones in the ears. Modern equivalents of these are still in use today, such as those made by 3DIO.
While the above methods enjoyed some popularity, they remained a niche interest until the second half of the 20th century, when audio technologies began to gain the complexity required to support more complex 3D sound. Disney’s animated film Fantasia (1940) was the first film to use multiple speaker positions to create 3D sound in cinemas, and in the following decades a number of competing methods for 3D sound were developed: Quadrophonics, Tetraphonics, and Dolby Surround Sound all aimed to produce an immersive, realistic field of 3D audio (known as a soundfield). In recent years, with the rise of Virtual Reality and Augmented Reality technologies, many tech companies are producing immersive 3D sound for VR. The Oculus Rift (owned by Facebook), Samsung Gear VR, HTC Vive, and many others use 3D visual techniques to create immersive Virtual and Augmented Reality experiences, and the most commonly used audio technique for these VR and AR applications is known as Ambisonics.
Ambisonics is a technique for the recording and playback of 3-dimensional soundfields. First invented by Michael Gerzon in the 1970s , the technique is achieved through the decomposition of a soundfield into spherical harmonics. The order of an Ambisonic system indicates the order of spherical harmonics that the soundfield has been decomposed into; first-order Ambisonics uses the zeroth and first order spherical harmonics.
The spherical harmonics are a set of mathematical functions which have the useful properties of being orthogonal and complete on the sphere . In practice, this means that any function over the surface of a sphere (such as the soundwaves approaching a point from all directions) can be expressed exactly as a simple sum of spherical harmonics. For instance, a first-order Ambisonic microphone such as the Sennheiser Ambeo (Figure 4) records audio with four microphone capsules in a tetrahedral shape and produces one audio track for each of the first four harmonics, each one a simple linear combination of the microphone feeds. This process is known as the conversion from A-format (the direct microphone feeds) to B-format (the normalised harmonic tracks).
Higher Order Ambisonics (HOA) involves using more of the spherical harmonics to create a more detailed soundfield, at the cost of more complicated microphone arrays and greater computational complexity . Generally, an Ambisonic array needs as many microphones as there are harmonics to create a complete soundfield, so a third-order microphone array would use a minimum of 16 microphone capsules. As the order of an Ambisonic system increases, the localisation ability of listeners increases accordingly , but issues begin to accumulate as well. All microphone capsules must be as close as possible to the central point, otherwise microphones may record differences due to their positions rather than their orientations. Additionally, microphone capsules should be perfectly equally spaced over a sphere, which becomes impossible for microphones with more than 20 capsules. Only the Platonic solids (tetrahedron, cube, octahedron, dodecahedron, and icosahedron) have exact and perfect spacing; larger arrays can only be approximately spaced. Despite this, larger arrays have seen significant development, with attention to reduce these issues .
Once a recording has been produced, there are several methods of playing it back through speakers or headphones to recreate the sense of 3-dimensional sound. Gerzon’s original design for Ambisonics required an array of loudspeakers surrounding the listener including above and below them, in order to recreate the full soundfield. Later innovations allowed for greater flexibility in the placement of loudspeakers, requiring a small number of circular loudspeaker arrays instead of a full sphere  . Alternatively, a soundfield can be translated directly into audio for headphones, through the process of binaural synthesis.
Figure 6. An Ambisonic loudspeaker array. https://grayarea.org/event/sound-research-meetup-ambisonics/
Binaural synthesis is the process of transforming a soundfield into binaural sound for headphones, through various methods . One version of the process primarily involves using a Fourier transform or Fourier impulse response to separate the individual frequencies, then identifying the most likely directions for each frequency and calculating the corresponding time and volume differences for those directions . In any form of binaural synthesis, a head-related transfer function is used in order to recreate the effect of the pinnae and head on the sounds heard. Since every person’s HRTF is different, most binaural synthesis techniques use a general HRTF compiled as a mean of many recorded HRTFs . In addition, the use of head tracking can significantly improve immersion, allowing for a soundfield to rotate as the listener turns their head.
The microphones and other hardware used for Ambisonics are also a source of interest, ranging from first-order microphones with four capsules such as the Sennheiser Ambeo to larger, 32-capsule designs like the MHAcoustics Eigenmike. A direct comparison was run comparing several common designs on a range of criteria, which found that the localisation ability in the Sennheiser Ambeo was better compared to other first-order microphones, but worse than the MHAcoustics Eigenmike .
Ambisonics has been used in a variety of applications, from entertainment to training to healthcare. Virtual Reality technologies such as Oculus and Microsoft’s HoloLens use Ambisonics, and while of course their most common uses thus far have been in the field of entertainment, other fields such as medicine have begun to take notice . VR has seen some use in several different mental health treatments  and in training for healthcare workers , and though the results so far have been inconclusive, procedures and methods are likely to improve along with wider adoption of the technology. There has also been some attention in defence and security within our own research group, with the design of Ambisonic microphones on small drones to support infantry in reconnaissance . There are also applications in telepresence , music, and education [32, 33].
While there has been a lot of attention on techniques using single Ambisonic microphones, the concept of distributed Ambisonic arrays holds significant promise. A distributed array has several Ambisonic microphones, each one capable of recording an omnidirectional soundfield. This allows for the localisation of sound sources by triangulation, and potentially for listeners to gain greater understanding of the space. Researchers have used arrays of Ambisonic microphones spaced distances of less than 1m apart to hundreds of metres apart in various applications.
Philipp Hack’s Master’s thesis  revolved around computer localisation in distributed Ambisonic arrays. Using an array of 8 tetrahedral microphones arranged in a circle of diameter 3 metres, Hack designed two different methods of estimating the positions of sound sources in three dimensions. Beginning with a detailed analysis of the tetrahedral microphone used for the array (Oktava 4D-Ambient), he then introduced a method based on creating three-dimensional “acoustic maps” of likely source positions, and estimating source locations as the maximum points. The other method involved taking the direction estimates from each individual microphone and extending the lines to see where they intersected or came closest together.
The mean absolute error for both methods ranged from 13cm to 20cm depending on the number of sound sources and the number of microphones listening, with the acoustic map method performing better in noisy environments (sound source 10 dB louder than background noise) and the linear intersection method performing better in quieter conditions (sound source 25 dB louder than background noise). Further direction-of-arrival (DOA) filtering was done on the individual microphones, which reduced the overall mean absolute error to 1-7 cm.
The total amount of processing involved meant that the system did not work in real time – it required prerecorded sound data and could not keep up with a live feed. In addition, little attention was given to justifying the distance between microphones – in fact, no explanation at all was given for why the array was distributed in the way it was. Also neglected were the effects of sound sources a distance away from the array; all the sound sources considered were either inside the array or within 1 metre of it. In a smaller study , researchers at Swinburne University attempted to recreate a simpler version of the Acoustic Maps method, using an array of three tetrahedral microphones and localising stimulus sounds played from several metres outside the array. The localisation algorithm was capable of running in real time, however its accuracy was significantly worse.
In a project as part of the Swinburne Summer Research Internship program , a series of Ambisonic and 3D video recordings were made throughout Flinders Street Station in Melbourne. These were used to create an immersive soundscape of the station, with the aim of producing a journey-preparation tool for the vision-impaired. As vision-impaired people often rely more on audio cues for localisation, familiarity with the sound of an area can aid in navigation . In this project, the recordings allowed vision-impaired people to listen to the area and prepare for a journey by familiarising themselves with the acoustics and background noise of various parts of the station.
Stationary recordings using a Sennheiser Ambeo microphone and a Garmin Virb 360 camera were made in 17 different locations around the station, along with one moving recording. The distributed recordings allowed for some spatial understanding of the area as a whole, but since the recordings were not simultaneous there was no way to hear one sound from two different positions in order to triangulate the source position. The distance between recording positions was chosen purely by practicality – there was only a limited time to make the recordings, and a large area to cover.
Figure 11. Recording at Flinders Street station
Another, even larger design was used to allow people to evaluate various locations, with respect to views, background noise and traffic sounds across a large area of Naples, Italy . 3D video and audio recordings were taken at ten different locations spread hundreds of metres apart, and listeners were asked to rate various aspects of the recordings (such as video and sound quality) and the locations recorded. The primary goal of the project was to aid municipal councils in urban planning, but other applications of wide-scale microphone arrays can be considered. Background noise has become a recognised health issue in many parts of the world [39, 40], and it could potentially be measured across very large areas with a distributed Ambisonic array. In addition, large-scale arrays can be beneficial for smart cities[41, 42], allowing for the automatic routing of resources or services to locations based on audio cues – recognising the sounds of alarms, car crashes, or traffic noise, and reacting accordingly with emergency services or other responses .
Figure 12. Stretch of Naples waterfront, with 3 microphone locations used by Puyana-Romero (2017)
These studies above are among a relatively small few that have used distributed Ambisonic microphone arrays in various contexts. Notably though, none have given any justification for their choice in the distance between microphones. This is despite the fact that little is understood about how the distance between Ambisonic microphones affects human sound localisation. Do larger arrays triangulate sound more effectively? Does the distance from a sound source affect localisation accuracy in distributed arrays? Is there an optimal distance between Ambisonic microphones in these kinds of arrays, and if so – what is it?
The answers to these questions are not currently known, and research is needed in these areas in order to answer them.
The techniques and theory behind human sound localisation and the use of Ambisonic microphone arrays are far more complex and detailed than can be outlined in this review, but it is hoped that the benefits of further research are clear. Greater understanding of the effects of microphone distribution in Ambisonic arrays can be of great benefit to the varied applications of this technology, from entertainment to urban planning to navigation for vision-impaired people.
- Blauert, J., Spatial hearing: the psychophysics of human sound localization. 1997: MIT press.
- Middlebrooks, J.C. and D.M. Green, Sound localization by human listeners. Annu Rev Psychol, 1991. 42(1): p. 135-59.
- Makous, J.C. and J.C. Middlebrooks, Two‐dimensional sound localization by human listeners. The journal of the Acoustical Society of America, 1990. 87(5): p. 2188-2200.
- Carlile, S., S. Delaney, and A. Corderoy, The localisation of spectrally restricted sounds by human listeners. Hearing Research, 1999. 128(1): p. 175-189.
- Zhong, X., W. Yost, and L. Sun, Dynamic binaural sound source localization with ITD cues: Human listeners. Vol. 137. 2015. 2376-2376.
- Zahorik, P., D. Brungart, and A. Bronkhorst, Auditory distance perception in humans: A summary of past and present research. Vol. 91. 2005. 409-420.
- Rumsey, F., Spatial audio. 2012: Focal press.
- Muller, B.S. and P. Bovet, Role of pinnae and head movements in localizing pure tones. Swiss Journal of Psychology / Schweizerische Zeitschrift für Psychologie / Revue Suisse de Psychologie, 1999. 58(3): p. 170-179.
- Gelfand, S.A., Hearing: An introduction to psychological and physiological acoustics. 2017: CRC Press.
- Xie, B., Head-related transfer function and virtual auditory display. 2013: J. Ross Publishing.
- Møller, H., et al., Head-related transfer functions of human subjects. Journal of the Audio Engineering Society, 1995. 43(5): p. 300-321.
- Zotkin, D.N., et al., Fast head-related transfer function measurement via reciprocity. The Journal of the Acoustical Society of America, 2006. 120(4): p. 2202-2215.
- Paul, S., Binaural recording technology: A historical review and possible future developments. Acta acustica united with Acustica, 2009. 95(5): p. 767-788.
- Gerzon, M.A., PERIPHONY – WITH-HEIGHT SOUND REPRODUCTION. Journal of the Audio Engineering Society, 1973. 21(1): p. 2-10.
- Marinucci, D. and G. Peccati, Random fields on the sphere: representation, limit theorems and cosmological applications. Vol. 389. 2011: Cambridge University Press.
- Frank, M., F. Zotter, and A. Sontacchi, Producing 3D Audio in Ambisonics. Vol. 2015. 2015, Proceedings of the AES International Conference.
- Bertet, S., et al., Investigation on Localisation Accuracy for First and Higher Order Ambisonics Reproduced Sound Sources. Acta Acustica United with Acustica, 2013. 99(4): p. 642-657.
- Moreau, S., J. Daniel, and S. Bertet. 3D sound field recording with higher order ambisonics–Objective measurements and validation of a 4th order spherical microphone. in 120th Convention of the AES. 2006.
- Zhang, W. and T.D. Abhayapala, Three dimensional sound field reproduction using multiple circular loudspeaker arrays: functional analysis guided approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014. 22(7): p. 1184-1194.
- Ahrens, J. and S. Spors, An Analytical Approach to Sound Field Reproduction Using Circular and Spherical Loudspeaker Distributions. Acta Acustica United with Acustica, 2008. 94(6): p. 988-999.
- Hammershøi, D. and H. Møller, Binaural technique—Basic methods for recording, synthesis, and reproduction, in Communication acoustics. 2005, Springer. p. 223-254.
- Berge, S. and N. Barrett. High angular resolution planewave expansion. in Proc. of the 2nd International Symposium on Ambisonics and Spherical Acoustics May. 2010.
- Algazi, V.R., et al. The cipic hrtf database. in Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575). 2001. IEEE.
- Hendrickx, E., et al., Influence of head tracking on the externalization of speech stimuli for non-individualized binaural synthesis. J Acoust Soc Am, 2017. 141(3): p. 2011.
- Bates, E., et al., Comparing Ambisonic Microphones—Part 2. 2017.
- Hong, J.Y., et al., Quality assessment of acoustic environment reproduction methods for cinematic virtual reality in soundscape applications. Building and Environment, 2019. 149: p. 1-14.
- Riva, G., Applications of Virtual Environments in Medicine. Methods Inf Med, 2003. 42(05): p. 524-534.
- Gregg, L. and N. Tarrier, Virtual reality in mental health. Social psychiatry and psychiatric epidemiology, 2007. 42(5): p. 343-354.
- Mantovani, F., et al., Virtual reality training for health-care professionals. CyberPsychology & Behavior, 2003. 6(4): p. 389-395.
- Bennett, H., et al., Soldier Supportive Acoustics. 2018.
- Keyrouz, F. and K. Diepold, Binaural Source Localization and Spatial Audio Reproduction for Telepresence Applications. Presence: Teleoperators and Virtual Environments, 2007. 16(5): p. 509-522.
- Psotka, J., Immersive training systems: Virtual reality and education and training. Instructional science, 1995. 23(5-6): p. 405-431.
- Merchant, Z., et al., Effectiveness of virtual reality-based instruction on students’ learning outcomes in K-12 and higher education: A meta-analysis. Computers & Education, 2014. 70: p. 29-40.
- Hack, P., Multiple Source Localization with Distributed Tetrahedral Microphone Arrays, in Institute of Electronic Music and Acoustics. 2015, University of Graz: Graz.
- Favilla, S., et al. Acoustic sound localisation: Visualisations of a 1 st order ambisonic microphone array. 2018. Association for Computing Machinery.
- Lai, T.D., et al., An immersive journey preparation tool for people with vision impairment. 2018.
- 37. Loomis, J.M., R.G. Golledge, and R.L. Klatzky, Navigation system for the blind: Auditory display modes and guidance. Presence, 1998. 7(2): p. 193-203.
- 38. Puyana-Romero, V., et al., Interactive Soundscapes: 360°-Video Based Immersive Virtual Reality in a Tool for the Participatory Acoustic Environment Evaluation of Urban Areas. Acta Acustica united with Acustica, 2017. 103(4): p. 574-588.
- 39. Hammer, M.S., T.K. Swinburn, and R.L. Neitzel, Environmental noise pollution in the United States: developing an effective public health response. Environ Health Perspect, 2014. 122(2): p. 115-9.
- 40. Basner, M., et al., Auditory and non-auditory effects of noise on health. The lancet, 2014. 383(9925): p. 1325-1332.
- 41. Almeida, C., J.P. Paulo, and M. Félix, Sound Localization in Urban Areas using the Ambisonic Concept.
- 42. Socoró, J.C., F. Alías, and R.M. Alsina-Pagès, An Anomalous Noise Events Detector for Dynamic Road Traffic Noise Mapping in Real-Life Urban and Suburban Environments. Sensors, 2017. 17(10): p. 2323.
- 43. Meiling, S., et al. MONICA in Hamburg: Towards Large-Scale IoT Deployments in a Smart City. in 2018 European Conference on Networks and Communications (EuCNC). 2018. IEEE.
If you need assistance with writing your essay, our professional essay writing service is here to help!Find out more
Cite This Work
To export a reference to this article please select a referencing style below:
Related ServicesView all
DMCA / Removal Request
If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please: