About the text

This treatise is not intended to be an introduction to hearing science. It assumes that the reader is well-versed in basic hearing phenomena and has at least some acquaintance with wave physics, Fourier analysis, and linear signal processing. Readers that are already familiar with Fourier and geometrical imaging optics, as well as photography, astronomy, or microscopy enthusiasts, are likely to find certain optics-inspired passages relatively easy to follow—bordering on trivial. The same goes for readers with background in communication and radar engineering, who are going to find some sections relatively straightforward. While several chapters contain mathematical derivations that can be outside the comfort zone of the less mathematically-inclined readers, they are encouraged to gloss over them and focus on the qualitative descriptions that may be sufficient to develop the necessary insight. Nevertheless, a few topics should undoubtedly benefit from mathematical understanding—mostly those that introduce the basic space-time duality equations, the analytic signal, modulation and demodulation, coherence, and the various modulation transfer functions.

Some specific conclusions and derivations may raise interest among non-specialists as well. The derivation of the modulation transfer functions from the temporal imaging equations has not appeared in the optics literature previously, which has focused primarily on time-domain solutions.

There are several implications of the ideas expressed in this work that may also be of interest to perception, vision, and neuroscience specialists. If proven correct, then auditory imaging as presented here suggests that certain imaging principles are biologically common to both hearing and vision. This begs the question of whether additional sensory inputs are processed in a similar fashion, only with less obvious dimensional substitutions.

For the neuroscientist, the idea that the brainstem performs neural processing in hearing in part to achieve a function that is performed analogically in the eye may be curious as well. This suggests that biological computation is both analog and digital and that the segregation between the mechanical and neural domains may be at least somewhat contrived. It also underlines the significance of sampling considerations, which are usually taken for granted in the discussion of neural coding.

The scope of this work has been limited on purpose and largely excludes an in-depth treatment of some topics that have already received much attention in hearing science. Major topics that are only mentioned in passing are binaural processing, intensity and dynamic range compression, and lateral inhibition, as well as across-channel frequency weighting that is achieved by different segments of the auditory processing chain, which possibly contains the spectrotemporal modulated class of signals. The theory also formally deals with the auditory system up to the inferior colliculus, so higher-level effects (like attention or speech perception) are mostly avoided. Finally, in order to limit the scope of the literature reviewed, I tended to ignore most of the mathematical models of the associated ear parts, such as the cochlea, the auditory nerve, or of the complete system. These omissions notwithstanding, there was still much left to be explored in this work.

In the text, I have striven to remain agnostic about the particular cochlear mechanics that transduces the signal, as numerous publications and models have been exclusively dedicated to this problem and, for all I can tell, the jury is still out as for which one is (the most) correct. Experimental data are still being reported regarding the cochlear mechanics and it is not unusual that they contradict predictions made based on different cochlear models or on classical observations done with outdated methods. However, I have posited two new functions of the organ of Corti (of a phase-locked loop and a time lens), which has made it almost impossible to retain this agnostic approach all throughout.

Chapter overview

The heart of this work—the temporal imaging theory—is contained in chapters §§ 10 to 14. Confident readers are encouraged to skim them before committing to the various introductory chapters. Results and implications are presented in the final chapters §§ 15 to 18, which are more qualitative in nature and likely have relevance to a wider audience within the auditory research community.

The introductory chapters survey a range of topics that are not necessarily new, but they attempt to tackle several acoustical issues in a fresh manner that is especially pertinent to hearing as a communication system that is embedded in a realistic world of arbitrary stimuli. Notable among them are chapters §6 about physical signals, and §7 and §8 about synchronization and coherence. Chapter §9 may be considered a standalone text that is introductory in spirit, but presents a novel hypothesis regarding the auditory phase-locked loop (PLL). This chapter was required for the assumption of coherence conservation between the external world and the auditory brain, but I believe that it may have far-reaching consequences beyond it, which are only superficially explored in the present work.

Appendices §A, §E, and §F feature results of small-scale measurements, which were necessary to corroborate some of the claims in the text and may be interesting in their own right, although only the first two may be understood without reference to the main text.

Some of the sections refer to audio demos, which can be found in the supplementary directories and are printed in small caps. They are found in https://zenodo.org/record/5656125.

Below is an overview of the individual chapters.

Chapter §1 motivates the treatise and provides a brief review of current and historical hearing theories with emphasis on visual analogies. It dwells on existing attempts to define the acoustic object, the auditory image and object, and the inconsistencies and shortcomings they bring about. Using various physiological, functional, and physical considerations, it makes the case that a correct analogy between the ear and the eye has it that the cochlea of the inner ear is at an analogous level to the lens, whereas the inferior colliculus of the auditory midbrain is at an analogous level to the retina. A temporal imaging theory is then motivated using four additional perspectives: the prominence of direct versus reflected sound in hearing (unlike light in vision), imaging mathematics analogies between spatial and temporal equations, insights from communication about the physical transfer of information, and signal coherence propagation from the acoustic environment into the listener's brain.

Chapter §2 reviews the anatomical structure and physiology of the mammalian ear, with emphasis on humans, from the external ear to the auditory cortex. The review is deliberately high level in that it tends to neglect low-level details (e.g., cellular, biochemical) in order to crystallize a systemic perspective, where attainable. It is intended mainly for reference and for highlighting possible roles that have been attributed to the different components of the auditory system. The chapter concludes with a comparative section about some of the major differences between the auditory systems of humans and other mammals.

Chapter §3 presents a novel point of view on known aspects of real acoustic sources and environments. The idea behind this chapter is to highlight how the acoustics of realistic sounds and environments diverge from classical linear descriptions. For this purpose, a general formalism is adopted for the representation of waves, which allows for straightforward incorporation of the concepts of dispersion, instantaneous envelope and phase, and group delay. The overarching difference can be boiled down to that between constant Fourier frequency representation and time-dependent complex envelope representation, which facilitates amplitude and frequency modulation. The effects of dispersion and other realistic acoustic signal degradations in realistic environments—both outdoors and indoors—are emphasized.

A very short introduction to physical optics is provided in Chapter §4, which revolves around spatial imaging. Several basic concepts in geometrical, wave, and Fourier optics are presented, as they provide the basis for the analogy with hearing in later chapters. The optics of the eye and the main elements in its peripheral physiology are presented. Finally, notable links and differences between imaging and Fourier analysis in acoustics and optics are mentioned.

Chapter §5 introduces a few basic information- and communication-theoretic concepts in a qualitative manner. Information theory is not applied directly in the work, but the physical propagation of information is taken to be the unifying element across the different stages of auditory processing. Several historical connections between information and hearing are briefly mentioned and it is argued that conservation of information—over the various signal transformations—has been taken as an implicit assumption of hearing theory. Actual communication systems can be described using generalized receivers and transmitters that deal with modulated signals. It is argued that hearing can be viewed as a communication system by assigning the appropriate roles of transmitter, channel, and receiver to the acoustic source, environment, and ear, respectively, and by recognizing that the intentional transfer of information is optional. The communication approach to information transfer is very similar to that used in simple spatial imaging, but there are some important differences between them that are highlighted as well.

Chapter §6 deals with the mathematical basis of physical and communication signals, which are used in all the theories that are relevant to this work. It begins from the analytic signal and the narrowband approximation, which gives rise to the important concept of instantaneous frequency. It then explores the roles of the temporal envelope and amplitude modulation in hearing and briefly reviews the role of phase in hearing, with emphasis on linear frequency modulation. Auditory phase perception has been a contentious topic, which gave rise to the concept of temporal fine structure as a proxy of auditory phase locking. However, several authors have indicated that the common way of applying these concepts in hearing has been inconsistent with the mathematics of broadband signals and with certain psychoacoustic observations. It is shown that the emphasis that has been put on the (mathematically) real envelope has led to auditory theory that treats hearing as a baseband (i.e., with low-pass characteristics, as though the system is capable of detecting sound down to 0 Hz) rather than a bandpass system. It is argued that a correct treatment of the system as bandpass is critical for embracing modulation and demodulation phenomena in hearing as a reality, rather than a metaphor. The existence of auditory demodulation along with a two-dimensional (carrier and modulation) spectrum are considered.

Chapter §7 is an exposition of the concepts of coherence and synchronization that are found in six different scientific fields that have some bearing on hearing: acoustics, optics, communication, neuroscience, auditory neuroscience and physiology, and psychoacoustics. While the essence of coherence as a concept may be shared between all six fields, it is obfuscated by the use of different jargons, sometimes for narrowly defined purposes. A standardized jargon is then proposed, which is used throughout the work. It largely adheres to the jargon used in optics of coherent and incoherent illumination and imaging, which overlaps with coherent and noncoherent detection in communication.

Chapter §8 draws heavily on optical coherence theory and summarizes its most important concepts that include interference, the mutual coherence function, partial coherence, temporal and spatial coherence, coherence time, coherence propagation according to the wave function, spectral coherence, the effect of narrowband filtering, and nonstationary coherence. These ideas are then linked to data gathered about known sound sources and to the theory of room acoustics, which has used coherence rather sporadically. Other topics in hearing that relate to coherence are briefly mentioned as well, such as binaural hearing and coincidence detection. It is argued that partial coherence and coherence time have central roles in auditory perception. Appendix §A provides several quantitative figures from typical acoustical sources to substantiate some of the main claims of the chapter.

Chapter §9 introduces the concept of synchronization from nonlinear dynamical system point of view. It focuses on the phase-locked loop (PLL)—one of the most important circuits in communication engineering, control theory, and general electronics. It is shown how a PLL can conserve the degree of coherence of an input signal at the output. It is then hypothesized and demonstrated how the phase locking that characterizes the mammalian low-frequency hearing can be the result of an auditory PLL, which may be assembled from known functions of the organ of Corti and the outer hair cells: a phase detector from the distorting mechanoelectrical transduction channels, a loop filter from the outer-hair cell membrane, and the self-oscillating hair bundle as the voltage controlled oscillator. This hypothetical feedback process may be additionally amplified by the somatic motility of the outer hair cells and feed into the inner hair cell transduction path. Available evidence that supports this idea, as well as known gaps in the model, are discussed at length. The usefulness and likelihood of having dual coherent and noncoherent detection within hearing are discussed as well.

Chapter §10 derives the paratonal (originally, the “paraxial”) dispersion equation that was first introduced in nonlinear optics and has been applied to scalar electromagnetic plane waves, but can just as well apply to pressure waves. This equation employs the space-time duality principle, which analogizes the spatial envelope to the temporal envelope of the wave field. The general solution of the wave equation requires the narrowband approximation—the decomposition of the wave into a fast moving carrier and a slow-moving complex envelope. Only the complex envelope is considered in the solution that has a fixed carrier. The solution requires the group-delay dispersion (or group-velocity dispersion), which is a fundamental property of the medium. It is expressed using the derivative of the standard (phase-velocity) dispersion and has not appeared in this name in acoustics before. The reciprocal operation to the group-velocity dispersion of the medium—that of a time lens, or a phase modulator—is also presented. Both group-delay dispersion and time lensing rely on quadratic phase transformations that can produce linear frequency modulation.

Chapter §11 goes through the signal transmission chain of the human ear and attempts to estimate its frequency-dependent dispersion parameters. The passive cochlea is known to be group-delay dispersive, and the magnitude of this dispersion is estimated to be much larger than that of the outer and middle ears. The group-delay dispersion associated with the neural pathways between the inner hair cells and the inferior colliculus—a quantity that has been considered to be negligible before—is estimated as well. It is speculated that the organ of Corti functions also as a time lens, and a physical principle of phase modulation is hypothesized, which has to do with the active change of stiffness that is caused by the outer hair cell electromotility. The time-lens curvature values for humans are roughly estimated based on animal data. The uncertainty in these estimates is very large and we derive approximate lower and upper bounds that may pertain to humans. Appendix §F offers an alternative derivation of the dispersion parameters using strictly psychoacoustic data instead of physiological data. The results are only partially consistent with the physiological estimates, because they introduce group-delay absorption to the parameters, which is largely ignored in this work and in optics, although it may be physically justifiable. Alternative explanations for the discrepancy are discussed.

Chapter §12 introduces the imaging equations, which require the three dispersive components to be in cascade—cochlear group-delay dispersion, time-lens curvature, and neural group-delay dispersion. The image of a pulse is computed and it is shown that it is inherently defocused in humans, based on the estimated parameters from Chapter §11. The same parameters are used to model available psychoacoustic data of the cochlear curvature in humans and high correspondence is found above 1 kHz. At lower frequencies, additional constraints of modulation bandwidth had to be introduced in order to obtain better estimates, although some uncertainty remains. The results also reveal the existence and the durations of the temporal aperture—a short sampling window that is associated with the different auditory channels. The estimates show close correspondence to additional human and animal data. It suggests that at high frequencies the aperture stop is determined by the auditory nerve, but at low frequencies it is determined by the cochlear filters.

Chapter §13 begins from the derivation of the impulse response of the defocused temporal imaging system for a single channel. The equations are then used to further derive the modulation domain transfer functions, which have not been previously introduced in optical temporal imaging, but are completely analogous to the spatial modulation and optical transfer functions from Fourier optics. Predictions are compared to available observations of the auditory temporal modulation transfer function. Interesting predictions and discrepancies are highlighted, where the dependence of the results on the degree of coherence of the stimulus is argued to be key.

The effect of sampling of continuous signals by the neural system is explored in Chapter §14. While sampling has been invoked several times in hearing models, the consequences of discretization have not been fully considered before. The significance of nonuniform sampling is discussed with respect to the modulation transfer functions, which are thought to degrade (lose modulation bandwidth) upon repeated resampling that occurs downstream, within the auditory pathways. The tradeoff between nonuniform sampling noise and aliasing from undersampling, as is known to take place on the retina, is hypothesized in the context of hearing. A psychoacoustic experiment that is interpreted as demonstrating the existence of auditory sampling is presented in Appendix §E, with emphasis on the effects of aliasing.

Chapter §15 explores in greater depth the idea of an auditory image based on all the previous findings and the principle of space-time duality. The concepts of sharpness, blur, focus, defocus, and depth of field are discussed, and a simple computation of the system temporal acuity is presented, based on its impulse response or the modulation frequency discrimination. A formal presentation of polychromatic images is made and pitch is discussed as a special case in auditory imaging that manifests in different ways. A subset of image aberrations from optics that can be relevant to hearing are discussed with speculations about the most significant auditory aberrations in humans. Ideas from masking theory are extrapolated to examine how supra-threshold stimuli sound in the presence of other sounds. Furthermore, nonsimultaneous masking is analogized to the auditory depth of field that applies temporally and is exaggerated by the signal processing of the auditory system. Most of the imaging effects considered are well-known auditory phenomena that are reinterpreted in light of the concepts of temporal imaging. Seven rules of thumb for auditory imaging are proposed that epitomize some of the analyses in the chapter.

Chapter §16 takes these ideas a step further and hypothesizes what an auditory accommodation function that is roughly analogous to accommodation in vision could be like. Different mechanisms are proposed for parameters that can be shifted within the system. The operation of the olivocochlear efferent bundle appears to be key in accommodating the PLL gain and/or the time-lens curvature. The plausibility of other mechanisms of accommodation is discussed. The coherence of the stimulus is a recurrent key parameter in the analysis that the system is hypothesized to react to. The idea that the system may be combining coherent and incoherent imaging products in different amounts is considered and complements the earlier discussion made in the context of coherent and noncoherent communication detection in Chapter §9.

Chapter §17 brings together the ideas of auditory imaging and coherence and sets to find out if they can be used to shed light on known hearing impairments. Evidence for dispersive shifts in hearing-impaired individuals is examined and their effects are considered. Additionally, the possibility of aberration and accommodation impairments is explored, also with analogy to eye disorders. Out of the different impairments considered, accommodation disorders and excessive higher-order aberrations appear to have the highest likelihood to be detrimental, but more conclusive relations to known hearing disorders cannot be made before the entire temporal imaging theory is elucidated and more relevant data become available.

The treatise closes in Chapter §18, where a functional model of the hearing system is presented that encompasses all of its standard parts, as well as the imaging components and the PLL, which are roughly mapped to the different auditory organs. The chapter concludes with a general discussion that highlights open questions, limitations, weaknesses, and merits of the ideas presented in this treatise. It also highlights several topics that can benefit from future experimentation. Finally, certain novel ideas and issues that appear in this work are mentioned, which may find interest outside of hearing research alone.