The Böhmer Audio Room Compensation Inner Workings

Introduction

The Böhmer Audio Room Compensation system uses a psychoacoustic measurement method to capture the loudspeaker and room response. This might not sound very significant and you would be forgiven thinking it is probably only some kind of marketing hype. This is however far from the truth and this paper aims to give a high level overview of how the psychoacoustic measurement method works and provide a background to why it is of such fundamental importance.

All correction systems need relevant and high quality input data to base their correction output on. This is invariably the same for all correction systems regardless if it is a room correction system or some other type of correction, e.g. the traction control system of a truck. The input data should have a high degree of correlation with the property that is supposed to be corrected or controlled. With better input data, the correction performance of the system improves.

Most modern electroacoustic measurement systems use a similar process to measure sound from a loudspeaker in a room. A test signal is played back through the speaker in the room and recorded with an omni directional microphone. The microphone can be located in different positions in relation to the speaker and the test signal is often a MLS controlled noise signal or a swept sine wave called a Chirp. Then, through the use of a few mathematical algorithms, the impulse response of the loudspeaker and room is calculated. A time gating window is then applied to the impulse response to cut out a slice in time. The time slice of the impulse response is then translated into a frequency response using a FFT algorithm.

Previous room correction solutions rely on variations of the described measurement method. The dissimilarities between them are that they use different microphone locations, often multiple locations, and different time slice sizes; some use long and others use shorter slices. The calculated frequency response is then used to interpret the behavior of the loudspeaker and the room in order to apply correction.

This might on the surface look like a reasonable and viable solution to acquire input for the room correction system but unfortunately there is a significant problem. The human hearing does not at all interpret the loudspeaker’s sound in the room in a similar manner as the measurement system. There is some correlation between a response measured like this and the way a loudspeaker sounds in a room but it is far from good. Anyone who has tried to measure a loudspeaker in a room using the above described method has experienced that the captured response varies greatly depending on where the microphone is located and the size of the gating window. The obtained frequency responses also clearly correlate poorly with how the loudspeaker’s sound is perceived when one listens to it.

This is a fundamental problem; every correction process completely relies on relevant measured input to work properly. The input data to a correction system need to correlate very well with the property it is meant to correct or else it can’t produce a predictable and consistent result. Let’s look at an example. Suppose we are designing a traction control system for a truck and someone came up with the idea to measure skidding of the trailer wheels to correct the wheel spin of the drive wheels on the truck. It is rather obvious that this idea might not be the best; there is likely some correlation between skidding trailer wheels and spinning drive wheels but it is not great. Clearly the correction would work poorly and be unpredictable if we pursued this approach. The obvious and much better solution would be to measure the wheel spin on the drive wheels and lower the torque from the engine the instant the wheels start spinning, i.e. we need to measure the thing we want to correct or the result might not be predictable and far from optimal.

This example translates rather well to the current state of affairs in room correction systems; they measure the skidding of the trailer wheels but have no direct information of what is happening with the drive wheels. The flaws in the measurement process cause performance weaknesses and unpredictable results. There is a lack of consistency and the corrections are sometimes improving the sound and sometimes not so much even possibly making things worse. There is also a lack of uniformity of the sound in the room; the base might be good in one location but booming and unarticulated in another. Some people think that the room correction systems sound artificial.

So, clearly the first issue is to solve the fundamental problem, the measurement of the loudspeaker and room must correlate to human perception of sound or else we are measuring skidding trailer wheels.

The Böhmer Audio Room Compensation uses a unique proprietary psychoacoustically based measurement method emulating human perception of sound through complex mathematical algorithms. We will look at how this works later but let’s first start by looking at basic human perception of sound.

Human perception of sound

Human perception of sound is a very complex mechanism that we probably never will be able to replicate using models and algorithms. Our imagination, preconceptions and expectations play a great part in our perception of sound and such mechanisms are extremely difficult to emulate. There are many very interesting psychoacoustic studies concerning these subjects but we will in this case only touch upon a couple of much simpler basic processes that are central to our hearing mechanism.

First, at the onset of a sound we initially listen to the individual frequency components of the sound, the buildup of the harmonics of the sound and the phase relationships between the fundamental and the harmonics. We use the first few milliseconds of the sound to determine whether the object is metallic, made of wood, rubber or something else we may recognize. We also use this section in time to determine if it is a violin, trumpet or a book that fell down on the floor.

A metallic object usually has more upper harmonics than a wooden object and in addition the harmonics of a metallic object builds up faster after the onset of the fundamental compared to wood. A note from a violin and a trumpet has very similar waveforms but the instruments have different fundamental versus harmonic startup patterns that we use to differentiate between the two. Without the startup section of the sound from these instruments it can actually become difficult to judge whether it is one or the other.

Many psychoacoustic studies have explored these phenomena and they are well documented in the scientific literature. With this knowledge it becomes evident that our hearing listens for the existence of various frequencies at various points in time with great accuracy and resolution. The ear apparently resolves with better than millisecond resolution the way several harmonics emerge after the onset of a fundamental. Without this ability we would not be able to distinguish between a violin and a trumpet or know whether the sound emerged from a wooden or metallic object.

The ear apparently works by analyzing both the frequency domain and the time domain at the same time. To picture this one can think about it as if our hearing mechanism uses a three dimensional sound landscape with time on one axis, frequency on another and level on the third. Think of a swimming pool with frequencies represented along the length of the pool, time along the short end and level by the level of the water in the pool. Now imagine a snapshot of the waves on the water after someone jumped into the pool and you have a pretty good picture of what the hearing mechanism is looking at to analyze the sound.

Secondly, there is something called the precedence effect. It is a psychoacoustic masking phenomenon; the ear is less sensitive to sound that occurs in time just after it has experienced a preceding higher sound pressure. What this means is that if we stand along the short end of the pool, the time domain axis, looking at the water surface, any wave we find that has a preceding higher wave need to have its level adjusted down to some degree or the pool surface won’t be an accurate representation of what the ear actually hears.

Thirdly and finally, our hearing mechanism uses different portions in time, different sections of the pool surface, to determine different properties of the sound. The first 5 milliseconds are treated as the direct sound that determines the location of the sounds and the materials involved in the sound generation, the next portion between approximately 5-25 milliseconds are used to continue the material determination process and also to determine type of object/instrument and object size. At 25 milliseconds we start to perceive the sound as an echo that provides us with information about the acoustic environment, size and properties of the enclosed space.

During these different sections in time the ears sensitivity to different aspects of the sound changes and again there is a need to change the water surface level in the different sections to properly reflect what the ear is hearing.

These are three examples of basic psychoacoustic hearing mechanisms that have a large influence on our sound perception. Bearing these in mind it hopefully becomes clear that our hearing looks on sound in a three dimensional landscape with time, frequency and level on the axes. The interpretation of the level depends on where in the landscape it occurs and whether there is another sound preceding it that attenuate its perceptibility.

Now, do you believe that all the aspects of this complex three dimensional sound landscape can be compressed into one single two dimensional frequency graph?

Let’s look closer on how a frequency response is conventionally measured and what information it actually contains.

The time gated frequency response

First there is the process of obtaining the impulse response of the loudspeaker and the room. The impulse response is simply a collection of samples that describes how the sound pressure varies over time. There are many ways to obtain this information and the specific method employed is not important in this case so just assume we have managed to acquire the impulse response of the loudspeaker and room with great accuracy. Let’s also say that our data contains information about how the sound pressure varies from just before the impulse starts and it is 10 seconds long.

Now we are faced with the decision of what portion of the impulse data to use. This can of course be anything from all of it to a very short portion of it so we need some reasoning behind the selection. Some room correction systems use a longer section that can as an example be about 200ms long and some others use a much shorter time slice, say 50ms. We already know that there is poor correlation between our measurement, whatever time slice we choose, and the sound we hear but just say we choose to use 200ms of the impulse response which seems to be a commonly selected value.

What we then do is to apply a mathematical transform called FFT on the impulse response slice to get a frequency response but what data does the frequency response contain and how is the level at each frequency calculated?

What the algorithm does essentially is that it looks on the time slice we selected to use of the impulse response data and finds the sound pressure level for every individual frequency of the frequency response we are about to calculate. At every frequency it runs over the time slice from the start and measures at every sample in our data how high the sound pressure is for the particular frequency. The sound pressure level is then accumulated, much like putting a ball into a bucket, more balls representing higher sound pressure. When it comes to the end of the time slice it has a number of balls in the frequency bucket that represents the level for that particular frequency. It goes on and does the same operation for every frequency. In the end we have a row of buckets, one for each frequency, filled up to various levels. The level in each bucket represents the accumulated sound pressure level for the particular frequency. We can now plot a two dimensional graph with the levels in each bucket on one axis, the sound pressure level, and the buckets, frequencies, on the other. We have our frequency graph.

So what just happened?

The FFT algorithm accumulated the level at every frequency without any regard for where the level occurred in time and without applying any sort of weighing based on the precedence effect or anything else we understand about psychoacoustics and human perception of sound.

Thinking about what we know about how our hearing works this just can’t be the right way to do this. It also becomes quite clear that regardless of what length of time slice we chose to use, utilizing the time gated measurement method we would not be able to get a measurement that correlates to the way humans perceive sound in a room. The traditional time gated measurement process is just ignoring human hearing and simplifies the analysis to a point where its usefulness is impaired, i.e. it is measuring skidding trailer wheels when it should be measuring spinning drive wheels on the truck.

The Böhmer Room Compensation

The Böhmer Audio Room Compensation’s psychoacoustically based measurement method looks upon the loudspeaker room response in a three dimensional way similar to how our hearing mechanism works. The mathematical algorithms in the Böhmer Audio Room Compensation system then operates on this psychoacoustically preprocessed three dimensional data to find optimal solutions.

When a loudspeaker is playing back recoded sound the loudspeaker and room always introduces time smear, the impulse response is smeared out in time instead of being focused. This always happens and the way the smearing turns out depends on the room. The construction materials of the walls, ceiling and floor have a large influence as well as the loudspeaker location within the room. Things like windows and doors and their mechanical properties can also play a significant role.

The Böhmer Audio Room Compensation system tries to minimize the time smear by looking in the psychoacoustically measured three dimensional sound landscape to find frequencies to apply correction to. The algorithms then optimize the time domain by testing thousands of solutions to a point where it doesn’t get any better. All of this is done within the three dimensional sound landscape applying our knowledge of how the human hearing works to the algorithms.

This approach is fundamentally different from the traditional approaches seen to date in earlier correction systems that are based on the simple time gated measurement method.

References

[1] David M. Howard, Jamie A. S. Angus, Acoustics and Psychoacoustics, fourth edition, 2009, ISBN 978-0-240-52175-6

[2] Albert S. Bregman, Auditory Scene Analysis The Perceptual Organization of Sound, 1994, ISBN 978-0-262-52195-6

[3] David Griesinger, Pitch, Timbre, Source Separation and the Myths of Loudspeaker Imaging, Presented at the 132nd Convention of the Audio Engineering Society 2012 April 26–29, Budapest, Hungary

[4] John Atkinson, “Loudspeakers: What Measurements Can Tell Us--and What They Can't Tell Us”, Presented at the 103rd Convention of the Audio Engineering Society 1997 September 26-29 New York.

[5] Robert Berkovi t z and Grayson Abbott, “A LOUDSPEAKER MEASUREMENT SYSTEM BASED ON SIGNAL PROCESSING AND BIOPHYSICAL SIMULATION”, presented at the 67th Convention of the Audio Engineering Society, 1980 Oct.31/Nov. 3 NewYork.

[6] Jorma Salmi, “A NEW, PSYCHOACOUSTICALLY MORE CORRECT WAY OF MEASURING LOUDSPEAKER FREQUENCY RESPONSES”, Presented at the 73rd Convention of the Audio Engineering Society, 1983 March 15-18, Eindhoven, The Netherlands.

[7] Bernt Böhmer, Böhmer Audio, “Psychoacoustic Measurements & Room Compensation System”, Version 1.4, July 2015.