Tuning on the Cheap: Part 1 in a 3-part Series
This week, we start a technical series that will delve into the engineering/math/hardware/software aspects of module making. To kick off the series, we have part one Tuning on the Cheap, a three-part series on how tuning works -- both in our ear and how we implement it in Eurorack. Have a technical (or non-technical) question you'd like to see addressed here? Contact us!
Tuning on the Cheap: Introduction to the Series
Today's low-end microcontrollers are very powerful but they still require a bit of finesse to produce musically accurate tones. But how accurate is accurate enough to convince our ear that it's right? And how can we achieve this precision on a $2 microcontroller?
This first post will be an analysis of exactly how accurate an oscillator tuning needs to be to match the ear's listening precision. There is enough psychoacoustic literature to develop a good understanding of the limits of pitch perception. Part 2 will be focused on practical aspects of the analog-to-digital conversion. Part 3 will be focused on how to convert a value that is linear in musical pitch to one that is linear in time, making it suitable for generation of the proper pitch. This is simple on a more powerful computer, but on a cheap microcontroller with performance, space, and accuracy constraints, it's a challenge.
Tuning on the cheap: What is accurate enough?
Eurorack tuning starts in the analog domain. Most Eurorack-format modules use a 1v/8va scale for pitch. This simply says that a one-volt change in voltage corresponds to a one-octave change in pitch. This leads to the elegant formula
where Hz is the frequency, v is the voltage and f0 is the frequency at zero volts. Voltage must be buffered for noise immunity and to avoid effects of external impedance. Once buffered, it is converted to a digital value and then to a frequency (or more likely to a period for a timer). If any step along the way is not accurate enough, the pitch will not be accurate...but then how accurate is accurate enough? To gain insight into this question we turn to the psychoacoustics literature.
The human ability to detect pitch is quite remarkable. Mammals (that's us) have an organ in our inner ear (the cochlea) that evolved to discriminate between different sound frequencies. There is a tremendous amount of fascinating literature in bioacoustics and psychoacoustics [LINKS] but for the task at hand we are really concerned with only one question: What is the smallest change in pitch that can be detected by a human? (sidenote: if you really want to know more about hearing, let us know. Kris' PhD was in a lab focused on how animals hear...ears are weird, interesting, wacky, and a marvel of evolution.)
Pitch resolution in psychoacoustics is measured by a test called "Just Noticible Difference" (JND). This test can be applied to any sensory perception and is fundamentally just the smallest amount a stimulus must be changed for an observer to perceive a difference. Weber's Law defines it as
where S is the stimulus (frequency, intensity, whatever), and delta S is the difference threshold or change required to be perceived. Psychologists have applied JNDs to olfaction, vision, hearing, vibrational sensing, and more, and for a variety of applications, including marketing, room acoustics, speech modeling, and beyond.
JNDs are typically determined by testing many people on several stimulus changes and back-calculating the equation that best describes the relationship over the range of stimuli tested. Luckily for us Just Noticeable Pitch Difference has been extensively studied and we can learn the limits of human pitch detection by web search.
If you're curious about your own ability to detect differences in frequency, the Physics Department at Davidson College has taken the time to put a test you can take yourself online here. [as an aside, I love that they preface the test by making you doubt your own perceptions: "Are you quite certain that you are seeing a computer in front of you?"]
To summarize the JND research, below 500 Hz the JND is about 1 Hz; above 500 Hz, the JND is about 0.6% of the frequency. To unify the two ranges, convert the lower frequency criteria to percent; this gives a JND of 0.2% of the frequency in Hz. This is a reasonable, convenient, and linear-in-frequency approximation to the accuracy needed for pitch computations. We will compute our error bounds for both 0.2% and 0.6% to demonstrate the ear's differences at different frequencies.
To convert the JND to volts, we use the Hz to v formula from above, but we solve it twice: once we solve for voltage, and the second time we offset the Hz by the JND (which we have in Hz) and then solve for v. Subtracting the results gives the JND in volts.
Once we've converted the JND to volts, it's simple to determine the minimum unique pitches per octave. This can then be used to compute how many bits per octave resolution are needed for analog-to-digital conversion.
On with the calculation!
First our equations for a pitch and a pitch one JND away from it.
We solve these for voltage:
And then subtract them to determine the JND in Volts:
From the voltage we can determine how many uniquely distinguishable pitches exist per octave:
And from the unique pitches we can compute how many bits we need:
It is convenient to have an integral number of bits per octave since it allows octave math to be performed with shifts rather than multiplies so these numbers should be rounded up.
for JND of 0.2% 9 bits per octave is needed.
for JND of 0.6% 7 bits per octave is needed.
To compute the total number of bits needed for the entire scale add log2(octave count) to this number. For example, a 16-octave range would require 4 more bits, or 13 bits total for JND of 0.2%.
An aside, although the development is for digital use, these numbers are also useful for analog work as they give precise numbers for an acceptable noise floor and accuracy in analog CV circuits.
As with most everything in life, it's a little more complicated than what I'm describing here. One of the things that stands out in the psychoacoustic literature is the absolutely enormous variation between individuals in listening tests. It's probably not terribly surprising that in many of these tests, listeners with music or audio backgrounds (trained listeners) perform substantially better than their untrained counterparts. Another aspect of interpreting the science is that the results are based on very simplified tests that are sometimes not very representative of musical tones. Despite some caveats, these minimum precision standards form a solid basis for understanding how accurate synthesizer tuning needs to be.
Useful resources for further reading
Fastl, H. and E. Zwicker. 2007. Psychoacoustics. Springer-Verlag Berlin Heidelberg. http://www.springer.com/us/book/9783540231592
Moore, B.C.J. 2013. An Introduction to the Psychology of Hearing. 6th ed. Brill. http://www.brill.com/introduction-psychology-hearing-0 (an older edition is available on Amazon)