Choosing the Right MCU for Your Audio Application

By Lee H. Goldberg

Contributed By Electronic Products

2012-05-16

MCUs currently powering many consumer products and embedded systems are now being asked to support various digital audio functions that used to be handled by DSPs, ASSPs, or other dedicated silicon. Fortunately, the power and sophistication packed into many 16-bit MCUs can support basic audio processing functions and allows them to perform tasks such as audio record/playback, audio stream conversion, and other innovative audio applications. But how do you choose an MCU that will give your product the price, performance, and design flexibility you need? Read on for answers to these design questions, along with a roundup of the latest chips and development kits for audio applications.

Audio characteristics

The type and amount of processing power you select for your audio application is one strand in a web of relationships between solution cost, desired audio quality, and available memory space. While the actual sample size used by a particular application can range between 8 and 24 bits, we will assume that most of the ones we deal with here use 12 to 16-bit samples. Since sample rate is a prime determinant in the quality of the sound you will be dealing with) we have provided a useful breakdown of audio sources and the sample rates commonly used to capture or reproduce them (Table 1).

Audio Source	Frequencies	Popular Sampling Rates
Tones, Buzzers	Usually a sinusoid of single frequency within the 3 kHz range	2 to 4 times the tone with the largest frequency
DTMF	A weighted sum of two sinusoids at specific standard frequencies between 500 Hz and 3 kHz	7.2 kHz or greater
Alarms	Usually a time-varying sweep of a range of frequencies	Twice the largest frequency
Human Speech / Voice	Can be viewed as a weighted sum of signals between 300 Hz to 3.3 kHz. A human voice is capable of generating these frequencies	8 kHz, 11.02 kHz, 16 kHz
Music & Musical Instruments	Can be viewed as a weighted sum of signals between 20 Hz and 20 kHz. A human ear can perceive these frequencies	32 kHz (Good enough for most instruments), 44.1 kHz (CD quality), 48 kHz (PC sound cards)

Table 1: Characteristics of common audio sources. (Courtesy of Microchip Technology.)

MCUs for voice-grade applications

Since memory space (and transmission bandwidth) is usually at a premium in an embedded system, digital compression is applied to the data stream, either throwing away part of the information it contains or using a more complex coding algorithm to represent it in a more compact manner. Compression/decompression can be done using either an external codec or a software codec running on the MCU itself. Figure 1 illustrates the quality/data rate trade-offs involved compressing a standard 128-kbit/s audio stream with the most commonly-used ITU (G.7xx) algorithms as well as the Speex¹ open-source codec.

Bit rate vs. audio quality

Figure 1: Bit rate vs. audio quality for commonly used speech codecs. (Courtesy of Microchip Technology.)

The amount of processing power (MIPS) a particular codec requires varies in rough proportion to the compression ratio and quality of the audio it delivers, as illustrated in Table 2. Fortunately, even inexpensive 16-bit general-purpose MCUs can easily support simple software codecs used for speech processing, such as adaptive differential pulse code modulation (ADPCM) or the simpler G.7xx ITU standards. The G.711 algorithm requires approximately 1 MIPS to process moderate quality human speech at a 2:1 compression ratio. The G.722 wideband algorithm delivers better audio quality and a 4:1 compression ratio while consuming only 5 MIPS. Both codecs can be comfortably supported on 16-bit MCUs, such as Freescale’s HC12 series, Microchip’s PIC24F /PIC24H families, or Texas Instruments’ extensive line of MSP430 MCUs, with enough reserve for supervisory code or other applications. In addition to the usual telephony and VoIP applications, these inexpensive techniques can make use of the surplus MIPS lurking within many embedded systems to add audio alerts (or even speech synthesis) to smoke detectors, alarm systems, or exercise and industrial equipment.

Algorithm	G.711	G.726A	Speex
MIPS	1	13	20
Flash (KB)	3.5	6	30
RAM (KB)	3.5	4	7
Memory needed to store 1 second of encoded speech	8 KB	2, 3, 4, or 5 KB	1 KB

Table 2: Processing requirements for commonly-used speech codecs. (Courtesy of Microchip Technology).

MCUs for music applications

The processing and memory requirements for decoding the MP3/4 streams used in most popular media players are significantly higher than the voice-grade applications discussed earlier. For compact disc (CD) quality audio, the standard is 16-bit resolution with a 44.1-kHz sample rate, and many applications use 24 bits with sample rates of 96 kHz or higher. Many designs are also required to support Microsoft’s WMA and Apple’s AAC decoding, which require even more processing power. As a result, it is often more cost-effective to implement these sophisticated coding schemes using a dedicated audio decoder such as ROHM’s BU9457KV or a Cirrus Logic audio decoder that produces serial PCM data fed to an integrated D/A stage or an off-chip audio codec, such as Cirrus Logic’s CS4270 or NXP’s UDA1341TS.

Nevertheless, low-cost MCUs can still play a big role in consumer audio, usually by managing the digital music streams in audio accessories, such as docking stations and digital speaker sets (Figure 2). In these applications, a frame of PCM audio data (encapsulated in the USB audio class format) arrives every 1 ms via one of the processor’s SPI/I²C serial channels. The USB audio class data format also provides controls for common features such as volume, tone, gain control, and equalizers.

Audio docking station

Figure 2: In an audio docking station, a low-cost MCU can be used to perform format conversion, sample rate adaptation, and stream management, as well as support the dock’s user interface. (Courtesy of Microchip Technology.)

Depending on the source, the audio stream may arrive in one of several formats (i.e., left-justified, right-justified, I2S, etc.) and some lower-cost codecs can only accept a specific format. In these cases, the MCU must make sure the data is properly aligned before it is fed to the codec. Since not all audio sources use the same sampling rate, the codec must adapt its sampling frequency to the source or rely on the MCU to convert the sampled data stream into a common data rate. In these cases where a lower-cost codec is employed, it usually lacks its own buffer, so the MCU must also manage the stream to avoid under- or over-run conditions that would otherwise cause silences, pops, and audio discontinuities that occur with data loss.

Some 16-bit MCUs, and nearly any 32-bit MCU, that deliver 40 MIPS or more have the processing capacity to support the stream management, buffering, and format conversion that takes place in an audio dock. Some manufacturers like Microchip have added special features which can cut implementation costs. For example, some MCUs in Microchip’s MX2 family have memories specifically sized for these types of applications and have an I²C ref clock output. This allows the MCU to supply a sample rate clock (master clock) that eliminates the need for a more expensive codec with an integrated or external PLL.

In applications where a fully programmable solution is desired, it is possible to perform MP3/4, AAC, or WMA decoding using a general-purpose MCU (typically 16-bit) that can supply 40 MIPS or more worth of general-purpose RISC processing. The application code for these applications typically occupies 128k of flash can require up to 48k of RAM, plus memory for other functions such as the user interface or simple graphics processing for the player’s small LCD. For example, Atmel’s AT32UC3 family of MCUs is also designed specifically for a variety of consumer audio applications, including docking stations, decoder/playback systems, and USB Audio Class devices (Figure 3).

Atmel’s versatile AT32UC3 family

Figure 3: Atmel’s versatile AT32UC3 family can be used as the basis for an audio docking station, a decoder/playback system for MP3, WMA and AAC audio, or as a USB Audio-Class device. (Courtesy of Atmel.)

Another option is to use so-called digital signal controllers (DSCs), which have extended instruction sets that support multiply-accumulate (MAC) operations and hardware accelerators that give them DSP-like capabilities. DSP-enhanced MCUs like Freescale’s 56800/E, Microchip’s dsPIC 30 series, STMicro’s ST10 processors require fewer instructions to execute an equivalent code/decode function, freeing resources to devote to some of their other functions like rate adaptation, advanced filtering, and equalization algorithms.

Getting started

Most manufacturers make it easy to get started with a digital audio project by offering application-specific development kits that include all the necessary hardware, software, and development tools in a single convenient package. One example is Microchip’s DM320013 audio development kit for its PIC32 MX1 and MX2 series (Figure 4). This flexible, USB-powered platform comes pre-loaded with demo code for an audio player with high quality audio features, including 24-bit audio record and playback, USB Digital Audio, MP3 decode and sample rate conversion, as well as support for development of basic user interfaces.

The PIC32 MX1/ MX2 starter kit (DM320013)

Figure 4: The PIC32 MX1/ MX2 starter kit (DM320013) is designed for development of high quality audio applications and basic user interfaces with mTouch buttons. (Courtesy of Microchip Technology.)

Atmel also supplies several application-specific development kits, including the EVK1104 which contains all the elements for Hi-Fi audio decoding and streaming applications. Built around Atmel’s AT32UC3A3256AU 32-bit microcontroller, the board includes a high-speed USB On-The-Go (OTG) interface, a dual SD card interface, ECC NAND flash, and a stereo 16-bit DAC. The kit also contains reference firmware for playing MP3 files from mass-storage devices, and demonstrates Atmel's patented QTouch capacitive-touch control. For docking stations, Atmel’s EVK1105 contains all required reference hardware and software for control and digital audio streaming from iPod, iPhone, and iPad devices.

Summary

There used to be a firm line that divided MCUs and DSPs and their applications, but this has blurred as the instruction sets of some general-purpose processors now sport multiply-accumulate (MAC) instructions and other DSP-like capabilities. MCUs with the ability to perform efficient signal processing make it easy to add voice recognition, audio record/playback, and other innovative audio functions to nearly any application. In addressing the question of how to select an MCU for audio applications, we’ve explored audio sample size and rates as well as processing requirements for popular codecs and then presented MCUs well-suited for audio apps. More information on the parts mentioned can be obtained by using the links provided to access product pages on the DigiKey website.

References and Footnotes

Speex – A patent-free audio compression format designed for speech. The Speex Project provides a free alternative to expensive proprietary speech codecs. More information at http://speex.org/
AN1422 - “High-Quality Audio Applications Using the PIC32” - an application note from Microchip Technology.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of DigiKey or official policies of DigiKey.