Use a Multicore Processor to Build Low-Power Media-Rich Smart Products

By Stephen Evanczuk

Contributed By Digi-Key's North American Editors

Smart products are increasingly combining traditional real-time embedded design requirements with more demanding application-level software. Along with greater computational load, these products need to support user interaction over advanced media services such as touchscreens, high resolution 3D graphics, and image sensors.

Even as requirements expand, developers face greater pressure to reduce power, cost, and system size, forcing them to find a more effective alternative to traditional multiple subsystem designs.

This article will discuss the evolution of design requirements and how processors have, to date, met these requirements. It will then introduce and show how to use new processors from NXP Semiconductors. These offer developers a simpler solution for meeting growing demand for designs able to handle both real-time and application computing requirements.

Scaling processors to preserve code investment

It has been common practice for developers to scale the computing power of their designs using microprocessors built with additional cores matched to specific application processing requirements. Using this method, developers could get a performance boost while retaining compatibility with their existing code base and experience with that device family.

With the NXP i.MX 6 applications processor series, for example, designers could retain code compatibility while scaling performance using an i.MX 6Solo microprocessor with a single Arm® Cortex®-A9 application processor core, an i.MX 6Dual with two cores, or an i.MX 6Quad integrating four cores.

The same need for scalable application processing is evolving rapidly in designs once considered traditional embedded systems. For these designs, product manufacturers look to add intelligence to home appliances, industrial controllers, medical instruments, and much more. Semiconductor vendors have responded with heterogeneous multicore processing (HMP) devices that integrate both application processor cores and embedded processor cores. For example, the NXP i.MX 8M family combines up to four Arm Cortex-A53 application processor cores with an Arm Cortex-M4 embedded processor core.

With this single device, developers can build smart audio products. These take full advantage of the application processor to enhance, filter, or otherwise manipulate audio data. The audio system then relies on the embedded processor core to perform the real-time functions needed for playback (Figure 1). While real-time operations proceed on the embedded processor, the application processor can be placed in a low power mode to reduce overall power consumption. When the real-time operations are complete, the embedded processor simply sends a wake-up signal to the application processor.

Diagram of heterogeneous multicore processors

Figure 1: Heterogeneous multicore processors are particularly effective for smart product designs, providing both high performance computing and real-time capabilities, without compromising tight power budgets. (Image source: NXP Semiconductors)

Along with its heterogeneous cores, the NXP i.MX 8M family integrates an extensive set of multimedia resources including a 4K video processing unit (VPU) and a high- performance 3D graphics processing unit (GPU). As a result, the family can provide an effective solution for a growing class of video and 3D graphics smart products.

Developers building high performance systems nevertheless face growing demand for lower power consumption for both battery and line-powered systems. To address this, the NXP i.MX 8M Mini processor family is fabricated with an advanced semiconductor process technology that solves the conflicting requirements for both high performance mixed load processing and reduced power consumption.

High performance, low power computing

As the first NXP HMP family fabricated with a 14 nanometer (nm) FinFET process technology, the NXP i.MX 8M Mini processors are designed for emerging industrial and Internet of Things (IoT) systems that blend requirements for high performance, strong security, and low power. As with the NXP i.MX 8M family, the NXP i.MX 8M Mini integrates an Arm Cortex-M4 for embedded processing with up to four Arm Cortex-A53 cores for application processing, along with a comprehensive security subsystem and multiple connectivity and I/O options (Figure 2).

Diagram of NXP i.MX 8M Mini processor (click to enlarge)

Figure 2: The NXP i.MX 8M Mini processor combines up to four Arm Cortex-A53 application processor cores and an Arm Cortex-M4 embedded core with a full complement of security, multimedia, system features, and I/O interfaces required in emerging smart products. (Image source: NXP Semiconductors)

To meet its role in embedded designs, the i.MX 8M Mini reduces the number of some I/O channels and the high-resolution capabilities of its multimedia subsystem compared to the previous NXP i.MX 8M. For example, i.MX 8M Mini devices including commercial (MIMX8MM6DVTLZAA) and industrial (MIMX8MM6CVTKZAAA) versions offer a one shader GPU and a 1080p 60 Hertz (Hz) VPU compared to the NXP i.MX 8M’s four shader GPU and 4K VPU. Other members of the i.MX 8M Mini family including commercial (MIMX8MM5DVTLZAA) and industrial (MIMX8MM5CVTKZAA) devices eliminate the VPU entirely. As with the earlier NXP i.MX 8M, the i.MX 8M Mini lets developers further balance performance and cost by taking advantage of processor support for multiple memory types, including double data rate 3 low voltage (DDR3L), DDR4, and low power DDR4 (LPDDR4)

Reduced power consumption

To further refine performance and power consumption, developers can take advantage of the NXP i.MX 8M Mini’s ability to switch automatically or programmatically to lower power operating modes. Under certain conditions the application cores can automatically switch to idle mode, turning off the GPU, VPU, and application cores, while clock gating most of the internal logic but retaining power to L2 data cache. In this mode, the Arm Cortex-M4 core can also continue to run to perform more traditional embedded processing tasks such as sensor data acquisition.

Suspend mode is the most power efficient mode, extending the power reduction applied in idle state by also disabling the double data rate controller (DDRC) that manages the memory interfaces. Finally, secure non-volatile storage (SNVS) mode retains power only to SNVS logic and the real-time clock.

The power reduction characteristics of the 14 nm FinFET process technology used in the i.MX 8M become particularly evident in the suspend and SNVS modes. In the i.MX 8M Mini with FinFET, suspend mode consumes about 7.81 milliwatts (mW). In the earlier i.MX 8M, the functionally similar mode (called deep sleep mode) consumes 197 mW. Similar power reduction applies in SNVS mode, where the i.MX 8M Mini consumes about 0.11 mW, while the earlier i.MX 8M consumes about 17 mW.

With all the functional blocks and finely tuned operating modes in complex devices like the i.MX 8M Mini, engineers face numerous strict implementation requirements. As with other devices in this class, the i.MX 8M Mini relies on multiple power domains to optimize power management and efficiency. To start and stop these devices requires rigid adherence to specified sequences for power-up and power-down.

Powering up the i.MX 8M Mini starts with applying power, typically 1.8 volts, to the GPIO pre-driver in the SNVS bank. Within 2 milliseconds (ms) (2.4 ms max), 0.8 volts (typical) must be applied to the SNVS core logic. This process continues with specific signals or power being applied to i.MX 8M Mini pins in sequence through 12 additional steps, each timed to occur within intervals ranging from 0.015 ms to 20 ms.

Powering down the device follows the reverse sequence, but with a uniform 10 ms delay between sequential stages. In either case, failure to follow these guidelines can prevent the device from booting, cause excessive current during power up, or in the worst-case result in irreversible damage to the device.

Physical design of a pc board for the processor brings its own strict requirements. For laying out the processor’s 486 ball 14 x 14 millimeter (mm) package, NXP recommends using a minimum eight layer pc board stack-up with sufficient layers dedicated to the power rails to remain within the current-resistance (IR) drop guidelines. At the same time, layout engineers must ensure minimal crosstalk on high-speed lines, including those for supported memory devices operating at speeds of 1.5 gigahertz (GHz) and 3,000 megatransfers per second (MT/s).

Rapid development

To help engineers quickly begin application development or speed development of custom hardware designs, NXP provides the NXP i.MX 8M Mini EVK (evaluation kit) and associated reference design (Figure 3). Delivered as a base board and system-on-module (SOM) board, the evaluation kit provides a complete system including external LPDDR4 memory and flash as well as USB and other interfaces.

Diagram of NXP i.MX 8M Mini EVK board set (click to enlarge)

Figure 3: The NXP i.MX 8M Mini EVK board set provides a complete system platform for immediate evaluation of i.MX 8M Mini processors and for rapid development of heterogeneous multicore processor applications. (Image source: NXP Semiconductors)

Along with Gigabit Ethernet, the kit includes Wi-Fi and Bluetooth for connectivity options. Available add-on boards including the MINISASTOCSI camera module and MX8-DSI-OLED1 organic light emitting diode (OLED) touchscreen fill out the design for video and display applications.

With available pre-built images for Embedded Linux® and Embedded Android® operating environments, developers can immediately use the kit to evaluate the i.MX 8M Mini processor and run sample applications. For developers building their own software applications, NXP offers its free MCUXpresso software development kit (SDK) which provides a full set of software components to build a typical high-performance embedded application (Figure 4).

Diagram of NXP MCUXpresso software development kit (SDK)

Figure 4: The MCUXpresso software development kit (SDK) architecture comprises the full set of software layers including drivers, board support package, and optional components required for developing most embedded applications. (Image source: NXP Semiconductors)

Using the NXP online MCUXpresso SDK Dashboard, developers can configure an SDK configuration for the GCC Arm Embedded toolchain or IAR Embedded Workbench for Arm. The dashboard also lets developers add optional middleware components including the Arm Cortex Microcontroller Software Interface Standard (CMSIS) DSP library and Amazon FreeRTOS. The configured SDK distribution comes complete with Arm standard libraries, peripheral drivers, peripheral driver wrappers for FreeRTOS, and an extensive set of software samples. Among the software samples in the SDK distribution, a pair of applications demonstrate a key message exchange design pattern that is fundamental to heterogeneous computing.

In any multicore computing environment, separate processors require efficient mechanisms for passing requests and sharing data. For the i.MX 8M Mini EVK applications, NXP uses its RPMsg-Lite, a lightweight version of the Remote Processor Messaging (RPMsg) protocol. The RPMsg protocol was created as part of the Open Asymmetric Multi Processing (OpenAMP) framework project to provide a standard interface for communication between multiple cores in a heterogeneous multicore system. NXP’s RPMsg-Lite addresses resource limitations in smaller embedded systems, providing a smaller footprint and eliminating features not consistent with these systems.

Within the SDK distribution, one sample application, rpmsg_lite_pingpong_rtos, demonstrates a bare bones exchange, using RPMsg-Lite to implement a simple ping-pong interaction between different processors (Listing 1). After creating an RPMsg queue (my_queue) and end point (my_ept) for the other “remote” processor, the “host” application processor signals the remote core. After receiving a handshake reply from the remote core, the host begins a loop that uses a blocking wait for a “ping” message from the remote before sending its own “pong” response.

void app_task(void *param)
    my_rpmsg = rpmsg_lite_remote_init((void *)RPMSG_LITE_SHMEM_BASE, RPMSG_LITE_LINK_ID, RL_NO_FLAGS);
    while (!rpmsg_lite_is_link_up(my_rpmsg))
    PRINTF("Link is up!\r\n");
    my_queue = rpmsg_queue_create(my_rpmsg);
    my_ept = rpmsg_lite_create_ept(my_rpmsg, LOCAL_EPT_ADDR, rpmsg_queue_rx_cb, my_queue);
    ns_handle = rpmsg_ns_bind(my_rpmsg, app_nameservice_isr_cb, NULL);
    rpmsg_ns_announce(my_rpmsg, my_ept, RPMSG_LITE_NS_ANNOUNCE_STRING, RL_NS_CREATE);
    PRINTF("Nameservice announce sent.\r\n");
    /* Wait Hello handshake message from Remote Core. */
    rpmsg_queue_recv(my_rpmsg, my_queue, (unsigned long *)&remote_addr, helloMsg, sizeof(helloMsg), NULL, RL_BLOCK);
    while (msg.DATA <= 100)
        PRINTF("Waiting for ping...\r\n");
        rpmsg_queue_recv(my_rpmsg, my_queue, (unsigned long *)&remote_addr, (char *)&msg, sizeof(THE_MESSAGE), NULL,
        PRINTF("Sending pong...\r\n");
        rpmsg_lite_send(my_rpmsg, my_ept, remote_addr, (char *)&msg, sizeof(THE_MESSAGE), RL_BLOCK);
    PRINTF("Ping pong done, deinitializing...\r\n");
    rpmsg_lite_destroy_ept(my_rpmsg, my_ept);
    my_ept = NULL;
    rpmsg_queue_destroy(my_rpmsg, my_queue);
    my_queue = NULL;
    rpmsg_ns_unbind(my_rpmsg, ns_handle);
    msg.DATA = 0;

Listing 1: This snippet from sample code provided in the MCUXpresso software development kit demonstrates the basic design pattern for performing interactions between different processors in a heterogeneous multicore processor. (Code source: NXP Semiconductors)

Developers can easily build on this simple exchange to create complete operations designed to allocate task execution across multiple processors. Another sample application, sai_low_power_audio, uses RPMsg-Lite essentially as a low-level data link layer for a higher level Simplified Real Time Messaging (SRTM) application protocol. In this application, an Arm Cortex-A53 processor uses this SRTM protocol to request the Arm Cortex-M4 processor to play back an audio file located in shared memory. After taking control of the shared buffer, the M4 performs several operations, ultimately executing a smart direct memory access (SDMA) transaction to transfer the data to the appropriate code, and finally to the serial audio interface (SAI) for audio output. During the operation, the A53 core can enter a low-power mode. Although more complex in design than the ping-pong application, the sai_low_power_audio sample application demonstrates how developers can use heterogeneous multicore processors to maximize performance while minimizing power consumption in smart products.


Smart products are combining traditional real-time embedded processing systems with substantial application processing capability. Yet, developers need to meet these processing requirements while satisfying continued expectations for lower power products in battery and line-powered systems alike.

Fabricated with an advanced semiconductor process, the NXP i.MX 8M Mini applications processor provides the required mix of low power consumption and high performance heterogeneous multicore processing capability. Using NXP i.MX 8M Mini devices, developers can respond more effectively to emerging requirements for high performance computing in embedded systems designs needed in increasingly sophisticated smart products.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of Digi-Key Electronics or official policies of Digi-Key Electronics.

About this author

Stephen Evanczuk

Stephen Evanczuk has more than 20 years of experience writing for and about the electronics industry on a wide range of topics including hardware, software, systems, and applications including the IoT. He received his Ph.D. in neuroscience on neuronal networks and worked in the aerospace industry on massively distributed secure systems and algorithm acceleration methods. Currently, when he's not writing articles on technology and engineering, he's working on applications of deep learning to recognition and recommendation systems.

About this publisher

Digi-Key's North American Editors