Targets: Automotive, Communication & Wired, Computers & Peripherals

ARM Cortex-R4(F) Block Diagram

The ARM Cortex-R4(F) processor is a mid-range synthesizable core for deeply embedded applications, including automotive, baseband controller, imaging, mass storage/HDD and microcontrollers. The Cortex-R4 processor builds on the ARM9E foundation. The Cortex-R4 processor offers more performance than the ARM946E-S processor, and is 50 percent more efficient than the ARM946E-S processor running Thumb code. Depending on the configuration, at 200MHz the Cortex-R4 processor can be smaller and consume less power than the ARM946E-S processor. At reduced clock frequencies, the Cortex-R4 processor gate count can be as low as 180k gates.

Based on the ARMv7 instruction set architecture, the Cortex-R4 processor utilizes Thumb-2 technology for enhanced performance and improved code density. Thumb-2 is a blended instruction set. It contains all the 16-bit instruction opcodes from the Thumb instruction set, as well as a large range of 32-bit instructions to provide almost the full functionality of the original ARM instruction set. This means that 16- and 32-bit instructions can be mixed on an instruction-by-instruction basis and a compiler can effectively select the optimum instruction size mix.

The ARMv7 architecture improves exception and interrupt handling and provides improved support for NMI (non-maskable interrupts). A selective superscalar eight-stage pipeline provides more than 1.6 DMIPS/MHz in a low gate count implementation.

The Cortex-R4 processor supports separate instruction and data caches. Each cache is a physically addressed 4-way set associative cache with a line length of 8 words. The cache sizes can be independently varied from 4- to 64-kbytes. Both the instruction and data caches are capable of providing 64-bits per cycle to the processor. The Cortex-R4 processor provides flexible support for tightly coupled memories. Up to three 64-bit memory ports are available. Each port has independent wait and error signals for connecting RAM, ROM, e-DRAM, and error correction logic. The caches may be disabled independently from the tightly coupled memories.

The Cortex-R4F processor's FPU is IEEE compatible and is backward compatible with earlier ARM FPUs (VFP9/10/11). The implementation is optimized for the single precision processing most commonly used in automotive and control applications without sacrificing double precision support. The FPU is particularly useful in sophisticated control applications, where algorithms are often modeled in an environment such as Simulink or ASCET-SD, and code auto-generated using tools such as Real Time Workshop Embedded Coder, ASCET-SE or dSPACE Targetlink.

The Cortex-R4 processor includes an optional memory protection unit that can be configured with 8 or 12 regions, combining flexibility with area efficiency. If the MPU (memory protection unit) is omitted completely, this results in a fixed mapping of protection attributes. The minimum size of an MPU region is 32-bytes.

The Cortex-R4 processor makes use of the AMBA 3 AXI protocol for more efficient on-chip interconnects. The Cortex-R4 processor integrates a 64-bit master port as well as a 64-bit DMA port for direct access to the tightly coupled memories. The flexible tightly coupled memory interface with DMA support more than doubles the time for RAM accesses compared with ARM9E family processors. There is support for error detection on RAMs for improved reliability and the enhanced MPU with 8 or 12 regions provides a finer granularity for stack checking. The flexible configuration options for caches, tightly coupled memories, MPU, and debug can save up to 40K gates.

The prefetch unit and branch prediction deliver more performance at the same clock frequency, providing a branch accuracy of more than 90% for typical C code. A synthesis-time option is available to generate a redundant copy of the processor logic that enables error detection in safety-critical systems, such as ABS. This is a specialized feature that enables the implementation of two processors plus relevant checking logic.