Multi-Core, Multi-Thread Superscalar Communication Processor
Overview
The NetLogic Microsystems XLP™ processor family uses a highly scalable design that incorporates key functions of a high-end communication system, including wired and wireless security, networking, storage, data center acceleration, load balancing, and other acceleration engines. The XLP processor is a third-generation architectural enhancement to NetLogic Microsystems’ industry-leading multi-core, multi-threaded XLR® processor family. The XLP processor family is designed using 40-nm technology and offers processor core frequencies from 500 MHz to 2 GHz, providing a greater than 3X performance per watt improvement over its XLR predecessor. The XLP processor family is software backward-compatible with the XLR and XLS® processor families.
XLP832 Processor EC4400 Cores
The XLP832 processor has eight EC4400 processor cores that provide optimum performance for both data plane and control plane applications. Each core includes field-proven multi-threading capabilities to provide the highest possible performance for throughput-oriented data plane processing. Each EC4400 core features a superscalar engine with out-of-order execution capabilities, and combines quad-issue instruction scheduling with simultaneous 4-way multi-threading. These features enable new classes of systems with uncompromised performance in a single-chip solution. Each of the four VirtuCore™ virtual threads embedded in an EC4400 core appears to software as a completely separate processing element, enabling extremely flexible software architectures that simultaneously simplify software development and increase overall system performance. Each EC4400 core is MIPS64 Release-II ISA-compliant and contains an IEEE754 and MIPS-compliant floating point unit per core. By combining architectural improvements and frequency enhancements, the EC4400 enables the XLP processor to deliver a greater than 3X performance per watt improvement over NetLogic Microsystems’ performance leading XLR processor product offering.
Processor Cache Architecture
The XLP832 processor contains a MOESI+ coherent, three-level cache architecture. Each of the eight EC4400 cores contains a dedicated 64-KByte instruction cache, a 32-KByte L1 data cache, and a 512-KByte 8-way set-associative L2 cache. The cores also share access to an 8-bank, 16-way set-associative 8-MByte L3 cache, providing a total of more than 12 MBytes of cached data on the XLP832 processor.
Memory Subsystem
The XLP832 processor’s high-performance memory subsystem contains four on-chip DDR3 memory controllers with 51.2 Gigabytes/sec of bandwidth.
Fast Messaging Network™
A low-latency, high-speed Fast Messaging Network™ (FMN) system allows for non-intrusive internal communication and control messaging among VirtuCore threads, acceleration engines, and I/O. The FMN enables inter-unit communication without the need for spin-locks or semaphores. By passing control descriptors, the FMN also permits lockless simultaneous access to peripheral devices, which dramatically simplifies and increases the performance of the associated device drivers.
Acceleration Engines
The XLP832 processor contains numerous Autonomous Acceleration Engine® modules that offload processing tasks from the EC4400 cores, thus freeing up the cores to perform other compute-intensive application-dependent tasks:
- A robust Autonomous Network Acceleration Engine® module supports up to 40 Gbps of packet throughput. Features included are a programmable packet parsing engine, FCoE, iSCSI and SCTP checksum/CRC generation and verification, TCP/UDP/ IP checksum on both ingress and egress, TCP segmentation offload, and IEEE1588v2 precision timing protocol support.
- A Packet Ordering Engine (POE) supports packet ordering for up to 64K flows. The POE can handle up to 60 million packets per second, which corresponds to 40 Gbps with 64-byte packets.
- 40 Gbps bandwidth Autonomous Security Acceleration Engine® module.
- 10 Gbps compression/decompression engine
- An 8-channel DMA and Storage Acceleration Engine with RAID-5 XOR acceleration, RAID-6 P+Q Galois computations; and de-duplication acceleration hardware assistance.
Interchip Cache Coherency
Three Interchip Coherency Interfaces seamlessly interconnect up to four XLP832 processors. Each interface has 80 Gbps of full-duplex bandwidth. These interfaces are fully software transparent. Hardware manages the chip-to-chip coherency, message passing between threads, and the sharing of memory and I/O resources.
Product Specification
Next Generation Processor Cores
Cache Subsystem
High Performance Memory Controller
Cache Coherent Scalability
High Speed Distributed Interconnects
Autonomous Network Acceleration Engine® Module
|
Hardware Packet Ordering Engine
|
Block Diagram
Application Examples

