Intel’s Haswell CPU Microarchitecture

nội dung

November 13, 2012 by David Kanter

Pages: 1 2 3 4 5 6

Over the last 5 years, high performance microprocessors have changed dramatically. One of the most significant influences is the increasing level of integration that is enabled by Moore’s Law. In the context of semiconductors, integration is an ever-present fact of life, reducing system power consumption and cost and increasing performance. The latest incarnation of this trend is the System-on-a-Chip (SoC) philosophy and design approach. SoCs have been the preferred solution for extremely low power systems, such as 1W mobile phone chips. However, high performance microprocessors span a much wider design space, from 15W notebook chips to 150W server sockets and the adoption of SoCs has been slower because of the more diverse market.

Sandy Bridge was a dawn of a new era for Intel, and the first high-end x86 microprocessor that could truly be described as an SoC, integrating the CPU, GPU, Last Level Cache and system I/O. However, Sandy Bridge largely targets conventional PC markets, such as notebooks, desktops, workstations and servers, with a smattering of embedded applications. The competition for Sandy Bridge is largely AMD’s Bulldozer family, which has suffered from poor performance in the first few iterations.

The 32nm Sandy Bridge CPU introduced AVX, a new instruction extension for floating point (FP) workloads and fundamentally changed almost every aspect of the pipeline, from instruction fetching to memory accesses. The system architecture was radically revamped, with a coherent ring interconnect for on-chip communication, a higher bandwidth Last Level Cache (LLC), integrated graphics and I/O, and comprehensive power management. The Sandy Bridge GPU was also new architecture that delivered acceptable performance for the first time. The server-oriented Sandy Bridge-EP started with the same building blocks, but eliminated the graphics, while adding more cores, more memory controllers, more PCI-E 3.0 I/O and coherent QPI links.

Haswell is the first family of SoCs that have been tailored to take advantage of Intel’s 22nm FinFET process technology. While Ivy Bridge is also 22nm, Intel’s circuit design team sacrificed power and performance in favor of a swift migration to a process with a radically new transistor architecture.

The Haswell family features a new CPU core, new graphics and substantial changes to the platform in terms of memory and power delivery and power management. All of these areas are significant from a technical and economic perspective and interact in various ways. However, the Haswell family represents a menu of options that are available for SoCs tailored to certain markets. Not every product requires graphics (e.g. servers), nor is a new power architecture desirable for cost optimized products (e.g. desktops). Architects will pick and choose from the menu of options, based on a variety of technical and business factors.

The heart of the Haswell family is the eponymous CPU. The Haswell CPU core pushes beyond the PC market into new areas, such as the high-end of the emerging tablet market. Haswell SoCs are aimed at 10W, potentially with further power reductions in the future. The 22nm node enables this wider range, but Haswell’s design and architecture play critical roles in fully exploiting the benefits of the new process technology.

The Haswell CPU boasts a huge number of architectural enhancements, with four extensions that touch every aspect of the x86 instruction set architecture (ISA). AVX2 brings integer SIMD to 256-bit vectors, and adds a gather instruction for sparse memory accesses. The fused multiply-add extensions improve performance for floating point (FP) workloads, such as scientific computing, and nicely synergize with the new gather instructions. A small number of bit manipulation instructions aid cryptography, networking and certain search operations. Last, Intel has introduced TSX, or transactional memory, an incredibly powerful programming model for concurrency and multi-threaded programming. TSX improves performance and efficiency of software by better utilizing the underlying multi-core hardware.

Intel’s design philosophy emphasizes superb single core performance with low power. The new Haswell core achieves even higher performance than Sandy Bridge. The improvements in Haswell are concentrated in the out-of-order scheduling, execution units and especially the memory hierarchy. It is a testament to the excellent front-end in Sandy Bridge that relatively few changes were necessary. The Haswell microarchitecture is a dual-threaded, out-of-order microprocessor that is capable of decoding 5 instructions, issuing 4 fused uops and dispatching 8 uops each cycle. The Haswell core is the basis of Intel’s upcoming generation of SoCs and will be used from tablets to servers, competing with AMD and a variety of ARM-based SoC vendors.

Pages:   1 2 3 4 5 6  Next »

Tóm tắt
The article discusses the evolution of high-performance microprocessors, particularly focusing on Intel's Haswell family of System-on-a-Chip (SoC) designs. Over the past five years, integration in microprocessors has increased due to Moore's Law, leading to reduced power consumption and costs while enhancing performance. The Sandy Bridge architecture marked a significant advancement for Intel, integrating CPU, GPU, and other components, primarily targeting conventional PC markets. Haswell, built on Intel's 22nm FinFET technology, represents a new generation of SoCs designed for various markets, including tablets and servers. It features a new CPU core with architectural enhancements, such as AVX2 for improved integer SIMD operations and TSX for better multi-threaded programming efficiency. Haswell aims for a power envelope of around 10W, with a focus on superb single-core performance and low power consumption. The design improvements in Haswell, particularly in scheduling and memory hierarchy, position it as a competitive option against AMD and ARM-based SoCs, catering to a diverse range of applications from mobile devices to servers.