Larrabee is a super multi-core Pentium processor design upgraded to 64 bits x86-64 with a new SIMD set of registers maxed out on steroids with ~250 new instructions! Sorta like SSE on steroids!
“LRBni adds two sorts of registers to the x86 architectural state. There are 32 new 512-bit vector registers, v0-v31, and 8 new 16-bit vector mask registers, k0-k7. While some core resources such as caches are shared by the core threads, that is not the case for registers; each thread has a full complement of vector and vector mask registers.”
“LRBni vector instructions are either 16-wide or 8-wide, so a vector register can be operated on by a single LRBni instruction as 16 float32s, 16 int32s, 8 float64s, or 8 int64s, with all elements operated on in parallel. LRBni vector instructions are also ternary; that is, they involve three vector registers, of which typically two are inputs and the third the output. This eliminates the need for most move instructions; such instructions are not a significant burden on out-of-order cores, which can schedule them in parallel with other work, but they would slow Larrabee’s in-order pipeline considerably.”
Some excellent must read papers.
GDC Session Overview: SIMD Programming on Larrabee
This video is a brief overview of Tom Forsyth’s session from GDC about Larrabee. Larrabee is Intel’s revolutionary approach to take the current evolving programmability of the GPGPU to its logical end. The Larrabee architecture features many cores and threads, as well as a new vector instruction-set extension, the Larrabee new instructions (LRBni). This talk follows Michael Abrash’s first glimpse into LRBni and examines the programming methods and hardware instructions that help programmers get the most out of LRBni’s extremely wide vector units. Starting with simple math examples that are fairly simple to vectorize, it moves through loops, conditionals, and more complex flow control, showing how to implement these algorithms in LRBni. Next, the numerous choices of data format are examined – when to use SOA or AOS (and what those terms mean!), and how to use gather/scatter most efficiently from the same data structures used in an existing engine. Finally, there is a quick look at efficient code scheduling and how to use the multiple hardware threads to help absorb instruction latencies.
Thoughts On How Larrabee Will Change Game Development
As an Intel Application Engineer, Doug McNabb enables game developers to develop for Larabee. Doug describes his background as a game developer and his thoughts on how Larrabee will change game development.