SSE4

Home * Hardware * x86 * SSE4

SSE4 is a set of Intel and AMD ambiguous and almost disjoint x86 instruction set extensions, SSE4.1, SSE4.2 both by Intel, and SSE4a by AMD.

SSE4.1

Intel introduced SSE4.1 with the Penryn Core 2 brand of the Core microarchitecture in 2007 with 47 new instructions.

Mnemonic
Description
C-Intrinsic
pcmpeqq
packed compare equal qword_m128i
_mm_cmpeq_epi64(_m128i a, _m128i b)

see Vulnerable on distant Checks with SSE4.

SSE4.2

SSE4.2 of the Nehalem-based Core i7 was introduced in 2008 with 7 new instructions.

STTNI

SSE4.2 includes five String and Text New Instructions (STTNI) working on 128-bit XMM SIMD as well as general prupose registers and flags to perform character searches and comparison on two operands of 16 bytes at a time , i.e. PCMPESTRI (Packed Compare Explicit Length Strings, Return Index) [1].

ATAI

Popcnt and crc32, working on general purpose registers, were dubbed Application-Targeted Accelerator Instructions (ATAI) as subset of SSE4.2 [2] [3], but should considered as disjoint instruction set concerning SSE4 compiler optimizations.

Mnemonic
Description
C-Intrinsic
popcnt
Population Countint
_mm_popcnt_u64(unsigned _int64 a)

AMD SSE4a

SSE4a was introduced by AMD with the K10 (Barcelona) microarchitecture.

SIMD

Two new SIMD instructions, working on XMM registers were combined mask-shift instructions (EXTRQ/INSERTQ) and scalar streaming store instructions (MOVNTSD/MOVNTSS). These instructions are not available in Intel’s SSE4.

Advanced Bit Manipulation

The two important instructions work on general purpose registers. Leading Zero Count was not available in Intel’s Application-Targeted Accelerator Instructions of SSE4.2, but later incorporated with BMI.

Mnemonic
Description
C-Intrinsic
lzcnt
Leading Zero Countunsigned _int64
_lzcnt64(unsigned _int64 a)
popcnt
Population Countunsigned _int64
_popcnt64(unsigned _int64 a)

See also

Manuals

Forum Posts

References

  1. PCMPESTRI — Packed Compare Explicit Length Strings, Return Index
  2. MSDN - Streaming SIMD Extensions 4 Instructions, 2.3 SSE4.2 INSTRUCTION SET, 2.3.3. Application-Targeted Accelerator Instructions
  3. Application Targeted Accelerators Intrinsics

Up one Level