SSE4
SSE4 is a set of Intel and AMD ambiguous and almost disjoint x86 instruction set extensions, SSE4.1, SSE4.2 both by Intel, and SSE4a by AMD.
SSE4.1
Intel introduced SSE4.1 with the Penryn Core 2 brand of the Core microarchitecture in 2007 with 47 new instructions.
Mnemonic | ||||
Description | ||||
C-Intrinsic | ||||
pcmpeqq | ||||
packed compare equal qword | _m128i | |||
_mm_cmpeq_epi64 | (_m128i a, _m128i b) | |||
see Vulnerable on distant Checks with SSE4.
SSE4.2
SSE4.2 of the Nehalem-based Core i7 was introduced in 2008 with 7 new instructions.
STTNI
SSE4.2 includes five String and Text New Instructions (STTNI) working on 128-bit XMM SIMD as well as general prupose registers and flags to perform character searches and comparison on two operands of 16 bytes at a time , i.e. PCMPESTRI (Packed Compare Explicit Length Strings, Return Index) [1].
ATAI
Popcnt and crc32, working on general purpose registers, were dubbed Application-Targeted Accelerator Instructions (ATAI) as subset of SSE4.2 [2] [3], but should considered as disjoint instruction set concerning SSE4 compiler optimizations.
Mnemonic | ||||
Description | ||||
C-Intrinsic | ||||
popcnt | ||||
Population Count | int | |||
_mm_popcnt_u64 | (unsigned _int64 a) | |||
AMD SSE4a
SSE4a was introduced by AMD with the K10 (Barcelona) microarchitecture.
SIMD
Two new SIMD instructions, working on XMM registers were combined mask-shift instructions (EXTRQ/INSERTQ) and scalar streaming store instructions (MOVNTSD/MOVNTSS). These instructions are not available in Intel’s SSE4.
Advanced Bit Manipulation
The two important instructions work on general purpose registers. Leading Zero Count was not available in Intel’s Application-Targeted Accelerator Instructions of SSE4.2, but later incorporated with BMI.
Mnemonic | ||||
Description | ||||
C-Intrinsic | ||||
lzcnt | ||||
Leading Zero Count | unsigned _int64 | |||
_lzcnt64 | (unsigned _int64 a) | |||
popcnt | ||||
Population Count | unsigned _int64 | |||
_popcnt64 | (unsigned _int64 a) | |||
See also
- AltiVec
- AVX
- BMI
- MMX
- SIMD and SWAR Techniques
- SSE
- SSE2
- SSE3
- SSSE3
- SSE5
- TBM
- Vulnerable on distant Checks with SSE4
- XOP
Manuals
- Intel® SSE4 Programming Reference (pdf)
- Software Optimization Guide for AMD Family 10h and 12h Processors (pdf)
Forum Posts
- using Popcount and Prefetch with SSE4 hardware support by Engin Üstün, CCC, May 19, 2012 » Population Count, Memory
External Links
- SSE4 from Wikipedia
- MSDN - Streaming SIMD Extensions 4 Instructions
- MSDN - SSE4A and Advanced Bit Manipulation Intrinsics
- SSEPlus Project Documentation
- Agner`s CPU blog by Agner Fog
- Intel Intrinsics Guide
References
- ↑ PCMPESTRI — Packed Compare Explicit Length Strings, Return Index
- ↑ MSDN - Streaming SIMD Extensions 4 Instructions, 2.3 SSE4.2 INSTRUCTION SET, 2.3.3. Application-Targeted Accelerator Instructions
- ↑ Application Targeted Accelerators Intrinsics