X8664

Home * Hardware * x86-64

[ Quad-core AMD Opteron processor [1] x86-64 or x64,

an 64-bit x86-extension, designed by AMD as Hammer- or K8 architecture with Athlon 64 and Opteron cpus. It has been cloned by Intel under the name EMT64 and later Intel 64. Beside 64-bit general purpose extensions, x86-64 supports MMX-, x87- as well as the 128-bit SSE- and SSE2-instruction sets. According to the CPUID-instructions, further SIMD Streamig Extensions, such as SSE3, SSSE3 (Intel only), SSE4 (Core2, K10), AVX, AVX2 and AVX-512, and AMD’s 3DNow!, Enhanced 3DNow! and XOP.

General Purpose

The 16 general purpose registers may be treated as 64 bit Quad Word ( bitboard), 32 bit Double Word, 16 bit Word and high (partly), low Byte [2]:

| 64 | 32 | 16 | 8 high | 8 low | Purpose

RAX
EAX
AX
AH
AL
GP, Accumulator
RBX
EBX
BX
BH
BL
GP, Index Register
RCX
ECX
CX
CH
CL
GP, Counter, variable shift, rotate via CL
RDX
EDX
DX
DH
DL
GP, high Accumulator mul/div
RSI
ESI
SI
-
-
GP, Source Index
RDI
EDI
DI
-
-
GP, Destination Index
RSP
ESP
SP
-
-
Stack Pointer
RBP
EBP
BP
-
-
GP, Base Pointer
R08
R08D
R08W
-
R08B
GP
R..
R..D
R..W
-
R..B
GP
R15
R15D
R15W
-
R15B
GP

MMX

Eight 64-bit MMX-Registers: MM0 - MM7. Treated as Double, Quad Word or vector of two Floats, Double Words, vector if four Words or eight Bytes.

SSE/ SSE*

Sixteen 128-bit XMM-Registers: XMM0 - XMM15. Treated as vector of two Doubles or Quad Words, as vector of four Floats or Double Words, and as vector of eight Words or 16 Bytes.

AVX, AVX2/ XOP

Intel Sandy Bridge and AMD Bulldozer Sixteen 256-bit YMM-Registers: YMM0 - YMM15 (shared by XMM as lower half). Treated as vector of four Doubles or Quad Words, as vector of eight Floats or Double Words, and as vector of 15 Words or 32 Bytes.

AVX-512

Intel Xeon Phi (2015) 32 512-bit ZMM-Registers: ZMM0 - ZMM31 Eight vector mask registers

Instructions

Useful instructions for bitboard-applications are by default not supported by high-level programming languages. Available through (inline) Assembly or compiler intrinsics of various C-Compilers [3].

General Purpose

x86-64 Instructions, C- Intrinsic reference from x64 (amd64) Intrinsics List | Microsoft Docs

| Mnemonic | Description | C-Intrinsic | Remark

bsf
bit scan forward_BitScanForward64
bsr
bit scan reverse_BitScanReverse64
bswap
byte swap_byteswap_uint64
bt
bit test_bittest64
btc
bit test and complement_bittestandcomplement64
btr
bit test and reset_bittestandreset64
bts
bit test and set_bittestandset64
cpuid
cpuid
_cpuidcpuid
imul
signed multiplication
_mulh, _mul128
lzcnt
leading zero count_lzcnt16, _lzcnt, _lzcnt64cpuid, SSE4a
mul
unsigned multiplication_umulh, _umul128
popcnt
population count_popcnt16, _popcnt, _popcnt64cpuid, SSE4.2, SSE4a
rdtsc
read performance counter
_rdtsc
rol, ror
rotate left, right_rotl, _rotl64, _rotr, _rotr64

Bit-Manipulation

SSE2

x86 and x86-64 - SSE2 Instructions, C- Intrinsic reference from Intel Intrinsics Guide

| Mnemonic | Description | C-Intrinsic

bitwise logical
return
parameter
pand
packed and, r := a & b
_m128i
_mm_and_si128(_m128i a, _m128i b)
pandn
packed and not, r := ~a & b
_m128i
_mm_andnot_si128(_m128i a, _m128i b)
por
packed or, r := ab
_m128i
_mm_or_si128(_m128i a, _m128i b)
pxor
packed xor, r:= a ^ b
_m128i
_mm_xor_si128(_m128i a, _m128i b)
quad word shifts
return
parameter
psrlq
packed shift right logical quad
_m128i
_mm_srl_epi64(_m128i a, _m128i cnt)
immediate
_m128i
_mm_srli_epi64(_m128i a, int cnt)
psllq
packed shift left logical quad
_m128i
_mm_sll_epi64(_m128i a, _m128i cnt)
immediate
_m128i
_mm_slli_epi64(_m128i a, int cnt)
arithmetical
return
parameter
paddb
packed add bytes
_m128i
_mm_add_epi8(_m128i a, _m128i b)
psubb
packed subtract bytes
_m128i
_mm_sub_epi8(_m128i a, _m128i b)
psadbw
packed sum of absolute differencesof bytes into a word
_m128i
_mm_sad_epu8(_m128i a, _m128i b)
pmaxsw
packed maximum signed words
_m128i
_mm_max_epi16(_m128i a, _m128i b)
pmaxub
packed maximum unsigned bytes
_m128i
_mm_max_epu8(_m128i a, _m128i b)
pminsw
packed minimum signed words
_m128i
_mm_min_epi16(_m128i a, _m128i b)
pminub
packed minimum unsigned bytes
_m128i
_mm_min_epu8(_m128i a, _m128i b)
pcmpeqb
packed compare equal bytes
_m128i
_mm_cmpeq_epi8(_m128i a, _m128i b)
pmullw
packed multiply mow signed (unsigned) word
_m128i
_mm_mullo_epi16(_m128i a, _m128i b)
pmulhw
packed multiply high signed word
_m128i
_mm_mulhi_epi16(_m128i a, _m128i b)
pmulhuw
packed multiply high unsigned word
_m128i
_mm_mulhi_epu16(_m128i a, _m128i b)
pmaddwd
packed multiply words and add doublewords
_m128
_mm_madd_epi16(_m128i a, _m128i b)
unpack, shuffle
return
parameter
punpcklbw
unpack and interleave low bytesgGhHfFeE:dDcCbBaA :=``xxxxxxxx:GHFEDCBA #``xxxxxxxx:ghfedcba_m128i
_mm_unpacklo_epi8(_m128i A, _m128i a)
punpckhbw
unpack and interleave high bytesgGhHfFeE:dDcCbBaA :=``GHFEDCBA:xxxxxxxx #``ghfedcba:xxxxxxxx_m128i
_mm_unpackhi_epi8(_m128i A, _m128i a)
punpcklwd
unpack and interleave low wordsdDcC:bBaA := xxxx:DCBA#xxxx:dcba_m128i
_mm_unpacklo_epi16(_m128i A, _m128i a)
punpckhwd
unpack and interleave high wordsdDcC:bBaA := DCBA:xxxx#dcba:xxxx_m128i
_mm_unpackhi_epi16(_m128i A, _m128i a)
punpckldq
unpack and interleave low doublewordsbB:aA := xx:BA # xx:ba_m128i
_mm_unpacklo_epi32(_m128i A, _m128i a)
punpckhdq
unpack and interleave high doublewordsbB:aA := BA:xx # ba:xx_m128i
_mm_unpackhi_epi32(_m128i A, _m128i a)
punpcklqdq
unpack and interleave low quadwordsa:A := x:A # x:a_m128i
_mm_unpacklo_epi64(_m128i A, _m128i a)
punpckhqdq
unpack and interleave high quadwordsa:A := A:x # a:x_m128i
_mm_unpackhi_epi64(_m128i A, _m128i a)
pshuflw
packed shuffle low words
_m128i
_mm_shufflelo_epi16(_m128i a, int imm)
pshufhw
packed shuffle high words
_m128i
_mm_shufflehi_epi16(_m128i a, int imm)
pshufd
packed shuffle doublewords
_m128i
_mm_shuffle_epi32(_m128i a, int imm)
load, store, moves
return
parameter
movdqa
move aligned double quadwordxmm := *p
_m128i
_mm_load_si128(_m128i const *p)
movdqu
move unaligned double quadwordxmm := *p
_m128i
_mm_loadu_si128(_m128i const*p)
movdqa
move aligned double quadword*p := xmm
void
_mm_store_si128(_m128i *p, _m128i a)
movdqu
move unaligned double quadword*p := xmm
void
_mm_storeu_si128(_m128i *p, _m128i a)
movq
move quadword, xmm := gp64
_m128i
_mm_cvtsi64_si128(_int64 a)
movq
move quadword, gp64 := xmm
_int64
_mm_cvtsi128_si64(_m128i a)
movd
move double word or quadwordxmm := gp64
_m128i
_mm_cvtsi64x_si128(_int64 value)
movd
move doubleword, xmm := gp32
_m128i
_mm_cvtsi32_si128(int a)
movd
move doubleword, gp32 := xmm
int
_mm_cvtsi128_si32(_m128i a)
pextrw
extract packed word, gp16 := xmm[i]
int
_mm_extract_epi16(_m128i a, int imm)
pinsrw
packed insert word, xmm[i] := gp16
_m128i
_mm_insert_epi16(_m128i a, int b, int imm)
pmovmskb
packed move mask byte,gp32 := 16 sign-bits(xmm)
int
_mm_movemask_epi(_m128i a)
cache support
return
parameter
prefetch
void
_mm_prefetch(char const* p , int i)

Software

Operating Systems

Development

Assembly

C-Compiler

See also

General Setwise Operations BitScan

Publications

Manuals

Agner Fog

AMD

Instructions

Optimization Guides

Intel

Instructions

Optimization Guides

Forum Posts

2003 …

2010 …

2015 …

2020 …

AMD

Intel

Instruction Sets

AVX-512 from Wikipedia » AVX-512

Security Vulnerability

References

  1. Die shot of AMD Opteron quad-core processor, Wikimedia Commons
  2. Introduction to x64 Assembly | Intel® Software
  3. Intel(R) C++ Compiler User and Reference Guides covers Intrinsics
  4. Advanced Matrix Extension (AMX) - x86 - WikiChip
  5. Georg Hager’s Blog | Random thoughts on High Performance Computing
  6. Intel Nehalem Core i3
  7. Application binary interface from Wikipedia

Up one Level