X8664

[ Quad-core AMD Opteron processor [1] x86-64 or x64,

an 64-bit x86-extension, designed by AMD as Hammer- or K8 architecture with Athlon 64 and Opteron cpus. It has been cloned by Intel under the name EMT64 and later Intel 64. Beside 64-bit general purpose extensions, x86-64 supports MMX-, x87- as well as the 128-bit SSE- and SSE2-instruction sets. According to the CPUID-instructions, further SIMD Streamig Extensions, such as SSE3, SSSE3 (Intel only), SSE4 (Core2, K10), AVX, AVX2 and AVX-512, and AMD’s 3DNow!, Enhanced 3DNow! and XOP.

General Purpose

The 16 general purpose registers may be treated as 64 bit Quad Word ( bitboard), 32 bit Double Word, 16 bit Word and high (partly), low Byte [2]:

| 64 | 32 | 16 | 8 high | 8 low | Purpose


RAX
EAX
AX
AH
AL
GP, Accumulator

RBX
EBX
BX
BH
BL
GP, Index Register

RCX
ECX
CX
CH
CL
GP, Counter, variable shift, rotate via CL

RDX
EDX
DX
DH
DL
GP, high Accumulator mul/div

RSI
ESI
SI
-
-
GP, Source Index

RDI
EDI
DI
-
-
GP, Destination Index

RSP
ESP
SP
-
-
Stack Pointer

RBP
EBP
BP
-
-
GP, Base Pointer

R08
R08D
R08W
-
R08B
GP

R..
R..D
R..W
-
R..B
GP

R15
R15D
R15W
-
R15B
GP

MMX

Eight 64-bit MMX-Registers: MM0 - MM7. Treated as Double, Quad Word or vector of two Floats, Double Words, vector if four Words or eight Bytes.

SSE/ SSE*

Sixteen 128-bit XMM-Registers: XMM0 - XMM15. Treated as vector of two Doubles or Quad Words, as vector of four Floats or Double Words, and as vector of eight Words or 16 Bytes.

AVX, AVX2/ XOP

Intel Sandy Bridge and AMD Bulldozer Sixteen 256-bit YMM-Registers: YMM0 - YMM15 (shared by XMM as lower half). Treated as vector of four Doubles or Quad Words, as vector of eight Floats or Double Words, and as vector of 15 Words or 32 Bytes.

AVX-512

Intel Xeon Phi (2015) 32 512-bit ZMM-Registers: ZMM0 - ZMM31 Eight vector mask registers

Instructions

Useful instructions for bitboard-applications are by default not supported by high-level programming languages. Available through (inline) Assembly or compiler intrinsics of various C-Compilers [3].

General Purpose

x86-64 Instructions, C- Intrinsic reference from x64 (amd64) Intrinsics List | Microsoft Docs

| Mnemonic | Description | C-Intrinsic | Remark


bsf
bit scan forward	_BitScanForward64
bsr
bit scan reverse	_BitScanReverse64
bswap
byte swap	_byteswap_uint64
bt
bit test	_bittest64
btc
bit test and complement	_bittestandcomplement64
btr
bit test and reset	_bittestandreset64
bts
bit test and set	_bittestandset64
cpuid
cpuid
_cpuid	cpuid
imul
signed multiplication
_mulh, _mul128
lzcnt
leading zero count	_lzcnt16, _lzcnt, _lzcnt64	cpuid, SSE4a
mul
unsigned multiplication	_umulh, _umul128
popcnt
population count	_popcnt16, _popcnt, _popcnt64	cpuid, SSE4.2, SSE4a
rdtsc
read performance counter
_rdtsc
rol, ror
rotate left, right	_rotl, _rotl64, _rotr, _rotr64

Bit-Manipulation

SSE2

x86 and x86-64 - SSE2 Instructions, C- Intrinsic reference from Intel Intrinsics Guide

| Mnemonic | Description | C-Intrinsic


bitwise logical
return
	parameter

pand
packed and, r := a & b
_m128i
_mm_and_si128	(_m128i a, _m128i b)

pandn
packed and not, r := ~a & b
_m128i
_mm_andnot_si128	(_m128i a, _m128i b)

por
packed or, r := a	b
_m128i
_mm_or_si128	(_m128i a, _m128i b)

pxor
packed xor, r:= a ^ b
_m128i
_mm_xor_si128	(_m128i a, _m128i b)

quad word shifts
return
	parameter

psrlq
packed shift right logical quad
_m128i
_mm_srl_epi64	(_m128i a, _m128i cnt)

immediate
_m128i
_mm_srli_epi64	(_m128i a, int cnt)

psllq
packed shift left logical quad
_m128i
_mm_sll_epi64	(_m128i a, _m128i cnt)

immediate
_m128i
_mm_slli_epi64	(_m128i a, int cnt)

arithmetical
return
	parameter

paddb
packed add bytes
_m128i
_mm_add_epi8	(_m128i a, _m128i b)

psubb
packed subtract bytes
_m128i
_mm_sub_epi8	(_m128i a, _m128i b)

psadbw
packed sum of absolute differencesof bytes into a word
_m128i
_mm_sad_epu8	(_m128i a, _m128i b)

pmaxsw
packed maximum signed words
_m128i
_mm_max_epi16	(_m128i a, _m128i b)

pmaxub
packed maximum unsigned bytes
_m128i
_mm_max_epu8	(_m128i a, _m128i b)

pminsw
packed minimum signed words
_m128i
_mm_min_epi16	(_m128i a, _m128i b)

pminub
packed minimum unsigned bytes
_m128i
_mm_min_epu8	(_m128i a, _m128i b)

pcmpeqb
packed compare equal bytes
_m128i
_mm_cmpeq_epi8	(_m128i a, _m128i b)

pmullw
packed multiply mow signed (unsigned) word
_m128i
_mm_mullo_epi16	(_m128i a, _m128i b)

pmulhw
packed multiply high signed word
_m128i
_mm_mulhi_epi16	(_m128i a, _m128i b)

pmulhuw
packed multiply high unsigned word
_m128i
_mm_mulhi_epu16	(_m128i a, _m128i b)

pmaddwd
packed multiply words and add doublewords
_m128
_mm_madd_epi16	(_m128i a, _m128i b)

unpack, shuffle
return
	parameter

punpcklbw
unpack and interleave low bytesgGhHfFeE:dDcCbBaA :=``xxxxxxxx:GHFEDCBA #``xxxxxxxx:ghfedcba	_m128i
_mm_unpacklo_epi8	(_m128i A, _m128i a)

punpckhbw
unpack and interleave high bytesgGhHfFeE:dDcCbBaA :=``GHFEDCBA:xxxxxxxx #``ghfedcba:xxxxxxxx	_m128i
_mm_unpackhi_epi8	(_m128i A, _m128i a)

punpcklwd
unpack and interleave low words`dDcC:bBaA := xxxx:DCBA#xxxx:dcba`	_m128i
_mm_unpacklo_epi16	(_m128i A, _m128i a)

punpckhwd
unpack and interleave high words`dDcC:bBaA := DCBA:xxxx#dcba:xxxx`	_m128i
_mm_unpackhi_epi16	(_m128i A, _m128i a)

punpckldq
unpack and interleave low doublewords`bB:aA := xx:BA # xx:ba`	_m128i
_mm_unpacklo_epi32	(_m128i A, _m128i a)

punpckhdq
unpack and interleave high doublewords`bB:aA := BA:xx # ba:xx`	_m128i
_mm_unpackhi_epi32	(_m128i A, _m128i a)

punpcklqdq
unpack and interleave low quadwords`a:A := x:A # x:a`	_m128i
_mm_unpacklo_epi64	(_m128i A, _m128i a)

punpckhqdq
unpack and interleave high quadwords`a:A := A:x # a:x`	_m128i
_mm_unpackhi_epi64	(_m128i A, _m128i a)

pshuflw
packed shuffle low words
_m128i
_mm_shufflelo_epi16	(_m128i a, int imm)

pshufhw
packed shuffle high words
_m128i
_mm_shufflehi_epi16	(_m128i a, int imm)

pshufd
packed shuffle doublewords
_m128i
_mm_shuffle_epi32	(_m128i a, int imm)

load, store, moves
return
	parameter

movdqa
move aligned double quadwordxmm := *p
_m128i
_mm_load_si128	(_m128i const *p)

movdqu
move unaligned double quadwordxmm := *p
_m128i
_mm_loadu_si128	(_m128i const*p)

movdqa
move aligned double quadword*p := xmm
void
_mm_store_si128	(_m128i *p, _m128i a)

movdqu
move unaligned double quadword*p := xmm
void
_mm_storeu_si128	(_m128i *p, _m128i a)

movq
move quadword, xmm := gp64
_m128i
_mm_cvtsi64_si128	(_int64 a)

movq
move quadword, gp64 := xmm
_int64
_mm_cvtsi128_si64	(_m128i a)

movd
move double word or quadwordxmm := gp64
_m128i
_mm_cvtsi64x_si128	(_int64 value)

movd
move doubleword, xmm := gp32
_m128i
_mm_cvtsi32_si128	(int a)

movd
move doubleword, gp32 := xmm
int
_mm_cvtsi128_si32	(_m128i a)

pextrw
extract packed word, gp16 := xmm[i]
int
_mm_extract_epi16	(_m128i a, int imm)

pinsrw
packed insert word, xmm[i] := gp16
_m128i
_mm_insert_epi16	(_m128i a, int b, int imm)

pmovmskb
packed move mask byte,gp32 := 16 sign-bits(xmm)
int
_mm_movemask_epi	(_m128i a)

cache support
return
	parameter

prefetch
	void
_mm_prefetch	(char const* p , int i)

Software

Operating Systems

Development

C-Compiler

Publications

Georg Hager [5], Jan Treibig, Gerhard Wellein (2013). The Practitioner’s Cookbook for Good Parallel Performance on Multi- and Many-Core Systems. RRZE, SC13, slides as pdf
S. Ali Mirsoleimani, Aske Plaat, Jaap van den Herik, Jos Vermaseren (2014). Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi. CoRR abs/1409.4297 » Go
S. Ali Mirsoleimani, Aske Plaat, Jaap van den Herik, Jos Vermaseren (2015). Scaling Monte Carlo Tree Search on Intel Xeon Phi. CoRR abs/1507.04383 » Hex, MCTS, Parallel Search

Manuals

Agner Fog

AMD

AMD Tech Docs

Instructions

Optimization Guides

Intel

Instructions

Optimization Guides

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Forum Posts

External Links

x86-64 from Wikipedia
x86-64 calling conventions from Wikipedia
x86 Addressing modes from Wikipedia
X32 ABI from Wikipedia [7]
Stack frame layout on x86-64 from Eli Bendersky’s website, September 06, 2011 » Stack
Introduction to x64 Assembly by Chris Lomont, March 2012

AMD

Intel

Instruction Sets

AVX-512 from Wikipedia » AVX-512

Security Vulnerability

References

↑ Die shot of AMD Opteron quad-core processor, Wikimedia Commons
↑ Introduction to x64 Assembly | Intel® Software
↑ Intel(R) C++ Compiler User and Reference Guides covers Intrinsics
↑ Advanced Matrix Extension (AMX) - x86 - WikiChip
↑ Georg Hager’s Blog | Random thoughts on High Performance Computing
↑ Intel Nehalem Core i3
↑ Application binary interface from Wikipedia

Up one Level

Edit this page on GitHub

X86

X8664AMD

Chess Programming Wiki

Title here

X8664

General Purpose

MMX

SSE/ SSE*

AVX, AVX2/ XOP

AVX-512

Instructions

General Purpose

Bit-Manipulation

SSE2

Software

Operating Systems

Development

Assembly

C-Compiler

See also

Publications

Manuals

Agner Fog

AMD

Instructions

Optimization Guides

Intel

Instructions

Optimization Guides

Forum Posts

2003 …

2010 …

2015 …

2020 …

External Links

AMD

Intel

Instruction Sets

Security Vulnerability

References

X8664

General Purpose#

MMX

SSE/ SSE*

AVX, AVX2/ XOP

AVX-512

Instructions#

General Purpose#

Bit-Manipulation#

SSE2

Software#

Operating Systems#

Development#

Assembly#

C-Compiler#

See also#

Publications#

Manuals#

Agner Fog#

AMD#

Instructions#

Optimization Guides#

Intel#

Instructions#

Optimization Guides#

Forum Posts#

2003 …#

2010 …#

2015 …#

2020 …#

External Links#

AMD#

Intel#

Instruction Sets#

Security Vulnerability#

References#

General Purpose

Instructions

General Purpose

Bit-Manipulation

Software

Operating Systems

Development

Assembly

C-Compiler

See also

Publications

Manuals

Agner Fog

AMD

Instructions

Optimization Guides

Intel

Instructions

Optimization Guides

Forum Posts

2003 …

2010 …

2015 …

2020 …

External Links

AMD

Intel

Instruction Sets

Security Vulnerability

References