nasm-know-hows

nasm assembly related stuff

View on GitHub

๐Ÿ”ฌ Instruction Encoding in x86-64

Advanced Topic - Understanding how assembly becomes machine code


Overview

When you write assembly, NASM translates it into machine code - raw bytes that the CPU understands. Understanding this encoding helps you:

Note for C Programmers: This topic covers low-level machine code encoding that happens transparently when you compile C code. C programmers donโ€™t typically need to worry about instruction encoding - the compiler handles it automatically. However, understanding it helps explain why some operations are faster or more compact than others.


The Translation Process

You Write:          mov rax, rbx
                         โ†“
NASM Assembles:     48 89 D8      (3 bytes)
                         โ†“
CPU Decodes:        "Move RBX into RAX"
                         โ†“
CPU Executes:       RAX now contains value from RBX

Part 1: Instruction Format

x86-64 instructions are variable length (1-15 bytes) with this structure:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Prefixes โ”‚ Opcode โ”‚ ModR/Mโ”‚ SIB โ”‚ Displacement โ”‚ Immediate โ”‚
โ”‚  (0-4)   โ”‚ (1-3)  โ”‚ (0-1) โ”‚(0-1)โ”‚   (0,1,2,4)  โ”‚(0,1,2,4,8)โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  Optional  Required Optional Opt.    Optional      Optional

Minimum: 1 byte (opcode only)
Maximum: 15 bytes (all components)

Component Breakdown

Component Size Purpose Example
Prefixes 0-4 bytes Modify instruction behavior REX (64-bit), LOCK
Opcode 1-3 bytes The operation to perform 0x89 (MOV), 0x31 (XOR)
ModR/M 0-1 byte Specifies operands Register or memory
SIB 0-1 byte Complex addressing base + index*scale
Displacement 0,1,2,4 bytes Memory offset [rbx + 100]
Immediate 0,1,2,4,8 bytes Constant value mov eax, 42

Part 2: The REX Prefix - 64-bit Mode

The REX prefix (0x40-0x4F) enables 64-bit operations and extended registers.

REX Byte Structure

โ”Œโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”
โ”‚0100โ”‚ W โ”‚ R โ”‚ X โ”‚ B โ”‚
โ””โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”˜
Fixed   โ”‚   โ”‚   โ”‚   โ””โ”€ Extends ModR/M r/m field (R8-R15)
        โ”‚   โ”‚   โ””โ”€โ”€โ”€โ”€โ”€ Extends SIB index field (R8-R15)
        โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Extends ModR/M reg field (R8-R15)
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ 1 = 64-bit operand, 0 = default

Common REX values:
0x48 (01001000) = REX.W - 64-bit operation
0x49 (01001001) = REX.W + REX.B - 64-bit with extended r/m
0x4C (01001100) = REX.W + REX.R + REX.X - full extensions

When REX is Required

; 64-bit operations - NEEDS REX.W
mov rax, rbx        ; 48 89 D8 (3 bytes)
add rax, rcx        ; 48 01 C8 (3 bytes)

; 32-bit operations - NO REX needed
mov eax, ebx        ; 89 D8 (2 bytes)
add eax, ecx        ; 01 C8 (2 bytes)

; Extended registers (R8-R15) - NEEDS REX
mov r8, rax         ; 49 89 C0 (3 bytes)
mov rax, r9         ; 4C 89 C8 (3 bytes)

; Accessing low bytes of RSI, RDI, RBP, RSP - NEEDS REX
mov sil, 0          ; 40 B6 00 (3 bytes)

REX Examples

; Example 1: mov rax, rbx
48 89 D8
โ”‚
โ””โ”€ 0100 1000
        โ”‚
        โ””โ”€ W=1 (64-bit operation)

; Example 2: mov r9, r10
4D 89 D1
โ”‚
โ””โ”€ 0100 1101
        โ”‚โ”‚โ”‚โ”‚
        โ”‚โ”‚โ”‚โ””โ”€ B=1 (r/m uses R8-R15)
        โ”‚โ”‚โ””โ”€โ”€ X=0 (not used)
        โ”‚โ””โ”€โ”€โ”€ R=1 (reg uses R8-R15)
        โ””โ”€โ”€โ”€โ”€ W=1 (64-bit)

Part 3: Opcodes

The opcode identifies the operation. Can be 1-3 bytes.

Common 1-Byte Opcodes

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Opcode โ”‚ Operation                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 0x01   โ”‚ ADD r/m, r                      โ”‚
โ”‚ 0x29   โ”‚ SUB r/m, r                      โ”‚
โ”‚ 0x31   โ”‚ XOR r/m, r                      โ”‚
โ”‚ 0x89   โ”‚ MOV r/m, r                      โ”‚
โ”‚ 0x8B   โ”‚ MOV r, r/m                      โ”‚
โ”‚ 0x50-57โ”‚ PUSH r (RAX-RDI)                โ”‚
โ”‚ 0x58-5Fโ”‚ POP r (RAX-RDI)                 โ”‚
โ”‚ 0x90   โ”‚ NOP                             โ”‚
โ”‚ 0xC3   โ”‚ RET                             โ”‚
โ”‚ 0xEB   โ”‚ JMP short                       โ”‚
โ”‚ 0xFF   โ”‚ Various (INC, DEC, CALL, JMP)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2-Byte Opcodes (0x0F prefix)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Opcode  โ”‚ Operation                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 0F 84   โ”‚ JZ/JE (conditional jump)        โ”‚
โ”‚ 0F 85   โ”‚ JNZ/JNE                         โ”‚
โ”‚ 0F AF   โ”‚ IMUL r, r/m                     โ”‚
โ”‚ 0F B6   โ”‚ MOVZX r, r/m8 (zero extend)     โ”‚
โ”‚ 0F BE   โ”‚ MOVSX r, r/m8 (sign extend)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Opcode Direction

Many operations have two opcodes for different directions:

mov eax, ebx        ; 89 D8 (opcode 0x89: MOV r/m, r)
mov ebx, eax        ; 8B D8 (opcode 0x8B: MOV r, r/m)

; The direction is encoded in the opcode itself!

Part 4: ModR/M Byte

The ModR/M byte specifies operands and addressing modes.

ModR/M Structure

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Mod   โ”‚   Reg   โ”‚   R/M   โ”‚
โ”‚ (2 bits)โ”‚ (3 bits)โ”‚ (3 bits)โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  7   6    5   4   3  2   1   0

Mod Field (Addressing Mode)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Mod โ”‚ Meaning                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 00  โ”‚ [reg] - indirect, no displacement    โ”‚
โ”‚ 01  โ”‚ [reg + disp8] - 8-bit displacement   โ”‚
โ”‚ 10  โ”‚ [reg + disp32] - 32-bit displacement โ”‚
โ”‚ 11  โ”‚ Register direct (no memory access)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Reg Field (Register)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Reg โ”‚ Without REX.R  โ”‚ With REX.R      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 000 โ”‚ RAX/EAX/AX/AL  โ”‚ R8/R8D/R8W/R8B  โ”‚
โ”‚ 001 โ”‚ RCX/ECX/CX/CL  โ”‚ R9/R9D/R9W/R9B  โ”‚
โ”‚ 010 โ”‚ RDX/EDX/DX/DL  โ”‚ R10/R10D/.../.. โ”‚
โ”‚ 011 โ”‚ RBX/EBX/BX/BL  โ”‚ R11/R11D/.../.. โ”‚
โ”‚ 100 โ”‚ RSP/ESP/SP/SPL โ”‚ R12/R12D/.../.. โ”‚
โ”‚ 101 โ”‚ RBP/EBP/BP/BPL โ”‚ R13/R13D/.../.. โ”‚
โ”‚ 110 โ”‚ RSI/ESI/SI/SIL โ”‚ R14/R14D/.../.. โ”‚
โ”‚ 111 โ”‚ RDI/EDI/DI/DIL โ”‚ R15/R15D/.../.. โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

R/M Field (Register/Memory)

Same encoding as Reg field, but with special meaning when Mod=00/01/10:

When Mod != 11 (memory mode):
โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ R/M โ”‚ Meaning                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 000 โ”‚ [RAX/R8]                            โ”‚
โ”‚ 001 โ”‚ [RCX/R9]                            โ”‚
โ”‚ 010 โ”‚ [RDX/R10]                           โ”‚
โ”‚ 011 โ”‚ [RBX/R11]                           โ”‚
โ”‚ 100 โ”‚ SIB byte follows                    โ”‚
โ”‚ 101 โ”‚ [RIP + disp32] or [RBP + disp]      โ”‚
โ”‚ 110 โ”‚ [RSI/R14]                           โ”‚
โ”‚ 111 โ”‚ [RDI/R15]                           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

ModR/M Examples

; Example 1: mov rax, rbx
; Encoding: 48 89 D8
;           ModR/M byte = D8 = 11011000

D8 binary: 11 011 000
           โ”‚  โ”‚   โ””โ”€โ”€ R/M = 000 (RAX)
           โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 011 (RBX)
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 11 (register direct)

Translation: "Move RBX (reg) to RAX (r/m)"

; Example 2: mov eax, [rbx]
; Encoding: 8B 03
;           ModR/M byte = 03 = 00000011

03 binary: 00 000 011
           โ”‚  โ”‚   โ””โ”€โ”€ R/M = 011 (RBX)
           โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 000 (EAX)
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 00 (indirect, no disp)

Translation: "Move [RBX] (memory) to EAX (reg)"

; Example 3: add eax, [rbx + 8]
; Encoding: 03 43 08
;           ModR/M byte = 43 = 01000011
;           Displacement = 08

43 binary: 01 000 011
           โ”‚  โ”‚   โ””โ”€โ”€ R/M = 011 (RBX)
           โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 000 (EAX)
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 01 (indirect with disp8)

Translation: "Add [RBX + 8] to EAX"

Part 5: SIB Byte (Scale-Index-Base)

Used for complex addressing: [base + index*scale + disp]

SIB Structure

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Scale  โ”‚  Index  โ”‚  Base   โ”‚
โ”‚ (2 bits)โ”‚ (3 bits)โ”‚ (3 bits)โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  7   6    5   4   3  2   1   0

Scale Field

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Scale โ”‚ Multiplier โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  00   โ”‚    *1      โ”‚
โ”‚  01   โ”‚    *2      โ”‚
โ”‚  10   โ”‚    *4      โ”‚
โ”‚  11   โ”‚    *8      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Index and Base Fields

Use same register encoding as ModR/M Reg field (000-111).

Special Cases:

SIB Examples

; Example 1: mov eax, [rbx + rcx*4]
; Encoding: 8B 04 8B
;           Opcode = 8B (MOV r, r/m)
;           ModR/M = 04 = 00000100
;           SIB = 8B = 10001011

ModR/M: 00 000 100
        โ”‚  โ”‚   โ””โ”€โ”€ R/M = 100 (SIB follows)
        โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 000 (EAX)
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 00 (no displacement)

SIB: 10 001 011
     โ”‚  โ”‚   โ””โ”€โ”€ Base = 011 (RBX)
     โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Index = 001 (RCX)
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Scale = 10 (*4)

Translation: "Move [RBX + RCX*4] to EAX"

; Example 2: mov eax, [rbx + rcx*4 + 8]
; Encoding: 8B 44 8B 08
;           Opcode = 8B
;           ModR/M = 44 = 01000100
;           SIB = 8B = 10001011
;           Disp8 = 08

ModR/M: 01 000 100
        โ”‚  โ”‚   โ””โ”€โ”€ R/M = 100 (SIB follows)
        โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 000 (EAX)
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 01 (disp8 follows)

SIB: 10 001 011
     โ”‚  โ”‚   โ””โ”€โ”€ Base = 011 (RBX)
     โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Index = 001 (RCX)
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Scale = 10 (*4)

Disp8: 08 (8 in decimal)

Translation: "Move [RBX + RCX*4 + 8] to EAX"

; Example 3: Array access pattern
; C code: array[i] where sizeof(int) = 4
mov eax, [rdi + rsi*4]  ; RDI=array base, RSI=index
; Encoding: 8B 04 B7

Part 6: Complete Encoding Examples

Example 1: nop

Assembly:   nop
Machine:    90

Analysis:
90 = Opcode for NOP (No Operation)

Total: 1 byte (shortest instruction!)

Example 2: xor eax, eax

Assembly:   xor eax, eax
Machine:    31 C0

Analysis:
31 = Opcode (XOR r/m, r)
C0 = ModR/M byte

C0 = 11 000 000
     โ”‚  โ”‚   โ””โ”€โ”€ R/M = 000 (EAX)
     โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 000 (EAX)
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 11 (register)

Total: 2 bytes
Why it's short: No prefixes, no immediate, register-to-register

Example 3: mov eax, 0

Assembly:   mov eax, 0
Machine:    B8 00 00 00 00

Analysis:
B8 = Opcode (MOV EAX, imm32) - special short form
00 00 00 00 = 32-bit immediate (0)

Total: 5 bytes (1 opcode + 4 immediate)
Why it's long: Must encode the full 32-bit immediate value

Example 4: mov rax, rbx

Assembly:   mov rax, rbx
Machine:    48 89 D8

Analysis:
48 = REX.W (enable 64-bit)
     0100 1000
         โ”‚
         โ””โ”€ W=1 (64-bit operands)

89 = Opcode (MOV r/m, r)

D8 = ModR/M byte
     11 011 000
     โ”‚  โ”‚   โ””โ”€โ”€ R/M = 000 (RAX destination)
     โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 011 (RBX source)
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 11 (register)

Total: 3 bytes (REX + opcode + ModR/M)

Example 5: push rax

Assembly:   push rax
Machine:    50

Analysis:
50 = Opcode (PUSH RAX)

Note: PUSH has special 1-byte encodings for each register:
50 = PUSH RAX/R8
51 = PUSH RCX/R9
52 = PUSH RDX/R10
53 = PUSH RBX/R11
54 = PUSH RSP/R12
55 = PUSH RBP/R13
56 = PUSH RSI/R14
57 = PUSH RDI/R15

Total: 1 byte (most compact!)

Example 6: mov dword [rdi + rsi*4 + 100], eax

Assembly:   mov dword [rdi + rsi*4 + 100], eax
Machine:    89 84 B7 64 00 00 00

Analysis:
89 = Opcode (MOV r/m, r)

84 = ModR/M byte
     10 000 100
     โ”‚  โ”‚   โ””โ”€โ”€ R/M = 100 (SIB follows)
     โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 000 (EAX source)
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 10 (disp32 follows)

B7 = SIB byte
     10 110 111
     โ”‚  โ”‚   โ””โ”€โ”€ Base = 111 (RDI)
     โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Index = 110 (RSI)
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Scale = 10 (*4)

64 00 00 00 = Displacement (100 in little-endian)

Total: 7 bytes (opcode + ModR/M + SIB + disp32)

Example 7: mov r15, [r14 + 8]

Assembly:   mov r15, [r14 + 8]
Machine:    4D 8B 7E 08

Analysis:
4D = REX byte
     0100 1101
         โ”‚โ”‚โ”‚โ”‚
         โ”‚โ”‚โ”‚โ””โ”€ B=1 (R/M uses R8-R15)
         โ”‚โ”‚โ””โ”€โ”€ X=0 (not used)
         โ”‚โ””โ”€โ”€โ”€ R=1 (Reg uses R8-R15)
         โ””โ”€โ”€โ”€โ”€ W=1 (64-bit)

8B = Opcode (MOV r, r/m)

7E = ModR/M byte
     01 111 110
     โ”‚  โ”‚   โ””โ”€โ”€ R/M = 110 (with REX.B = R14)
     โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 111 (with REX.R = R15)
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 01 (disp8)

08 = Displacement (8)

Total: 4 bytes (REX + opcode + ModR/M + disp8)

Part 7: Why Instruction Size Matters

Performance Impact

Instruction Cache:
- L1 I-cache: 32-64 KB (very fast)
- Smaller code = more instructions fit in cache
- Cache miss = ~50-100 cycle penalty

Example:
1000 instances of "xor eax, eax" (2 bytes each) = 2 KB
1000 instances of "mov eax, 0" (5 bytes each) = 5 KB

Difference: 3 KB saved = more cache-friendly

Decoding Efficiency

Modern CPUs decode multiple instructions per cycle:
- Shorter instructions = more can be decoded
- Simple instructions = faster decode

Intel/AMD can decode 4-5 instructions per cycle if:
โœ“ Instructions are short
โœ“ Instructions are simple
โœ“ No complex addressing modes

Branch Prediction

Smaller code = shorter branch distances
- Short jumps: 2 bytes (JMP rel8)
- Long jumps: 5 bytes (JMP rel32)

More code fits in prediction buffers

Part 8: Optimization Techniques

1. Choose Shorter Encodings

; BAD: 7 bytes
mov rax, 0              ; 48 C7 C0 00 00 00 00

; GOOD: 3 bytes (and faster!)
xor eax, eax            ; 31 C0
; Note: zeros upper 32 bits automatically in 64-bit mode

2. Use 32-bit When Possible

; If you only need 32-bit result:
xor eax, eax            ; 2 bytes (no REX needed)
; vs
xor rax, rax            ; 3 bytes (needs REX.W)

; Both zero the full 64-bit register!

3. Leverage Special Encodings

; PUSH/POP are compact
push rax                ; 1 byte
push rbx                ; 1 byte
; ... do work ...
pop rbx                 ; 1 byte
pop rax                 ; 1 byte

; vs saving to memory (much longer)
mov [temp1], rax        ; 7-8 bytes
mov [temp2], rbx        ; 7-8 bytes

4. Minimize Immediate Values

; BAD: 5 bytes
mov eax, 1              ; B8 01 00 00 00

; BETTER: 5 bytes total, but might be faster
xor eax, eax            ; 31 C0 (2 bytes)
inc eax                 ; FF C0 (2 bytes) or 48 FF C0 (3 bytes for RAX)

; CONTEXT MATTERS: If you already have EAX=0, just INC!

5. Use LEA for Arithmetic

; Instead of multiple operations:
add eax, ebx            ; 2 bytes
add eax, 8              ; 3-6 bytes
; Total: 5-8 bytes

; Use LEA (Load Effective Address):
lea eax, [ebx + eax + 8]; 3-4 bytes
; Does (EBX + EAX + 8) in one instruction!

6. Avoid Unnecessary Prefixes

; 64-bit REX prefix adds 1 byte
; If you don't need 64-bit, don't use it

; Accessing low 32-bits:
mov eax, [rdi]          ; No REX needed if EAX is enough
; vs
mov rax, [rdi]          ; Needs REX.W

Part 9: Tools for Exploration

Using objdump

# Create test file
cat > test.asm << 'EOF'
section .text
    global _start
_start:
    xor eax, eax
    mov eax, 0
    mov rax, rbx
    push rax
    mov eax, [rbx + rcx*4 + 8]
EOF

# Assemble
nasm -f elf64 test.asm -o test.o

# Disassemble with machine code
objdump -d test.o -M intel

# Output:
# 0: 31 c0                   xor    eax,eax
# 2: b8 00 00 00 00          mov    eax,0x0
# 7: 48 89 d8                mov    rax,rbx
# a: 50                      push   rax
# b: 8b 44 8b 08             mov    eax,DWORD PTR [rbx+rcx*4+0x8]

Using hexdump

# View raw bytes
hexdump -C test.o | less

# Look for the .text section

Online Assembler

NASM List File

# Generate listing file with machine code
nasm -f elf64 -l test.lst test.asm

# View listing
cat test.lst

# Shows line numbers, addresses, machine code, and source

Part 10: Decoding Practice

Exercise 1

Machine code: 48 01 C3

Decode it:
48 = ?
01 = ?
C3 = ?

Assembly: ?
Solution ``` 48 = REX.W (64-bit) 01 = ADD r/m, r C3 = 11 000 011 โ”‚ โ”‚ โ””โ”€โ”€ R/M = 011 (RBX) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 000 (RAX) โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 11 (register) Assembly: add rbx, rax ```

Exercise 2

Machine code: 89 D8

Decode it:
89 = ?
D8 = ?

Assembly: ?
Solution ``` 89 = MOV r/m, r D8 = 11 011 000 โ”‚ โ”‚ โ””โ”€โ”€ R/M = 000 (EAX/RAX) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 011 (EBX/RBX) โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 11 (register) Assembly: mov eax, ebx (or mov rax, rbx with REX) Context determines size! ```

Exercise 3

Machine code: 50

What instruction?
Solution ``` 50 = PUSH RAX (or PUSH R8 with REX.B) Single-byte encoding for PUSH register. ```

Exercise 4

Machine code: 8B 04 8B

Decode it:
8B = ?
04 = ?
8B = ?

Assembly: ?
Solution ``` 8B = MOV r, r/m 04 = 00 000 100 โ”‚ โ”‚ โ””โ”€โ”€ R/M = 100 (SIB follows) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€ Reg = 000 (EAX) โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Mod = 00 (no disp) 8B = 10 001 011 โ”‚ โ”‚ โ””โ”€โ”€ Base = 011 (RBX) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€ Index = 001 (RCX) โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Scale = 10 (*4) Assembly: mov eax, [rbx + rcx*4] ```

Part 11: Advanced Topics

RIP-Relative Addressing (64-bit)

; Position-independent code uses RIP-relative
mov rax, [rel myvar]    ; Encoded relative to instruction pointer

; Encoding uses ModR/M with special values:
; Mod = 00, R/M = 101 means [RIP + disp32]

VEX/EVEX Prefixes (AVX/AVX-512)

SIMD instructions use longer prefixes:
- VEX: 2-3 byte prefix (AVX, AVX2)
- EVEX: 4 byte prefix (AVX-512)

Example:
vaddps xmm0, xmm1, xmm2  ; VEX prefix + opcode

Lock Prefix

; Atomic operations
lock add [counter], 1   ; 0xF0 prefix + normal encoding

Segment Overrides

; Rarely used in 64-bit mode
mov rax, gs:[0x28]      ; TLS access (0x65 prefix)

Summary Tables

Instruction Length by Category

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Instruction Type     โ”‚ Bytes โ”‚ Example             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Simple ops           โ”‚ 1-2   โ”‚ NOP, PUSH, POP      โ”‚
โ”‚ Register-register    โ”‚ 2-3   โ”‚ ADD, MOV, XOR       โ”‚
โ”‚ Register-immediate   โ”‚ 5-7   โ”‚ MOV EAX, 42         โ”‚
โ”‚ Memory-register      โ”‚ 3-8   โ”‚ MOV [RBX], RAX      โ”‚
โ”‚ Complex addressing   โ”‚ 7-10  โ”‚ MOV [RBX+RCX*4+N],. โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Component Sizes

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Component    โ”‚ Bytes โ”‚ When Present             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ REX prefix   โ”‚ 1     โ”‚ 64-bit or R8-R15         โ”‚
โ”‚ Opcode       โ”‚ 1-3   โ”‚ Always                   โ”‚
โ”‚ ModR/M       โ”‚ 1     โ”‚ Most instructions        โ”‚
โ”‚ SIB          โ”‚ 1     โ”‚ Complex addressing       โ”‚
โ”‚ Displacement โ”‚ 1-4   โ”‚ Memory with offset       โ”‚
โ”‚ Immediate    โ”‚ 1-8   โ”‚ Constant operand         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Takeaways

  1. โœ… Variable Length: Instructions are 1-15 bytes
  2. โœ… Structured Format: Each component has a specific purpose
  3. โœ… REX is for 64-bit: Required for 64-bit ops and R8-R15
  4. โœ… Shorter = Better: Saves cache space and decoding time
  5. โœ… ModR/M is Complex: Handles registers and all addressing modes
  6. โœ… SIB for Arrays: Essential for base + index*scale patterns
  7. โœ… Special Encodings: Some instructions have compact forms (PUSH, NOP)
  8. โœ… Optimization Matters: Understanding encoding helps write better code

Further Reading

Official Documentation

Articles & Guides

Tools


Next Steps

Now that you understand instruction encoding:

  1. โœ… Read disassembly output with understanding
  2. โœ… Write more optimal assembly code
  3. โœ… Debug at the machine code level
  4. โœ… Appreciate compiler optimizations
  5. โœ… Understand performance implications

Continue with: Topic 3: Basic Instructions


โ† Back to Main Q&A Index