Assembly#
RISC-V#
RISC-V Basics#
open-sourced ISA based on RISC principles
can be used academically, and deployed in any hardware or software without royalties
RV32 for 32-bit and RV64 for 64-bit implementations
byte: 8 bits, halfword: 16 bits, word: 32 bits, double word: 64 bits
unsigned numbers are usually used for addresses, pointers and bit vectors
unsigned double words are never necessary for memory addresses, loop counters, etc.
unused upper bits of singed values will be filled with sign bit, and zeros for unsigned values
- Address Alignment
every byte in the memory has a unique address
halfword-aligned address: divisible by 2, and ends with bit 0, or 0, 2, 4, 6, 8, a, c, e in hex
word-aligned address: divisible by 4, and ends with bits 00, or 0, 4, 8, c in hex
double word-aligned address: divisible by 8, and ends with bits 000, or 0, 8 in hex
all instructions must be halfword aligned
Load and Store with unaligned addresses are implementation dependent, works fine, exception occurs, or undefined behaviour
- Code Relocation
benefit of PC relative code, where it can be anywhere in memory and still works
the distance between jump/call and the target must be constant
useful when dynamically loading the code
RISC-V Modes#
privileged instructions can only be executed in Machine and Supervisor modes
- Machine Mode
highest, and initial mode after power-on or reset
code such as timer interrupts require the handler to run in machine mode
- Supervisor Mode
mode run by all kernel code
when switching from kernel code to user code, kernel registers are saved, and will be restored once kernel code is entered again
- User Mode
mode run by all user code
running privileged instructions in this mode will cause trap and abort
RISC-V Registers#
32 general purpose registers
x0-x31
, and 1 program counter registerpc
4,096 control and status registers (CSRs) cannot be accessed by user mode
6 special purpose, 7 temporary (volatile, data does not persist through function calls), 12 saved (non-volatile), and 8 argument
arguments more than 8 or larger than register size must be passed onto the stack
registers have separate identifiers based on calling convention
- zero: x0
hard-wired zero
can also be used as a destination to discard the result
- ra: x1
return address, saver: caller
CALL
: does not push return address onto stack, but saves it inx1
RET
: jumps to the address inra
orx1
CALL
andRET
are fast as they do not need to access memorybut nested calls require additional instructions to save and restore from memory
- sp: x2
stack pointer, saver: callee
not a special stack pointer register, but used as convention only
some compressed instructions assume
x2
is the stack pointer
- gp: x3
global pointer
points to global/static variables area, and makes addressing easier
- tp: x4
thread pointer
points to area of variables such as thread ID, threat parameters, and global/static variables local to the thread
- t0-2: x5-7
temporaries, saver: caller
- s0/fp: x8
saved register/frame pointer, saver: callee
- s1: x9
saved register, saver: callee
- a0-1: x10-11
function arguments/return values, saver: caller
by convention, function returns the result in
a0
- a2-7: x12-17
function arguments, saver: caller
- s2-11: x18-27
saved registers, saver: callee
- t3-6: x28-31
temporaries, saver: caller
- pc
holds the address of next instruction to execute
32 or 64 bits accordingly, but may be smaller and implementation depends on maximum memory size
- mhartid
hard-wired core/hart id
during startup, id is copied into
tp
and the register will never change in the kernel
mstatus
&sstatus
: status register
mtvec
&stvec
: trap vector or address of the invoked handler when a trap occurs
mepc
&sepc
: exception pc, value of previous pc when a trap occurs
scause
: trap cause code
stval
: bad address or instruction
mscratch
&sscratch
: work register to be used by trap handlers
- satp
address translation pointer, points to the page table and initially 0
after initialisation, VAT is always on in supervisor and user mode
whenever
satp
is updated, all TLBs (translation lookaside buffers) must be flushed withsfence.vma
orsfence_vma()
mie
&sie
: interrupt enable, to enable interrupt selectively
sip
: interrupt pending
medeleg
: exception delegation, from machine to supervisor mode
mideleg
: interrupt delegation, from machine to supervisor mode
- pmpcfgX & pmpaddrX
physical memory protection, configuration word and address
limit physical memory access for code in supervisor or user mode
mainly for secure bootstrapping and hypervisor
RISC-V Instruction Set#
- Notation
three register instruction structure:
<opcode> <dst>,<src1>,<src2> # cmt
only
sw
has destination listed last,sw <rs1>,<rd> /* multi-line cmt */
one instruction per line
can have optional label:
<label>: <opcode> <rd>,<rs1>,<rs2>
value of the label is the address of next thing in the file, e.g. in
hello: add
, value ofhello
will be the address ofadd
instructionexample opcode:
add
,call
,c.add
,.word
which is an assembler directiveoperands can be register to register, or register to 12-bit immediate value, which can be in decimal, hex, symbolic or expression
- Full-Sized Instruction Set
mandatory on any core
every instruction is a word in size, 32 bits or 4 bytes, regardless of RV32 or RV64
- Compressed Instruction Set
optional and will not be included on some processors
to reduce code size and increase execution speed, e.g.
c.add
every instruction is halfword in size, 16 bits or 2 bytes
every compressed instruction is same as a single full-sized instruction, but not vice versa
- Option Letter Codes
RV32I and RV64I basic instructions work on entire register, e.g.
add
performs 32-bit addition on RV32, and 64-bit addition on RV64
C
: compressed instruction set
M
: multiply and divide instructions
- Arithmetic & Logic
add, sub, and, or, xor
addi, andi, ori, xori, slli, srli, srai, slti, sltui
sll
: shift left logical, rs1 << rs2
srl
: shift right logical, rs1 >> rs2
sra
: shift right arithmetic, rs1 >>> rs2, used for signed values
slt
: set if less than, (rs1 < rs2) ? 1 : 0
sltu
: set if less than unsigned, (rs1 < rs2) ? 1 : 0no subtract immediate instruction in RISC-V, use
addi
with negated value
- Load & Store
lb rd,imm12(rs1)
: load byte, upper bits will be sign extended
lh
: load halfword, upper bits will be sign extended
lw
: load word
lbu
andlhu
will be zero extended for upper bits
sb rs2,imm12(rs1)
: store byte,rs1 + imm12 = rs2
sh
for 16 bits, andsw
for 32 bits
lwu, ld, sd
for double word in RV64
- Conditional Branching
syntax:
<opcode> rs1,rs2,imm12 # if condition(rs1,rs2) goto PC+imm12
target address to jump to is PC relative
beq
: ==,bne
: !=,blt
: <,bge
: >=unsigned condition testing with
bltu, bgeu
- Pseudo Instructions
assembler will translate them into one or more machine instructions
j label
: jump to address
call label
: save return address inra
ret
: go to address inra
, translated tojalr zero,0(ra)
li rd,imm
: load immediate into destination, translated toaddi rd,zero,imm
for 12 bits, andlui rd,upper(imm)
andaddi rd,rd,lower(imm)
for >12 bits
la rd,label
: load address into destination
neg rd,rs1
: negate, translated tosub rd,zero,rs1
mov rd,rs1
: move, translated toaddi rd,rs1,0
not rd,rs1
: logical negate, translated toxori rd,rs1,-1
jr reg
: indirect jump to target, translated tojalr zero,0(reg)
bgt rs1,rs2,addr
: branch if greater than, translated toblt rs2,rs1,addr
ble rs1,rs2,addr
: branch if less than or equal, translated tobge rs2,rs1,addr
bgtu
andbleu
for unsigned comparisons
beqz
: branch if equal to zero, translated tobeq rs1,zero,addr
bnez, bltz, blez, bgtz, bgez
are also translated similarly
- Jump & Link
jal rd,imm20
, should usecall
andj
insteadsave return address in destination register
rd
, and perform PC relative jump wherepc = pc + imm20
to make instruction halfword-aligned,
imm20
is extended with additional 0 bit
jal ra,addr
: same ascall
jal zero,addr
: same asj
- Jump & Link Register
jalr rd,imm12(rs1)
save return address in destination register,
rd
, and jump topc = rs1 + imm12
jalr ra,0(t5)
: indirectcall
jalr zero,0(t5)
: indirectj
jalr zero,0(ra)
: same asret
- Load Upper Immediate
lui rd,imm20
, whererd = imm20<<12
- Long Jump & Long Call
if target address requires 32 bits, use 2 instructions by breaking address into 20 + 12 bits
long jump:
lui t0,upper20
andjalr zero,lower12(t0)
long call:
lui to,upper20
andjalr ra,lower12(t0)
- Add Upper Immediate to PC
auipc rd,imm20
, whererd = pc + (imm20<<12)
used to make long jumps and calls with PC relative
long jump:
auipc t0,upper20
andjalr zero,lower12(t0)
long call:
auipc t0,upper20
andjalr ra,lower12(t0)
- Multiply
in multiplication, lower bytes of signed and unsigned are same, but the upper bytes differ
number of result bits = 2x number of operand bits
mul rd,rs1,rs2
: lower half of the result
mulh rd,rs1,rs2
: multiply high, upper half of the result
mulhu
: unsigned, upper half of the result
mulhsu rd,rs1,rs2
:rs1
is signed, andrs2
is unsigned
- Division
number of result bits = number of operand bits
div rd,rs1,rs2
: divide
rem rd,rs1,rs2
: remainder
divu, remu
for unsigned operandsRISC-V mandates truncated division for negative operands
overflow for
max_negative / -1
: max_negative for quotient, and 0 for remainderdivision by zero: all bits set to 1 for quotient, and dividend for remainder
- 32-bit Operations for RV64
addw, sub2, addiw, sllw, srlw, sraw, slliw, srliw, sraiw
for word size
mulw
for 32-bit multiply on RV64
divw, remw, divuw, remuw
for 32-bit division on RV64result is stored in lower 32 bits of
rd
, and upper 32 bits ofrd
contain sign-extension of the lowershift amount is from 0 to 31, taken from only lower 5 bits
- Linux-based Syscall
set
a7
to the syscall number,a0..
to the arguments for the syscallinvoke
ecall
to call the kernelcan just load null terminated argument and use
call printf
if standard library is linkedcan use
.asciz
to auto null terminate# can also use addi like the second ecall .global _start _start: li a0,1 # arg1: stdout la a1,helloworld # arg2: buff li a2,13 # arg3: buff length li a7,64 # write syscall ecall addi a0,zero,0 # arg1: status addi a7,zero,93 # exit syscall ecall helloworld: .ascii "Hello, World\n"
RISC-V Directives#
- Memory Allocation
always preceded by a label to create a variable
can initialise using comma-separated values
.byte int
: allocate 1 byte and initialise with a value, often 0
.hword int
: allocate 2 bytes and initialise
.half int
: same as.hword
.word int
: allocate 4 bytes
.dword int
: allocate 8 bytes
.ascii "str"
: any Unicode or UTF-8
.asciz "str
: same as.ascii
with auto null terminated
.string "str
: same as.asciz
.skip int
: allocate given byte count, useful for arrays
- Global
.globl symbol
export and make a symbol available to the rest of the program, accessible by other object files when linking them together
assembler does not check the symbols, but the linker does
_start
symbol must be in one of the files and exported to be the entry point of the program
.equ symbol,val
: associate a value with the symbol, and the symbol can be used anywhere below the directive
.set symbol,val
: same as.equ
.align int
: add enough bytes for the next instruction to be properly align
- Segments
.text
: code, program usually begins with this directive
.data
: r/w datacan have multiple
.text
and.data
segments in the file
.bss
: only data that evaluates to zero, object file will be optimised
.rodata
: read-only data
References & External Resources#
Low Level. (2021). You Can Learn RISC-V Assembly in 10 Minutes | Getting Started RISC-V Assembly on Linux Tutorial [online]. Available at: https://youtu.be/GWiAQs4-UQ0?si=c6mdr-kJs_7gMJ9o
Scheel, Jeff. (2024). RISC-V Technical Specifications [online]. Available at: https://lf-riscv.atlassian.net/wiki/spaces/HOME/pages/16154769/RISC-V+Technical+Specifications
Borza, Juraj. (2021). RISC-V Linux syscall table [online]. Available at: https://jborza.com/post/2021-05-11-riscv-linux-syscalls/