-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathnotes.TXT
390 lines (320 loc) · 11 KB
/
notes.TXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
this is a tutorial on how to write 32bit (x86) NASM Assembly code, compile and run it aimed at people who have some
knowledge of programming but have never used assembly before.
first make sure you have nasm installed in your linux machine!
apt-get for debian/Ubuntu/mint :
$ sudo apt-get install nasm
dnf for fedora(18 and newer)/CentOS/RHEL :
$ sudo dnf install nasm
pacman for ArchLinux :
$ sudo pacman -S nasm
Writing a NASM assembly program is not that difficult. You just create a new file with the extension '.asm', write
your code with your favorite text editor and compile and link in the end. If you name your .asm file like you want
your program to be called you won't have to rename it later.
For ease of use I have provided two scripts, compile and compile32. Both will compile and link your .asm files
for x64_86 or x86 architecture automatically, you will just have to provide a filename. Use ./compile -h for help.
for convinience I advise you to create the directory /bin/scripts and put the compile scripts there
then add the directory /bin/scripts to your PATH in the bashrc file
to do that, add this to your ~/.bashrc file:
if [ -d "/bin/scripts" ]; then
PATH="$PATH:/bin/scripts/"
fi
If you don't want to use my scripts you can just manually enter these commands to firstly compile your file and link
it afterwards.
$ nasm -f elf64 -o filename.o filename.asm
$ ld -o filename filename.o
to run the programs you've compiled enter
$ ./filename
nasm automatically adds execute rights to the program
If you write assembly code (or any other programming language) you should comment your code! This will help you or
other who will look through your program to understand what is happening.
Imagine writing a program that does it's job, but after two months it breaks down. Now you have to sit through the
source code but don't remember what any of it did. That's where your comments come in.
In NASM comments are line-based meaning if you put the special comment character (a ; (semicolon) in NASM)on one
line everything to the right of that semicolon in that line will be ignored by the compiler and linker.
visual example :
CODE_WORKING ; CODE_NOT_WORKING
;CODE_NOT_WORKING
CODE_WORKING
In assembly everything is done in registers. You can imagine these being small spaces inside of your CPU which will
store very small amounts of data but are extremely fast. A x64_86 CPU will have more registers than a x86 CPU.
This is called the architecture. All modern CPUs are x64_86 but need a 64bit operating system in order to leverage
the additional registers. This tutorial is primarily going to use x86 registers, but you can write your code in
64bit NASM if you want.
get 64bit syscalls here:
http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
important registers when using 64bit syntax:
rax - temporary register, when using a syscall rax must contain a syscall number
rdi - used to pass 1st argument to function
rsi - pointer used to pass 2nd argument to function
rdx - used to pass 3rd argument to function
rcx - 4th
r8 - 5th
r9 - 6th
get 32bit syscalls here:
http://syscalls.kernelgrok.com/
important registers when using 32bit syntax:
eax - temporary register, when using a syscall rax must contain a syscall number
ebx - used to pass 1st argument to function
ecx - pointer used to pass 2nd argument to function
edx - used to pass 3rd argument to function
esi - 4th
edi - 5th
If you want to do anything that isn't covered by the base syntax of NASM
(docs available here: http://www.nasm.us/xdoc/2.12.02/nasmdoc.pdf) then you will need to perform a syscall. These
are actions performed by the operating system kernel, the base of your os. Each syscall has a different number
associated with it called OPCODE. You can look up the OPCODEs and the arguments each syscall needs in the links
provided above.
sections - every assembly program consists of these three sections (even if you don't use one explicitly)
.data - used for declaring initialized data or constants
.bss - used for declaring uninitialized variables
.text - used for code (all code goes here)
data types
name size in bytes size in bits abbreviation uninitialized abbr.
byte 1 8 db resb
word 2 16 dw resw
doubleword 4 32 dd resd
quadword 8 64 dq resq
tword 10 80 dt rest
ocotword 16 128 do reso
yword 32 256 dy resy
zword 64 512 dz resz
while db, dw, dd, dq, dt, do, dy and dz are used for declaring initialized data,
resb, resw, resd, resq, rest, reso, resy and resz are used for declaring uninitialized variables
some syntax for both x64_86 and x86 NASM :
incbin - includes external binary files
equ - defines constants ex:
one equ 1
times - repeating instructions or data
arithmetic operations:
add add
sub substract
mul unsigned multiply
imul signed multiply
div unsigned division
idiv signed division
inc increment
dec decrement
neg negate
cmp - compares values ex:
cmp rax, 50
cmp on its own doesn't do anything. to do something we have conditionals
je - if equal
jz - if zero
jne - if not equal
jnz - if not zero
jg - if first operand is greater than second
jge - if first operand is greater or equal to second
ja - same as jg but unsigned
jae - same as jge but unsigned
ex:
C CODE:
if (rax!=50) {
exit();
} else {
right();
}
NASM x64:
cmp rax, 50 ; compare rax with 50
jne .exit ; perform .exit if rax is not equal to 50
jmp .right ; execute .right if that is not the case
jmp - unconditional jump, goto function (label) ex:
-------------------
_start:
;; ...
;; some code
;; ...
jmp exit
exit:
mov rax, 60
mov rdi, 0
syscall
-------------------
That's the most important things to get started done!
Either try writing your own code, look at the following more advanced notes
Or go to the individual lessons in the respective directories.
NOTE: The following notes all use the x64_86 sysntax!
stack is a special region in the memory that operates on the lifo principle (Last Input, First Output)
there are 16 general-purpose registers for temporary data storage, (in a 64bit CPU)
rax, rbx, rcx, rdx, rdi, rsi, rbp, rsp, and r8-15
that's not a whole lot for serious applications, so we should store data in the stack
also: when we call a function the return address is copied to the stack
ex:
section .text
global _start
_start:
mov rax, 1
call incRax
cmp rax, 2
jne exit
; ...
; do something
; ...
incRax:
inc rax
ret
the code will be executed because rax is 2 after incRax
imagine a function like this:
int foo(int a1, int a2, int a2, int a4, int a5, int a6, int a7) {
return (a1 + a2 - a3 - a4 + a5 - a6) * a7;
}
the first six arguments will be passed in registers, the 7th will be passed in stack
there are two stack pointers, the first one being rbp, or base pointer
which points to the base of the current stack
and the other one being rsp, or stack pointer
which points to the top of the current stack
the two commands for working with the stack are:
push - increments rsp and stores argument in location pointed by pointer
pop - copies data to argument from location pointed by rsp
ex:
-------------------
section .text
global _start
_start:
mov rax, 1
mov rdx, 2
push rax
push rdx
mov rax, [rsp +8]
; ...
; do something
; ...
-------------------
we put 1 into rax and 2 into rdx registers. afterwards we push these values to the stack
since the stack works with the lifo principle (Last Input, First Output) the stack will
have the following structure:
###
RSP -> 2
1
###
now lets write a simple program that will take two arguments, and output the sum
-------------------
section .data
SYS_WRITE equ 1
STD_IN equ 1
SYS_EXIT equ 60
NEW_LINE db 0xa
WRONG_ARGC db "Must be two command line arguments!", 0xa
len equ $ - WRONG_ARGC
; we first declare some constants for ease of use and convenience
; and also two strings, one being the newline symbol and the other one an error message
; terminating with the new line symbol instead of NUL
section .text
global _start
_start:
pop rcx ; put the argument count into rcx
;;
;; check argc
;;
cmp rcx, 3 ; check wether we have 2 arguments supplied
; why 3? the first argument is always the program name
jne argcError
;;
;; get arguments as strings, add them
;;
add rsp, 8 ; skip argv[0] - program name
pop rsi ; get argv[1] - put into rsi
call str_to_int; convert argv[1] to string, put into rax
mov r10, rax ; put first number into r10
add rsp, 8 ; get to the second argument
pop rsi ; get argv[2] - put into rsi
call str_to_int; convert argv[2] to string, put into rax
mov r11, rax ; put second number into r11
add r10, r11 ; add both numbers, save in r10
;;
;; convert to string
;;
mov rax, r10
xor r12, r12
jmp int_to_str
;;
;; print argc error
;;
argcError:
mov rax, SYS_WRITE
mov rdi, STD_IN
mov rsi, WRONG_ARGC
mov rdx, len
syscall
jmp exit
;;
;; convert int to string
;;
int_to_str:
mov rdx, 0
mov rbx, 10
div rbx ; rax = rax / 10
add rdx, 48
add rdx, 0x0
push rdx
inc r12
cmp rax, 0x0
jne int_to_str
jmp print
str_to_int:
xor rax, rax
mov rcx, 10
next:
cmp [rsi], byte 0 ; check if it is the end of the string
je return_str ; return int
mov bl, [rsi] ; mov current char to bl
sub bl, 48 ; get number
mul rcx ; rax = rax * 10
add rax, rbx ; ax = ax + digit
inc rsi
jmp next
return_str:
ret
;;
;; print number
;;
print:
;;
;; calculate number length
;;
mov rax, 1
mul r12
mov r12, 8
mul r12
mov rdx, rax
;;
;; print sum
;;
mov rax, SYS_WRITE
mov rdi, STD_IN
mov rsi, rsp
syscall
jmp printNewLine
;;
;; prints new line symbol
;;
printNewline:
mov rax, SYS_WRITE
mov rdi, STD_IN
mov rsi, NEW_LINE
mov rdx, 1
syscall
jmp exit
;;
;; exit from program
;;
exit:
mov rax, SYS_EXIT
mov rdi, EXIT_CODE
syscall
-------------------
lets try to understand what's happening here:
the first instruction gets the first value from the stack and puts it into the rcx register
at program start everything will be in this order:
[rsp] - top of stack will contain argument count
[rsp + 8] - will contain argv[0]
[rsp + 16] - will contain argv[1]
etc...
also: why do we compare the argument count to 3 instead of 2?
the first argument is always the program name, so argument 1 is the first
that is also why we shift rsp to 8 before comparing
then we put the first argument into rsi and call the str_to_int function
after out function ends we will have the integer value in rax and will save it to r10
then we do the same thing with r11
now we can get the sum with the add command
the two programs described here are provided within the package as program1 and program2
run with ./program1 or ./program2 {value1} {value2}