Assembly Language Programming For Beginners – Part 1

What are processors?

Processor operations mostly involve processing data. This data can be stored in memory and accessed from thereon. However, reading data from and storing data into memory slows down the processor, as it involves complicated processes of sending the data request across the control bus, and into the memory storage unit and getting the data through the same channel.

What are registers?

To speed up the processor operations, the processor includes some internal memory storage locations, called registers. The registers stores data elements for processing without having to access the memory. A limited number of registers are built into the processor chip.

 

Registers and its classification

2


Let’s go straight to a simple (yes a very simple) Assembly Language Program !!!

Hello World Program:

1.                            section .data

2.                                          msg db ‘Hello, world!’, 0xa

3.                                          len equ $ – msg

 

4.                            section .text

5.                                          global main

 

6.                            main:

7.                                          mov eax,4

8.                                          mov ebx,1

9.                                          mov ecx, msg

10.                                        mov edx, len

11.                                        int 0x80

 

12.                                      mov eax,1

13.                                      int 0x80

Let me start explaining each line:

Line 1:

section .data

This part is used for initializing values. This can also be written as “segment .data”.

It is as simple as before assigning value ‘0’ to a variable named ‘a’ that is ‘a=0’, we need to write this line.

Line 2:

msg db ‘Hello, world!’, 0xa

‘msg’ is a variable which stores the value “Hello, world!”

‘db’ means “data byte” that is bytes of data which can be anything like characters, symbols or digits.

‘0xa’ is to print a new line(that is to give an ENTER after printing).

Hence, this line is like assigning a string(stream of characters) to a variable “msg”. Similar to msg=”Hello, world!”

Line 3:

len equ $ – msg

This calculates the length of the string we have assigned to “msg” variable.

‘len’ is a variable which stores the length of the value of ‘msg’ variable.

‘eq’ means equals to. This is used only for digits.

‘$’ means current address

‘-‘ is an actual minus symbol

The length is calculated by taking the difference between the current address (which is “$” symbol) and the ‘msg’ variable. This is because the pointer is at the end of the memory or simply at the end of the string “Hello, world!”. The ‘len’ variable will hold that length.

Line 4:

section .text

This part is used to write instructions or in simple language actual code or function to perform, can be replaced by “segment .text”.

Before beginning all operations, calculations or instructions, we need to write this line.

Line 5:

global main

It basically means that the symbol should be visible to the linker because other object files will use it. The linker or the assembly language compiler first calls this ‘main’ part when we execute a program.

This is similar to ‘main’ function of C++, Java, etc. This is declared global so that any object files can call it outside this program just like classes of Java calling another class outside that program.

Line 6:

main:

This just marks the beginning of ‘main’ function.

Line 7:

mov eax,4

‘mov’ is used to copy value of one variable to another variable. This can be memory address(which is any variable declared in the program, here we have declared len, msg, etc.) or register. As processing data between registers does not involve memory, it provides fastest processing of data. In this case we are copying value ‘4’ to a register EAX.

But transferring specific digit to a specific register also has some special meaning. Copying specific digit to a specific register is known as system call. In this case this is a system call ‘sys_write’. Parameters for system calls are stored in EBX, ECX, EDX, ESI, EDI, and EBP depending on system call chart.

Don’t worry! Just think in this way that system calls are required to print anything on-screen, accept variables as input for users, etc. Basically for any activity in the program which needs to access resource outside of the program. All we need to do is assign proper variable to the proper register for a particular activity.

Below is the system call chart:

Name %eax %ebx %ecx %edx %esx %edi
sys_exit 1 int
sys_fork 2 struct pt_regs
sys_read 3 unsigned int char * size_t
sys_write 4 unsigned int const char * size_t
sys_open 5 const char * int int
sys_close 6 unsigned int

To print something, we need to inform system that “We need to print something”. Hence, we are assigning 4 to EAX register. From the chart above we can see that for doing a “sys_write”, we need to assign some mandatory parameter to some registers. Here we assign ‘4’ to EAX, an ‘unsigned int’ value to EBX, a ‘char *’ to ECX and a ‘size_t’ i.e. a length to EDX register.

We will perform these operations in next three lines.

Line 8:

mov ebx,1

Now assigning a value ‘1’ to EBX register. This is an ‘unsigned int’ value as it was required for sys_write.

This value when assigned to EBX, the system prepares itself to print to terminal.

Line 9:

mov ecx, msg

Here we are moving value of ‘msg’ to ECX register. This is to intimate system what to print to terminal. This is a “const char *” value as required for sys_write.

Line 10:

mov edx, len

Here we are moving value of ‘len’ to EDX register which is the length of the text to be printed. This is the “size_t” value as required for sys_write.

Line 11:

int 0x80

This calls kernel and is the interrupt handler. The previous 4 steps were just assigning values; this step actually executes them by checking value of each register.

Line 12:

mov eax,1

When we needed to print something, we informed system that “We need to print something”. Now we are done and again we need to inform system that “We are done. Please exit now”.

If you see the system call chart as given above, check we need to assign value ‘1’ to EAX register and an ‘int’ value to EBX. Now EBX already as an integer number. Remember about length of ‘msg’ we stored previously on EBX? Integer can be any number. Hence, we do not need to assign any value again to EBX register.  Hence, we are simply assigning ‘1’ to EAX register.

Line 13:

int 0x80

This calls kernel again. Whenever we perform a system call we need to exit and inform system that we are done. Hence, this line again.

Done!!! See it was really simple. So what have learnt till now?

  1. What are the different program segments
  2. How to assign variables
  3. What is the purpose of MOV
  4. How to print to terminal

 


So every time we need to print anything, we need to write these four lines:

mov eax,4

mov ebx,1

mov ecx,msg

mov edx,len

int 0x80

Wait a minute, don’t you think its time-consuming to write these same set of instructions every time you want to print something? Yes it is! Hence, “macro” comes to the rescue.

Macro is nothing but a set of instructions defined to reuse them multiple times in a program.

So to create a macro for the above instructions, simply write a macro in the program at the top before data or text section.

Re-framing Hello World Program:

1.                      %macro write_string 2

2.                                   mov eax, 4

3.                                   mov ebx, 1

4.                                   mov ecx, %1

5.                                   mov edx, %2

6.                                   int 80h

7.                      %endmacro

 

8.                      section .data

9.                                   msg db ‘Hello, world!’, 0xa

10.                               len equ $ – msg

 

11.                  section .text

12.                               global main

 

13.                  main:

14.                               write_string msg, len

 

15.                               mov eax,1

16.                               int 0x80

Line 1

%macro write_string 2

This line indicates start of a macro.

‘write_string’ is the name of the macro

‘2’ is the number of parameters that the macro will be accepting. The parameters in this case are ‘msg’ and ‘len’, hence ‘2’.

Line 2-6:

These are the set of instructions which the macro will execute everytime it is called.

Line 7:

%endmacro

This indicates end of macro.


Program which accepts user input:

1.                  %macro write_string 2                            ; defining macro for output to terminal

2.                               mov eax, 4

3.                               mov ebx, 1

4.                               mov ecx, %1

5.                               mov edx, %2

6.                               int 80h

7.                  %endmacro

 

8.                  section  .data                            ; declaring variables with initialized values

9.                               userMsg db ‘Please enter a number: ‘

10.                               lenUserMsg equ $-userMsg

11.                               dispMsg db ‘You have entered: ‘

12.                               lenDispMsg equ $-dispMsg

 

13.                  section .bss                            ; declaring variables with uninitialized data

14.                               num resb 5

 

15.                  section .text                            ; starting of main operations/code

16.                               global main

 

17.                  main:

18.                               write_string userMsg, lenUserMsg

19.                               mov eax, 3

20.                               mov ebx, 0

21.                               mov ecx, num

22.                               mov edx, 5

23.                               int 80h

 

24.                               write_string dispMsg, lenDispMsg                  ; calling macro for output

25.                               write_string num,5                                           ; calling macro for output

 

26.                               mov eax, 1                            ; exiting a system call

27.                               int 80h

I guess most of the code is clear except few lines. Let me explain them.

Line 30:

section .bss

This part defines any variables which is not initialized and whose value will be given later in the program. This can also be written as “segment .bss”

Line 31:

num resb 5

RESB, RESW, RESD, RESQ and REST are designed to be used in the BSS section of a module: they declare uninitialized storage space. Each takes a single operand, which is the number of bytes, words, doublewords or whatever to reserve. So RESB is Reserve Byte, RESW is Reserve Word and so on.

In this case, ‘num’ is initialized with space of maximum 5 bytes.

Always remember:

‘db’ is used for initialized data in the data segment.

‘resb’ is used for uninitialized data in the bss segment.

Line 36:

mov eax, 3

This is a system call value to read user input values.

Line 37:

mov ebx, 0

This is to read standard input.

Line 38:

mov ecx, num

This defines value entered by user is stored in unallocated variable ‘num’.

Line 39:

mov edx, 5

This defines length of value which will be entered by user.


Time to execute the program. Below are the steps for saving and executing a program

  1. Save file as asm
  2. To assemble the program type:

nasm -f elf hello.asm

[object file of your program named hello.o will be created]

  1. To link the object file and create an executable file named hello type:

ld -m elf_i386 -s -o hello hello.o

  1. Run the following command to give execute permission:

chmod +x hello.o

  1. Execute the program, type:

./hello.o

aa


Now that we have successfully executed a program and understood how and why each instruction lines are written in the ‘hello world’ program, let’s move to more useful instructions.

P.S. by instructions I mean operations or functions.

 

Logical Instructions:

AND

Use: For clearing one or more bits i.e. to assign 0.

Example:

BL register contains 00111010. If you need to clear the high order bits to zero, you need to perform AND on it with 0FH.

BL : 00111010

0FH : 00001111

Opcode : AND   BL, 0FH

 

BL 00111010
0FH 00001111
AND       BL, 0FH 00001010

Current value of BL : 00001010

OR

Use: For setting one or more bits i.e. to assign 1.

Example:

BL register contains 00111010. If you need to set the four low order bits to one, you need to perform OR on it with 0FH.

BL : 00111010

0FH : 00001111

Opcode : OR   BL, 0FH

 

BL 00111010
0FH 00001111
OR          BL, 0FH 00111111

Current value of BL : 00001010

XOR

Use: To clear a register i.e. for changing all ‘1’ to ‘0’. XORing an operand with itself changes the operand to 0.

Example:

BL register contains 00111010. If you need to change all ‘1’s to ‘0’, you need to perform XOR on it with itself.

BL : 00111010

Opcode : XOR BL, BL

 

BL 00111010
BL 00111010
XOR       BL, BL 00000000

Current value of BL : 00001010

TEST

Use: Used to test ODD or EVEN number. It is generally followed by a jump statement as result obtained is a true or false.

Example:

BL register contains 00111010. If you need to check if it is an odd or even number, you need to perform TEST on it with 0FH.

BL : 00111010

0FH : 00001111

Opcode : TEST                    BL, 0FH
JZ                         EVEN_NUMBER

BL 00111010
0FH 00001111
TEST       BL, 0FH 00001010

 Advantage is even the original value is not changed.

Current value of BL : 00111010 (Original value did not change. It is same as before)

Since, value obtained is not 00000000, this is an even number. Hence, it will jump to label “EVEN_NUMBER”

NOT

Use: Reverses the bits in an operand i.e. changes ‘1’ to ‘0’ and vice-versa

Example:

BL register contains 00111010. If you need change all ‘1’ to ‘0’, you need to perform NOT on the same.

BL : 00111010

Opcode : NOT BL

BL 00111010
NOT    BL 11000101

Current value of BL : 11000101

CMP

Use: Jump depending on a condition.

Example 1: Comparing strings

CMP    DX, 00                      ; Compare the DX value with zero

JE         L7                              ; If yes, then jump to label L7

Here, it is checking if DX register is equals to ‘0’ then jump to label ‘L7’

Example 2:

CMP is often used for comparing whether a counter value has reached the number of time a loop needs to be run. Consider the following typical condition:

INC      EDX                          ; Increments EDX register by 1 i.e. EDX=EDX+1

CMP    EDX, 10                   ; Checks if EDX=10

JLE       LP1                           ; jump to label ‘LP1’ if EDX is less than or equals to 10

This is a typical loop


PUSH & POP

Pushing a value means writing it to the stack. Now what is stack?? Well, stack is nothing but a memory space like a pile of books. When we keep any new book in a pile that is at the top. When we need the any book, we remove all the books above it one by one starting from the top. This is what stack is. It follows the LIFO (Last-In-First-Out) procedure.

Popping means restoring whatever is on top of the stack into a register.

Note: Only words or doublewords could be saved into the stack, not a byte.

Example 1: How to use PUSH/POP

PUSH 0xtestval                  ; push a value (in this case ‘0xtestval’) to the stack

POP EAX                               ; EAX is now ‘0xtestval’.

Example 2: Swapping value of EAX register to EBX and vice-versa

PUSH EAX                            ; value of EAX pushed to stack

MOV EAX, EBX                   ; value of EBX moved to EAX

POP EBX                               ; value of stack moved to EBX

7 thoughts on “Assembly Language Programming For Beginners – Part 1

  1. 95Wendy

    I must say you have very interesting content here.
    Your blog can go viral. You need initial boost only.
    How to get it? Search for; Etorofer’s strategies

Leave a Reply

Your email address will not be published. Required fields are marked *