AVR Linker - Manual Link, Bare Metal and Mixed Source Code

This article discusses how to write assembly code using the GNU avr-as assembler for AVR microprocessor and how to link the object files into AVR executables. Furthermore, it shows how to write a bare metal C program with manual link, and how to write a program with mixed C and assembly code for AVR.

AVR-GCC, AVR-AS, AVR-LD, C language, Assembly language, linker, disassembly, compiler, ABI, AVR, ISA, Application Binary Interface, Instruction Set Architecture, Bare metal, object file, embedded

--by Captdam @ Nov 9, 2025 ~~Nov 6, 2025~~

Index

I used to rely on GCC (or other compilers for specific machines) to compile source code into executable files without worrying about intermediate object files and linking processes. Recently, a project required me to build programs and place the code units into specific memory regions for proper initialization; hence leading me to dive into the linking process.

I choose to start with AVR-GCC linker, not only because I am familiar with the AVR architecture; but also because microprocessors have simpler memory structures than PCs, which allows me to clearly inspect the effect of the linking process.

I decided to post my notes online so I can refer to them as a template in future projects. Furthermore, it seems that most resources online are for Atmel Studio’s avrasm2.exe instead of the GNU avr-as, and I spent days digging documents and BBS to find references and solutions. If you are having trouble finding information about the GNU avr-as, you may find this article helpful.

C Language Program - avr-hello.c

Let's start from a simple C code:


main.c

#include <stdint.h>

volatile uint8_t my_data[4] = {2, 3, 5, 7};

uint8_t main() {
	volatile uint8_t my_var = my_data[1];
	for(;;);
}

In this example, we load an element from an array.

We use volatile keyword to prevent the compiler from optimizing the variable:

The compiler will try to store variables in registers instead of memory, because registers are faster. For some CPUs including AVR, the CPU cannot directly access memory. Data must be loaded into a general purpose register (GPR) before use. Some CPUs, like the 68HC11, allow direct access to memory, but suffer CPU clock. Therefore, it is preferred to use registers instead of memory for speed and space purposes.
If the variable is not used, the compiler will delete that variable and all related compute to save CPU cycles.


avr-gcc -mmcu=atmega328 -O0 main.c -o main.o
avr-objcopy -O ihex main.o main.hex
avr-objdump -D main.hex -m avr5

Compile, create flash file and disassemble the flash file.

We specify -O0 for no optimization to force the compiler translate C source code into machine code line by line, so we can examine the result. Otherwise, the compiler may do a lot of tricks to make the generated code efficient, but extremely hard to analyze.

Following shows the content of the flash file. Let’s divide it into segments:

Vector Table


	0:	0c 94 34 00 	jmp	0x68	;  0x68
	4:	0c 94 49 00 	jmp	0x92	;  0x92
	8:	0c 94 49 00 	jmp	0x92	;  0x92
	c:	0c 94 49 00 	jmp	0x92	;  0x92
       10:	0c 94 49 00 	jmp	0x92	;  0x92
       14:	0c 94 49 00 	jmp	0x92	;  0x92
       18:	0c 94 49 00 	jmp	0x92	;  0x92
       1c:	0c 94 49 00 	jmp	0x92	;  0x92
       20:	0c 94 49 00 	jmp	0x92	;  0x92
       24:	0c 94 49 00 	jmp	0x92	;  0x92
       28:	0c 94 49 00 	jmp	0x92	;  0x92
       2c:	0c 94 49 00 	jmp	0x92	;  0x92
       30:	0c 94 49 00 	jmp	0x92	;  0x92
       34:	0c 94 49 00 	jmp	0x92	;  0x92
       38:	0c 94 49 00 	jmp	0x92	;  0x92
       3c:	0c 94 49 00 	jmp	0x92	;  0x92
       40:	0c 94 49 00 	jmp	0x92	;  0x92
       44:	0c 94 49 00 	jmp	0x92	;  0x92
       48:	0c 94 49 00 	jmp	0x92	;  0x92
       4c:	0c 94 49 00 	jmp	0x92	;  0x92
       50:	0c 94 49 00 	jmp	0x92	;  0x92
       54:	0c 94 49 00 	jmp	0x92	;  0x92
       58:	0c 94 49 00 	jmp	0x92	;  0x92
       5c:	0c 94 49 00 	jmp	0x92	;  0x92
       60:	0c 94 49 00 	jmp	0x92	;  0x92
       64:	0c 94 49 00 	jmp	0x92	;  0x92

When a specific event happens, the CPU will jump to the address in the vector table for that event, providing that specific interrupt is enabled.

In the above example, the first interrupt vector at address 0x0000 is for reset. After reset and hardware internal initialization, the CPU jumps to address 0x0000 (the reset vector). The instruction at that vector is then executed, which in the example, causes the CPU to jump to address 0x0068.

From 0x0004 to 0x0064 are vectors for other ISRs (such as timer overflow, ADC finish, UART sent). Since we didn’t specify any of those ISR in our C code, the compiler puts jmp 0x0092 for us. This causes the CPU to jump to a special handler, the __bad_interrupt. If something goes wrong (for example, mistakenly enabling a timer overflow interrupt without specifying its handler), the CPU will jump to the __bad_interrupt.

The __bad_interrupt served as the last effort. It will eventually reset the software, like we reboot our computer if blue screened. This prevents the CPU executing “dangerous” program.

Different architecture comes with different peripherals, allowing different functionalities; hence different vector tables. Furthermore, size for each vector can be different depending on the program space, because smaller CPUs can use 2-byte rjmp instruction to cover the entire program space; large CPUs must use 4-byte jmp to cover the entire program space.

Despite the architecture, the first vector is always the reset vector. If interruption is not needed, we can omit the vector table.

Initialization


       68:	11 24       	eor	r1, r1
       6a:	1f be       	out	0x3f, r1	; 63
       6c:	cf ef       	ldi	r28, 0xFF	; 255
       6e:	d8 e0       	ldi	r29, 0x08	; 8
       70:	de bf       	out	0x3e, r29	; 62
       72:	cd bf       	out	0x3d, r28	; 61

Above is to prepare the AVR GCC environment.

Clear register R1. AVR-GCC uses R1 as zero register (the value in this register is always 0, it helps with arithmetic operations such as comparing with 0 and adding with 0).

Reset SREG (Status register, IO address 0x3F).

Set SP (stack pointer, IO address 0x3E:0x3D) to top of SRAM (0x08FF for ATmega328).

This section is hardcoded in the AVR-GCC library. AVR-GCC directly copied it from toolchain/avr8/avr8-gnu-toolchain/avr/lib/avr5/crtatmega328.o:.init2, see detail at AVR-LibC Memory Sections . This can be verified with command avr-objdump -d crtatmega328.o.


       74:	11 e0       	ldi	r17, 0x01	; 1
       76:	a0 e0       	ldi	r26, 0x00	; 0
       78:	b1 e0       	ldi	r27, 0x01	; 1
       7a:	ec ea       	ldi	r30, 0xAC	; 172
       7c:	f0 e0       	ldi	r31, 0x00	; 0
       7e:	02 c0       	rjmp	.+4      	;  0x84
       80:	05 90       	lpm	r0, Z+
       82:	0d 92       	st	X+, r0
       84:	a4 30       	cpi	r26, 0x04	; 4
       86:	b1 07       	cpc	r27, r17
       88:	d9 f7       	brne	.-10     	;  0x80

After reset, the RAM for data memory is either empty (from power-on) or contaminated (from watchdog reset or bad interrupt handler). However, when we write our code, we expect the defined variables to have the defined value (in this example, the static array my_data[4]); and the declared variables to be 0. AVR-GCC did the initialization for us. Before it enters the user main() function, it puts the default value (the value defined by us using = {2, 3, 5, 7}) in ROM and copies them into RAM at the initialization stage.

In this example, the value is saved in flash (program) address 0x00AC and the variable is located at RAM (data) address 0x0100, length is 4 bytes. The following task is performed:


for addrRAM = 0x0100 to 0x0103, addrProg = 0x00AC to 0x00AF:
	RAM[addrRAM ] ← Prog[addrProg]

Entry Point


       8a:	0e 94 4b 00 	call	0x96	;  0x96
       8e:	0c 94 54 00 	jmp	0xa8	;  0xa8

Initialization finished. Call the user supplied main function at address 0x0096. Once the main function returns, jump to library supplied dead loop at address 0x00A8.

Bad ISR Handler


       92:	0c 94 00 00 	jmp	0	;  0x0

The bad ISR handler, which re-initialize the MCU.

Main Function


uint8_t main() {
	volatile uint8_t my_var = my_data[1];
	for(;;);
}


       96:	cf 93       	push	r28
       98:	df 93       	push	r29
       9a:	1f 92       	push	r1
       9c:	cd b7       	in	r28, 0x3d	; 61
       9e:	de b7       	in	r29, 0x3e	; 62
       a0:	80 91 01 01 	lds	r24, 0x0101	;  0x800101
       a4:	89 83       	std	Y+1, r24	; 0x01
       a6:	ff cf       	rjmp	.-2      	;  0xa6

Finally, our main function.

As the AVR-GCC ABI specified, call-saved registers (in this example, R28 and R29) must be saved if clobbered. Therefore, they are pushed into the stack at the beginning of the function. This register pair is then used as a frame pointer.

Local variable uint8_t my_var is stored in stack, more specifically, the function frame.

Recall: A variable declared outside function (declared globally) is static and will be saved in RAM at a compile-time determined, fixed address in the RAM. A variable declared inside a function is local and will be saved in stack (function frame) at function initialization. A frame pointer is used to access data (local variables) in the frame.

AVR Stack pointer points to the next available slot and grows downwards. AVR doesn’t support stack pointer addressing; we must use an index register (X, Y or Z) to build a stack pointer to access data in RAM. Furthermore, AVR index registers don’t support negative offset; therefore, the frame pointer must point to the lowest position and use positive offset to address stack variables. To do so, we first decrease the stack pointer by 1, then copy SP into index register Y (R29:R28).

Load my_data[1] and store it in a local variable my_var. Since my_data[1] is a static variable and its address is fixed (known at compile time), we will use direct addressing lds Rd, absolute_address to load this value. Local variable my_var is local to the function and it is in the stack, so the address is not known at compile time; therefore, we have to use indexed addressing std, Y+positive_offset, Rs via the frame pointer to address it.

Here is a trick. To allocate a frame in the stack, simply grow the stack (move the SP down). AVR-GCC will use shortest instructions with no harmful side effects:

In this example, AVR-GCC uses push r1 to simply move SP down by 1. Although this instruction saves the value in R1 into the stack, it is a side effect with no harm. The actual needed data (our local variable) will override it later.

AVR-GCC may use rcall 0 (relative call) to allocate (move down) 2 bytes with only 1 instruction word. This instruction is used to call a routine. It will push the current PC into the stack, since the PC is 2 bytes long, it allocates 2 bytes of space; then jump to the routine which is 0 words away from the current address. The side effect is the PC will be modified by adding 0, which is nothing.

AVR-GCC may use sbiw r28, k to allocate (move down) k bytes. This method is used when more than 2 bytes are required.

AVR-GCC adds 0x800000 to the data address. This helps seperate the data in data space (in RAM) from program code in program space (in Flash). When burning the program to MCU, the higher bits will be truncated.

End


       a8:	f8 94       	cli
       aa:	ff cf       	rjmp	.-2      	;  0xaa

Dead loop as mentioned above.

Static Data


       ac:	02 03       	mulsu	r16, r18
       ae:	05 07       	cpc	r16, r21

Our data array. As mentioned above, initial values of variables are saved in program space and copied into data space at initialization. This is data, the disassembled opcode has no meaning.

Assembly Language Program - avr-hello.asm

The Assembly Source Code

Now, let’s try the same example using assembly code:

We will be using the GNU assembler avr-as. The assembly editor of Atmel Studio uses avrasm2.exe. They use DIFFERENT syntax and they interpret addresses DIFFERENTLY. DO NOT BE CONFUSED!


main.asm

Description


.data
my_array:	.space	4
my_array_end:

In this assembly program, we have 2 segments: the .data segment for data memory in the RAM and the .text segment for the program in the flash.

In the .data segment, we first create a symbol my_array for the static variable and allocate 4 bytes of space. We also declare a symbol my_array_end immediately following my_array, so, my_array_end‘s address will be the address of my_array plus the size of my_array. This helps us to get the size of my_array, example: size_t size = my_array_end - my_array and for (void* ptr = my_array; ptr < my_array_end; ptr++).

We only declare static variables in the .data segment. Declaring variables in the .data segment allocates the variables a fixed address in the data memory. Local variables are stored in the stack and its address can change as the frame position of its parent function changes. Therefore, we should not declare local variables in the data segment.

Space is always in the unit of bytes. When writing assembly code, we should know the size of our data type: such as char for 1 byte, int for 2 bytes (and yes, most 8-bit systems including AVR-GCC set int for 2 bytes!), and long for 4 bytes.


.text

my_routine:
	ldi	r16, 0x08
	out	SPH, r16
	ldi	r16, 0xFF
	out	SPL, r16

Here comes our program, the .text segment.

First, initialize the stack pointer (SP) to the top of ATmega328 RAM, which is 0x08FF.


	ldi	r31, hi8(my_array_flash)
	ldi	r30, lo8(my_array_flash)
	ldi	r29, hi8(my_array)
	ldi	r28, lo8(my_array)
	ldi	r27, hi8(my_array_end)
1:	lpm	r0, Z+
	st	Y+, r0
	cpi	r28, lo8(my_array_end)
	cpc	r29, r27 
	brne	1b

In C language, AVR-GCC has placed the defined value into the static variable my_array for us; in assembly code, we have to do this ourselves.

We will use index register Z (R31:R30) as a read pointer to the program space, pointing to the start of the defined data saved in program space my_array_flash. We also use index register Y (R29:R28) as a write pointer to the data space, pointing to the start of the static variable in data space my_array.

A for loop is used to copy the data, using the special lpm (load from program space) instruction. Write starting from the symbol my_array, until but exclude my_array_end.

Watch for the syntax difference. GNU avr-as uses hi8() and lo8() to take the higher and lower byte of a multi-byte value; Atmel Studio avrasm2.exe uses HIGH() and LOW().


	push	r0		; Allocate stack
	in	r29, SPH
	in	r28, SPL
	lds	r0, my_array + 1
	std	Y + 1, r0

Use the push instruction to move SP down for 1 byte, this will allocate the stack space for local variable my_var.

Copy the SP to index register Y to build a frame pointer.

Load the data my_data[1] into a temprorial register R0. Then, use frame pointer to write this data into local variable my_var in stack. Because my_data is static, its address is known at compile-time; we can encode its address in the instruction using direct addressing mode lad Rd, address.


loop:	jmp	loop

Dead loop to stop the program.


my_array_flash:	.byte	2, 3, 5, 7

Initial value for the static variable saved in program space.

In AVR-GCC, the main() function is the same as other functions, it follows the same rule for all functions: preserve all call-save registers (in this example, R28 and R29 used as index register Y). Although the main() function will never return, the compiler still makes sure these registers can be restored at the end.

Technically, in C, although we are taught to write a dead loop at the end of the main() function, we don’t really need to do so. We can simply return, and the library supplied dead loop will be executed.

In assembly, we have more control. We know we are not going to return from our main function, (in this example, the my_routine). There is no need to preserve the call-saved registers.

Furthermore, if we are gonna write the entire program ourselves in assembly, we don’t need to worry about the ABI. We can define our own rule for call-save and call-used registers.

The Assembled Machine Code


avr-as -m avr5 main.asm -o main.o
avr-objdump -Dx main.o

Assemble. We specify the device family by using -m family.For this example, ATmega328 is in avr5 family. Some instructions like jmp are only available for certain families. Without specifying the family, AVR-AS will use the default family which may not support certain instructions; hence, the assembler will report an error.

Disassemble. The -D and -x flags for avr-objdump mean disassemble and symbol table, respectively.

Following shows the content of the disassemble: (for your convenience, source code on the side for reference)


Disassembly of section .text:

00000000 <my_routine>:
0:	08 e0       	ldi	r16, 0x08	; 8
2:	00 b9       	out	0x00, r16	; 0
4:	0f ef       	ldi	r16, 0xFF	; 255
6:	00 b9       	out	0x00, r16	; 0

8:	f0 e0       	ldi	r31, 0x00	; 0
a:	e0 e0       	ldi	r30, 0x00	; 0
c:	d0 e0       	ldi	r29, 0x00	; 0
e:	c0 e0       	ldi	r28, 0x00	; 0
10:	b0 e0       	ldi	r27, 0x00	; 0
12:	05 90       	lpm	r0, Z+
14:	09 92       	st	Y+, r0
16:	b0 30       	cpi	r28, 0x00	; 0
18:	cb 07       	cpc	r29, r27
1a:	01 f4       	brne	.+0      	; 0x1c <my_routine+0x1c>

1c:	0f 92       	push	r0
1e:	d0 b1       	in	r29, 0x00	; 0
20:	c0 b1       	in	r28, 0x00	; 0
22:	00 90 00 00 	lds	r0, 0x0000	; 0x800000 <my_array_flash+0x7fffd4>
26:	09 82       	std	Y+1, r0	; 0x01

00000028 <loop>:
28:	0c 94 00 00 	jmp	0	; 0x0 <my_routine>

0000002c <my_array_flash>:
2c:	02 03       	mulsu	r16, r18
2e:	05 07       	cpc	r16, r21

Disassembly of section .data:

00000000 <my_array>:
0:	00 00       	nop
     ...


.text

my_routine:
	ldi	r16, 0x08
	out	SPH, r16
	ldi	r16, 0xFF
	out	SPL, r16

	ldi	r31, hi8(my_array_flash)
	ldi	r30, lo8(my_array_flash)
	ldi	r29, hi8(my_array)
	ldi	r28, lo8(my_array)
	ldi	r27, hi8(my_array_end)
1:	lpm	r0, Z+
	st	Y+, r0
	cpi	r28, lo8(my_array_end)
	cpc	r29, r27 
	brne	1b

	push	r0		; Allocate stack
	in	r29, SPH
	in	r28, SPL
	lds	r0, my_array + 1
	std	Y + 1, r0

loop:	jmp	loop

my_array_flash:	.byte	2, 3, 5, 7


.data
my_array:	.space	4
my_array_end:

As we can see, some information is missing. For example:

At disassembled program address 0x0002, instruction out SPL, r16 should be converted to out 0x3D, r16, as the IO address of SPL is 0x3D; however, we got out 0x00, r16 in the disassembly. Same happened for SPH in the second next instruction.
At disassembled program address 0x0008, instruction ldi r31, hi8(my_array_flash) should be converted to ldi r31, 0x002C since the address of my_array_flash is 0x002C. Same happened for my_array and my_array_end in the following instructions.

Symbol Table

Now, let’s take a look at the symbol table.


00000000 l    d  .text	00000000 .text
00000000 l    d  .data	00000000 .data
00000000 l    d  .bss	00000000 .bss
00000000 l       .data	00000000 my_array
00000004 l       .data	00000000 my_array_end
00000000 l       .text	00000000 my_routine
0000002c l       .text	00000000 my_array_flash
00000028 l       .text	00000000 loop

All symbols declared in the assembly code are local to that file. For example, a line in that symbol table shows:


00000004 l       .data	00000000 my_array_end

The symbol named my_array_end has a local offset of 0x00000004 in the .data segment.

This local offset address is used to inform the linker where is that symbol. It cannot be used in the final instruction, because the CPU expect the absolute address of that symbol.


00000000         *UND*	00000000 SPH
00000000         *UND*	00000000 SPL

For symbols not declared (for example, SPL), the symbol table specifies UND for unknown.

In C program, extern keyword is required if a function or a variable is not declared. It tells the compiler that, the symbol is declared externally, it can be resolved later. More specifically, it can be linked later, discussed in next section.

So, the compiler won’t complain about not finding the symbol in the current source file.

Normally, we should add .extern keywords. The GNU Assembler gnu-as or gas is a little bit different, it treats all undefined symbols as external.

Assembled is NOT Finished nor Executable

Why does the disassembled code show all the address zero?

First, we need to know what the GNU assembler avr-as is doing. It translates the human-readable mnemonic in assembly source code file into binary machine code, one file at a time.

The assembler creates .data segments and .text segments based on the input source code. However, it has no idea where the .text segment and .data segment should be placed in the actual machine. Therefore, the assembler cannot determine the absolute address for them.

Furthermore, multiple object files can be put together. Absolute addresses for .text segment and .data segment will cause conflict between object files. Again, assembler assembles one file at a time. When it assembles object file A, it cannot make any arrangement for object file B.

Therefore, the assembler will not put the absolute address. Instead, it will:

Put a placeholder for symbols. The placeholder can be resolved when linking (the next stage) all object files into executable. At this stage, the linker has all the information to arrange addresses for symbols from different object files.
Using relative addresses for internal operations. The assembler knows the size of some symbols; hence, it can use the size to calculate the relative offset of symbols inside the code block. This allows the linker to move the code block around.

Linker

The assembly is only the first half of the job. We will need to link the object files to create the final executable file. As in previous section of this article, we have to address the following problems:

What is the memory structure? Different microprocessors have different memory structures. For example, the data memory of ATmega328 starts at 0x0100 and ends at 0x08FF; on the other hand, the data memory of ATtiny25 starts at 0x0060 and ends at 0x00EF. Should we place the .data segment at 0x0100 or at 0x0060?
What is the address of the IO registers? Different machines come with different peripherals; hence, different IO register addresses.
Where to place the .text segment? This includes program code, vector table, and program data. Furthermore, some MCUs (like the ATmega328) have a special bootloader region that has different programming properties than regular program regions. The size and starting address of the bootloader region can be adjusted by fuse as well. For those MCUs, it is important to assign .text segments into the proper region.
When multiple object files (or multiple sections) are provided, how to order them? Most importantly, which program code should be executed first?

The tool avr-gcc is a combination of compile and link. We will get an error if:

We didn’t specify the device using -mmcu=xxx. AVR-GCC needs to know the machine name to find the memory layout and addresses for IO registers.
We used an undefined symbol (variable or function name) in our code. A symbol must be defined so AVR-GCC can resolve its address.
We didn’t define the main() function. AVR-GCC uses this name as the default entry point; that is, the first program to execute after initialization.

We can use gcc -c to compile without linking. This eliminates the above error; but the generated cannot be executed before link.

Object File for Machine-specific IO Register Addresses

Let’s first provide the unsolved symbols. To do so, we create a assembly file:


m328.asm

.global SPL
.equ	SPL	, 0x3D
.global SPH
.equ	SPH	, 0x3E

In this example, we use .equ to assign absolute values (the absolute IO space addresses) to the symbols.

We also make them .global. This expose the given symbols to the linker. So, other objects files can reference them as .extern symbols at linking.

It may be a good idea to create an object file containing the addresses of all IO registers for the given microprocessor, as a stand-alone machine library. In this way, we can use the object file to resolve IO registers addresses in future projects.


avr-as m328.asm -o m328.o
avr-objdump m328.o -Sx

Assemble. Then, disassemble the object file to check its content:


SYMBOL TABLE:
00000000 l    d  .text	00000000 .text
00000000 l    d  .data	00000000 .data
00000000 l    d  .bss	00000000 .bss
0000003d g       *ABS*	00000000 SPL
0000003e g       *ABS*	00000000 SPH

As we can see, the absolute IO space address of the symbols are now globally available.

Linker Script

Now, let’s create the linker script to instruct the linker how to link the object files. In this example, we only consider the .text segment for program code and .data segment for static data. Fuse and EEPROM are out of this article’s scope.

In this section, I will use name convention like my_function to show that the specific field can have an arbitrary name.


main.ld

MEMORY {
	my_code(rx): ORIGIN = 0x00, LENGTH = 32K
	my_data(rw): ORIGIN = 0x800100, LENGTH = 2K
}

SECTIONS {
	. = ORIGIN(my_code);
	.text : {
		*(.text)
	} >my_code
	
	. = ORIGIN(my_data);
	.data : {
		*(.data)
	} >my_data
}

First, we specify the MEMORY structure according to microprocessor memory size and structure:

We call the program memory my_code, it is readable and executable. Its address starts from 0x0000 and its length is 32K bytes.
We call the data memory my_data, it is readable and writable. Its address starts from 0x800100 and its length is 2K bytes. The first 0x100 bytes of data space are used by IO registers; therefore, the RAM starts at 0x100.

Length is in bytes.

Linker put memories in different regions according to its attributes (readable, writable and executable), such as .rodata for read-only data. However, this rule doesn’t apply to our example. We are directly placing .text and .data in a simple hardware. We can specify both my_code and my_data read-only and it will not change the behaviour of our program at all.

We add an offset of 0x800000 to the data segment. This is not mandatory. We do this to align with AVR-GCC’s memory layout. This offset will be truncated because AVR's address space is limited (only lower bits count).

Next, we tell the linker how to place the object files into SECTIONS:

Starting from the origin of my_code (address 0x0000) is the .text segment. Place .text segments from all input object files into this section. This section will be assigned to my_code.
Starting from the origin of my_data (address 0x800100) is the .data segment. Place .data segments from all input object files into this section. This section will be assigned to my_data.

Because there is only one object file with program code, and the offset of our my_code section is 0, the absolute address of each instruction will be the same as their local address in that object file.

After reset, the hardware start the execution at program address 0x0000; therefore, the entry point will be the first instruction in that object file.

The Final Executable


avr-ld -T main.ld m328.o main.o -o app.elf
avr-objdump -Dx app.elf

Link. Disassemble to examine the result:


SYMBOL TABLE:
00000000 l    d  .text	00000000 .text
00000030 l    d  .trampolines	00000000 .trampolines
00800100 l    d  .data	00000000 .data
00000000 l    df *ABS*	00000000 main.o
00800100 l       .data	00000000 my_array
00800104 l       .data	00000000 my_array_end
00000000 l       .text	00000000 my_routine
0000002c l       .text	00000000 my_array_flash
00000028 l       .text	00000000 loop
0000003d g       *ABS*	00000000 SPL
0000003e g       *ABS*	00000000 SPH

In the symbol table, all symbols have proper values now. For example:


00800104 l       .data	00000000 my_array_end

The symbol named my_array_end is now at address 0x00800104.

Furturemore, undefined symbols for IO registeres are now defined globally and absolutely. For example:


0000003d g       *ABS*	00000000 SPL

The symbol named SPL is now at address 0x3D.


Disassembly of section .text:

00000000 <my_routine>:
   0:	08 e0       	ldi	r16, 0x08	; 8
   2:	0e bf       	out	0x3e, r16	; 62
   4:	0f ef       	ldi	r16, 0xFF	; 255
   6:	0d bf       	out	0x3d, r16	; 61
   8:	f0 e0       	ldi	r31, 0x00	; 0
   a:	ec e2       	ldi	r30, 0x2C	; 44
   c:	d1 e0       	ldi	r29, 0x01	; 1
   e:	c0 e0       	ldi	r28, 0x00	; 0
  10:	b1 e0       	ldi	r27, 0x01	; 1
  12:	05 90       	lpm	r0, Z+
  14:	09 92       	st	Y+, r0
  16:	b4 30       	cpi	r28, 0x04	; 4
  18:	cb 07       	cpc	r29, r27
  1a:	d9 f7       	brne	.-10     	; 0x12 <my_routine+0x12>
  1c:	0f 92       	push	r0
  1e:	de b7       	in	r29, 0x3e	; 62
  20:	cd b7       	in	r28, 0x3d	; 61
  22:	00 90 01 01 	lds	r0, 0x0101	; 0x800101 <my_array+0x1>
  26:	09 82       	std	Y+1, r0	; 0x01

00000028 <loop>:
  28:	0c 94 14 00 	jmp	0x28	; 0x28 <loop>

0000002c <my_array_flash>:
  2c:	02 03       	mulsu	r16, r18
  2e:	05 07       	cpc	r16, r21

Disassembly of section .data:

00800100 <my_array>:
  800100:	00 00       	nop
	...

In the disassembled code, correct addresses are inserted. For example:


  16:	b4 30       	cpi	r28, 0x04	; 4

is assembled from cpi r28, lo8(my_array_end). The lower byte of the symbol my_array_end is 0x04, which can be seen in the instruction.

Baremetal C

C Source Code

In this C language example, we will perform the initialization ourselves. Let’s consider the following C program:


m328.c

#include <stdint.h>
#include <avr/io.h>
#include <avr/pgmspace.h>

volatile const uint8_t my_array[] PROGMEM = {2, 3, 5, 7};

volatile uint8_t some_data1, some_data2;

void __attribute__((noinline)) setOutputPins() {
	volatile uint8_t dir = 0b11111111;
	DDRB = dir;
}

void __attribute__((naked)) something() {
	setOutputPins();
	some_data1 = pgm_read_byte(&(my_array[2]));
	for(;;);
}

Data in Program

For defined variables, the initial values are saved in the program space and copied into data space during initialization. In AVR-GCC, the compiler did this for us; Now, we want to do this ourselves.

To store a data in program space, we will need to use the special PROGMEM keyword from the AVR library avr/pgmspace.h:


volatile const uint8_t my_array[] PROGMEM = {2, 3, 5, 7};

To read a data from the program space, use:


dest = pgm_read_byte(&(program_data));

Access IO Registers

To access the IO register of the microprocessor, we will need to include the AVR library avr/io.h.

When we compile the code using avr-gcc -mmcu=xxx, the compiler will find the IO definition file for that specific MCU. This file contains the addresses and bit definition of all IO registers.

Static Variables

microprocess

We also declare 2 static variables. C compilers place variables in data space by default; therefore, we do not need to specifically tell the compiler to do so.

Function Call and Entry Point

To simulate function call and stack operation during function invocation, we create a function void setOutputPins(). This function will load the IO direction into a local (stack) variable first; then, write that into the IO register. The compiler may inline our function for optimization purposes. To prevent that, which is to preserve the purpose of this example: testing the function call, we put the noinline attribute on this function.

We are not gonna name our main function with the default entry point name main(); instead, we are gonna call it something(). We will make this function the entry point in the linker script. Furthermore, because this function is the top function, there is no need to preserve the stack when entering this function (On the other hand, any callee function must preserve the stack for the caller because the caller doesn’t expect any change in its stack by the callee). Therefore, we put the naked attribute to strip the function prologue and epilogue. In this function, we will call the child function, read a value from program space into a static variable in data space.

Compile No Link


avr-gcc -c main.c -O3 -ffunction-sections -fdata-sections -o main.o -mmcu=atmega328
avr-objdump -Dx main.o

Now, compile this program with the following flags:

-c means no link, compile only. We will manually link later.
-O3 allows the compiler to perform some optimization so the generated code won’t be too stupid to read.
-ffunction-sections and -fdata-sections creates sections for each function and data variables, respectively. This helps us to specify functions and data variables in the linker script.
Although we will link this program later manually, we do like the convenience that AVR-LibC can provide the address of IO registers of a specific microprocessor. We include the header avr/io.h in our source code and specify the device using -mmcu=atmega328 option in the command. Furthermore, this enables the compiler to use some machine-specific instructions to increase performance, such as MUL for multiplication.

Then, disassemble to examine the result: (for your convenience, source code on the side for reference)

Child Function


Disassembly of section .text.setOutputPins:

00000000 <setOutputPins>:
   0:	cf 93       	push	r28
   2:	df 93       	push	r29
   4:	1f 92       	push	r1
   6:	cd b7       	in	r28, 0x3d	; 61
   8:	de b7       	in	r29, 0x3e	; 62
   a:	8f ef       	ldi	r24, 0xFF	; 255
   c:	89 83       	std	Y+1, r24	; 0x01
   e:	89 81       	ldd	r24, Y+1	; 0x01
  10:	84 b9       	out	0x04, r24	; 4
  12:	0f 90       	pop	r0
  14:	df 91       	pop	r29
  16:	cf 91       	pop	r28
  18:	08 95       	ret


void __attribute__((noinline)) setOutputPins() {
	volatile uint8_t dir = 0b11111111;
	DDRB = dir;
}

To access a variable in stack space, we must use an index register as a frame pointer. In this example, index register Y (R29:R28) is used. Since this will clobber the original content of the index register, we must preserve its content by pushing them into the stack at the beginning of this function.

Push a random value into the stack to allocate 1-byte space for local variable uint8_t dir.

Copy the stack pointer SP to index register Y to create the frame pointer.

Use immediate addressing mode to load a value 0b11111111 into a temporal register R24; then, write that value into a local variable via frame pointer. In AVR-GCC ABI, R24 is a call-used register; hence, it doesn’t need to be preserved.

Load that variable dir back from the stack into register R24; then, write to the IO register DDRB. Note, in the disassembled code, the IO register DDRB has the correct address 0x04, because we included the avr/io.h header and compiled it against the specified microcontroller name.

Deallocate the stack, restore the original value of the index register Y, and return from this function.

Main Function


Disassembly of section .text.something:

00000000 <something>:
   0:	0e 94 00 00 	call	0	; 0x0 <something>
			0: R_AVR_CALL	.text.setOutputPins
   4:	e0 e0       	ldi	r30, 0x00	; 0
			4: R_AVR_LO8_LDI	.progmem.data+0x2
   6:	f0 e0       	ldi	r31, 0x00	; 0
			6: R_AVR_HI8_LDI	.progmem.data+0x2
   8:	e4 91       	lpm	r30, Z
   a:	e0 93 00 00 	sts	0x0000, r30	; 0x800000 <__SREG__+0x7fffc1>
			c: R_AVR_16	some_data1
   e:	00 c0       	rjmp	.+0      	; 0x10 <__zero_reg__+0xf>
			e: R_AVR_13_PCREL	.text.something+0xe


void __attribute__((naked)) something() {
	setOutputPins();
	some_data1 = pgm_read_byte(&(my_array[2]));
	for(;;);
}

Call the function setOutputPins(). Note, we have yet to link; so, the function address in program space is unknown. Placeholder with address zero is inserted.

Use index register Z (R31:R30) to load a value from program space. Again, we have yet to link; so, the data address in program space is unknown. Placeholder with address zero is inserted.

Write that value to a static variable using direct addressing mode instruction sts addr, Rs. Again, we have yet to link; so, the data address in data space is unknown. Placeholder with address zero is inserted.

At the end, a dead loop. Annnnnnnnnnnd again, we have yet to link; so, the instruction address of the dead loop (the current instruction) in program space is unknown. Placeholder with address zero is inserted. In fact, this one can be resolved using relative addressing now; but the compiler decide to leave it for the linker.

Manual Link

Now, let's create our linker script:


main.ld

MEMORY {
	my_code(rx): ORIGIN = 0x00, LENGTH = 32K
	my_data(rw): ORIGIN = 0x800100, LENGTH = 2K
}

SECTIONS {
	. = ORIGIN(my_code);
	.text : {
		main.o(.text.something)
		*(.text)
	} >my_code
	
	. = ORIGIN(my_data);
	.data : {
		*(.data)
	} >my_data
}

This is exactly the same as the linker script we used in the previous example, except the additional line: main.o(.text.something). We instruct the linker to place mian.o(.text.something) at the beginning of the text segment. In other words, function something() from .text segment in object file main.o should be placed at program space address 0x0000. That is, after reset, the microprocessor starts executing the program from this address. This eventually set main.o(.text.something) as the entry point.


avr-ld -T main.ld main.o -o app.elf
avr-objdump -Dx app.elf

Link. Disassemble to examine the result:


Disassembly of section .text:

00000000 <something>:
   0:	0e 94 08 00 	call	0x10	; 0x10 <setOutputPins>
   4:	ec e2       	ldi	r30, 0x2C	; 44
   6:	f0 e0       	ldi	r31, 0x00	; 0
   8:	e4 91       	lpm	r30, Z
   a:	e0 93 00 01 	sts	0x0100, r30	; 0x800100 <some_data1>
   e:	ff cf       	rjmp	.-2      	; 0xe <__zero_reg__+0xd>

Disassembly of section .text.setOutputPins:

00000010 <setOutputPins>:
  10:	cf 93       	push	r28
  12:	df 93       	push	r29
  14:	1f 92       	push	r1
  16:	cd b7       	in	r28, 0x3d	; 61
  18:	de b7       	in	r29, 0x3e	; 62
  1a:	8f ef       	ldi	r24, 0xFF	; 255
  1c:	89 83       	std	Y+1, r24	; 0x01
  1e:	89 81       	ldd	r24, Y+1	; 0x01
  20:	84 b9       	out	0x04, r24	; 4
  22:	0f 90       	pop	r0
  24:	df 91       	pop	r29
  26:	cf 91       	pop	r28
  28:	08 95       	ret

Disassembly of section .progmem.data:

0000002a <my_array>:
  2a:	02 03       	mulsu	r16, r18
  2c:	05 07       	cpc	r16, r21

Disassembly of section .bss:

00800100 <some_data1>:
	...

00800101 <some_data2>:
	...

As we can see, function something() is placed at the beginning of the text segment at program address 0x0000. Furthermore, correct addresses are assigned to symbols in something(). For example, call 0x10 on program address 0x0000 now has the correct address of routine setOutputPins() (0x0010).

Mixing C and Assembly Code

In the following example, we will build a project from a mix of both assembly code and C code. This is helpful when specific language is favored for different sections of the project. For example, we can C to write most parts of the project; use assembly for ISR (interrupt service routine) and time critical routines.

In this example, we will use 2 PWM signals to drive LEDs. A timer overflow is used to change the intensity of the LEDs (duty of PWM) periodically. We will have 4 source code files for this project:

vector.asm - An assembly code specifying the vector table.
startup.c - A C function to initialize the IO registers and peripherals.
task.c - A C function to modify the PWM signals.
timer.asm - An assembly ISR to change the duty cycle of the PWM signals, executed periodically by timer overflow.

And 1 linker script:

make.ld - Linker script.

`vector.asm` - Assembly code specifying the vector table


vector.asm

.text
. = 0
	rjmp	startup
. = 5 * 2
	rjmp	timer
. = 8 * 2
	rjmp	timer

According to ATtiny24/44/84 [DATASHEET], reset vector is at program address 0x0000, timer 1 capture interrupt vector at 0x0005, timer 1 overflow vector at 0x0008. When a specific interrupt event is fired, the microprocessor hardware will jump to its associated vector. We place rjmp instruction at those locations to instruct the microprocessor to jump to the corresponding ISR.

The manual uses the word address, and each program word is 2-byte long. The GNU AVR assembler uses the byte address; therefore, we need to multiply the address by 2 when specify program space address in GNU avr-as.

Relative jump rjmp, which is 16 bits long (12-bit signed address), can address the entire program space for devices with less than 4K word of program memory, such as ATtiny25, ATtiny84. On these devices, each vector is 1-word long and uses rjmp instruction. On the other hand, devices with more than 4K program space, jump jmp, which is 32 bits long (22-bit address), can address the entire 4M words program space, such as ATmega328. On these devices, each vector is 2-word long and uses jmp instruction.

For example, after reset, the hardware will cause the CPU to go to program address 0x0000. The instruction we placed at program address 0x0000 is rjmp startup, which will cause the CPU to jump to the address represented by startup. The actual address of startup is unknown at now, it will be resolved at the linking stage.

Assemble with: avr-as vector.asm -mmcu=attiny44 -o vector.o

`startup.c` - C Function to Initialize the IO Registers and Peripherals


startup.c

#include <avr/io.h>
#include <avr/interrupt.h>

#include "task.h"

extern volatile uint8_t timer_a_scale, timer_b_scale;

void startup() {
	SP = 0x015F; // Top of t44
	asm("clr r1");

	DDRA = 0b10000000; //Output on PA7 and PB2
	DDRB = 0b0100;
	TCCR0A = (2<<COM0A0) | (2<<COM0B0) | (3<<WGM00); // Fast PWM
	TCCR0B = (1<<CS00); // clk/1
	OCR0A = 0;
	OCR0B = 0;

	TCCR1B = (3<<WGM12) | (4<<CS10); // CTC on ICR1, clk / 8
	ICR1 = 200;
	TIMSK1 = (1<<ICIE1) | (1<<TOIE1); // Interrupt on overflow

	timer_a_scale = 4;
	timer_b_scale = 16;

	task_mailbox = 0; // Clear mailbox

	sei();
	task_loop();
}

Recall, in initialization, the stack pointer should be set to top of stack, the zero register R1 should be cleared, and the defined static variables should be assigned with initial value. We performed so at the beginning of our startup() function.

To allow PWM output, we will need to set the associated pin to output mode (by default, all pins are input after reset to prevent short circuit). Futuremore, we will set up the timer peripherals to allow the hardware to generate PWM signals for us. In this example, we set time 0 to be in fast PWM mode and output PWM signal on both channels.

To allow periodic events, we will need to enable timer overflow interrupt on timer 1. In this example, we will let timer 1 run at 1/256 of CPU speed (which is 8kHz with 1MHz CPU). We set the top of the timer to be 200, which means the timer should overflow at a rate of 8kHz / 200 = 40Hz.

We introduce 2 external variables: timer_a_scale and timer_b_scale. They define how much to add to the light intensities (PWM duty cycles) at everytime the timer overflow. We use 4 for timer_a_scale and 16 for timer_b_scale; in other words, the intensity of LED A should be increased by 4 and the intensity of LED B should be increased by 16 everytime. Futuremore, because the MCU uses 8-bit wide registers for PWM duty cycles, PWM A will overflow (and wrap back) every 256 / 4 = 64 events; PWM B will overflow every 256 / 16 = 16 events. Because the timer overflows at the rate of 40Hz, we should see the intensity of LED A and LED B wrap back every 1.6s and 0.4s, respectively.

Clear mailbox variable task_mailbox. This variable is not defined in this file; AVR-GCC will treat it as external variable although not specified.

Finally, we enable global interrupt and invoke the main loop of this program, thetask_loop().

Compile with: avr-gcc -c -O3 -mmcu=attiny44 startup.c -o startup.o

`task.c` - C Function to Modify the PWM Signals


task.h

#include <stdint.h>

volatile uint8_t task_a, task_b, task_mailbox;

void task_loop();

Declare 3 global variables: task_a, task_b and task_mailbox. task_a and task_b contain the new intensity of LEDs; task_mailbox is used to request update PWM peripherals using the new intensity values.


task.c

#include <avr/io.h>

#include "task.h"

void task_loop() {
	for(;;) {
		if (!task_mailbox)
			continue;
		OCR0A = task_a;
		OCR0B = task_b;
		task_mailbox = 0;
	}
}

This function is a dead loop and should be consider the main task of the program.

If the mailbox is set, this function will update the value from task_a and task_b to PWM peripheral register OCR0A and OCR0B, which set new PWM duty cycles; then, clear the mailbox.

Otherwise, do nothing.

Compile with: avr-gcc -c -O3 -mmcu=attiny44 task.c -o task.o

`timer.asm` - Assembly Timer Overflow ISR to Update PWM Duty Cycle


timer.asm

.data

.global timer_a_scale
.global timer_b_scale

timer_a_scale:	.space	1
timer_b_scale:	.space	1

.text

.global timer
timer:
	push	r30
	push	r31
	lds	r31, timer_a_scale
	lds	r30, task_a
	add	r30, r31
	sts	task_a, r30
	lds	r31, timer_b_scale
	lds	r30, task_b
	add	r30, r31
	sts	task_b, r30
	ldi	r31, 0xFF
	sts	task_mailbox, r31
	pop	r31
	pop	r30
	reti

Two global variables are defined: timer_a_scale and timer_b_scale, discussed in startup.c.

Everytime this ISR is fired, it will add the scales timer_a_scale and timer_b_scale to PWM intensity task_a and task_b; then, set the mailbox task_mailbox (declared in task.h) to request task.c to update the PWM peripherals.

Assemble with: avr-as timer.asm -mmcu=attiny44 -o timer.o

`make.ld` - Linker Script


linker.ld

MEMORY {
	my_code(rx): ORIGIN = 0x00, LENGTH = 4K
	my_data(rw): ORIGIN = 0x800060, LENGTH = 256
}

SECTIONS {
	. = ORIGIN(my_code);
	.text : {
		vector.o(.text)
		*(.text)
	} >my_code
	
	. = ORIGIN(my_data);
	.data : {
		*(.data)
	} >my_data
}

Put the .data segments and the .text segments from all the inputs file together. We specifically tell the linker to place the vector table from vector.o at the beginning of the program space, to allow the hardware to jump to correct address after reset and at interrupst.

Link with: avr-ld -T make.ld vector.o startup.o timer.o task.o -o app.elf

The generated executable app.elf is ready to be burned into MCU.

AVR Linker - Manual Link, Bare Metal and Mixed Source Code

Index

C Language Program - avr-hello.c

Vector Table

Initialization

Entry Point

Bad ISR Handler

Main Function

End

Static Data

Assembly Language Program - avr-hello.asm

The Assembly Source Code

The Assembled Machine Code

Symbol Table

Assembled is NOT Finished nor Executable

Linker

Object File for Machine-specific IO Register Addresses

Linker Script

The Final Executable

Baremetal C

C Source Code

Data in Program

Access IO Registers

Static Variables

Function Call and Entry Point

Compile No Link

Child Function

Main Function

Manual Link

Mixing C and Assembly Code

vector.asm - Assembly code specifying the vector table

startup.c - C Function to Initialize the IO Registers and Peripherals

task.c - C Function to Modify the PWM Signals

timer.asm - Assembly Timer Overflow ISR to Update PWM Duty Cycle

make.ld - Linker Script

`vector.asm` - Assembly code specifying the vector table

`startup.c` - C Function to Initialize the IO Registers and Peripherals

`task.c` - C Function to Modify the PWM Signals

`timer.asm` - Assembly Timer Overflow ISR to Update PWM Duty Cycle

`make.ld` - Linker Script