AVR Linker - Manual Link, Bare Metal and Mixed Source Code
This article discusses how to write assembly code using the GNU avr-as assembler for AVR microprocessor and how to link the object files into AVR executables. Furthermore, it shows how to write a bare metal C program with manual link, and how to write a program with mixed C and assembly code for AVR.
AVR-GCC, AVR-AS, AVR-LD, C language, Assembly language, linker, disassembly, compiler, ABI, AVR, ISA, Application Binary Interface, Instruction Set Architecture, Bare metal, embedded
--by Captdam @ Nov 8, 2025Index
I used to rely on GCC (or other compilers for specific machines) to compile source code into executable files without worrying about intermediate object files and linking processes. Recently, a project required me to build programs and place the code units into specific memory regions for proper initialization; hence leading me to dive into the linking process.
I choose to start with AVR-GCC linker, not only because I am familiar with the AVR architecture; but also because microprocessors have simpler memory structures than PCs, which allows me to clearly inspect the effect of the linking process.
I decided to post my notes online so I can refer to them as a template in future projects. Furthermore, it seems that most resources online are for Atmel Studio’s avrasm2.exe instead of the GNU avr-as, and I spent days digging documents and BBS to find references and solutions. If you are having trouble finding information about the GNU avr-as, you may find this article helpful.
C Language Program - avr-hello.c
Let's start from a simple C code:
main.c
#include <stdint.h>
volatile uint8_t my_data[4] = {2, 3, 5, 7};
uint8_t main() {
volatile uint8_t my_var = my_data[1];
for(;;);
}
In this example, we load an element from an array.
We use volatile keyword to prevent the compiler from optimizing the variable:
- The compiler will try to store variables in registers instead of memory, because registers are faster. For some CPUs including AVR, the CPU cannot directly access memory. Data must be loaded into a general purpose register (GPR) before use. Some CPUs, like the 68HC11, allow direct access to memory, but suffer CPU clock. Therefore, it is preferred to use registers instead of memory for speed and space purposes.
- If the variable is not used, the compiler will delete that variable and all related compute to save CPU cycles.
avr-gcc -mmcu=atmega328 -O0 main.c -o main.o
avr-objcopy -O ihex main.o main.hex
avr-objdump -D main.hex -m avr5
Compile, create flash file and disassemble the flash file.
We specify -O0 for no optimization to force the compiler translate C source code into machine code line by line, so we can examine the result. Otherwise, the compiler may do a lot of tricks to make the generated code efficient, but extremely hard to analyze.
Following shows the content of the flash file. Let’s divide it into segments:
Vector Table
0: 0c 94 34 00 jmp 0x68 ; 0x68
4: 0c 94 49 00 jmp 0x92 ; 0x92
8: 0c 94 49 00 jmp 0x92 ; 0x92
c: 0c 94 49 00 jmp 0x92 ; 0x92
10: 0c 94 49 00 jmp 0x92 ; 0x92
14: 0c 94 49 00 jmp 0x92 ; 0x92
18: 0c 94 49 00 jmp 0x92 ; 0x92
1c: 0c 94 49 00 jmp 0x92 ; 0x92
20: 0c 94 49 00 jmp 0x92 ; 0x92
24: 0c 94 49 00 jmp 0x92 ; 0x92
28: 0c 94 49 00 jmp 0x92 ; 0x92
2c: 0c 94 49 00 jmp 0x92 ; 0x92
30: 0c 94 49 00 jmp 0x92 ; 0x92
34: 0c 94 49 00 jmp 0x92 ; 0x92
38: 0c 94 49 00 jmp 0x92 ; 0x92
3c: 0c 94 49 00 jmp 0x92 ; 0x92
40: 0c 94 49 00 jmp 0x92 ; 0x92
44: 0c 94 49 00 jmp 0x92 ; 0x92
48: 0c 94 49 00 jmp 0x92 ; 0x92
4c: 0c 94 49 00 jmp 0x92 ; 0x92
50: 0c 94 49 00 jmp 0x92 ; 0x92
54: 0c 94 49 00 jmp 0x92 ; 0x92
58: 0c 94 49 00 jmp 0x92 ; 0x92
5c: 0c 94 49 00 jmp 0x92 ; 0x92
60: 0c 94 49 00 jmp 0x92 ; 0x92
64: 0c 94 49 00 jmp 0x92 ; 0x92
When a specific event happens, the CPU will jump to the address in the vector table for that event, providing that specific interrupt is enabled.
In the above example, the first interrupt vector at address 0x0000 is for reset. After reset and hardware internal initialization, the CPU jumps to address 0x0000 (the reset vector). The instruction at that vector is then executed, which in the example, causes the CPU to jump to address 0x0068.
From 0x0004 to 0x0064 are vectors for other ISRs (such as timer overflow, ADC finish, UART sent). Since we didn’t specify any of those ISR in our C code, the compiler puts jmp 0x0092 for us. This causes the CPU to jump to a special handler, the __bad_interrupt. If something goes wrong (for example, mistakenly enabling a timer overflow interrupt without specifying its handler), the CPU will jump to the __bad_interrupt.
The __bad_interrupt served as the last effort. It will eventually reset the software, like we reboot our computer if blue screened. This prevents the CPU executing “dangerous” program.
Different architecture comes with different peripherals, allowing different functionalities; hence different vector tables. Furthermore, size for each vector can be different depending on the program space, because smaller CPUs can use 2-byte rjmp instruction to cover the entire program space; large CPUs must use 4-byte jmp to cover the entire program space.
Despite the architecture, the first vector is always the reset vector. If interruption is not needed, we can omit the vector table.
Initialization
68: 11 24 eor r1, r1
6a: 1f be out 0x3f, r1 ; 63
6c: cf ef ldi r28, 0xFF ; 255
6e: d8 e0 ldi r29, 0x08 ; 8
70: de bf out 0x3e, r29 ; 62
72: cd bf out 0x3d, r28 ; 61
Above is to prepare the AVR GCC environment.
Clear register R1. AVR-GCC uses R1 as zero register (the value in this register is always 0, it helps with arithmetic operations such as comparing with 0 and adding with 0).
Reset SREG (Status register, IO address 0x3F).
Set SP (stack pointer, IO address 0x3E:0x3D) to top of SRAM (0x08FF for ATmega328).
This section is hardcoded in the AVR-GCC library. AVR-GCC directly copied it from toolchain/avr8/avr8-gnu-toolchain/avr/lib/avr5/crtatmega328.o:.init2, see detail at AVR-LibC Memory Sections . This can be verified with command avr-objdump -d crtatmega328.o.
74: 11 e0 ldi r17, 0x01 ; 1
76: a0 e0 ldi r26, 0x00 ; 0
78: b1 e0 ldi r27, 0x01 ; 1
7a: ec ea ldi r30, 0xAC ; 172
7c: f0 e0 ldi r31, 0x00 ; 0
7e: 02 c0 rjmp .+4 ; 0x84
80: 05 90 lpm r0, Z+
82: 0d 92 st X+, r0
84: a4 30 cpi r26, 0x04 ; 4
86: b1 07 cpc r27, r17
88: d9 f7 brne .-10 ; 0x80
After reset, the RAM for data memory is either empty (from power-on) or contaminated (from watchdog reset or bad interrupt handler). However, when we write our code, we expect the defined variables to have the defined value (in this example, the static array my_data[4]); and the declared variables to be 0. AVR-GCC did the initialization for us. Before it enters the user main() function, it puts the default value (the value defined by us using = {2, 3, 5, 7}) in ROM and copies them into RAM at the initialization stage.
In this example, the value is saved in flash (program) address 0x00AC and the variable is located at RAM (data) address 0x0100, length is 4 bytes. The following task is performed:
for addrRAM = 0x0100 to 0x0103, addrProg = 0x00AC to 0x00AF:
RAM[addrRAM ] ← Prog[addrProg]
Entry Point
8a: 0e 94 4b 00 call 0x96 ; 0x96
8e: 0c 94 54 00 jmp 0xa8 ; 0xa8
Initialization finished. Call the user supplied main function at address 0x0096. Once the main function returns, jump to library supplied dead loop at address 0x00A8.
Bad ISR Handler
92: 0c 94 00 00 jmp 0 ; 0x0
The bad ISR handler, which re-initialize the MCU.
Main Function
uint8_t main() {
volatile uint8_t my_var = my_data[1];
for(;;);
}
96: cf 93 push r28
98: df 93 push r29
9a: 1f 92 push r1
9c: cd b7 in r28, 0x3d ; 61
9e: de b7 in r29, 0x3e ; 62
a0: 80 91 01 01 lds r24, 0x0101 ; 0x800101
a4: 89 83 std Y+1, r24 ; 0x01
a6: ff cf rjmp .-2 ; 0xa6
Finally, our main function.
As the AVR_GCC ABI specified, call-saved registers (in this example, R28 and R29) must be saved if clobbered. Therefore, they are pushed into the stack at the beginning of the function. This register pair is then used as a frame pointer.
Local variable uint8_t my_var is stored in stack, more specifically, the function frame.
Recall: A variable declared outside function (declared globally) is static and will be saved in RAM at a compile-time determined, fixed address in the RAM. A variable declared inside a function is local and will be saved in stack (function frame) at function initialization. A frame pointer is used to access data (local variables) in the frame.
AVR Stack pointer points to the next available slot and grows downwards. AVR doesn’t support stack pointer addressing; we must use an index register (X, Y or Z) to build a stack pointer to access data in RAM. Furthermore, AVR index registers don’t support negative offset; therefore, the frame pointer must point to the lowest position and use positive offset to address stack variables. To do so, we first decrease the stack pointer by 1, then copy SP into index register Y (R29:R28).
Load my_data[4] and store it in a local variable my_var. Since my_data[4] is a static variable and its address is fixed (known at compile time), we will use direct addressing lds Rd, absolute_address to load this value. Local variable my_var is local to the function and it is in the stack, so the address is not known at compile time; therefore, we have to use indexed addressing std, Y+positive_offset, Rs via the frame pointer to address it.
Here is a trick. To allocate a frame in the stack, simply grow the stack (move the SP down). AVR-GCC will use shortest instructions with no harmful side effects:
In this example, AVR-GCC uses push r1 to simply move SP down by 1. Although this instruction saves the value in R1 into the stack, it is a side effect with no harm. The actual needed data (our local variable) will override it later.
AVR-GCC may use rcall 0 (relative call) to allocate (move down) 2 bytes with only 1 instruction word. This instruction is used to call a routine. It will push the current PC into the stack, since the PC is 2 bytes long, it allocates 2 bytes of space; then jump to the routine which is 0 words away from the current address. The side effect is the PC will be modified by adding 0, which is nothing.
AVR-GCC may use sbiw r28, k to allocate (move down) k bytes. This method is used when more than 2 bytes are required.
AVR-GCC adds 0x800000 to the data address. This helps seperate the data in data space (in RAM) from program code in program space (in Flash). When burning the program to MCU, the higher bits will be truncated.
End
a8: f8 94 cli
aa: ff cf rjmp .-2 ; 0xaa
Dead loop as mentioned above.
Static Data
ac: 02 03 mulsu r16, r18
ae: 05 07 cpc r16, r21
Our data array. As mentioned above, initial values of variables are saved in program space and copied into data space at initialization. This is data, the disassembled opcode has no meaning.
Assembly Language Program - avr-hello.asm
The Assembly Source Code
Now, let’s try the same example using assembly code:
We will be using the GNU assembler avr-as. The assembly editor of Atmel Studio uses avrasm2.exe. They use DIFFERENT syntax and they interpret addresses DIFFERENTLY. DO NOT BE CONFUSED!
main.asm
Description
.data
my_array: .space 4
my_array_end:
In this assembly program, we have 2 segments: the data segment for data memory in the RAM and the text segment for the program in the flash.
In the data segment, we first create a symbol my_array for the static variable and allocate 4 bytes of space. We also declare a symbol my_array_end immediately following my_array, so, my_array_end‘s address will be the address of my_array plus the size of my_array. This helps us to get the size of my_array, example: size_t size = my_array_end - my_array and for (void* ptr = my_array; ptr < my_array_end; ptr++).
We only declare static variables in the data segment. Declaring variables in the data segment allocates the variables a fixed address in the data memory. Local variables are stored in the stack and its address can change as the frame position of its parent function changes. Therefore, we should not declare local variables in the data segment.
Space is always in the unit of bytes. When writing assembly code, we should know the size of our data type: such as char for 1 byte, int for 2 bytes (and yes, most 8-bit systems including AVR-GCC set int for 2 bytes!), and long for 4 bytes.
.text
my_routine:
ldi r16, 0x08
out SPH, r16
ldi r16, 0xFF
out SPL, r16
Here comes our program, the text segment.
First, initialize the stack pointer (SP) to the top of ATmega328 RAM, which is 0x08FF.
ldi r31, hi8(my_array_flash)
ldi r30, lo8(my_array_flash)
ldi r29, hi8(my_array)
ldi r28, lo8(my_array)
ldi r27, hi8(my_array_end)
1: lpm r0, Z+
st Y+, r0
cpi r28, lo8(my_array_end)
cpc r29, r27
brne 1b
In C language, AVR-GCC has placed the defined value into the static variable my_array for us; in assembly code, we have to do this ourselves.
We will use index register Z (R31:R30) as a read pointer to the program space, pointing to the start of the defined data saved in program space my_array_flash. We also use index register Y (R29:R28) as a write pointer to the data space, pointing to the start of the static variable in data space my_array.
A for loop is used to copy the data, using the special lpm (load from program space) instruction. Write starting from the symbol my_array, until but exclude my_array_end.
Watch for the syntax difference. GNU avr-as uses hi8() and lo8() to take the higher and lower byte of a multi-byte value; Atmel Studio avrasm2.exe uses HIGH() and LOW().
push r0 ; Allocate stack
in r29, SPH
in r28, SPL
lds r0, my_array + 1
std Y + 1, r0
Use the push instruction to move SP down for 1 byte, this will allocate the stack space for local variable my_var.
Copy the SP to index register Y to build a frame pointer.
Load the data my_data[1] into a temprorial register R0. Then, use frame pointer to write this data into local variable my_var in stack. Because my_data is static, its address is known at compile-time; we can encode its address in the instruction using direct addressing mode lad Rd, address.
loop: jmp loop
Dead loop to stop the program.
my_array_flash: .byte 2, 3, 5, 7
Initial value for the static variable saved in program space.
In AVR-GCC, the main() function is the same as other functions, it follows the same rule for all functions: preserve all call-save registers (in this example, R28 and R29 used as index register Y). Although the main() function will never return, the compiler still makes sure these registers can be restored at the end.
Technically, in C, although we are taught to write a dead loop at the end of the main() function, we don’t really need to do so. We can simply return, and the library supplied dead loop will be executed.
In assembly, we have more control. We know we are not going to return from our main function, (in this example, the my_routine). There is no need to preserve the call-saved registers.
Furthermore, if we are gonna write the entire program ourselves in assembly, we don’t need to worry about the ABI. We can define our own rule for call-save and call-used registers.
The Assembled Machine Code
avr-as -m avr5 main.asm -o main.o
avr-objdump -Dx main.o
Assemble. We specify the device family by using -m family.For this example, ATmega328 is in avr5 family. Some instructions like jmp are only available for certain families. Without specifying the family, AVR-AS will use the default family which may not support certain instructions; hence, the assembler will report an error.
Disassemble. The -D and -x flags for avr-objdump mean disassemble and symbol table, respectively.
Following shows the content of the disassemble: (for your convenience, source code on the side for reference)
Disassembly of section .text:
00000000 <my_routine>:
0: 08 e0 ldi r16, 0x08 ; 8
2: 00 b9 out 0x00, r16 ; 0
4: 0f ef ldi r16, 0xFF ; 255
6: 00 b9 out 0x00, r16 ; 0
8: f0 e0 ldi r31, 0x00 ; 0
a: e0 e0 ldi r30, 0x00 ; 0
c: d0 e0 ldi r29, 0x00 ; 0
e: c0 e0 ldi r28, 0x00 ; 0
10: b0 e0 ldi r27, 0x00 ; 0
12: 05 90 lpm r0, Z+
14: 09 92 st Y+, r0
16: b0 30 cpi r28, 0x00 ; 0
18: cb 07 cpc r29, r27
1a: 01 f4 brne .+0 ; 0x1c <my_routine+0x1c>
1c: 0f 92 push r0
1e: d0 b1 in r29, 0x00 ; 0
20: c0 b1 in r28, 0x00 ; 0
22: 00 90 00 00 lds r0, 0x0000 ; 0x800000 <my_array_flash+0x7fffd4>
26: 09 82 std Y+1, r0 ; 0x01
00000028 <loop>:
28: 0c 94 00 00 jmp 0 ; 0x0 <my_routine>
0000002c <my_array_flash>:
2c: 02 03 mulsu r16, r18
2e: 05 07 cpc r16, r21
Disassembly of section .data:
00000000 <my_array>:
0: 00 00 nop
...
.text
my_routine:
ldi r16, 0x08
out SPH, r16
ldi r16, 0xFF
out SPL, r16
ldi r31, hi8(my_array_flash)
ldi r30, lo8(my_array_flash)
ldi r29, hi8(my_array)
ldi r28, lo8(my_array)
ldi r27, hi8(my_array_end)
1: lpm r0, Z+
st Y+, r0
cpi r28, lo8(my_array_end)
cpc r29, r27
brne 1b
push r0 ; Allocate stack
in r29, SPH
in r28, SPL
lds r0, my_array + 1
std Y + 1, r0
loop: jmp loop
my_array_flash: .byte 2, 3, 5, 7
.data
my_array: .space 4
my_array_end:
As we can see, some information is missing. For example:
- At disassembled program address
0x0002, instructionout SPL, r16should be converted toout 0x3D, r16, as the IO address ofSPLis0x3D; however, we gotout 0x00, r16in the disassembly. Same happened forSPHin the second next instruction. - At disassembled program address
0x0008, instructionldi r31, hi8(my_array_flash)should be converted toldi r31, 0x002Csince the address ofmy_array_flashis0x002C. Same happened formy_arrayandmy_array_endin the following instructions.
Symbol Table
Now, let’s take a look at the symbol table.
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l .data 00000000 my_array
00000004 l .data 00000000 my_array_end
00000000 l .text 00000000 my_routine
0000002c l .text 00000000 my_array_flash
00000028 l .text 00000000 loop
All symbols declared in the assembly code are local to that file. For example, a line in that symbol table shows:
00000004 l .data 00000000 my_array_end
The symbol named my_array_end has a local offset of 0x00000004 in the .data segment.
This local offset address is used to inform the linker where is that symbol. It cannot be used in the final instruction, because the CPU expect the absolute address of that symbol.
00000000 *UND* 00000000 SPH
00000000 *UND* 00000000 SPL
For symbols not declared (for example, SPL), the symbol table specifies UND for unknown.
In C program, extern keyword is required if a function or a variable is not declared. It tells the compiler that, the symbol is declared externally, it can be resolved later. More specifically, it can be linked later, discussed in next section.
So, the compiler won’t complain about not finding the symbol in the current source file.
Normally, we should add .extern keywords. The GNU Assembler gnu-as or gas is a little bit different, it treats all undefined symbols as external.
Assembled is NOT Finished nor Executable
Why does the disassembled code show all the address zero?First, we need to know what the GNU assembler avr-as is doing. It translates the human-readable mnemonic in assembly source code file into binary machine code, one file at a time.
The assembler creates .data segments and .text segments based on the input source code. However, it has no idea where the .text segment and .data segment should be placed in the actual machine. Therefore, the assembler cannot determine the absolute address for them.
Furthermore, multiple object files can be put together. Absolute addresses for .text segment and .data segment will cause conflict between object files. Again, assembler assembles one file at a time. When it assembles object file A, it cannot make any arrangement for object file B.
Therefore, the assembler will not put the absolute address. Instead, it will:
- Put a placeholder for symbols. The placeholder can be resolved when linking (the next stage) all object files into executable. At this stage, the linker has all the information to arrange addresses for symbols from different object files.
- Using relative addresses for internal operations. The assembler knows the size of some symbols; hence, it can use the size to calculate the relative offset of symbols inside the code block. This allows the linker to move the code block around.
Linker
The assembly is only the first half of the job. We will need to link the object files to create the final executable file. As in previous section of this article, we have to address the following problems:
- What is the memory structure? Different microprocessors have different memory structures. For example, the data memory of ATmega328 starts at
0x0100and ends at0x08FF; on the other hand, the data memory of ATtiny25 starts at0x0060and ends at0x00EF. Should we place the.datasegment at0x0100or at0x0060? - What is the address of the IO registers? Different machines come with different peripherals; hence, different IO register addresses.
- Where to place the
.textsegment? This includes program code, vector table, and program data. Furthermore, some MCUs (like the ATmega328) have a special bootloader region that has different programming properties than regular program regions. The size and starting address of the bootloader region can be adjusted by fuse as well. For those MCUs, it is important to assign.textsegments into the proper region. - When multiple object files (or multiple sections) are provided, how to order them? Most importantly, which program code should be executed first?
The tool avr-gcc is a combination of compile and link. We will get an error if:
- We didn’t specify the device using
-mmcu=xxx. AVR-GCC needs to know the machine name to find the memory layout and addresses for IO registers. - We used an undefined symbol (variable or function name) in our code. A symbol must be defined so AVR-GCC can resolve its address.
- We didn’t define the
main()function. AVR-GCC uses this name as the default entry point; that is, the first program to execute after initialization.
We can use gcc -c to compile without linking. This eliminates the above error; but the generated cannot be executed before link.
Object File for Machine-specific IO Register Addresses
Let’s first provide the unsolved symbols. To do so, we create a assembly file:
m328.asm
.global SPL
.equ SPL , 0x3D
.global SPH
.equ SPH , 0x3E
In this example, we use .equ to assign absolute values (the absolute IO space addresses) to the symbols.
We also make them .global. This expose the given symbols to the linker. So, other objects files can reference them as .extern symbols at linking.
It may be a good idea to create an object file containing the addresses of all IO registers for the given microprocessor, as a stand-alone machine library. In this way, we can use the object file to resolve IO registers addresses in future projects.
avr-as m328.asm -o m328.o
avr-objdump m328.o -Sx
Assemble. Then, disassemble the object file to check its content:
SYMBOL TABLE:
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
0000003d g *ABS* 00000000 SPL
0000003e g *ABS* 00000000 SPH
As we can see, the absolute IO space address of the symbols are now globally available.
Linker Script
Now, let’s create the linker script to instruct the linker how to link the object files. In this example, we only consider the .text segment for program code and .data segment for static data. Fuse and EEPROM are out of this article’s scope.
In this section, I will use name convention like my_function to show that the specific field can have an arbitrary name.
main.ld
MEMORY {
my_code(rx): ORIGIN = 0x00, LENGTH = 32K
my_data(rw): ORIGIN = 0x800100, LENGTH = 2K
}
SECTIONS {
. = ORIGIN(my_code);
.text : {
*(.text)
} >my_code
. = ORIGIN(my_data);
.data : {
*(.data)
} >my_data
}
First, we specify the MEMORY structure according to microprocessor memory size and structure:
- We call the program memory
my_code, it is readable and executable. Its address starts from0x0000and its length is32Kbytes. - We call the data memory
my_data, it is readable and writable. Its address starts from0x800100and its length is2Kbytes.
Length is in bytes.
Linker put memories in different regions according to its attributes (readable, writable and executable), such as .rodata for read-only data. However, this rule doesn’t apply to our example. We are directly placing .text and .data in a simple hardware. We can specify both my_code and my_data read-only and it will not change the behaviour of our program at all.
We add an offset of 0x800000 to the data segment. This is not mandatory. We do this to align with AVR-GCC’s memory layout. This offset will be truncated because AVR's address space is limited (only lower bits count).
Next, we tell the linker how to place the object files into SECTIONS:
- Starting from the origin of
my_code(address0x0000) is the.textsegment. Place.textsegments from all input object files into this section. This section will be assigned tomy_code. - Starting from the origin of
my_data(address0x800100) is the.datasegment. Place.datasegments from all input object files into this section. This section will be assigned tomy_data.
Because there is only one object file with program code, and the offset of our my_code section is 0, the absolute address of each instruction will be the same as their local address in that object file.
After reset, the hardware start the execution at program address 0x0000; therefore, the entry point will be the first instruction in that object file.
The Final Executable
avr-ld -T main.ld m328.o main.o -o app.elf
avr-objdump -Dx app.elf
Link. Disassemble to examine the result:
SYMBOL TABLE:
00000000 l d .text 00000000 .text
00000030 l d .trampolines 00000000 .trampolines
00800100 l d .data 00000000 .data
00000000 l df *ABS* 00000000 main.o
00800100 l .data 00000000 my_array
00800104 l .data 00000000 my_array_end
00000000 l .text 00000000 my_routine
0000002c l .text 00000000 my_array_flash
00000028 l .text 00000000 loop
0000003d g *ABS* 00000000 SPL
0000003e g *ABS* 00000000 SPH
In the symbol table, all symbols have proper values now. For example:
00800104 l .data 00000000 my_array_end
The symbol named my_array_end is now at address 0x00800104.
Furturemore, undefined symbols for IO registeres are now defined globally and absolutely. For example:
0000003d g *ABS* 00000000 SPL
The symbol named SPL is now at address 0x3D.
Disassembly of section .text:
00000000 <my_routine>:
0: 08 e0 ldi r16, 0x08 ; 8
2: 0e bf out 0x3e, r16 ; 62
4: 0f ef ldi r16, 0xFF ; 255
6: 0d bf out 0x3d, r16 ; 61
8: f0 e0 ldi r31, 0x00 ; 0
a: ec e2 ldi r30, 0x2C ; 44
c: d1 e0 ldi r29, 0x01 ; 1
e: c0 e0 ldi r28, 0x00 ; 0
10: b1 e0 ldi r27, 0x01 ; 1
12: 05 90 lpm r0, Z+
14: 09 92 st Y+, r0
16: b4 30 cpi r28, 0x04 ; 4
18: cb 07 cpc r29, r27
1a: d9 f7 brne .-10 ; 0x12 <my_routine+0x12>
1c: 0f 92 push r0
1e: de b7 in r29, 0x3e ; 62
20: cd b7 in r28, 0x3d ; 61
22: 00 90 01 01 lds r0, 0x0101 ; 0x800101 <my_array+0x1>
26: 09 82 std Y+1, r0 ; 0x01
00000028 <loop>:
28: 0c 94 14 00 jmp 0x28 ; 0x28 <loop>
0000002c <my_array_flash>:
2c: 02 03 mulsu r16, r18
2e: 05 07 cpc r16, r21
Disassembly of section .data:
00800100 <my_array>:
800100: 00 00 nop
...
In the disassembled code, correct addresses are inserted. For example:
16: b4 30 cpi r28, 0x04 ; 4
is assembled from cpi r28, lo8(my_array_end). The lower byte of the symbol my_array_end is 0x04, which can be seen in the instruction.
Baremetal C
C Source Code
In this C language example, we will perform the initialization ourselves. Let’s consider the following C program:
m328.c
#include <stdint.h>
#include <avr/io.h>
#include <avr/pgmspace.h>
volatile const uint8_t my_array[] PROGMEM = {2, 3, 5, 7};
volatile uint8_t some_data1, some_data2;
void __attribute__((noinline)) setOutputPins() {
volatile uint8_t dir = 0b11111111;
DDRB = dir;
}
void __attribute__((naked)) something() {
setOutputPins();
some_data1 = pgm_read_byte(&(my_array[2]));
for(;;);
}
Data in Program
For defined variables, the initial values are saved in the program space and copied into data space during initialization. In AVR-GCC, the compiler did this for us; Now, we want to do this ourselves.
To store a data in memory space, we will need to use the special PROGMEM keyword from the AVR library avr/pgmspace.h:
volatile const uint8_t my_array[] PROGMEM = {2, 3, 5, 7};
To read a data from the program space, use:
dest = pgm_read_byte(&(program_data));
Access IO Registers
To access the IO register of the microprocessor, we will need to include the AVR library avr/io.h.
When we compile the code using avr-gcc -mmcu=xxx, the compiler will find the IO definition file for that specific MCU. This file contains the addresses and bit definition of all IO registers.
Static Variables
We also declare 2 static variables. C compilers place variables in data space by default; therefore, we do not need to specifically tell the compiler to do so.
Function Call and Entry Point
To simulate function call and stack operation during function invocation, we create a function void setOutputPins(). This function will load the IO direction into a local (stack) variable first; then, write that into the IO register. The compiler may inline our function for optimization purposes. To prevent that, which is to preserve the purpose of this example: testing the function call, we put the noinline attribute on this function.
We are not gonna name our main function with the default entry point name main(); instead, we are gonna call it something(). We will make this function the entry point in the linker script. Furthermore, because this function is the top function, there is no need to preserve the stack when entering this function (On the other hand, any callee function must preserve the stack for the caller because the caller doesn’t expect any change in its stack by the callee). Therefore, we put the naked attribute to strip the function prologue and epilogue. In this function, we will call the child function, read a value from program space into a static variable in data space.
Compile No Link
avr-gcc -c main.c -O3 -ffunction-sections -fdata-sections -o main.o -mmcu=atmega328
avr-objdump -Dx main.o
Now, compile this program with the following flags:
-cmeans no link, compile only. We will manually link later.-O3allows the compiler to perform some optimization so the generated code won’t be too stupid to read.-ffunction-sectionsand-fdata-sectionscreates sections for each function and data variables, respectively. This helps us to specify functions and data variables in the linker script.- Although we will link this program later manually, we do like the convenience that AVR-LibC can provide the address of IO registers of a specific microprocessor. We include the header
avr/io.hin our source code and specify the device using-mmcu=atmega328option in the command. Furthermore, this enables the compiler to use some machine-specific instructions to increase performance, such asMULfor multiplication.
Then, disassemble to examine the result: (for your convenience, source code on the side for reference)
Child Function
Disassembly of section .text.setOutputPins:
00000000 <setOutputPins>:
0: cf 93 push r28
2: df 93 push r29
4: 1f 92 push r1
6: cd b7 in r28, 0x3d ; 61
8: de b7 in r29, 0x3e ; 62
a: 8f ef ldi r24, 0xFF ; 255
c: 89 83 std Y+1, r24 ; 0x01
e: 89 81 ldd r24, Y+1 ; 0x01
10: 84 b9 out 0x04, r24 ; 4
12: 0f 90 pop r0
14: df 91 pop r29
16: cf 91 pop r28
18: 08 95 ret
void __attribute__((noinline)) setOutputPins() {
volatile uint8_t dir = 0b11111111;
DDRB = dir;
}
To access a variable in stack space, we must use an index register as a frame pointer. In this example, index register Y (R29:R28) is used. Since this will clobber the original content of the index register, we must preserve its content by pushing them into the stack at the beginning of this function.
Push a random value into the stack to allocate 1-byte space for local variable uint8_t dir.
Copy the stack pointer SP to index register Y to create the frame pointer.
Use immediate addressing mode to load a value 0b11111111 into a temporal register R24; then, write that value into a local variable via frame pointer. In AVR-GCC ABI, R24 is a call-used register; hence, it doesn’t need to be preserved.
Load that variable dir back from the stack into register R24; then, write to the IO register DDRB. Note, in the disassembled code, the IO register DDRB has the correct address 0x04, because we included the avr/io.h header and compiled it against the specified microprocessor name.
Deallocate the stack, restore the original value of the index register Y, and return from this function.
Main Function
Disassembly of section .text.something:
00000000 <something>:
0: 0e 94 00 00 call 0 ; 0x0 <something>
0: R_AVR_CALL .text.setOutputPins
4: e0 e0 ldi r30, 0x00 ; 0
4: R_AVR_LO8_LDI .progmem.data+0x2
6: f0 e0 ldi r31, 0x00 ; 0
6: R_AVR_HI8_LDI .progmem.data+0x2
8: e4 91 lpm r30, Z
a: e0 93 00 00 sts 0x0000, r30 ; 0x800000 <__SREG__+0x7fffc1>
c: R_AVR_16 some_data1
e: 00 c0 rjmp .+0 ; 0x10 <__zero_reg__+0xf>
e: R_AVR_13_PCREL .text.something+0xe
void __attribute__((naked)) something() {
setOutputPins();
some_data1 = pgm_read_byte(&(my_array[2]));
for(;;);
}
Call the function setOutputPins(). Note, we have yet to link; so, the function address in program space is unknown. Placeholder with address zero is inserted.
Use index register Z (R31:R30) to load a value from program space. Again, we have yet to link; so, the data address in program space is unknown. Placeholder with address zero is inserted.
Write that value to a static variable using direct addressing mode instruction sts addr, Rs. Again, we have yet to link; so, the data address in data space is unknown. Placeholder with address zero is inserted.
At the end, a dead loop. Annnnnnnnnnnd again, we have yet to link; so, the instruction address of the dead loop (the current instruction) in program space is unknown. Placeholder with address zero is inserted. In fact, this one can be resolved using relative addressing now; but the compiler decide to leave it for the linker.
Manual Link
Now, let's create our linker script:
main.ld
MEMORY {
my_code(rx): ORIGIN = 0x00, LENGTH = 32K
my_data(rw): ORIGIN = 0x800100, LENGTH = 2K
}
SECTIONS {
. = ORIGIN(my_code);
.text : {
main.o(.text.something)
*(.text)
} >my_code
. = ORIGIN(my_data);
.data : {
*(.data)
} >my_data
}
This is exactly the same as the linker script we used in the previous example, except the additional line: main.o(.text.something). We instruct the linker to place mian.o(.text.something) at the beginning of the text segment. In other words, function something() from .text segment in object file main.o should be placed at program space address 0x0000. That is, after reset, the microprocessor starts executing the program from this address. This eventually set main.o(.text.something) as the entry point.
avr-ld -T main.ld main.o -o app.elf
avr-objdump -Dx app.elf
Link. Disassemble to examine the result:
Disassembly of section .text:
00000000 <something>:
0: 0e 94 08 00 call 0x10 ; 0x10 <setOutputPins>
4: ec e2 ldi r30, 0x2C ; 44
6: f0 e0 ldi r31, 0x00 ; 0
8: e4 91 lpm r30, Z
a: e0 93 00 01 sts 0x0100, r30 ; 0x800100 <some_data1>
e: ff cf rjmp .-2 ; 0xe <__zero_reg__+0xd>
Disassembly of section .text.setOutputPins:
00000010 <setOutputPins>:
10: cf 93 push r28
12: df 93 push r29
14: 1f 92 push r1
16: cd b7 in r28, 0x3d ; 61
18: de b7 in r29, 0x3e ; 62
1a: 8f ef ldi r24, 0xFF ; 255
1c: 89 83 std Y+1, r24 ; 0x01
1e: 89 81 ldd r24, Y+1 ; 0x01
20: 84 b9 out 0x04, r24 ; 4
22: 0f 90 pop r0
24: df 91 pop r29
26: cf 91 pop r28
28: 08 95 ret
Disassembly of section .progmem.data:
0000002a <my_array>:
2a: 02 03 mulsu r16, r18
2c: 05 07 cpc r16, r21
Disassembly of section .bss:
00800100 <some_data1>:
...
00800101 <some_data2>:
...
As we can see, function something() is placed at the beginning of the text segment at program address 0x0000. Furthermore, correct addresses are assigned to symbols in something(). For example, call 0x10 on program address 0x0000 now has the correct address of routine setOutputPins() (0x0010).
Mixing C and Assembly Code
In the following example, we will build a project from a mix of both assembly code and C code. This is helpful when specific language is favored for different sections of the project. For example, we can C to write most parts of the project; use assembly for ISR (interrupt service routine) and time critical routines.
In this example, we will use 2 PWM signals to drive LEDs. A timer overflow is used to change the intensity of the LEDs (duty of PWM) periodically. We will have 4 source code files for this project:
vector.asm- An assembly code specifying the vector table.startup.c- A C function to initialize the IO registers and peripherals.task.c- A C function to modify the PWM signals.timer.asm- An assembly ISR to change the duty cycle of the PWM signals, executed periodically by timer overflow.
And 1 linker script:
make.ld- Linker script.
vector.asm - Assembly code specifying the vector table
vector.asm
.text
. = 0
rjmp startup
. = 5 * 2
rjmp timer
. = 8 * 2
rjmp timer
According to ATtiny24/44/84 [DATASHEET], reset vector is at program address 0x0000, timer 1 capture interrupt vector at 0x0005, timer 1 overflow vector at 0x0008. When a specific interrupt event is fired, the microprocessor hardware will jump to its associated vector. We place rjmp instruction at those locations to instruct the microprocessor to jump to the corresponding ISR.
The manual uses the word address, and each program word is 2-byte long. The GNU AVR assembler uses the byte address; therefore, we need to multiply the address by 2 when specify program space address in GNU avr-as.
Relative jump rjmp, which is 16 bits long (12-bit signed address), can address the entire program space for devices with less than 4K word of program memory, such as ATtiny25, ATtiny84. On these devices, each vector is 1-word long and uses rjmp instruction. On the other hand, devices with more than 4K program space, jump jmp, which is 32 bits long (22-bit address), can address the entire 4M words program space, such as ATmega328. On these devices, each vector is 2-word long and uses jmp instruction.
For example, after reset, the hardware will cause the CPU to go to program address 0x0000. The instruction we placed at program address 0x0000 is rjmp startup, which will cause the CPU to jump to the address represented by start. The actual address of start is unknown at now, it will be resolved at the linking stage.
Assemble with: avr-as vector.asm -mmcu=attiny44 -o vector.o
startup.c - C Function to Initialize the IO Registers and Peripherals
startup.c
#include <avr/io.h>
#include <avr/interrupt.h>
#include "task.h"
extern volatile uint8_t timer_a_scale, timer_b_scale;
void startup() {
SP = 0x015F; // Top of t44
asm("clr r1");
DDRA = 0b10000000; //Output on PA7 and PB2
DDRB = 0b0100;
TCCR0A = (2<<COM0A0) | (2<<COM0B0) | (3<<WGM00); // Fast PWM
TCCR0B = (1<<CS00); // clk/1
OCR0A = 0;
OCR0B = 0;
TCCR1B = (3<<WGM12) | (4<<CS10); // CTC on ICR1, clk / 8
ICR1 = 200;
TIMSK1 = (1<<ICIE1) | (1<<TOIE1); // Interrupt on overflow
timer_a_scale = 4;
timer_b_scale = 16;
task_mailbox = 0; // Clear mailbox
sei();
task_loop();
}
Recall, in initialization, the stack pointer should be set to top of stack, the zero register R1 should be cleared, and the defined static variables should be assigned with initial value. We performed so at the beginning of our startup() function.
To allow PWM output, we will need to set the associated pin to output mode (by default, all pins are input after reset to prevent short circuit). Futuremore, we will set up the timer peripherals to allow the hardware to generate PWM signals for us. In this example, we set time 0 to be in fast PWM mode and output PWM signal on both channels.
To allow periodic events, we will need to enable timer overflow interrupt on timer 1. In this example, we will let timer 1 run at 1/256 of CPU speed (which is 8kHz with 1MHz CPU). We set the top of the timer to be 200, which means the timer should overflow at a rate of 8kHz / 200 = 40Hz.
We introduce 2 external variables: timer_a_scale and timer_b_scale. They define how much to add to the light intensities (PWM duty cycles) at everytime the timer overflow. We use 4 for timer_a_scale and 16 for timer_b_scale; in other words, the intensity of LED A should be increased by 4 and the intensity of LED B should be increased by 16 everytime. Futuremore, because the MCU uses 8-bit wide registers for PWM duty cycles, PWM A will overflow (and wrap back) every 256 / 4 = 64 events; PWM B will overflow every 256 / 16 = 16 events. Because the timer overflows at the rate of 40Hz, we should see the intensity of LED A and LED B wrap back every 1.6s and 0.4s, respectively.
Clear mailbox variable task_mailbox. This variable is not defined in this file; AVR-GCC will treat it as external variable although not specified.
Finally, we enable global interrupt and go to the main loop of this program, thetask_loop().
Compile with: avr-gcc -c -O3 -mmcu=attiny44 startup.c -o startup.o
task.c - C Function to Modify the PWM Signals
task.h
#include <stdint.h>
volatile uint8_t task_a, task_b, task_mailbox;
void task_loop();
Declare 3 global variables: task_a, task_b and task_mailbox. task_a and task_b contain the new intensity of LEDs; task_mailbox is used to request update PWM peripherals using the new intensity values.
task.c
#include <avr/io.h>
#include "task.h"
void task_loop() {
for(;;) {
if (!task_mailbox)
continue;
OCR0A = task_a;
OCR0B = task_b;
task_mailbox = 0;
}
}
This function is a dead loop and should be consider the main task of the program.
If the mailbox is set, this function will update the value from task_a and task_b to PWM peripheral register OCR0A and OCR0B, which set new PWM duty cycles; then, clear the mailbox.
Otherwise, do nothing.
Compile with: avr-gcc -c -O3 -mmcu=attiny44 task.c -o task.o
timer.asm - Assembly Timer Overflow ISR to Update PWM Duty Cycle
timer.asm
.data
.global timer_a_scale
.global timer_b_scale
timer_a_scale: .space 1
timer_b_scale: .space 1
.text
.global timer
timer:
push r30
push r31
lds r31, timer_a_scale
lds r30, task_a
add r30, r31
sts task_a, r30
lds r31, timer_b_scale
lds r30, task_b
add r30, r31
sts task_b, r30
ldi r31, 0xFF
sts task_mailbox, r31
pop r31
pop r30
reti
Two global variables are defined: timer_a_scale and timer_b_scale, discussed in startup.c.
Everytime this ISR is fired, it will add the scales timer_a_scale and timer_b_scale to PWM intensity task_a and task_b; then, set the mailbox task_mailbox (declared in task.h) to inform task.h to update the PWM peripherals.
Assemble with: avr-as timer.asm -mmcu=attiny44 -o timer.o
make.ld - Linker Script
linker.ld
MEMORY {
my_code(rx): ORIGIN = 0x00, LENGTH = 4K
my_data(rw): ORIGIN = 0x800060, LENGTH = 256
}
SECTIONS {
. = ORIGIN(my_code);
.text : {
vector.o(.text)
*(.text)
} >my_code
. = ORIGIN(my_data);
.data : {
*(.data)
} >my_data
}
Put the .data segments and the .text segments from all the inputs file together. We specifically tell the linker to place the vector table from vector.o at the beginning of the program space, to allow the hardware to jump to correct address after reset and at interrupst.
Link with: avr-ld -T make.ld vector.o startup.o timer.o task.o -o app.elf
The generated executable app.elf is ready to be burned into MCU.