Machine Code Generated by AVR-GCC
In this article, a number of experiments are used to examine the output machine code generated by AVR-GCC. Description is given for possible reasons AVR-GCC generates machine code in this way.
AVR-GCC, C language, disassembly, compiler, ABI, AVR, ISA, Application Binary Interface, Instruction Set Architecture, Stack, Memory layout, Static variable
--by Captdam @ Nov 18, 2024The most basic - General AVR
Let's start from a simple C code:
avr-hello.c
volatile char a = 'A', b, c = 'C', d;
void main() {
volatile char z = 'Z';
}
In this example, we create 4 global variable a
, b
, c
and d
, 2 of them defined; and a local (function-scope) variable z
.
We use volatile
keyword to prevent the compiler from optimizing the variables. We will discuse this in detail later.
avr-gcc -O0 avr-hello.c -o avr-hello.o
avr-objdump -m avr2 -Dxs avr-hello.o
Compile the source code avr-hello.c
into object file avr-hello.o
. We specify -O0
for no optimization to force the compiler translate C source code into machine code line by line, so we can study the result. Otherwise, the compiler may do a lot of tricks and make the generated code extremely hard to analyze.
In general, we should use -mmcu
to specify the architecture (machine) we are gonna use so the compiler can look into the address of IO registers, memory size, and supported instruction set (ISA) of the given architecture. We didn’t do so; therefore, the compiler will use default architecture avr2
.
Disassemble all sections of the object file avr-hello.o
. The disassembler requires us to specify the architecture of the object file. We use -m avr2
to tell the disassembler the ISA; so, the disassembler can decode the machine code correctly.
To get all information we want, we use 3 flags -Dxs
:
-D
- Disassemble all sections. This will translate the machine code from binary to human-readable assembly code. Helpful for reading the text segment (program).-x
- Show all headers. This will give us the section list and the symbol table. Helpful for finding variables and functions in the object file.-s
- Show full content. This will display the content of all sections in binary format and ASCII format. Helpful for reading the content of data.
Following shows the content of the object file. Let’s divide it into segments:
Data in RAM - Static and Stack
volatile char a = 'A', b, c = 'C', d;
SYMBOL TABLE:
00800060 l d .data 00000000 .data
00800062 l d .bss 00000000 .bss
00800062 g O .bss 00000001 b
00800061 g O .data 00000001 c
00800063 g O .bss 00000001 d
00800060 g O .data 00000001 a
The above shows the partial content of the symbol table. There are 2 segments for static (global) variables: the .data
segment for initialized (we provided initial value in the source code) static variables and the .bss
segment for uninitialized static variables.
Not to be confused with static
variables in C. In this article, static
means the variable is allocated statically, its address is fixed in the data space and it is known at compile time. In contrast, static
in C has different meanings: global static
variable means visible in file-scope; in-function static
variable means allocated only once.
The first is the .data
segment, which starts at the beginning of the RAM. In this example (for avr2 family), the first 96 bytes of memory space are used by 32 general purpose registers (GPR R0 - R31) and 64 IO registers (Note: some of the addresses are not used). Therefore, the first slot in the data segment is at address 96
(Hex 0x60
).
After the .data
segment is the .bss
segment. In this example, we have 2 initialized static variables a
and c
, both are char
type and consume 1 byte of space; therefore, the size of the .data
segment will be 2 bytes. Using the start address of the .data
segment 0x60
plus the size of the .data
segment 0x02
, we can see the start address of the .bss
segment should be 0x62
.
Although we declared b
before c
, c
’s address is less than b
’s address in the symbol table. AVR-GCC changes the order of variables to group the variables into segments.
The static (global) and the stack (local) variables
For C, only the global and in-function static
variables are placed into the .data
segment or the .bss
segment. Their lifetime is through the entire duration of the program; therefore, they can be allocated at the start of the program, and they won’t be deallocated until the program ends. In other words, they will occupy that space forever. So, we are able to assign them a fixed address in the data space.
In contrast, in-function local variables cannot be allocated in the .data
segment or the .bss
segment. They are created and allocated when its parent function starts and destroyed when its parent function returns. They will be allocated in the stack. When a function is called, this function starts by allocating a region of memory in the stack (growing the stack pointer) to store its local variables. A function may be called at any time, the stack pointer at the time of call may be at any location; therefore, it is impossible to predicate the address of the in-function local variables at compile time. The only way to address them is to use the stack pointer to calculate their address on-the-fly.
This rule not only applies to AVR GCC, but also applies to PC.
Program in Flash - Initialize .data
segment
Disassembly of section .text:
00000000 <__ctors_end>:
0: 10 e0 ldi r17, 0x00 ; 0
2: a0 e6 ldi r26, 0x60 ; 96
4: b0 e0 ldi r27, 0x00 ; 0
6: e0 e4 ldi r30, 0x40 ; 64
8: f0 e0 ldi r31, 0x00 ; 0
a: 03 c0 rjmp .+6 ; 0x12 <__zero_reg__+0x11>
c: c8 95 lpm
e: 31 96 adiw r30, 0x01 ; 1
10: 0d 92 st X+, r0
12: a2 36 cpi r26, 0x62 ; 98
14: b1 07 cpc r27, r17
16: d1 f7 brne .-12 ; 0xc <__zero_reg__+0xb>
After reset, the content in RAM for data memory is undefined. The main()
function expects the static variables a
and c
already in memory with the specified initilial values. Therefore, it is required to prepare the values for it. To do so, AVR-GCC will put the default value of defined variables in ROM and copy them into RAM at the initialization stage.
In this example, the values are saved in program address 0x0040
(Why? See Data stored in program), the .data
segment is located at RAM address 0x0060
and end at RAM address 0x0062
(not included). Index register Z (R31:R30) is used to read data from program space using the special lpm
(Load program memory, this instruction only works with index register Z) instruction; index register X (R27:R26) is used to write data to data space. A for loop is used, starting from the beginning of the data segment, ending at the tail of the data segment. We can use the following pseudo-code to illustrate this section:
uint8_t* ptr_data = &(data_segment);
uint8_t* ptr_program = &(init_value_in_program);
while (ptr_data < &(data_segment_end)) {
*ptr_data = *ptr_program;
ptr_data++;
ptr_program++;
}
To compare the 16-bit address using 8-bit ALU, AVR must compare the lower byte first. In this example, the address is known at compile time, so the lower byte of the address can be encoded in the instruction cpi
(Compare with immediate). To compare the higher byte, compare with carry should be used. AVR doesn’t have instructions for “compare with immediate with carry”. Therefore, instruction cpc
(Compare with register with carry) is used. The higher byte is loaded into register R17.
Program in Flash - Initialize .bss
segment
00000018 <__do_clear_bss>:
18: 20 e0 ldi r18, 0x00 ; 0
1a: a2 e6 ldi r26, 0x62 ; 98
1c: b0 e0 ldi r27, 0x00 ; 0
1e: 01 c0 rjmp .+2 ; 0x22 <.do_clear_bss_start>
00000020 <.do_clear_bss_loop>:
20: 1d 92 st X+, r1
00000022 <.do_clear_bss_start>:
22: a4 36 cpi r26, 0x64 ; 100
24: b2 07 cpc r27, r18
26: e1 f7 brne .-8 ; 0x20 <.do_clear_bss_loop>
The uninitialized variables are allocated in the .bss
segment. Because we expect them to be zero, AVR-GCC will clear them for us.
In this example, the .bss
segment is located at RAM address 0x0062
and end at RAM address 0x0062
(not included). Zero register R1 is used to hold the value (0) to write, index register X (R27:R26) is used to write data to data space. A for loop is used, starting from the beginning of the data segment, ending at the tail of the data segment. We can use the following pseudo-code to illustrate this section:
uint8_t* ptr_bss = &(bss_segment);
while (ptr_data < &(bss_segment_end)) {
*ptr_bss = 0;
ptr_bss++;
}
After reset, the content of register is defined and is 0. We can safely use R1 as zero register if we don't write any value to it.
Program in Flash - User code
void main() {
volatile char z = 'Z';
}
00000028 <main>:
28: cf 93 push r28
2a: df 93 push r29
2c: 1f 92 push r1
2e: cd b7 in r28, 0x3d ; 61
30: de b7 in r29, 0x3e ; 62
32: 8a e5 ldi r24, 0x5A ; 90
34: 89 83 std Y+1, r24 ; 0x01
36: 00 00 nop
38: 0f 90 pop r0
3a: df 91 pop r29
3c: cf 91 pop r28
3e: 08 95 ret
Our main function. In the C source code, we assign ASCII code ‘Z’ to the 1-byte size local variable z
.
In-function local variables are saved in the stack. Their address is relevant to the address of the function frame; hence, they can only be addressed using the frame pointer. AVR doesn’t have a native frame pointer. However, it is possible to read the value of the stack pointer (which is the frame address after allocating all the local variables into stack) into an index register. Then, we can use that index register to access local variables in the stack.
In this example, index register Y (R29:R28) is used as the frame pointer, the following steps are performed:
- Push the original value of index register Y into the stack to preserve it. R29 and R28 are Call-Saved Registers.
- Use push with an irrelevant register to allocate 1-byte space in the stack.
- Copy the stack pointer (IO address
0x3E:0x3D
) to index register Y. Because AVR Stack pointer points to the next available slot and grows downwards, the content of the stack pointer is 1 byte lower then the address of variablez
. - Load the ASCII code of ‘Z’
0x5A
into register R24; then, write R24 to the address pointed by index register Y with offset+1
. - At the end of the function, deallocate the stack and return. Destroy the local variable
z
by pop 1 byte of data into a free-to-override register R0. Restore index register Y.
Register Layout
The AVR-GCC ABI (Application Binary Interface) says:
Fixed Registers are registers that won't be allocated by GCC's register allocator. Registers R0 and R1 are fixed and used implicitly while printing out assembler instructions.
The call-used or call-clobbered general purpose registers (GPRs) are registers that might be destroyed (clobbered) by a function call.
The remaining GPRs are call-saved, i.e. a function that uses such a registers must restore its original content. This applies even if the register is used to pass a function argument.
Data in Flash
As mentioned above, AVR-GCC will put the default value of defined variables in ROM and copy them into RAM at the initialization stage. AVR-GCC will place the content at the end of the .text
segment. In this example, the last instruction is at program address 0x003E
; therefore, the initial value of the .data
segment will be the next slot, which is program address 0x0040
. Note, the program address is 16-bit aligned because AVR instructions are 2-byte or 4-byte long.
The object file doesn’t show the initial value of the .data
segment. We can verify this by converting the object file into a hex file, which is ready to be burned into the microprocessor.
avr-objcopy -O ihex avr-hello.o avr-hello.hex
This converts the object file avr-hello.o
into avr-hello.hex
using ihex
(Intel HEX) format.
Open the hex file avr-hello.hex
:
:1000000010E0A0E6B0E0E0E4F0E003C0C89531966F
:100010000D92A236B107D1F720E0A2E6B0E001C010
:100020001D92A436B207E1F7CF93DF931F92CDB7AD
:10003000DEB78AE5898300000F90DF91CF910895A4
:0200400041433A
:00000001FF
The second last line shows: 0x02
byte at address 0x0040
. First byte is 0x41
, which is the ASCII code of ‘A’, the initial value of static variable a
. Second byte is 0x43
, which is the ASCII code of ‘C’, the initial value of static variable c
. Checksum of this line is 0x3A
.
Compile for a specific microprocessor
In the previous example, we compiled a program against a general AVR. However, when we program a microprocessor, we want to have the program run on a real chip and have some physical outputs.
In the next example, we will create a simple program that writes the IO registers of one of my favorite, and the smallest classic AVR, the ATtiny25. This chip comes with 6 IOs, 128 bytes of data memory (RAM) and 2K bytes of program memory (Flash).
t25-hello.c
#include <avr/io.h>
volatile char a = 'A';
volatile char b;
void main() {
volatile char z = 'Z';
DDRB = 0b00111111;
PORTB = 0b00111111;
}
The above C example is very similar to the last general AVR example. Before the main()
function, we provide an initialized static variable volatile char a = ‘A’
and an uninitialized static variable volatile char b
. In the main()
function, we assign a value for a local variable volatile z = ‘Z’
. Furthermore, we write 0b00111111
to IO register DDRB
to set all pins on PORT B to output; write 0b00101010
to IO register PORTB
to set some of the pins to high and some of the pins to low.
Because IO registers are distinct to microprocessors (General computers don't have IO / Different microprocessors have different IOs), we need to include the special header avr/io.h
.
avr-gcc -O0 t25-hello.c -mmcu=attiny25 -o t25-hello.o
avr-objdump -m avr25 -Dsx t25-hello.o
Compile the source code t25-hello.c
into object file t25-hello.o
. This time, we use -mmcu=attiny25
to specify the architecture (machine), so the compiler can look into the address of IO registers, memory size, and supported instruction set (ISA) of the given architecture.
Disassemble all sections of the object file t25-hello.o
. The disassembler requires us to specify the architecture of the object file. We use -m avr25
to tell the disassembler the ISA according to the AVR-GCC manual.
To get all information we want, we use 3 flags -Dxs
:
Following shows the content of the object file. Let’s divide it into segments:
Static data
volatile char a = 'A';
volatile char b;
SYMBOL TABLE:
00800060 l d .data 00000000 .data
00800062 l d .bss 00000000 .bss
00800062 g O .bss 00000001 b
00800060 g O .data 00000001 a
The above shows the partial content of the symbol table. The .data
segment starts at the beginning of the RAM, which is at address 96
(Hex 0x60
) for ATtiny25. After this is the .bss
segment, which start at address 0x62
.
Although there is only one initialized static variable, AVR-GCC allocates 2 bytes for the .data
segment. As we can see, the address of the .bss
segment is 2 bytes after the .data
segment. We can also confirm the size of the .data
segment in the section table:
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000082 00000000 00000000 00000094 2**1
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000002 00800060 00000082 00000116 2**0
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000001 00800062 00800062 00000118 2**0
ALLOC
If we declare 2 initialized static variables, the size of .data
segment will be 2 bytes; if we declare 3 initialized static variables, the size of the .data
segment will be 4 bytes. In conclusion, AVR-GCC aligns the .data
segment by 2 bytes. In contrast, the .bss
segment is not aligned. This may be because the initial values of the .data
segment are stored in the program space and the program is aligned by 2 bytes.
Program in Flash - Vector table
Disassembly of section .text:
00000000 <__vectors>:
0: 0e c0 rjmp .+28 ; 0x1e <__ctors_end>
2: 26 c0 rjmp .+76 ; 0x50 <__bad_interrupt>
4: 25 c0 rjmp .+74 ; 0x50 <__bad_interrupt>
6: 24 c0 rjmp .+72 ; 0x50 <__bad_interrupt>
8: 23 c0 rjmp .+70 ; 0x50 <__bad_interrupt>
a: 22 c0 rjmp .+68 ; 0x50 <__bad_interrupt>
c: 21 c0 rjmp .+66 ; 0x50 <__bad_interrupt>
e: 20 c0 rjmp .+64 ; 0x50 <__bad_interrupt>
10: 1f c0 rjmp .+62 ; 0x50 <__bad_interrupt>
12: 1e c0 rjmp .+60 ; 0x50 <__bad_interrupt>
14: 1d c0 rjmp .+58 ; 0x50 <__bad_interrupt>
16: 1c c0 rjmp .+56 ; 0x50 <__bad_interrupt>
18: 1b c0 rjmp .+54 ; 0x50 <__bad_interrupt>
1a: 1a c0 rjmp .+52 ; 0x50 <__bad_interrupt>
1c: 19 c0 rjmp .+50 ; 0x50 <__bad_interrupt>
When a specific event happens (for example, power up, watchdog timeout, timer overflow, ADC process finished), if the ISR of that event and the global event enable is enabled, the hardware will jump to the program at the associated address of that specific event.
In this example, the first vector at program address 0x0000
is for reset, which leads to the vector _ctros_end
at program address 0x001E
. Followed by other 14 vectors like timer event, ADC events, communication events, all lead to vector __bad_interrupt
at program address 0x0050
.
AVR-GCC will generate a full vector table if a machine is specified, although some vector is not used. In this example, only the first vector (reset) is used; but AVR-GCC creates vectors for all other 14 events. If we can remove the unused vectors (link by ourselves), we can save some program space.
AVR-GCC copies the vector table from a pre-built object file at avr-gnu/avr/lib/[family]/[tiny-stack/]crt[machine].o
, where family
in the directory name is the family name, tiny-machine
in the directory name is presented by chips with less than 256 bytes of memory (stack pointer has only 1 byte), machine
in the file name specifies the name of the chip. In this example, we are using ATtiny25 from the avr25 family and with 128 bytes of memory, so, we can find this file at avr-gnu/avr/lib/avr25/tiny-stack/crtattiny25.o
. Dump this file with command avr-objdump -m avr25 -dx crtattiny25.o
, we get:
SYMBOL TABLE:
00000000 g .vectors 00000000 __vectors
00000000 w .text 00000000 __vector_1
00000000 g .text 00000000 __bad_interrupt
00000000 w .text 00000000 __vector_2
00000000 w .text 00000000 __vector_3
00000000 w .text 00000000 __vector_4
00000000 w .text 00000000 __vector_5
00000000 w .text 00000000 __vector_6
00000000 w .text 00000000 __vector_7
00000000 w .text 00000000 __vector_8
00000000 w .text 00000000 __vector_9
00000000 w .text 00000000 __vector_10
00000000 w .text 00000000 __vector_11
00000000 w .text 00000000 __vector_12
00000000 w .text 00000000 __vector_13
00000000 w .text 00000000 __vector_14
Disassembly of section .vectors:
00000000 <__vectors>:
0: 00 c0 rjmp .+0 ; 0x2 <__vectors+0x2>
0: R_AVR_13_PCREL __init
2: 00 c0 rjmp .+0 ; 0x4 <__vectors+0x4>
2: R_AVR_13_PCREL __vector_1
4: 00 c0 rjmp .+0 ; 0x6 <__vectors+0x6>
4: R_AVR_13_PCREL __vector_2
6: 00 c0 rjmp .+0 ; 0x8 <__vectors+0x8>
6: R_AVR_13_PCREL __vector_3
8: 00 c0 rjmp .+0 ; 0xa <__vectors+0xa>
8: R_AVR_13_PCREL __vector_4
a: 00 c0 rjmp .+0 ; 0xc <__vectors+0xc>
a: R_AVR_13_PCREL __vector_5
c: 00 c0 rjmp .+0 ; 0xe <__vectors+0xe>
c: R_AVR_13_PCREL __vector_6
e: 00 c0 rjmp .+0 ; 0x10 <__vectors+0x10>
e: R_AVR_13_PCREL __vector_7
10: 00 c0 rjmp .+0 ; 0x12 <__vectors+0x12>
10: R_AVR_13_PCREL __vector_8
12: 00 c0 rjmp .+0 ; 0x14 <__vectors+0x14>
12: R_AVR_13_PCREL __vector_9
14: 00 c0 rjmp .+0 ; 0x16 <__vectors+0x16>
14: R_AVR_13_PCREL __vector_10
16: 00 c0 rjmp .+0 ; 0x18 <__vectors+0x18>
16: R_AVR_13_PCREL __vector_11
18: 00 c0 rjmp .+0 ; 0x1a <__vectors+0x1a>
18: R_AVR_13_PCREL __vector_12
1a: 00 c0 rjmp .+0 ; 0x1c <__vectors+0x1c>
1a: R_AVR_13_PCREL __vector_13
1c: 00 c0 rjmp .+0 ; 0x1e <__FUSE_REGION_LENGTH__+0x1b>
1c: R_AVR_13_PCREL __vector_14
This pre-built object file provides a set of weak symbols for vectors. If user provides a required ISR (for example, ISR (TIMER0_COMPA_vect) {...}
), then, AVR-GCC will place the program address of the user supplied ISR to the vector table; otherwise, AVR-GCC will use the program address of the weak symbol __bad_interrupt
.
Relative jump rjmp
, which is 16 bits long (12-bit signed address), can address the entire program space for devices with less than 4K words of program memory, such as ATtiny25, ATtiny84. On these devices, each vector is 1-word long. On the other hand, devices with more than 4K program words, such as ATmega328, use jump jmp
instruction, which is 32 bits long (22-bit address), and can address the entire 4M words program space. On these devices, each vector is 2-word long.
Program in Flash - Initialize stack and zero register
0000001e <__ctors_end>:
1e: 11 24 eor r1, r1
20: 1f be out 0x3f, r1 ; 63
22: cf ed ldi r28, 0xDF ; 223
24: cd bf out 0x3d, r28 ; 61
Clear register R1. AVR-GCC uses R1 as zero register (the value in this register is always 0, it helps with arithmetic operations such as comparing with 0 and adding with 0).
Reset SREG (Status register, IO address 0x3F
).
Set SP (stack pointer, IO address 0x3E:0x3D
) to top of SRAM (0x00DF
for ATtiny25). For devices with more than 256 bytes of memory space, higher byte of the SP will be set.
AVR-GCC copies this section from the pre-built object file at avr-gnu/avr/lib/[family]/[tiny-stack/]crt[machine].o
:
Disassembly of section .init2:
00000000 <.init2>:
0: 11 24 eor r1, r1
2: 1f be out 0x3f, r1 ; 63
4: c0 e0 ldi r28, 0x00 ; 0
4: R_AVR_LO8_LDI __stack
6: cd bf out 0x3d, r28 ; 61
AVR-LibC Memory Sections syas, this section is used to In C programs, weakly bound to initialize the stack, and to clear zero_reg (r1).
Program in Flash - Initialize .data
and .bss
segment
00000026 <__do_copy_data>:
26: 10 e0 ldi r17, 0x00 ; 0
28: a0 e6 ldi r26, 0x60 ; 96
2a: b0 e0 ldi r27, 0x00 ; 0
2c: e2 e8 ldi r30, 0x82 ; 130
2e: f0 e0 ldi r31, 0x00 ; 0
30: 02 c0 rjmp .+4 ; 0x36 <__do_copy_data+0x10>
32: 05 90 lpm r0, Z+
34: 0d 92 st X+, r0
36: a2 36 cpi r26, 0x62 ; 98
38: b1 07 cpc r27, r17
3a: d9 f7 brne .-10 ; 0x32 <__do_copy_data+0xc>
0000003c<__do_clear_bss>:
3c: 20 e0 ldi r18, 0x00 ; 0
3e: a2 e6 ldi r26, 0x62 ; 98
40: b0 e0 ldi r27, 0x00 ; 0
42: 01 c0 rjmp .+2 ; 0x46 <.do_clear_bss_start>
00000044 <.do_clear_bss_loop>:
44: 1d 92 st X+, r1
00000046 <.do_clear_bss_start>:
46: a3 36 cpi r26, 0x63 ; 99
48: b2 07 cpc r27, r18
4a: e1 f7 brne .-8 ; 0x44 <.do_clear_bss_loop>
Copy the initial values for the static variables in the .data
segment. Clear the space for static variables in the .bss
segment.
Program in Flash - Entry point
4c: 02 d0 rcall .+4 ; 0x52 <main>
4e: 17 c0 rjmp .+46 ; 0x7e <_exit>
00000050 <__bad_interrupt>:
50: d7 cf rjmp .-82 ; 0x0 <__vectors>
Initialization finished. Call the user supplied main function at address 0x0052
. Once the main function returns, jump to library supplied dead loop at address 0x007E
.
AVR-GCC copies this section from the pre-built object file at avr-gnu/avr/lib/[family]/[tiny-stack/]crt[machine].o
:
Disassembly of section .init2:
00000000 <.init9>:
0: 00 d0 rcall .+0 ; 0x2 <.init9+0x2>
0: R_AVR_13_PCREL main
2: 00 c0 rjmp .+0 ; 0x4 <__FUSE_REGION_LENGTH__+0x1>
2: R_AVR_13_PCREL exit
AVR-LibC Memory Sections syas, this section is used to Jumps into main().
If undefined ISR fired, jump to reset vector.
Program in Flash - User code
void main() {
volatile char z = 'Z';
DDRB = 0b00111111;
PORTB = 0b00111111;
}
00000052 <main>:
52: cf 93 push r28
54: df 93 push r29
56: 1f 92 push r1
58: cd b7 in r28, 0x3d ; 61
5a: dd 27 eor r29, r29
5c: 8a e5 ldi r24, 0x5A ; 90
5e: 89 83 std Y+1, r24 ; 0x01
60: 87 e3 ldi r24, 0x37 ; 55
62: 90 e0 ldi r25, 0x00 ; 0
64: 2f e3 ldi r18, 0x3F ; 63
66: fc 01 movw r30, r24
68: 20 83 st Z, r18
6a: 88 e3 ldi r24, 0x38 ; 56
6c: 90 e0 ldi r25, 0x00 ; 0
6e: 2f e3 ldi r18, 0x3F ; 63
70: fc 01 movw r30, r24
72: 20 83 st Z, r18
74: 00 00 nop
76: 0f 90 pop r0
78: df 91 pop r29
7a: cf 91 pop r28
7c: 08 95 ret
Following tasks are performed:
- Preserve call-saved registers by pushing into the stack. Allocate space for local variables in the stack.
- Load value ‘Z’ (ASCII code
0x5A
) from instruction (immediate addressing mode) into register R24, then use index register Y (R29:R28) as frame pointer to set the local variable. - Load
0b00111111 (0x3F)
from instruction into register R18. Load memory space address of IO registerDDRB (0x0037)
into register pair R25:R24, then transfer this address into index register Z (R31:R30). Use index register Z to assignDDRB
. - Load
0b00111111 (0x3F)
from instruction into register R18. Load memory space address of IO registerPORTB (0x0038)
into register pair R25:R24, then transfer this address into index register Z (R31:R30). Use index register Z to assignPORTB
. - Restore call-saved registers, deallocate stack and return.
For ATtiny25, the IO space addresses of DDRB
and PORTB
are 0x17
and 0x18
, respectively. The memory space addresses of DDRB
and PORTB
are 0x0037
and 0x0038
, respectively.
Program in Flash - End
0000007e <_exit>:
7e: f8 94 cli
00000080 <__stop_program>:
80: ff cf rjmp .-2 ; 0x80 <__stop_program>
Clear global interrupt flag to disable all interrupts. Stop the execution here using a dead loop.
Compile with optimization
In this example, we compile the source C code without any optimization. No doubt, the generated code is not elegant at all.
- If we only use call-used registers, we can save a few bytes of stack space and reduce the instructions to push and pop the call-saved registers. AVR provides not only index Y (R29:R28) but also index register Z (R31:R30) for memory access with displacement.
- IO registers (in this example,
DDRB
andPORTB
) are static; hence, they can be accessed using direct addressing mode. For example, use instructionssts memory_address(DDRB), Rr
. - IO registers that can be access in the IO space (for devices like ATtiny25, all IO registers are located in the 64 bytes long IO space; for devices like ATmega328, some of the IO registers are located out of the IO space and must be access through the memory space). IO addressing instructions like
out DDRB, Rr
can be used to provide faster execution and less program memory consumption. - Why does AVR-GCC load the address of registers into R25:R24 first then transfer to index register Z? Index register Z can be loaded directly.
- If the desired value already existed in the destination register, there is no need to reload the value. In the example, AVR-GCC reloads the values for each statement in the C source code.
avr-gcc -O1 t25-hello.c -mmcu=attiny25 -o t25-hello.o
avr-objdump -Dsx t25-hello.o
Next, let’s compile with optimization on (-O1
for basic optimization). The section for the main()
function changes to:
void main() {
volatile char z = 'Z';
DDRB = 0b00111111;
PORTB = 0b00111111;
}
00000052 <main>:
52: cf 93 push r28
54: df 93 push r29
56: 1f 92 push r1
58: cd b7 in r28, 0x3d ; 61
5a: dd 27 eor r29, r29
5c: 8a e5 ldi r24, 0x5A ; 90
5e: 89 83 std Y+1, r24 ; 0x01
60: 8f e3 ldi r24, 0x3F ; 63
62: 87 bb out 0x17, r24 ; 23
64: 88 bb out 0x18, r24 ; 24
66: 0f 90 pop r0
68: df 91 pop r29
6a: cf 91 pop r28
6c: 08 95 ret
As we can see in the disassemble code, AVR-GCC is still using call-saved index register Y (R29:R28) as the frame pointer. It would be better if index register Z can be used in this case. However, with the optimization, AVR-GCC uses the fast and small out
instruction to access IO registers DDRB
and PORTB
and only load the value 0b00111111
once.
All other sections remain the same except some address change due to size of the main()
function change. These codes are not compiled, but linked from a pre-built object file.
Function and ISR (Interrupt Service Routine)
Let's start from a simple C code:
routine.c
#include <avr/io.h>
#include <avr/pgmspace.h>
#include <avr/interrupt.h>
volatile const uint8_t prog_data[] PROGMEM = {2, 3, 5, 7};
volatile uint8_t static_d1 = 11, static_d2;
void func(uint8_t dir, uint8_t mask) {
volatile uint8_t _mask = 0b00111111;
DDRB = _mask & mask & dir;
}
void main() {
func(0b01010101, 0xFF);
static_d2 = pgm_read_byte(&(prog_data[2]));
for(;;);
}
ISR (WDT_vect) {
PINB = 0b00111111;
}
We create an array of uint8_t
(8-bit unsigned integer) prog_data
saved in the program space. We use the PROGMEM
keyword to tell the compiler to not place them in the data segment in the RAM, but to store them in the flash. Assign the first four prime numbers as their value.
We also create 2 static variables: uint8_t static_d1
with initial value 1, this variable should be placed in the .data
segment; uint8_t static_d2
without initial value, this variable should be placed in the .bss
segment.
A child function func()
. This function accepts 2 parameters uint8_t dir
and uint8_t mask
. In the function, we load a local variable uint8_t _mask
with value 0b00111111
. We should expect this local variable in stack and be accessed using a frame pointer. Then, we bitwise AND it with the given two parameters and write the result to the IO register DDRB
.
The main()
function. We first call the child function func()
with two parameters, 0b01010101
for dir
and 0xFF
for mask
. After thes, we load the 2nd (or the 3rd if count from 1) element of the array prog_data
in memory space in flash to static variable static_d2
in data space in RAM. At the end, we use a dead loop for(;;);
to stop the execution of the program.
Furthermore, we create the ISR for WDT (Watchdog Timeout). This ISR should be fired when the watchdog is not reset in a given time period. In general, we should set up the watchdog first to be able to enable it and insert watchdog reset instruction wdr
in the program to properly use it; in this example, to simplify the process, we are gonna ignore them. In this ISR, we write a value to IO register PINB
.
avr-gcc -O1 routine.c -o routine.o
avr-objdump -m avr25 -Dxs routine.o
Next, let’s compile with basic optimization and disassemble, we get:
Static data
volatile uint8_t static_d1 = 11, static_d2;
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000000b4 00000000 00000000 00000094 2**1
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000002 00800060 000000b4 00000148 2**0
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000001 00800062 00800062 0000014a 2**0
ALLOC
SYMBOL TABLE:
00800060 l d .data 00000000 .data
00800062 l d .bss 00000000 .bss
00800062 g O .bss 00000001 static_d2
00800060 g O .data 00000001 static_d1
The above shows the partial content of the symbol table and the section table.
Static variable static_d1
is 1-byte long, residing in the .data
segment. The .data
segment consumes 2 bytes (due to align) starts at the beginning of the RAM, which is at address0x60
for ATtiny25.
After .data
segment is the .bss
segment starts at address 0x62
and consumes 1 byte of space. In this section, we can find static variable static_d2
which is 1-byte long.
Program in Flash - Vector table
Disassembly of section .text:
00000000 <__vectors>:
0: 10 c0 rjmp .+32 ; 0x22 <__ctors_end>
2: 28 c0 rjmp .+80 ; 0x54 <__bad_interrupt>
4: 27 c0 rjmp .+78 ; 0x54 <__bad_interrupt>
6: 26 c0 rjmp .+76 ; 0x54 <__bad_interrupt>
8: 25 c0 rjmp .+74 ; 0x54 <__bad_interrupt>
a: 24 c0 rjmp .+72 ; 0x54 <__bad_interrupt>
c: 23 c0 rjmp .+70 ; 0x54 <__bad_interrupt>
e: 22 c0 rjmp .+68 ; 0x54 <__bad_interrupt>
10: 21 c0 rjmp .+66 ; 0x54 <__bad_interrupt>
12: 20 c0 rjmp .+64 ; 0x54 <__bad_interrupt>
14: 1f c0 rjmp .+62 ; 0x54 <__bad_interrupt>
16: 1e c0 rjmp .+60 ; 0x54 <__bad_interrupt>
18: 36 c0 rjmp .+108 ; 0x86 <__vector_12>
1a: 1c c0 rjmp .+56 ; 0x54 <__bad_interrupt>
1c: 1b c0 rjmp .+54 ; 0x54 <__bad_interrupt>
The vector table. The first vector which is for reset led to __ctors_end
at program address 0x0022
, others led to __bad_interrupt
at program address 0x0054
. Pretty much the same as what we saw in the previous example, except the vector 12, which led to __vector_12
. This is because in this example, we provide ISR for the WDT event.
We can also check the symbol table:
SYMBOL TABLE:
00000054 w .text 00000000 __vector_1
00000094 g F .text 0000001c __vector_12
00000054 g .text 00000000 __bad_interrupt
00000054 w .text 00000000 __vector_6
00000054 w .text 00000000 __vector_3
00000054 w .text 00000000 __vector_11
00000054 w .text 00000000 __vector_13
00000054 w .text 00000000 __vector_7
00000054 w .text 00000000 __vector_5
00000054 w .text 00000000 __vector_4
00000054 w .text 00000000 __vector_9
00000054 w .text 00000000 __vector_2
00000054 w .text 00000000 __vector_8
00000054 w .text 00000000 __vector_14
00000054 w .text 00000000 __vector_10
Because we have ISR (WDT_vect) {...}
in the C source code, __vector_12
changes from a weak symbol to a global, strong and function symbol.
Program data
volatile const uint8_t prog_data[] PROGMEM = {2, 3, 5, 7};
0000001e <__trampolines_end>:
1e: 02 03 mulsu r16, r18
20: 05 07 cpc r16, r21
Data saved in program space using the PROGMEM keyword. In this example, we create the array volatile const uint8_t prog_data[] PROGMEM = {2, 3, 5, 7};
, the values are saved here. The disassembled instruction has no meaning.
Address of the program data can be found in the symbol table:
SYMBOL TABLE:
0000001e g O .text 00000004 prog_data
Program in Flash - Initialize stack and zero register
00000022 <__ctors_end>:
22: 11 24 eor r1, r1
24: 1f be out 0x3f, r1 ; 63
26: cf ed ldi r28, 0xDF ; 223
28: cd bf out 0x3d, r28 ; 61
Clear register R1. Reset SREG (Status register, IO address 0x3F
). Set SP (stack pointer, IO address 0x3E:0x3D
) to top of SRAM (0x00DF
for ATtiny25).
Program in Flash - Initialize .data
and .bss
segment
0000002a <__do_copy_data>:
2a: 10 e0 ldi r17, 0x00 ; 0
2c: a0 e6 ldi r26, 0x60 ; 96
2e: b0 e0 ldi r27, 0x00 ; 0
30: e4 eb ldi r30, 0xB4 ; 180
32: f0 e0 ldi r31, 0x00 ; 0
34: 02 c0 rjmp .+4 ; 0x3a <__do_copy_data+0x10>
36: 05 90 lpm r0, Z+
38: 0d 92 st X+, r0
3a: a2 36 cpi r26, 0x62 ; 98
3c: b1 07 cpc r27, r17
3e: d9 f7 brne .-10 ; 0x36 <__do_copy_data+0xc>
00000040 <__do_clear_bss>:
40: 20 e0 ldi r18, 0x00 ; 0
42: a2 e6 ldi r26, 0x62 ; 98
44: b0 e0 ldi r27, 0x00 ; 0
46: 01 c0 rjmp .+2 ; 0x4a <.do_clear_bss_start>
00000048 <.do_clear_bss_loop>:
48: 1d 92 st X+, r1
0000004a <.do_clear_bss_start>:
4a: a3 36 cpi r26, 0x63 ; 99
4c: b2 07 cpc r27, r18
4e: e1 f7 brne .-8 ; 0x48 <.do_clear_bss_loop>
Copy the initial values for the static variables in the .data
segment. Clear the space for static variables in the .bss
segment.
Program in Flash - Entry point
50: 11 d0 rcall .+34 ; 0x74 <main>
52: 2e c0 rjmp .+92 ; 0xb0 <_exit>
00000054 <__bad_interrupt>:
54: d5 cf rjmp .-86 ; 0x0 <__vectors>
Initialization finished. Call the user supplied main function at address 0x0074
. Once the main function returns, jump to library supplied dead loop at address 0x00B0
.
If undefined ISR fired, jump to reset vector.
Program in Flash - Child function
void func(uint8_t dir, uint8_t mask) {
volatile uint8_t _mask = 0b00111111;
DDRB = _mask & mask & dir;
}
00000056 <func>:
56: cf 93 push r28
58: df 93 push r29
5a: 1f 92 push r1
5c: cd b7 in r28, 0x3d ; 61
5e: dd 27 eor r29, r29
60: 9f e3 ldi r25, 0x3F ; 63
62: 99 83 std Y+1, r25 ; 0x01
64: 99 81 ldd r25, Y+1 ; 0x01
66: 89 23 and r24, r25
68: 68 23 and r22, r24
6a: 67 bb out 0x17, r22 ; 23
6c: 0f 90 pop r0
6e: df 91 pop r29
70: cf 91 pop r28
72: 08 95 ret
Following tasks are performed:
- Preserve call-saved registers by pushing into the stack. Allocate space for local variables in the stack.
- Load
0b00111111 (0x3F)
from instruction into register R25, then use index register Y (R29:R28) as frame pointer to set the local variable_mask
. - Local variable
_mask
is still in register R25. Bitwise AND it with parametersdir
in R24 andmask
in R22. Next, write the result in R22 to IO registerDDRB
at IO address0x17
. - Restore call-saved registers, deallocate stack and return.
Program in Flash - Main function
void main() {
func(0b01010101, 0xFF);
volatile uint8_t temp = pgm_read_byte(&(prog_data[2]));
static_d2 = temp;
for(;;);
}
00000074 <main>:
74: cf 93 push r28
76: df 93 push r29
78: 1f 92 push r1
7a: cd b7 in r28, 0x3d ; 61
7c: dd 27 eor r29, r29
7e: 6f ef ldi r22, 0xFF ; 255
80: 85 e5 ldi r24, 0x55 ; 85
82: e9 df rcall .-46 ; 0x56 <func>
84: e0 e2 ldi r30, 0x20 ; 32
86: f0 e0 ldi r31, 0x00 ; 0
88: e4 91 lpm r30, Z
8a: e9 83 std Y+1, r30 ; 0x01
8c: 89 81 ldd r24, Y+1 ; 0x01
8e: 80 93 62 00 sts 0x0062, r24 ; 0x800062 <__data_end>
92: ff cf rjmp .-2 ; 0x92 <__DATA_REGION_LENGTH__+0x12>
Following tasks are performed:
- Preserve call-saved registers by pushing into the stack. Allocate space for local variables in the stack.
- Load stack pointer into index register Y (R29:R28) to build the frame pointer.
- Load parameters
dir
in R24 andmask
in R22, then call the child functionfunc()
. - Load address of program data
prog_data
into index register Z (R31:R30). Load the data use load program data instructionlpm Rd, Z
to load the iterested data into register R30; then, store into locaal variabletemp
in stack. - Load locaal variable
temp
into R24, then write to static variablestatic_d2
at memory address0x0062
. - Use a dead loop to stop execution.
Program in Flash - Interrupt Service Routine
ISR (WDT_vect) {
PINB = 0b00111111;
}
00000094 <__vector_12>:
94: 1f 92 push r1
96: 0f 92 push r0
98: 0f b6 in r0, 0x3f ; 63
9a: 0f 92 push r0
9c: 11 24 eor r1, r1
9e: 8f 93 push r24
a0: 8f e3 ldi r24, 0x3F ; 63
a2: 86 bb out 0x16, r24 ; 22
a4: 8f 91 pop r24
a6: 0f 90 pop r0
a8: 0f be out 0x3f, r0 ; 63
aa: 0f 90 pop r0
ac: 1f 90 pop r1
ae: 18 95 reti
Following tasks are performed:
- Preserve SREG and all registers used in the ISR by pushing them into the stack.
- Clear R1. Although R1 is zero register and should always be 0, the ISR may be fired when the R1 is temporarily override. Therefore, we need to re-clear R1 for safe.
- Load value
0x3F
into R24, then write to IO registerPINB
at IO address0x16
. - Restore call-saved registers, deallocate stack and return.
Program in Flash - End
000000b0 <_exit>:
b0: f8 94 cli
000000b2 <__stop_program>:
b2: ff cf rjmp .-2 ; 0xb2 <__stop_program>
Clear global interrupt flag to disable all interrupts. Stop the execution here using a dead loop.