Machine Code Generated by AVR-GCC

In this article, a number of experiments are used to examine the output machine code generated by AVR-GCC. Description is given for possible reasons AVR-GCC generates machine code in this way.

--by Captdam @ Nov 18, 2024

The most basic - General AVR

Let's start from a simple C code:


avr-hello.c

volatile char a = 'A', b, c = 'C', d;

void main() {
	volatile char z = 'Z';
}
	

In this example, we create 4 global variable a, b, c and d, 2 of them defined; and a local (function-scope) variable z.

We use volatile keyword to prevent the compiler from optimizing the variables. We will discuse this in detail later.


avr-gcc -O0 avr-hello.c -o avr-hello.o
avr-objdump -m avr2 -Dxs avr-hello.o
	

Compile the source code avr-hello.c into object file avr-hello.o. We specify -O0 for no optimization to force the compiler translate C source code into machine code line by line, so we can study the result. Otherwise, the compiler may do a lot of tricks and make the generated code extremely hard to analyze.

In general, we should use -mmcu to specify the architecture (machine) we are gonna use so the compiler can look into the address of IO registers, memory size, and supported instruction set (ISA) of the given architecture. We didn’t do so; therefore, the compiler will use default architecture avr2.

Disassemble all sections of the object file avr-hello.o. The disassembler requires us to specify the architecture of the object file. We use -m avr2 to tell the disassembler the ISA; so, the disassembler can decode the machine code correctly.

To get all information we want, we use 3 flags -Dxs:

Following shows the content of the object file. Let’s divide it into segments:

Data in RAM - Static and Stack


volatile char a = 'A', b, c = 'C', d;
		

SYMBOL TABLE:
00800060 l    d  .data	00000000 .data
00800062 l    d  .bss	00000000 .bss
00800062 g     O .bss	00000001 b
00800061 g     O .data	00000001 c
00800063 g     O .bss	00000001 d
00800060 g     O .data	00000001 a
		

The above shows the partial content of the symbol table. There are 2 segments for static (global) variables: the .data segment for initialized (we provided initial value in the source code) static variables and the .bss segment for uninitialized static variables.

Not to be confused with static variables in C. In this article, static means the variable is allocated statically, its address is fixed in the data space and it is known at compile time. In contrast, static in C has different meanings: global static variable means visible in file-scope; in-function static variable means allocated only once.

The first is the .data segment, which starts at the beginning of the RAM. In this example (for avr2 family), the first 96 bytes of memory space are used by 32 general purpose registers (GPR R0 - R31) and 64 IO registers (Note: some of the addresses are not used). Therefore, the first slot in the data segment is at address 96 (Hex 0x60).

After the .data segment is the .bss segment. In this example, we have 2 initialized static variables a and c, both are char type and consume 1 byte of space; therefore, the size of the .data segment will be 2 bytes. Using the start address of the .data segment 0x60 plus the size of the .data segment 0x02, we can see the start address of the .bss segment should be 0x62.

Although we declared b before c, c’s address is less than b’s address in the symbol table. AVR-GCC changes the order of variables to group the variables into segments.

RAM map of a device with internal RAM
AVR-LibC Manual - Memory Areas and Using malloc() - https://www.nongnu.org/avr-libc/user-manual/malloc.html

The static (global) and the stack (local) variables

For C, only the global and in-function static variables are placed into the .data segment or the .bss segment. Their lifetime is through the entire duration of the program; therefore, they can be allocated at the start of the program, and they won’t be deallocated until the program ends. In other words, they will occupy that space forever. So, we are able to assign them a fixed address in the data space.

In contrast, in-function local variables cannot be allocated in the .data segment or the .bss segment. They are created and allocated when its parent function starts and destroyed when its parent function returns. They will be allocated in the stack. When a function is called, this function starts by allocating a region of memory in the stack (growing the stack pointer) to store its local variables. A function may be called at any time, the stack pointer at the time of call may be at any location; therefore, it is impossible to predicate the address of the in-function local variables at compile time. The only way to address them is to use the stack pointer to calculate their address on-the-fly.

This rule not only applies to AVR GCC, but also applies to PC.

Program in Flash - Initialize .data segment


Disassembly of section .text:

00000000 <__ctors_end>:
	0:	10 e0       	ldi	r17, 0x00	; 0
	2:	a0 e6       	ldi	r26, 0x60	; 96
	4:	b0 e0       	ldi	r27, 0x00	; 0
	6:	e0 e4       	ldi	r30, 0x40	; 64
	8:	f0 e0       	ldi	r31, 0x00	; 0
	a:	03 c0       	rjmp	.+6      	; 0x12 <__zero_reg__+0x11>
	c:	c8 95       	lpm
	e:	31 96       	adiw	r30, 0x01	; 1
	10:	0d 92       	st	X+, r0
	12:	a2 36       	cpi	r26, 0x62	; 98
	14:	b1 07       	cpc	r27, r17
	16:	d1 f7       	brne	.-12     	; 0xc <__zero_reg__+0xb>
	

After reset, the content in RAM for data memory is undefined. The main() function expects the static variables a and c already in memory with the specified initilial values. Therefore, it is required to prepare the values for it. To do so, AVR-GCC will put the default value of defined variables in ROM and copy them into RAM at the initialization stage.

In this example, the values are saved in program address 0x0040 (Why? See Data stored in program), the .data segment is located at RAM address 0x0060 and end at RAM address 0x0062 (not included). Index register Z (R31:R30) is used to read data from program space using the special lpm (Load program memory, this instruction only works with index register Z) instruction; index register X (R27:R26) is used to write data to data space. A for loop is used, starting from the beginning of the data segment, ending at the tail of the data segment. We can use the following pseudo-code to illustrate this section:


uint8_t* ptr_data = &(data_segment);
uint8_t* ptr_program = &(init_value_in_program);
while (ptr_data < &(data_segment_end)) {
	*ptr_data = *ptr_program;
	ptr_data++;
	ptr_program++;
}
	

To compare the 16-bit address using 8-bit ALU, AVR must compare the lower byte first. In this example, the address is known at compile time, so the lower byte of the address can be encoded in the instruction cpi (Compare with immediate). To compare the higher byte, compare with carry should be used. AVR doesn’t have instructions for “compare with immediate with carry”. Therefore, instruction cpc (Compare with register with carry) is used. The higher byte is loaded into register R17.

Program in Flash - Initialize .bss segment


00000018 <__do_clear_bss>:
	18:	20 e0       	ldi	r18, 0x00	; 0
	1a:	a2 e6       	ldi	r26, 0x62	; 98
	1c:	b0 e0       	ldi	r27, 0x00	; 0
	1e:	01 c0       	rjmp	.+2      	; 0x22 <.do_clear_bss_start>

00000020 <.do_clear_bss_loop>:
	20:	1d 92       	st	X+, r1

00000022 <.do_clear_bss_start>:
	22:	a4 36       	cpi	r26, 0x64	; 100
	24:	b2 07       	cpc	r27, r18
	26:	e1 f7       	brne	.-8      	; 0x20 <.do_clear_bss_loop>
	

The uninitialized variables are allocated in the .bss segment. Because we expect them to be zero, AVR-GCC will clear them for us.

In this example, the .bss segment is located at RAM address 0x0062 and end at RAM address 0x0062 (not included). Zero register R1 is used to hold the value (0) to write, index register X (R27:R26) is used to write data to data space. A for loop is used, starting from the beginning of the data segment, ending at the tail of the data segment. We can use the following pseudo-code to illustrate this section:


uint8_t* ptr_bss = &(bss_segment);
while (ptr_data < &(bss_segment_end)) {
	*ptr_bss = 0;
	ptr_bss++;
}
	

After reset, the content of register is defined and is 0. We can safely use R1 as zero register if we don't write any value to it.

Program in Flash - User code


void main() {
	volatile char z = 'Z';
}
		

00000028 <main>:
	28:	cf 93       	push	r28
	2a:	df 93       	push	r29
	2c:	1f 92       	push	r1
	2e:	cd b7       	in	r28, 0x3d	; 61
	30:	de b7       	in	r29, 0x3e	; 62
	32:	8a e5       	ldi	r24, 0x5A	; 90
	34:	89 83       	std	Y+1, r24	; 0x01
	36:	00 00       	nop
	38:	0f 90       	pop	r0
	3a:	df 91       	pop	r29
	3c:	cf 91       	pop	r28
	3e:	08 95       	ret
		

Our main function. In the C source code, we assign ASCII code ‘Z’ to the 1-byte size local variable z.

In-function local variables are saved in the stack. Their address is relevant to the address of the function frame; hence, they can only be addressed using the frame pointer. AVR doesn’t have a native frame pointer. However, it is possible to read the value of the stack pointer (which is the frame address after allocating all the local variables into stack) into an index register. Then, we can use that index register to access local variables in the stack.

In this example, index register Y (R29:R28) is used as the frame pointer, the following steps are performed:

  1. Push the original value of index register Y into the stack to preserve it. R29 and R28 are Call-Saved Registers.
  2. Use push with an irrelevant register to allocate 1-byte space in the stack.
  3. Copy the stack pointer (IO address 0x3E:0x3D) to index register Y. Because AVR Stack pointer points to the next available slot and grows downwards, the content of the stack pointer is 1 byte lower then the address of variable z.
  4. Load the ASCII code of ‘Z’ 0x5A into register R24; then, write R24 to the address pointed by index register Y with offset +1.
  5. At the end of the function, deallocate the stack and return. Destroy the local variable z by pop 1 byte of data into a free-to-override register R0. Restore index register Y.

Register Layout

The AVR-GCC ABI (Application Binary Interface) says:

Fixed Registers are registers that won't be allocated by GCC's register allocator. Registers R0 and R1 are fixed and used implicitly while printing out assembler instructions.

The call-used or call-clobbered general purpose registers (GPRs) are registers that might be destroyed (clobbered) by a function call.

The remaining GPRs are call-saved, i.e. a function that uses such a registers must restore its original content. This applies even if the register is used to pass a function argument.

Data in Flash

As mentioned above, AVR-GCC will put the default value of defined variables in ROM and copy them into RAM at the initialization stage. AVR-GCC will place the content at the end of the .text segment. In this example, the last instruction is at program address 0x003E; therefore, the initial value of the .data segment will be the next slot, which is program address 0x0040. Note, the program address is 16-bit aligned because AVR instructions are 2-byte or 4-byte long.

The object file doesn’t show the initial value of the .data segment. We can verify this by converting the object file into a hex file, which is ready to be burned into the microprocessor.


avr-objcopy -O ihex avr-hello.o avr-hello.hex
	

This converts the object file avr-hello.o into avr-hello.hex using ihex (Intel HEX) format.

Open the hex file avr-hello.hex:


:1000000010E0A0E6B0E0E0E4F0E003C0C89531966F
:100010000D92A236B107D1F720E0A2E6B0E001C010
:100020001D92A436B207E1F7CF93DF931F92CDB7AD
:10003000DEB78AE5898300000F90DF91CF910895A4
:0200400041433A
:00000001FF
	

The second last line shows: 0x02 byte at address 0x0040. First byte is 0x41, which is the ASCII code of ‘A’, the initial value of static variable a. Second byte is 0x43, which is the ASCII code of ‘C’, the initial value of static variable c. Checksum of this line is 0x3A.

Compile for a specific microprocessor

In the previous example, we compiled a program against a general AVR. However, when we program a microprocessor, we want to have the program run on a real chip and have some physical outputs.

In the next example, we will create a simple program that writes the IO registers of one of my favorite, and the smallest classic AVR, the ATtiny25. This chip comes with 6 IOs, 128 bytes of data memory (RAM) and 2K bytes of program memory (Flash).


t25-hello.c

#include <avr/io.h>

volatile char a = 'A';
volatile char b;

void main() {
	volatile char z  = 'Z';
	DDRB = 0b00111111;
	PORTB = 0b00111111;
}
	

The above C example is very similar to the last general AVR example. Before the main() function, we provide an initialized static variable volatile char a = ‘A’ and an uninitialized static variable volatile char b. In the main() function, we assign a value for a local variable volatile z = ‘Z’. Furthermore, we write 0b00111111 to IO register DDRB to set all pins on PORT B to output; write 0b00101010 to IO register PORTB to set some of the pins to high and some of the pins to low.

Because IO registers are distinct to microprocessors (General computers don't have IO / Different microprocessors have different IOs), we need to include the special header avr/io.h.


avr-gcc -O0 t25-hello.c -mmcu=attiny25 -o t25-hello.o
avr-objdump -m avr25 -Dsx t25-hello.o
	

Compile the source code t25-hello.c into object file t25-hello.o. This time, we use -mmcu=attiny25 to specify the architecture (machine), so the compiler can look into the address of IO registers, memory size, and supported instruction set (ISA) of the given architecture.

Disassemble all sections of the object file t25-hello.o. The disassembler requires us to specify the architecture of the object file. We use -m avr25 to tell the disassembler the ISA according to the AVR-GCC manual.

To get all information we want, we use 3 flags -Dxs:

Following shows the content of the object file. Let’s divide it into segments:

Static data


volatile char a = 'A';
volatile char b;
		

SYMBOL TABLE:
00800060 l    d  .data	00000000 .data
00800062 l    d  .bss	00000000 .bss
00800062 g     O .bss	00000001 b
00800060 g     O .data	00000001 a
		

The above shows the partial content of the symbol table. The .data segment starts at the beginning of the RAM, which is at address 96 (Hex 0x60) for ATtiny25. After this is the .bss segment, which start at address 0x62.

Although there is only one initialized static variable, AVR-GCC allocates 2 bytes for the .data segment. As we can see, the address of the .bss segment is 2 bytes after the .data segment. We can also confirm the size of the .data segment in the section table:


Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000082  00000000  00000000  00000094  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000002  00800060  00000082  00000116  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000001  00800062  00800062  00000118  2**0
                  ALLOC
	

If we declare 2 initialized static variables, the size of .data segment will be 2 bytes; if we declare 3 initialized static variables, the size of the .data segment will be 4 bytes. In conclusion, AVR-GCC aligns the .data segment by 2 bytes. In contrast, the .bss segment is not aligned. This may be because the initial values of the .data segment are stored in the program space and the program is aligned by 2 bytes.

Program in Flash - Vector table


Disassembly of section .text:

00000000 <__vectors>:
	0:	0e c0       	rjmp	.+28     	; 0x1e <__ctors_end>
	2:	26 c0       	rjmp	.+76     	; 0x50 <__bad_interrupt>
	4:	25 c0       	rjmp	.+74     	; 0x50 <__bad_interrupt>
	6:	24 c0       	rjmp	.+72     	; 0x50 <__bad_interrupt>
	8:	23 c0       	rjmp	.+70     	; 0x50 <__bad_interrupt>
	a:	22 c0       	rjmp	.+68     	; 0x50 <__bad_interrupt>
	c:	21 c0       	rjmp	.+66     	; 0x50 <__bad_interrupt>
	e:	20 c0       	rjmp	.+64     	; 0x50 <__bad_interrupt>
	10:	1f c0       	rjmp	.+62     	; 0x50 <__bad_interrupt>
	12:	1e c0       	rjmp	.+60     	; 0x50 <__bad_interrupt>
	14:	1d c0       	rjmp	.+58     	; 0x50 <__bad_interrupt>
	16:	1c c0       	rjmp	.+56     	; 0x50 <__bad_interrupt>
	18:	1b c0       	rjmp	.+54     	; 0x50 <__bad_interrupt>
	1a:	1a c0       	rjmp	.+52     	; 0x50 <__bad_interrupt>
	1c:	19 c0       	rjmp	.+50     	; 0x50 <__bad_interrupt>
	

When a specific event happens (for example, power up, watchdog timeout, timer overflow, ADC process finished), if the ISR of that event and the global event enable is enabled, the hardware will jump to the program at the associated address of that specific event.

In this example, the first vector at program address 0x0000 is for reset, which leads to the vector _ctros_end at program address 0x001E. Followed by other 14 vectors like timer event, ADC events, communication events, all lead to vector __bad_interrupt at program address 0x0050.

AVR-GCC will generate a full vector table if a machine is specified, although some vector is not used. In this example, only the first vector (reset) is used; but AVR-GCC creates vectors for all other 14 events. If we can remove the unused vectors (link by ourselves), we can save some program space.

AVR-GCC copies the vector table from a pre-built object file at avr-gnu/avr/lib/[family]/[tiny-stack/]crt[machine].o, where family in the directory name is the family name, tiny-machine in the directory name is presented by chips with less than 256 bytes of memory (stack pointer has only 1 byte), machine in the file name specifies the name of the chip. In this example, we are using ATtiny25 from the avr25 family and with 128 bytes of memory, so, we can find this file at avr-gnu/avr/lib/avr25/tiny-stack/crtattiny25.o. Dump this file with command avr-objdump -m avr25 -dx crtattiny25.o, we get:


SYMBOL TABLE:
00000000 g       .vectors	00000000 __vectors
00000000  w      .text	00000000 __vector_1
00000000 g       .text	00000000 __bad_interrupt
00000000  w      .text	00000000 __vector_2
00000000  w      .text	00000000 __vector_3
00000000  w      .text	00000000 __vector_4
00000000  w      .text	00000000 __vector_5
00000000  w      .text	00000000 __vector_6
00000000  w      .text	00000000 __vector_7
00000000  w      .text	00000000 __vector_8
00000000  w      .text	00000000 __vector_9
00000000  w      .text	00000000 __vector_10
00000000  w      .text	00000000 __vector_11
00000000  w      .text	00000000 __vector_12
00000000  w      .text	00000000 __vector_13
00000000  w      .text	00000000 __vector_14


Disassembly of section .vectors:

00000000 <__vectors>:
	0:	00 c0       	rjmp	.+0      	; 0x2 <__vectors+0x2>
			0: R_AVR_13_PCREL	__init
	2:	00 c0       	rjmp	.+0      	; 0x4 <__vectors+0x4>
			2: R_AVR_13_PCREL	__vector_1
	4:	00 c0       	rjmp	.+0      	; 0x6 <__vectors+0x6>
			4: R_AVR_13_PCREL	__vector_2
	6:	00 c0       	rjmp	.+0      	; 0x8 <__vectors+0x8>
			6: R_AVR_13_PCREL	__vector_3
	8:	00 c0       	rjmp	.+0      	; 0xa <__vectors+0xa>
			8: R_AVR_13_PCREL	__vector_4
	a:	00 c0       	rjmp	.+0      	; 0xc <__vectors+0xc>
			a: R_AVR_13_PCREL	__vector_5
	c:	00 c0       	rjmp	.+0      	; 0xe <__vectors+0xe>
			c: R_AVR_13_PCREL	__vector_6
	e:	00 c0       	rjmp	.+0      	; 0x10 <__vectors+0x10>
			e: R_AVR_13_PCREL	__vector_7
	10:	00 c0       	rjmp	.+0      	; 0x12 <__vectors+0x12>
			10: R_AVR_13_PCREL	__vector_8
	12:	00 c0       	rjmp	.+0      	; 0x14 <__vectors+0x14>
			12: R_AVR_13_PCREL	__vector_9
	14:	00 c0       	rjmp	.+0      	; 0x16 <__vectors+0x16>
			14: R_AVR_13_PCREL	__vector_10
	16:	00 c0       	rjmp	.+0      	; 0x18 <__vectors+0x18>
			16: R_AVR_13_PCREL	__vector_11
	18:	00 c0       	rjmp	.+0      	; 0x1a <__vectors+0x1a>
			18: R_AVR_13_PCREL	__vector_12
	1a:	00 c0       	rjmp	.+0      	; 0x1c <__vectors+0x1c>
			1a: R_AVR_13_PCREL	__vector_13
	1c:	00 c0       	rjmp	.+0      	; 0x1e <__FUSE_REGION_LENGTH__+0x1b>
			1c: R_AVR_13_PCREL	__vector_14
	

This pre-built object file provides a set of weak symbols for vectors. If user provides a required ISR (for example, ISR (TIMER0_COMPA_vect) {...}), then, AVR-GCC will place the program address of the user supplied ISR to the vector table; otherwise, AVR-GCC will use the program address of the weak symbol __bad_interrupt.

Relative jump rjmp, which is 16 bits long (12-bit signed address), can address the entire program space for devices with less than 4K words of program memory, such as ATtiny25, ATtiny84. On these devices, each vector is 1-word long. On the other hand, devices with more than 4K program words, such as ATmega328, use jump jmp instruction, which is 32 bits long (22-bit address), and can address the entire 4M words program space. On these devices, each vector is 2-word long.

Program in Flash - Initialize stack and zero register


0000001e <__ctors_end>:
	1e:	11 24       	eor	r1, r1
	20:	1f be       	out	0x3f, r1	; 63
	22:	cf ed       	ldi	r28, 0xDF	; 223
	24:	cd bf       	out	0x3d, r28	; 61
	

Clear register R1. AVR-GCC uses R1 as zero register (the value in this register is always 0, it helps with arithmetic operations such as comparing with 0 and adding with 0).

Reset SREG (Status register, IO address 0x3F).

Set SP (stack pointer, IO address 0x3E:0x3D) to top of SRAM (0x00DF for ATtiny25). For devices with more than 256 bytes of memory space, higher byte of the SP will be set.

AVR-GCC copies this section from the pre-built object file at avr-gnu/avr/lib/[family]/[tiny-stack/]crt[machine].o:


Disassembly of section .init2:

00000000 <.init2>:
	0:	11 24       	eor	r1, r1
	2:	1f be       	out	0x3f, r1	; 63
	4:	c0 e0       	ldi	r28, 0x00	; 0
			4: R_AVR_LO8_LDI	__stack
	6:	cd bf       	out	0x3d, r28	; 61
	

AVR-LibC Memory Sections syas, this section is used to In C programs, weakly bound to initialize the stack, and to clear zero_reg (r1).

Program in Flash - Initialize .data and .bss segment


00000026 <__do_copy_data>:
	26:	10 e0       	ldi	r17, 0x00	; 0
	28:	a0 e6       	ldi	r26, 0x60	; 96
	2a:	b0 e0       	ldi	r27, 0x00	; 0
	2c:	e2 e8       	ldi	r30, 0x82	; 130
	2e:	f0 e0       	ldi	r31, 0x00	; 0
	30:	02 c0       	rjmp	.+4      	; 0x36 <__do_copy_data+0x10>
	32:	05 90       	lpm	r0, Z+
	34:	0d 92       	st	X+, r0
	36:	a2 36       	cpi	r26, 0x62	; 98
	38:	b1 07       	cpc	r27, r17
	3a:	d9 f7       	brne	.-10     	; 0x32 <__do_copy_data+0xc>

0000003c<__do_clear_bss>:
	3c:	20 e0       	ldi	r18, 0x00	; 0
	3e:	a2 e6       	ldi	r26, 0x62	; 98
	40:	b0 e0       	ldi	r27, 0x00	; 0
	42:	01 c0       	rjmp	.+2      	; 0x46 <.do_clear_bss_start>
	
00000044 <.do_clear_bss_loop>:
	44:	1d 92       	st	X+, r1
	
00000046 <.do_clear_bss_start>:
	46:	a3 36       	cpi	r26, 0x63	; 99
	48:	b2 07       	cpc	r27, r18
	4a:	e1 f7       	brne	.-8      	; 0x44 <.do_clear_bss_loop>
	

Copy the initial values for the static variables in the .data segment. Clear the space for static variables in the .bss segment.

Program in Flash - Entry point


	4c:	02 d0       	rcall	.+4      	; 0x52 <main>
	4e:	17 c0       	rjmp	.+46     	; 0x7e <_exit>

00000050 <__bad_interrupt>:
	50:	d7 cf       	rjmp	.-82     	; 0x0 <__vectors>
	

Initialization finished. Call the user supplied main function at address 0x0052. Once the main function returns, jump to library supplied dead loop at address 0x007E.

AVR-GCC copies this section from the pre-built object file at avr-gnu/avr/lib/[family]/[tiny-stack/]crt[machine].o:


Disassembly of section .init2:

00000000 <.init9>:
	0:	00 d0       	rcall	.+0      	; 0x2 <.init9+0x2>
			0: R_AVR_13_PCREL	main
	2:	00 c0       	rjmp	.+0      	; 0x4 <__FUSE_REGION_LENGTH__+0x1>
			2: R_AVR_13_PCREL	exit
	

AVR-LibC Memory Sections syas, this section is used to Jumps into main().

If undefined ISR fired, jump to reset vector.

Program in Flash - User code


void main() {
	volatile char z  = 'Z';
	DDRB = 0b00111111;
	PORTB = 0b00111111;
}
		

00000052 <main>:
	52:	cf 93       	push	r28
	54:	df 93       	push	r29
	56:	1f 92       	push	r1
	58:	cd b7       	in	r28, 0x3d	; 61
	5a:	dd 27       	eor	r29, r29
	5c:	8a e5       	ldi	r24, 0x5A	; 90
	5e:	89 83       	std	Y+1, r24	; 0x01
	60:	87 e3       	ldi	r24, 0x37	; 55
	62:	90 e0       	ldi	r25, 0x00	; 0
	64:	2f e3       	ldi	r18, 0x3F	; 63
	66:	fc 01       	movw	r30, r24
	68:	20 83       	st	Z, r18
	6a:	88 e3       	ldi	r24, 0x38	; 56
	6c:	90 e0       	ldi	r25, 0x00	; 0
	6e:	2f e3       	ldi	r18, 0x3F	; 63
	70:	fc 01       	movw	r30, r24
	72:	20 83       	st	Z, r18
	74:	00 00       	nop
	76:	0f 90       	pop	r0
	78:	df 91       	pop	r29
	7a:	cf 91       	pop	r28
	7c:	08 95       	ret
		

Following tasks are performed:

  1. Preserve call-saved registers by pushing into the stack. Allocate space for local variables in the stack.
  2. Load value ‘Z’ (ASCII code 0x5A) from instruction (immediate addressing mode) into register R24, then use index register Y (R29:R28) as frame pointer to set the local variable.
  3. Load 0b00111111 (0x3F) from instruction into register R18. Load memory space address of IO register DDRB (0x0037) into register pair R25:R24, then transfer this address into index register Z (R31:R30). Use index register Z to assign DDRB.
  4. Load 0b00111111 (0x3F) from instruction into register R18. Load memory space address of IO register PORTB (0x0038) into register pair R25:R24, then transfer this address into index register Z (R31:R30). Use index register Z to assign PORTB.
  5. Restore call-saved registers, deallocate stack and return.

For ATtiny25, the IO space addresses of DDRB and PORTB are 0x17 and 0x18, respectively. The memory space addresses of DDRB and PORTB are 0x0037 and 0x0038, respectively.

Program in Flash - End


0000007e <_exit>:
	7e:	f8 94       	cli

00000080 <__stop_program>:
	80:	ff cf       	rjmp	.-2      	; 0x80 <__stop_program>
	

Clear global interrupt flag to disable all interrupts. Stop the execution here using a dead loop.

Compile with optimization

In this example, we compile the source C code without any optimization. No doubt, the generated code is not elegant at all.


avr-gcc -O1 t25-hello.c -mmcu=attiny25 -o t25-hello.o
avr-objdump -Dsx t25-hello.o
	

Next, let’s compile with optimization on (-O1 for basic optimization). The section for the main() function changes to:


void main() {
	volatile char z  = 'Z';
	DDRB = 0b00111111;
	PORTB = 0b00111111;
}
		

00000052 <main>:
	52:	cf 93       	push	r28
	54:	df 93       	push	r29
	56:	1f 92       	push	r1
	58:	cd b7       	in	r28, 0x3d	; 61
	5a:	dd 27       	eor	r29, r29
	5c:	8a e5       	ldi	r24, 0x5A	; 90
	5e:	89 83       	std	Y+1, r24	; 0x01
	60:	8f e3       	ldi	r24, 0x3F	; 63
	62:	87 bb       	out	0x17, r24	; 23
	64:	88 bb       	out	0x18, r24	; 24
	66:	0f 90       	pop	r0
	68:	df 91       	pop	r29
	6a:	cf 91       	pop	r28
	6c:	08 95       	ret
		

As we can see in the disassemble code, AVR-GCC is still using call-saved index register Y (R29:R28) as the frame pointer. It would be better if index register Z can be used in this case. However, with the optimization, AVR-GCC uses the fast and small out instruction to access IO registers DDRB and PORTB and only load the value 0b00111111 once.

All other sections remain the same except some address change due to size of the main() function change. These codes are not compiled, but linked from a pre-built object file.

Function and ISR (Interrupt Service Routine)

Let's start from a simple C code:


routine.c

#include <avr/io.h>
#include <avr/pgmspace.h>
#include <avr/interrupt.h>

volatile const uint8_t prog_data[] PROGMEM = {2, 3, 5, 7};

volatile uint8_t static_d1 = 11, static_d2;

void func(uint8_t dir, uint8_t mask) {
	volatile uint8_t _mask = 0b00111111;
	DDRB = _mask & mask & dir;
}

void main() {
	func(0b01010101, 0xFF);
	static_d2 = pgm_read_byte(&(prog_data[2]));
	for(;;);
}

ISR (WDT_vect) {
	PINB = 0b00111111;
}
	

We create an array of uint8_t (8-bit unsigned integer) prog_data saved in the program space. We use the PROGMEM keyword to tell the compiler to not place them in the data segment in the RAM, but to store them in the flash. Assign the first four prime numbers as their value.

We also create 2 static variables: uint8_t static_d1 with initial value 1, this variable should be placed in the .data segment; uint8_t static_d2 without initial value, this variable should be placed in the .bss segment.

A child function func(). This function accepts 2 parameters uint8_t dir and uint8_t mask. In the function, we load a local variable uint8_t _mask with value 0b00111111. We should expect this local variable in stack and be accessed using a frame pointer. Then, we bitwise AND it with the given two parameters and write the result to the IO register DDRB.

The main() function. We first call the child function func() with two parameters, 0b01010101 for dir and 0xFF for mask. After thes, we load the 2nd (or the 3rd if count from 1) element of the array prog_data in memory space in flash to static variable static_d2 in data space in RAM. At the end, we use a dead loop for(;;); to stop the execution of the program.

Furthermore, we create the ISR for WDT (Watchdog Timeout). This ISR should be fired when the watchdog is not reset in a given time period. In general, we should set up the watchdog first to be able to enable it and insert watchdog reset instruction wdr in the program to properly use it; in this example, to simplify the process, we are gonna ignore them. In this ISR, we write a value to IO register PINB.


avr-gcc -O1 routine.c -o routine.o
avr-objdump -m avr25 -Dxs routine.o
	

Next, let’s compile with basic optimization and disassemble, we get:

Static data


volatile uint8_t static_d1 = 11, static_d2;
		

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
	0 .text         000000b4  00000000  00000000  00000094  2**1
			CONTENTS, ALLOC, LOAD, READONLY, CODE
	1 .data         00000002  00800060  000000b4  00000148  2**0
			CONTENTS, ALLOC, LOAD, DATA
	2 .bss          00000001  00800062  00800062  0000014a  2**0
			ALLOC

SYMBOL TABLE:
00800060 l    d  .data	00000000 .data
00800062 l    d  .bss	00000000 .bss
00800062 g     O .bss	00000001 static_d2
00800060 g     O .data	00000001 static_d1
		

The above shows the partial content of the symbol table and the section table.

Static variable static_d1 is 1-byte long, residing in the .data segment. The .data segment consumes 2 bytes (due to align) starts at the beginning of the RAM, which is at address0x60 for ATtiny25.

After .data segment is the .bss segment starts at address 0x62 and consumes 1 byte of space. In this section, we can find static variable static_d2 which is 1-byte long.

Program in Flash - Vector table


Disassembly of section .text:

00000000 <__vectors>:
	0:	10 c0       	rjmp	.+32     	; 0x22 <__ctors_end>
	2:	28 c0       	rjmp	.+80     	; 0x54 <__bad_interrupt>
	4:	27 c0       	rjmp	.+78     	; 0x54 <__bad_interrupt>
	6:	26 c0       	rjmp	.+76     	; 0x54 <__bad_interrupt>
	8:	25 c0       	rjmp	.+74     	; 0x54 <__bad_interrupt>
	a:	24 c0       	rjmp	.+72     	; 0x54 <__bad_interrupt>
	c:	23 c0       	rjmp	.+70     	; 0x54 <__bad_interrupt>
	e:	22 c0       	rjmp	.+68     	; 0x54 <__bad_interrupt>
	10:	21 c0       	rjmp	.+66     	; 0x54 <__bad_interrupt>
	12:	20 c0       	rjmp	.+64     	; 0x54 <__bad_interrupt>
	14:	1f c0       	rjmp	.+62     	; 0x54 <__bad_interrupt>
	16:	1e c0       	rjmp	.+60     	; 0x54 <__bad_interrupt>
	18:	36 c0       	rjmp	.+108    	; 0x86 <__vector_12>
	1a:	1c c0       	rjmp	.+56     	; 0x54 <__bad_interrupt>
	1c:	1b c0       	rjmp	.+54     	; 0x54 <__bad_interrupt>
	

The vector table. The first vector which is for reset led to __ctors_end at program address 0x0022, others led to __bad_interrupt at program address 0x0054. Pretty much the same as what we saw in the previous example, except the vector 12, which led to __vector_12. This is because in this example, we provide ISR for the WDT event.

We can also check the symbol table:


SYMBOL TABLE:
00000054  w      .text	00000000 __vector_1
00000094 g     F .text	0000001c __vector_12
00000054 g       .text	00000000 __bad_interrupt
00000054  w      .text	00000000 __vector_6
00000054  w      .text	00000000 __vector_3
00000054  w      .text	00000000 __vector_11
00000054  w      .text	00000000 __vector_13
00000054  w      .text	00000000 __vector_7
00000054  w      .text	00000000 __vector_5
00000054  w      .text	00000000 __vector_4
00000054  w      .text	00000000 __vector_9
00000054  w      .text	00000000 __vector_2
00000054  w      .text	00000000 __vector_8
00000054  w      .text	00000000 __vector_14
00000054  w      .text	00000000 __vector_10
	

Because we have ISR (WDT_vect) {...} in the C source code, __vector_12 changes from a weak symbol to a global, strong and function symbol.

Program data


volatile const uint8_t prog_data[] PROGMEM = {2, 3, 5, 7};
		

0000001e <__trampolines_end>:
	1e:	02 03       	mulsu	r16, r18
	20:	05 07       	cpc	r16, r21
		

Data saved in program space using the PROGMEM keyword. In this example, we create the array volatile const uint8_t prog_data[] PROGMEM = {2, 3, 5, 7};, the values are saved here. The disassembled instruction has no meaning.

Address of the program data can be found in the symbol table:


SYMBOL TABLE:
0000001e g     O .text	00000004 prog_data
	

Program in Flash - Initialize stack and zero register


00000022 <__ctors_end>:
	22:	11 24       	eor	r1, r1
	24:	1f be       	out	0x3f, r1	; 63
	26:	cf ed       	ldi	r28, 0xDF	; 223
	28:	cd bf       	out	0x3d, r28	; 61
	

Clear register R1. Reset SREG (Status register, IO address 0x3F). Set SP (stack pointer, IO address 0x3E:0x3D) to top of SRAM (0x00DF for ATtiny25).

Program in Flash - Initialize .data and .bss segment


0000002a <__do_copy_data>:
	2a:	10 e0       	ldi	r17, 0x00	; 0
	2c:	a0 e6       	ldi	r26, 0x60	; 96
	2e:	b0 e0       	ldi	r27, 0x00	; 0
	30:	e4 eb       	ldi	r30, 0xB4	; 180
	32:	f0 e0       	ldi	r31, 0x00	; 0
	34:	02 c0       	rjmp	.+4      	; 0x3a <__do_copy_data+0x10>
	36:	05 90       	lpm	r0, Z+
	38:	0d 92       	st	X+, r0
	3a:	a2 36       	cpi	r26, 0x62	; 98
	3c:	b1 07       	cpc	r27, r17
	3e:	d9 f7       	brne	.-10     	; 0x36 <__do_copy_data+0xc>

00000040 <__do_clear_bss>:
	40:	20 e0       	ldi	r18, 0x00	; 0
	42:	a2 e6       	ldi	r26, 0x62	; 98
	44:	b0 e0       	ldi	r27, 0x00	; 0
	46:	01 c0       	rjmp	.+2      	; 0x4a <.do_clear_bss_start>

00000048 <.do_clear_bss_loop>:
	48:	1d 92       	st	X+, r1

0000004a <.do_clear_bss_start>:
	4a:	a3 36       	cpi	r26, 0x63	; 99
	4c:	b2 07       	cpc	r27, r18
	4e:	e1 f7       	brne	.-8      	; 0x48 <.do_clear_bss_loop>
	

Copy the initial values for the static variables in the .data segment. Clear the space for static variables in the .bss segment.

Program in Flash - Entry point


	50:	11 d0       	rcall	.+34     	; 0x74 <main>
	52:	2e c0       	rjmp	.+92     	; 0xb0 <_exit>

00000054 <__bad_interrupt>:
	54:	d5 cf       	rjmp	.-86     	; 0x0 <__vectors>
	

Initialization finished. Call the user supplied main function at address 0x0074. Once the main function returns, jump to library supplied dead loop at address 0x00B0.

If undefined ISR fired, jump to reset vector.

Program in Flash - Child function


void func(uint8_t dir, uint8_t mask) {
	volatile uint8_t _mask = 0b00111111;
	DDRB = _mask & mask & dir;
}
		

00000056 <func>:
	56:	cf 93       	push	r28
	58:	df 93       	push	r29
	5a:	1f 92       	push	r1
	5c:	cd b7       	in	r28, 0x3d	; 61
	5e:	dd 27       	eor	r29, r29
	60:	9f e3       	ldi	r25, 0x3F	; 63
	62:	99 83       	std	Y+1, r25	; 0x01
	64:	99 81       	ldd	r25, Y+1	; 0x01
	66:	89 23       	and	r24, r25
	68:	68 23       	and	r22, r24
	6a:	67 bb       	out	0x17, r22	; 23
	6c:	0f 90       	pop	r0
	6e:	df 91       	pop	r29
	70:	cf 91       	pop	r28
	72:	08 95       	ret
		

Following tasks are performed:

  1. Preserve call-saved registers by pushing into the stack. Allocate space for local variables in the stack.
  2. Load 0b00111111 (0x3F) from instruction into register R25, then use index register Y (R29:R28) as frame pointer to set the local variable _mask.
  3. Local variable _mask is still in register R25. Bitwise AND it with parameters dir in R24 and mask in R22. Next, write the result in R22 to IO register DDRB at IO address 0x17.
  4. Restore call-saved registers, deallocate stack and return.

Program in Flash - Main function


void main() {
	func(0b01010101, 0xFF);
	volatile uint8_t temp = pgm_read_byte(&(prog_data[2]));
	static_d2 = temp;
	for(;;);
}
		

00000074 <main>:
	74:	cf 93       	push	r28
	76:	df 93       	push	r29
	78:	1f 92       	push	r1
	7a:	cd b7       	in	r28, 0x3d	; 61
	7c:	dd 27       	eor	r29, r29
	7e:	6f ef       	ldi	r22, 0xFF	; 255
	80:	85 e5       	ldi	r24, 0x55	; 85
	82:	e9 df       	rcall	.-46     	; 0x56 <func>
	84:	e0 e2       	ldi	r30, 0x20	; 32
	86:	f0 e0       	ldi	r31, 0x00	; 0
	88:	e4 91       	lpm	r30, Z
	8a:	e9 83       	std	Y+1, r30	; 0x01
	8c:	89 81       	ldd	r24, Y+1	; 0x01
	8e:	80 93 62 00 	sts	0x0062, r24	; 0x800062 <__data_end>
	92:	ff cf       	rjmp	.-2      	; 0x92 <__DATA_REGION_LENGTH__+0x12>
		

Following tasks are performed:

  1. Preserve call-saved registers by pushing into the stack. Allocate space for local variables in the stack.
  2. Load stack pointer into index register Y (R29:R28) to build the frame pointer.
  3. Load parameters dir in R24 and mask in R22, then call the child function func().
  4. Load address of program data prog_data into index register Z (R31:R30). Load the data use load program data instruction lpm Rd, Z to load the iterested data into register R30; then, store into locaal variable temp in stack.
  5. Load locaal variable temp into R24, then write to static variable static_d2 at memory address 0x0062.
  6. Use a dead loop to stop execution.

Program in Flash - Interrupt Service Routine


ISR (WDT_vect) {
	PINB = 0b00111111;
}
		

00000094 <__vector_12>:
	94:	1f 92       	push	r1
	96:	0f 92       	push	r0
	98:	0f b6       	in	r0, 0x3f	; 63
	9a:	0f 92       	push	r0
	9c:	11 24       	eor	r1, r1
	9e:	8f 93       	push	r24
	a0:	8f e3       	ldi	r24, 0x3F	; 63
	a2:	86 bb       	out	0x16, r24	; 22
	a4:	8f 91       	pop	r24
	a6:	0f 90       	pop	r0
	a8:	0f be       	out	0x3f, r0	; 63
	aa:	0f 90       	pop	r0
	ac:	1f 90       	pop	r1
	ae:	18 95       	reti
		

Following tasks are performed:

  1. Preserve SREG and all registers used in the ISR by pushing them into the stack.
  2. Clear R1. Although R1 is zero register and should always be 0, the ISR may be fired when the R1 is temporarily override. Therefore, we need to re-clear R1 for safe.
  3. Load value 0x3F into R24, then write to IO register PINB at IO address 0x16.
  4. Restore call-saved registers, deallocate stack and return.

Program in Flash - End


000000b0 <_exit>:
	b0:	f8 94       	cli

000000b2 <__stop_program>:
	b2:	ff cf       	rjmp	.-2      	; 0xb2 <__stop_program>
	

Clear global interrupt flag to disable all interrupts. Stop the execution here using a dead loop.