AVR Naked Interrupt Service Routine

Using naked interrupt service routine (ISR) in AVR development

AVR, ISR, interrupt service routine, ISR_NAKED, Assembly, avr-asm

--by Captdam @ Sep 1, 2024 ~~Sep 1, 2024~~

[en] Here is the English version of this article

Non-naked ISR

In MCU, interrupt service routine (ISR) is a mini program executed at the middle of executing the main routine. When an interrupt request rised, the hardware will stop the current process, remember the current process (by recording some information in the stack), and then pass the control to the ISR. The ISR finished by executing a RETI instruction in the ISR, which let the hardware to pass the control back to the previous process.

The current process does not expect any change made by the ISR except variables marked volatile. Therefore, most application binary interfaces (ABI) require the ISR to restore any register modified by the ISR. For some MCU like the 68HC11, registers are pushed into stack by the hardware before entering the ISR; however, this is not the case for AVR. For AVR, the hardware only push the program counter (PC) into the stack; everything eles must be saved manually if it will be modified by the ISR, those are:

Registers R0 to R31, if will be used.
Status register SREG, which represents flags (e.g. Zero flag, overflow, global interrupt enable...).
Stack pointer, if the ISR dose push/pop data into stack.

The GNU AVR C compiler will always generate a prologue and an epilogue.

Following is an empty ISR:


ISR (TIMER0_COMPA_vect) {

}

The GNU AVR C compiler will generete the machine code even with optimization on (-O2 or -O3):


00000276 <__vector_14>:
	276:	1f 92       	push	r1
	278:	0f 92       	push	r0
	27a:	0f b6       	in	r0, 0x3f	; 63
	27c:	0f 92       	push	r0
	27e:	11 24       	eor	r1, r1
	280:	0f 90       	pop	r0
	282:	0f be       	out	0x3f, r0	; 63
	284:	0f 90       	pop	r0
	286:	1f 90       	pop	r1
	288:	18 95       	reti

In the generated machien code, the following tasks are preformed:

Save R0 and R1. R0 can be used as a temp register; R1 can be used as a zero register.
Save SREG by copy it into register and then push that register. SREG may be modified by some instructions such as EOR, ADD.
Clear R1, so the content in R1 become zero.
Execute actual ISR code.
Restore SREG, R0 and R1.
Execute RETI to return from ISR.

If the ISR uses any register other than R0 and R1, the compiler will generate code to save and restore those registers as well.

Naked ISR

It is possible to prevent the compiler from generate the prologue and the epilogue by adding ISR_NAKED flag.

In most case, we should not use the ISR_NAKED flag because the prologue and the epilogue are here to help us write functionality correct code; however, we sometimes may run into performance issue, or we just want to have some fun, play some trick.

For example, there is an ISR. In the ISR, we reset a value when the timer compare matches:


volatile uint8_t value;
ISR (TIMER0_COMPA_vect) {
	value = 20;
}

Now, let's compile this code. We get the following code with -O3 or -O2 flag:


00000276 <__vector_14>:
	276:	1f 92       	push	r1
	278:	0f 92       	push	r0
	27a:	0f b6       	in	r0, 0x3f	; 63
	27c:	0f 92       	push	r0
	27e:	11 24       	eor	r1, r1
	280:	8f 93       	push	r24
	282:	84 e1       	ldi	r24, 0x14	; 20
	284:	80 93 a0 01 	sts	0x01A0, r24	; 0x8001a0 <value>
	288:	8f 91       	pop	r24
	28a:	0f 90       	pop	r0
	28c:	0f be       	out	0x3f, r0	; 63
	28e:	0f 90       	pop	r0
	290:	1f 90       	pop	r1
	292:	18 95       	reti

We get 14 instructions which consumes 15 words in this ISR. This code does all the job correctly, but we are not happy with the preformance.

The compiler saved and restored temp register (R0), but not used.
The compiler saved, cleared and restored the zero register (R1), but not used.
Because of the EOR R1, R1 instruction used to clear zero register, SREG is modified; therefore, SREG has to be saved and restored.

Since we are just load a constant value into the variable value, we can rewrite this ISR in assembly code to save on performace:


volatile uint8_t value;
ISR (TIMER0_COMPA_vect, ISR_NAKED) {
	asm volatile (
		"	PUSH	R16 \n"
		"	LDI	R16, 20 \n"
		"	STS	%[var], %[dat] \n"
		"	POP	R16 \n"
		"	RETI	\n"
		:
		: [var] "m"(value)
		, [dat] "I"(20)
	);
}

Now, let's compile this code. We get the following code:


00000276 <__vector_14>:
	276:	0f 93       	push	r16
	278:	04 e1       	ldi	r16, 0x14	; 20
	27a:	40 93 a0 01 	sts	0x01A0, r20	; 0x8001a0 <value>
	27e:	0f 91       	pop	r16
	280:	18 95       	reti

Here is what we did:

We saved a upper register so we can use that register int our ISR. For AVR, we can only assign constant value to the upper 16 registers (R16 to R31).
We load the value into that register, then write that register into the memory where we save our variable.
We restore that register and pass the control back.

In the new naked ISR, we perform the task in only 5 instruction and 6 words, that's more than 100% performance boost.

When to use naked ISR and naked function?

Generally speaking, we should avoid naked ISR and naked function as much as possible. Life is so good with the automatically generated code. But, who don't want to have some fun?

Disclaimer: This is my oppion.

For performance issue, I will sometimes use naked ISR. But for naked function, I will say no in most case.

Because ISR needs to be small and fast. Therefore, I will need to make the ISR naked and write some assembly code. On the other hand, functions in the main routine do not have the strict timing requirement like ISRs do; therefore, I am happy with lower performance.

The ABI for ISR is simple: restore whatever you used in ISR. So, house-keeping in ISR is simple. For function call, the ABI is complex, there are call-saved registers, there are call-used registers, there are calling argument registers and stack, there are return registers and stack. Different compilers get different ABI, it is a mess. So, I say no to naked function, unless I have to.

A little bit of history

In the early era, when memory in MCU was expensive and CPU only had one or few registers connected to ALU, developers must fit the code in limited space.

To save code memory space, the hardware will push everything into the stack automatically when an ISR is involved. This eliminates the need to write code to push registers into stack; however, a few clock cycles are consumed to push the registers into stack, resulting in slower ISR response speed. For example, upon an interrupt request, 68HC11 will consume machine cycles to push all accumulators (2 bytes), all index registers (2 by 2 bytes), the stack pointer (2 bytes) and the state register (1 byte) into the stack. The reverse tasks will be performed when returning from interrupt.

As we can see, the machine automatically pushes and pulls 9 bytes into and from the stack. That’s a lot of code to save into ROM if you push and pull manually. However, the cost is, it consumes machine cycles. Even in some cases, we don’t need the registers in our ISR, the machine will consume cycles and save them. As a compensation, 68HC11 has a special instruction to pre-save the registers into stack to reduce the ISR latency.

Nowadays, as the manufacturing process evolutions, memory is no longer that expensive in MCU. On the other hand, modern RISC CPUs tend to have more registers (2 8-bit registers for old CISC 68HC11, 32 8-bit registers for modern RISC AVR), it is no longer practical to push all registers into the stack.

Since there are far more than sufficient code memory space and there are too many registers that should be considered to be saved in ISR, the design principle changed. Instead of automatically pushing all registers into the stack, the developer gets the control to decide what to save. This provides not only faster ISR response, but also less stack consumption. The cost is more code in ISR to push and pull registers.