Raspberry Pi Pico (RP2040) SRAM and Flash Programming

Program the Pico (RP2040) from the perspective of an 8-bit MCU developer’s view, focus on details in SRAM and flash memory programming. Analysis the SDK 2nd stage bootloader.

--by Captdam @ Sep 16, 2025 Aug 27, 2025

Index

I have been working with 8-bit MCUs like 8051 and AVR for a long time; but new to 32-bit MCU like the RP2040 MCU on Raspberry Pi Pico board. In this blog, I will record my first trial on Pico from the perspective of an 8-bit MCU developer’s view. I will be focused on programming the on-chip run-time memory and off-chip flash memory.

I will be directly writing / reading the MCU registers instead of using libraries. This allows me to touch the actual system control registers and explore the full functionality of the MCU.

Difference Between 8-bit MCU and 32-bit MCU

8-bit MCUs in Harvard Architecture

Harvard Architecture
Harvard Architecture Found on ATtiny24

8-bit MCUs like 8051 and AVR are in Harvard architecture. The instruction (program) bus is connected to the program memory, the data bus is connected to the data memory. They are not crossed.

The program is saved on on-chip program memory, and in most cases, the on-chip flash. When we download the binary to MCU, we burn the program to the on-chip flash. When the MCU starts, the CPU directly reads the on-chip flash.

Other than the flash, there is the data memory. It can be an on-chip SRAM, or an external memory chip, or both. This memory is for data only, the CPU cannot execute from it.

32-bit MCUs in Von Neumann Architecture

Von Neumann Architecture
Von Neumann Architecture Found on RP2040

32-bit MCU like the RP2040 are in Von Neumann architecture. The instruction bus and the data bus are connected to the same memory. There is only one universal memory, the AHB-Lite Crossbar.

More specifically, one memory space. From the CPU’s perspective, it only sees one memory space. Regions in the one universal memory space can be specialized, some allow write and read, some are read only; some can be used for data only, some can be used for program only, some can be used as both (and some for system control registers). The memory bus will map different physical memories into the memory bus.

To execute a program, a bootloader must load the program into the main memory (and in a region that can be used for program) before the CPU can execute it.

RP2040 supports execution in place (XIP). It seems that the CPU can execute from off-chip flash (or any device connected to the QSPI port); however, the XIP hardware must first cache the program into the on-chip memory. Therefore, the CPU is executing the program in the on-chip memory.

HelloWorld.asm

We will write a very simple program that blinks the on-board LED on GPIO25.

Here is a link to my files: asm-blink-sram.zip.

ASM Program

Following is our program source code in assembly language in file main.s.


.cpu cortex-m0plus
.thumb
.align 2
.thumb_func
	

At the beginning, we tell the assembler that we are using Cortex M0+ CPU (the CPU used by RP2040, see key features in Chapter 1 Introduction).

We also let the assembler use the Thumb instruction set (2 bytes long instruction word) and we would like to align the instruction code by 2 bytes.


.global reset
reset:
	ldr	r0, =0x20041000	@ Stack Pointer @ SRAM bank 4
	mov	sp, r0

	ldr	r3, =0x4000f000	@ Reset clear: 0x4000C000 for RESET_BASE + 0x3000 for Atomic bit clear
	mov	r2, #32		@ IO_BANK0 Reset after power-on, need manual release
	str	r2, [r3, #0]

	ldr	r3, =0x400140cc	@ IO_BANK0.GPIO25_CTRL
	mov	r2, #5		@ Function 5 (SIO)
	str	r2, [r3, #0]

	ldr	r3, =0xd0000020	@ SIO_BASE.GPIO_OE: Output or Input
	ldr	r2, =(1<<25)	@ GPIO25
	str	r2, [r3, #0]
	

The first routine is called reset. We will:

  1. We want to use SRAM bank 4 (address 0x20004000 - 0x200040FFF, 4kB) as our stack. The ARM Cortex-M0+ CPU stack pointer indicates the last stacked item on the stack memory; therefore, we need to set the initial SP at the top of the memory region + 1. That is, 0x20041000.
  2. We always prefer to reset the user IOs (IO Bank 0, GPIO 0 - 29) to have a fresh start before we perform any GPIO operation. To do so, we will need to write 1 to the 5th bit in the RESET register. To atomically write 1 to clear a bit, we will add an offset of 0x3000. Therefore, we will write (1<<5) to address 0x4000F000.
  3. To set the functionality of GPIO5, we will need to write GPIO25_CTRL at 0x400140CC. To use it as a simple IO, write 5.
  4. To enable output on GPIO25, write 1 to the 25th bit on GPIO_EN register on 0xD0000020.

Furthermore, we need to make the label reset global. In that way, the reset routine will be visible in the linking stage.


blink:
	ldr	r3, =0xd000001c	@ SIO_BASE.GPIO_XOR
	ldr	r2, =(1<<25)	@ GPIO25
	str	r2, [r3, #0]
	
	ldr	r0, =650000
	bl	delay
	b	blink

delay:
	mov	r4,r0
loop:
	sub	r4,r4,#1
	cmp	r4, #0
	bne	loop
	bx	lr
	

The routine blink is used to blink the LED. To blink the LED on GPIO25, we will xor that bit. To do so, write 1 to 25th bit on GPIO_OUT_XOR register on 0xD000001C.

The routine delay is used to generate some delay between each XOR operation using the loop under the label loop.

These labels / routines don't need to be visible in the linking stage; therefore, don't need to be global.


.align 4
	

It is always a good idea to restore the default alignment of 4 for the 32-bit CPU at the end.

To assemble this code, use arm-none-eabi-as --warn --fatal-warnings -g main.s -o main.o, result in file main.o.

Linker Script

Once assembled, we have to link the program to the target platform.

Following is our linker script in file main.ld.


MEMORY {
	SRAM(rwx) : ORIGIN = 0x20000000, LENGTH = 264k
}

ENTRY(reset)

SECTIONS {
	. = ORIGIN(SRAM);
	.text : {
		*(.text)
	} >SRAM
}
	

In this linker script, we tell the linker:

  1. There is a readable and writeable memory called SRAM, starting at address 0x20000000, that is 264kB long.
  2. Entry point is the routine reset. Start execution from that position.
  3. Put the text segment (program code) in SRAM, starting from the origin (beginning) of SRAM.

To link this program, use arm-none-eabi-ld -nostdlib -nostartfiles -T main.ld main.o -o main.elf, result in file main.elf

UF2 File

Now, the program is ready. To download it to the Pico, we will need to do one more step: pack it into uf2 format to download via USB.

The RP2040 provides a very easy-to-use way to download. Compared to the traditional method that requires a physical programmer (or a special interface not common on modern PCs), RP2040 supports download over USB. Once powered up, RP2040 will represent itself as a mass storage device. This allows the developer to download the program file the same way as copy a file into a USB disk.

To generate the uf2 file, use pico-elf2uf2 main.elf main.uf2, result in file main.uf2.

You can download the SDK source code but you will need to build the SDK to run it on your computer.

You can find the elf2uf2 program in pico-sdk/elf2uf2 (SDK version 1). See source code in /tools/elf2uf2 in SDK1.5.1 on GitHub.

I copied the elf2uf2 program into my environment /bin to use it directly.

We can open the uf2 file in hex mode to check its content, use od -t x4 main.uf2:


Addr    X0       X4       X8       XC	
0000000 0a324655 9e5d5157 00002000 20000000
0000020 00000100 00000000 00000001 e48bff56
0000040 4685480b 22204b0b 4b0b601a 601a2205
0000060 4a0b4b0a 4b0b601a 601a4a09 f000480a
0000100 e7f8f801 3c011c04 d1fc2c00 46c04770
0000120 20041000 4000f000 400140cc d0000020
0000140 02000000 d000001c 0009eb10 00000000
0000160 00000000 00000000 00000000 00000000
*
0000760 00000000 00000000 00000000 0ab16f30
0001000
	

Cross reference the UF2 format, we can understant the content as follow:

UF2 File Content
Offset,Size Name Content Comment
0,8 Magic number: 0x0A324655, 0x9E5D5157 0a324655 9e5d5157
8,4 Flag 00002000 familyID present - when set, the fileSize/familyID holds a value identifying the board family (usually corresponds to an MCU)
12,4 Address in flash where the data should be written 20000000 Same as the linker script ORIGIN(SRAM), beginning of SECTIONS
16,4 Number of bytes used in data (often 256) 00000100 256 bytes long
20,4 Sequential block number; starts at 0 00000000 0, first block
24,4 Total number of blocks in file 00000001 1, 1 block only
28,4 File size or board family ID or zero e48bff56 Family ID for RP2040, see this list
32,476 Data, padded with zeros 4685480b... The program, verify with arm-none-eabi-objdump -d main.elf
508,4 Final magic number: 0x0AB16F30 0ab16f30

Download to Pico

When we power the Pico up, the CPU starts executing the on-chip bootloader (offen called the first stage bootloader). This bootloader program is located at the boot ROM and cannot be changed. This bootloader allows the Pico to be connected to PC via USB as a mass storage device (like a USB disk) for programming.

RP2040 will start as a USB mass storage device if the BOOTSEL button pressed (connected to the on-board Flash CSn pin, RP2040 pin 56), or ther is no valid program in the Flash.

We can program the chip by copying the uf2 program file into the mass storage device.

Once the uf2 file is received, the bootloader will try to copy the content in that file into specified location.

LED on Pico
LED on Pico

We can see the LED on the Pico board blinking.

Download to SRAM vs to Flash

Cut the power, then connect the power. The program disappeared. The Pico starts as a mass storage device. Or, if there is another valid program in the flash, that program will replace the blink program we just downloaded.

Download to SRAM

In our linker script, we have:


MEMORY {
	SRAM(rwx) : ORIGIN = 0x20000000, LENGTH = 264k
}
	

This causes the uf2 file to contain the 0x20000000 as target address. Therefore, the bootloader will copy the content of the uf2 file (which is our program) to address 0x20000000. For RP2040, this address is mapped to SRAM.

The SRAM is volatile. Once the power is lost, its content is lost.

In fact, even if the program is not lost in SRAM, the CPU will always stall in on-chip bootloader stage because no valid program in the flash; or reload the program in flash if valid.

Download to Flash

According to the datasheet, the XIP address starts from 0x10000000. This address maps the external 2MB on-board flash chip.

RP2040’s flash address width allows a size of up to 16MB (0x01000000). The flash is mirrored 4 times, spanning from 0x10000000 to 0x13FFFFFF, for different cache strategies. 0x10000000 - 0x10FFFFFF represents the most basic one, the “acheable, allocating - Normal cache operation”. For the on-chip bootloader, writing to 0x10000000 - 0x10FFFFFF means write to the flash; writing to 0x11000000 - 0x13FFFFFF is not allowed.

To download to the XIP (flash) memory, we will need to link the program to that location in order to let the on-chip bootloader copy our program into that memory region.

Let’s modify the liner script to write to 0x10000000, as follow:


MEMORY {
	Flash(rx) : ORIGIN = 0x10000000, LENGTH = 264k
}

ENTRY(reset)

SECTIONS {
	. = ORIGIN(Flash);
	.text : {
		*(.text)
	} >Flash
}
	

Keep the assembly code source file untouched. Assemble, link, generate the uf2 file, use arm-none-eabi-ld -nostdlib -nostartfiles -T main.ld main.o -o main.elf then pico-elf2uf2 main.elf main.uf2.

If we check the content of the newly generated uf2 file, we can find that the content at file offset 12 (Address in flash where the data should be written) changed to 0x10000000. This tells the on-chip bootloader to write the program to 0x10000000, which is mapped to the on-board flash.

Program the chip by copying the uf2 program file into the mass storage device. It failed. The LED is not blinking. The chip stalls at on-chip bootloader stage. Shortly after Pico received the uf2 file, it presents itself as a mass storage device again.

If there is a valid program in the flash, it will be overwritten.

The Checksum

Pico LED blink
RP2040 Boot Seqence

From the CPU’s perspective of view, there is no programmed, or un-programmed flash. Reading the flash always produces some result. An un-programmed flash will provide random bytes, or all empty bytes (there is always some data).

At power up, the on-chip bootloader will read 256 bytes of data from the on-board flash. Then, it uses checksum to verify the data from the flash.

We had the program in flash, but without a valid checksum. So, the on-chip bootloader failed to load our program.

Download to Flash

Now, we will be looking to program the flash, so the program can be preserved after power lost.

Here is a link to my files: asm-blink-flash.zip.

The Source Code

Before we add the checksum, we have to make a slight modification on our assembly source code:


.cpu cortex-m0plus
.thumb
.align 2
.thumb_func

.section .boot2, "ax"
.global reset
reset:
	ldr	r0, =0x20041000	@ Stack Pointer @ SRAM bank 4
	mov	sp, r0

	ldr	r3, =0x4000f000	@ Reset clear: 0x4000C000 for RESET_BASE + 0x3000 for Atomic bit clear
	mov	r2, #32		@ IO_BANK0 Reset after power-on, need manual release
	str	r2, [r3, #0]

	ldr	r3, =0x400140cc	@ IO_BANK0.GPIO25_CTRL
	mov	r2, #5		@ Function 5 (SIO)
	str	r2, [r3, #0]

	ldr	r3, =0xd0000020	@ SIO_BASE.GPIO_OE: Output or Input
	ldr	r2, =(1<<25)	@ GPIO25
	str	r2, [r3, #0]

blink:
	ldr	r3, =0xd000001c	@ SIO_BASE.GPIO_XOR
	ldr	r2, =(1<<25)	@ GPIO25
	str	r2, [r3, #0]
	ldr	r0, =650000
	bl	delay
	b	blink

delay:
	mov	r4,r0
loop:
	sub	r4,r4,#1
	cmp	r4, #0
	bne	loop
	bx	lr

.align 4
	

We add .section .boot2, “ax”. This is to name the following program code boot2, and indicate the program is allocateable and executable.

Save this source file as boot2_src.s.

Assemble it, use arm-none-eabi-as --warn --fatal-warnings -g boot2_src.s -o boot2_src.o, result in file boot2_src.o


We also need some modification on the linker script:


MEMORY {
	FLASH(rx) : ORIGIN = 0x10000000, LENGTH = 2048k
	SRAM(rwx) : ORIGIN = 0x20000000, LENGTH = 264k
}

SECTIONS {
	. = ORIGIN(FLASH);
	.text : {
		KEEP(*(.boot2))
	} >FLASH
}
	

Store the boot2 section at the beginning of flash.

Save this linker script as boot2_src.ld.

Link it, use arm-none-eabi-ld -nostdlib -nostartfiles -T boot2_src.ld boot2_src.o -o boot2_src.elf, result in file boot2_src.elf.


Generate a binary file containing the program code, use arm-none-eabi-objcopy -O binary boot2_src.elf boot2_src.bin, result in file boot2_src.bin. We can check the content of this binary file using od -t x4 boot2_src.bin:


0000000 4685480b 22204b0b 4b0b601a 601a2205
0000020 4a0b4b0a 4b0b601a 601a4a09 f000480a
0000040 e7f8f801 3c011c04 d1fc2c00 46c04770
0000060 20041000 4000f000 400140cc d0000020
0000100 02000000 d000001c 0009eb10
0000114
	

To be honest, it is not necessary to neither name the section boot2 in source code nor specify the memories and sections in the linker script. They all got destroyed in the binary file.

However, it is a good idea to properly name and link them, so we don’t get confused in the future.

Append the Checksum

The RP2040 document says, the bootloader will load the first 256 bytes from the flash into SRAM bank 5 and verify the checksum:

We will add the checksum using the tool that comes with the SDK. Use pico-pad_checksum -s 0xFFFFFFFF boot2_src.bin boot2.s, result in boot2.s. We can open this assembly source code file in text mode:


// Padded and checksummed version of: boot2_src.bin

.cpu cortex-m0plus
.thumb

.section .boot2, "ax"

.byte 0x0b, 0x48, 0x85, 0x46, 0x0b, 0x4b, 0x20, 0x22, 0x1a, 0x60, 0x0b, 0x4b, 0x05, 0x22, 0x1a, 0x60
.byte 0x0a, 0x4b, 0x0b, 0x4a, 0x1a, 0x60, 0x0b, 0x4b, 0x09, 0x4a, 0x1a, 0x60, 0x0a, 0x48, 0x00, 0xf0
.byte 0x01, 0xf8, 0xf8, 0xe7, 0x04, 0x1c, 0x01, 0x3c, 0x00, 0x2c, 0xfc, 0xd1, 0x70, 0x47, 0xc0, 0x46
.byte 0x00, 0x10, 0x04, 0x20, 0x00, 0xf0, 0x00, 0x40, 0xcc, 0x40, 0x01, 0x40, 0x20, 0x00, 0x00, 0xd0
.byte 0x00, 0x00, 0x00, 0x02, 0x1c, 0x00, 0x00, 0xd0, 0x10, 0xeb, 0x09, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xda, 0x37, 0x23, 0x75
	

As we can see, this file contains the program code (see file boot2_src.bin) plus 4 bytes of checksum at the end of the file. The total length is 256 bytes.

Note that at the top of this assembly code, they had been given the section name boot2. This is because in most cases, this code (the first 256 bytes read by on-chip bootloader) is used as the second stage bootloader, which loads the main program.

You can find the pad_checksum program in pico-sdk/src/rp2_common/boot_stage2 (SDK version 1). See source code in /src/rp2_common/boot_stage2 in SDK1.5.1 on GitHub.

I copied the pad_checksum program into my environment /bin to use it directly.

This tool is a Python script. By default, it uses 256 as pad size, 0 as seed (initial value). We are OK with 256 pad size, but we would like to change the initial value (seed) to 0xFFFFFFFF by using -s 0xFFFFFFFF.

Inside this tool, on line 42, there is (binascii.crc32(bytes(bitrev(b, 8) for b in idata_padded), args.seed ^ 0xffffffff) ^ 0xffffffff) & 0xffffffff, 32).

In the Python document, it states: binascii.crc32: Compute CRC-32, the unsigned 32-bit checksum of data, starting with an initial CRC of value. … The algorithm is consistent with the ZIP file checksum. According to GZIP File Format Specification, page 12, the polynomial in ZIP algorithm is 0xedb88320, which is the reverse of 0x04c11db7 required by RP2040 on-chip bootloader.

UF2 File with Checksum

Finally, we can download the program with proper checksum.

Assemble the padded assembly code, use arm-none-eabi-as --warn --fatal-warnings -g boot2.s -o boot2.o, result in file boot2.o.

Create a new linker script:


MEMORY {
	FLASH(rx) : ORIGIN = 0x10000000, LENGTH = 2048k
	SRAM(rwx) : ORIGIN = 0x20000000, LENGTH = 264k
}

SECTIONS {
	. = ORIGIN(FLASH);
	.text : {
		KEEP(*(.boot2))
	} >FLASH
}
	

This will place the boot2 section (our program with checksum) at the beginning of the flash (mapped to XIP at address 0x10000000). This should request the on-chip bootloader to burn the program to flash after it received the uf2 file.

Link it, use arm-none-eabi-ld -nostdlib -nostartfiles -T main.ld boot2.o -o main.elf, result in file main.elf.

Generate uf2 file, use pico-elf2uf2 main.elf boot2.uf2, result in file boot2.uf2.

Program the chip. Remove the power and reapply the power.

Our program is kept.

SDK 2nd Stage Bootloader - XIP

During the on-chip bootloader stage, only 256 bytes (including the checksum) of program data will be copied into the main memory SRAM bank 5 to execute. That’s far too small to store a reasonable program.

Therefore, these 256 bytes should only contain a bootloader program (commonly called the second stage bootloader, or boot2), which will bring the main program into the main memory.

The SDK boot2 Source Code

To learn the second stage bootloader, let’s check the content of the official SDK’s 2nd stage bootloader, source code can be found here.

However, this source code is not really easy to read. It includes tons of other files; personally, I don’t like it. It is a dependency hell in source files.

Rather, I will disassemble an executable. The one I choose is the blink program in pico-example/blink. Use arm-none-eabi-objdump --disassembler-options=force-thumb -j .boot2 -Dxs blink.elf:


10000000 <__boot2_start__>:
10000000:	b500      	push	{lr}
10000002:	4b32      	ldr	r3, [pc, #200]	; (100000cc <__boot2_start__+0xcc>)
10000004:	2021      	movs	r0, #33	; 0x21
10000006:	6058      	str	r0, [r3, #4]
10000008:	6898      	ldr	r0, [r3, #8]
1000000a:	2102      	movs	r1, #2
1000000c:	4388      	bics	r0, r1
1000000e:	6098      	str	r0, [r3, #8]
10000010:	60d8      	str	r0, [r3, #12]
10000012:	6118      	str	r0, [r3, #16]
10000014:	6158      	str	r0, [r3, #20]
10000016:	4b2e      	ldr	r3, [pc, #184]	; (100000d0 <__boot2_start__+0xd0>)
10000018:	2100      	movs	r1, #0
1000001a:	6099      	str	r1, [r3, #8]
1000001c:	2102      	movs	r1, #2
1000001e:	6159      	str	r1, [r3, #20]
10000020:	2101      	movs	r1, #1
10000022:	22f0      	movs	r2, #240	; 0xf0
10000024:	5099      	str	r1, [r3, r2]
10000026:	492b      	ldr	r1, [pc, #172]	; (100000d4 <__boot2_start__+0xd4>)
10000028:	6019      	str	r1, [r3, #0]
1000002a:	2101      	movs	r1, #1
1000002c:	6099      	str	r1, [r3, #8]
1000002e:	2035      	movs	r0, #53	; 0x35
10000030:	f000 f844 	bl	100000bc <__boot2_start__+0xbc>
10000034:	2202      	movs	r2, #2
10000036:	4290      	cmp	r0, r2
10000038:	d014      	beq.n	10000064 <__boot2_start__+0x64>
1000003a:	2106      	movs	r1, #6
1000003c:	6619      	str	r1, [r3, #96]	; 0x60
1000003e:	f000 f834 	bl	100000aa <__boot2_start__+0xaa>
10000042:	6e19      	ldr	r1, [r3, #96]	; 0x60
10000044:	2101      	movs	r1, #1
10000046:	6619      	str	r1, [r3, #96]	; 0x60
10000048:	2000      	movs	r0, #0
1000004a:	6618      	str	r0, [r3, #96]	; 0x60
1000004c:	661a      	str	r2, [r3, #96]	; 0x60
1000004e:	f000 f82c 	bl	100000aa <__boot2_start__+0xaa>
10000052:	6e19      	ldr	r1, [r3, #96]	; 0x60
10000054:	6e19      	ldr	r1, [r3, #96]	; 0x60
10000056:	6e19      	ldr	r1, [r3, #96]	; 0x60
10000058:	2005      	movs	r0, #5
1000005a:	f000 f82f 	bl	100000bc <__boot2_start__+0xbc>
1000005e:	2101      	movs	r1, #1
10000060:	4208      	tst	r0, r1
10000062:	d1f9      	bne.n	10000058 <__boot2_start__+0x58>
10000064:	2100      	movs	r1, #0
10000066:	6099      	str	r1, [r3, #8]
10000068:	491b      	ldr	r1, [pc, #108]	; (100000d8 <__boot2_start__+0xd8>)
1000006a:	6019      	str	r1, [r3, #0]
1000006c:	2100      	movs	r1, #0
1000006e:	6059      	str	r1, [r3, #4]
10000070:	491a      	ldr	r1, [pc, #104]	; (100000dc <__boot2_start__+0xdc>)
10000072:	481b      	ldr	r0, [pc, #108]	; (100000e0 <__boot2_start__+0xe0>)
10000074:	6001      	str	r1, [r0, #0]
10000076:	2101      	movs	r1, #1
10000078:	6099      	str	r1, [r3, #8]
1000007a:	21eb      	movs	r1, #235	; 0xeb
1000007c:	6619      	str	r1, [r3, #96]	; 0x60
1000007e:	21a0      	movs	r1, #160	; 0xa0
10000080:	6619      	str	r1, [r3, #96]	; 0x60
10000082:	f000 f812 	bl	100000aa <__boot2_start__+0xaa>
10000086:	2100      	movs	r1, #0
10000088:	6099      	str	r1, [r3, #8]
1000008a:	4916      	ldr	r1, [pc, #88]	; (100000e4 <__boot2_start__+0xe4>)
1000008c:	4814      	ldr	r0, [pc, #80]	; (100000e0 <__boot2_start__+0xe0>)
1000008e:	6001      	str	r1, [r0, #0]
10000090:	2101      	movs	r1, #1
10000092:	6099      	str	r1, [r3, #8]
10000094:	bc01      	pop	{r0}
10000096:	2800      	cmp	r0, #0
10000098:	d000      	beq.n	1000009c <__boot2_start__+0x9c>
1000009a:	4700      	bx	r0
1000009c:	4812      	ldr	r0, [pc, #72]	; (100000e8 <__boot2_start__+0xe8>)
1000009e:	4913      	ldr	r1, [pc, #76]	; (100000ec <__boot2_start__+0xec>)
100000a0:	6008      	str	r0, [r1, #0]
100000a2:	c803      	ldmia	r0, {r0, r1}
100000a4:	f380 8808 	msr	MSP, r0
100000a8:	4708      	bx	r1
100000aa:	b503      	push	{r0, r1, lr}
100000ac:	6a99      	ldr	r1, [r3, #40]	; 0x28
100000ae:	2004      	movs	r0, #4
100000b0:	4201      	tst	r1, r0
100000b2:	d0fb      	beq.n	100000ac <__boot2_start__+0xac>
100000b4:	2001      	movs	r0, #1
100000b6:	4201      	tst	r1, r0
100000b8:	d1f8      	bne.n	100000ac <__boot2_start__+0xac>
100000ba:	bd03      	pop	{r0, r1, pc}
100000bc:	b502      	push	{r1, lr}
100000be:	6618      	str	r0, [r3, #96]	; 0x60
100000c0:	6618      	str	r0, [r3, #96]	; 0x60
100000c2:	f7ff fff2 	bl	100000aa <__boot2_start__+0xaa>
100000c6:	6e18      	ldr	r0, [r3, #96]	; 0x60
100000c8:	6e18      	ldr	r0, [r3, #96]	; 0x60
100000ca:	bd02      	pop	{r1, pc}
100000cc:	0000      	movs	r0, r0
100000ce:	4002      	ands	r2, r0
100000d0:	0000      	movs	r0, r0
100000d2:	1800      	adds	r0, r0, r0
100000d4:	0000      	movs	r0, r0
100000d6:	0007      	movs	r7, r0
100000d8:	0300      	lsls	r0, r0, #12
100000da:	005f      	lsls	r7, r3, #1
100000dc:	2221      	movs	r2, #33	; 0x21
100000de:	0000      	movs	r0, r0
100000e0:	00f4      	lsls	r4, r6, #3
100000e2:	1800      	adds	r0, r0, r0
100000e4:	2022      	movs	r0, #34	; 0x22
100000e6:	a000      	add	r0, pc, #0	; (adr r0, 100000e8 <__boot2_start__+0xe8>)
100000e8:	0100      	lsls	r0, r0, #4
100000ea:	1000      	asrs	r0, r0, #32
100000ec:	ed08 e000 	stc	0, cr14, [r8, #-0]
	...
100000fc:	b274      	sxtb	r4, r6
100000fe:	7a4e      	ldrb	r6, [r1, #9]
	...
10000100:	2000      	movs	r0, #0
10000102:	2004      	movs	r0, #4
10000104:	01f7      	lsls	r7, r6, #7
10000106:	1000      	asrs	r0, r0, #32
	...
100001f6 <_reset_handler>:
100001f6:	481d      	ldr	r0, [pc, #116]	; (1000026c <hold_non_core0_in_bootrom+0xe>)
100001f8:	6800      	ldr	r0, [r0, #0]
100001fa:	2800      	cmp	r0, #0
	

Analysis the SDK boot2

The SDK 2nd stage botloader is designed to configurate the SSI to support XIP (execution in Place), assume the main program is stored immediately after the 2nd stage bootloader, at 0x10000100.

Let's analysis the disassembled code:

Main routine


__boot2_start__:
10000000:	b500      	push	{lr}
	

Store the link register (return address). This is used to determine who calls our 2nd stage bootloader, either from on-chip bootloader, or the user code (main program).



10000002:	4b32      	ldr	r3, =0x40020000	@ PADS_QSPI_BASE
10000004:	2021      	movs	r0, #0x21	@ 2 (8mA) << DRIVE | 1 << SLEWFAST
10000006:	6058      	str	r0, [r3, #4]	@ PADS_QSPI_BASE + GPIO_QSPI_SCLK
10000008:	6898      	ldr	r0, [r3, #8]	@ PADS_QSPI_BASE + GPIO_QSPI_SD0
1000000a:	2102      	movs	r1, #2		@ 1 << Schmitt
1000000c:	4388      	bics	r0, r1
1000000e:	6098      	str	r0, [r3, #8]	@ PADS_QSPI_BASE + GPIO_QSPI_SD0
10000010:	60d8      	str	r0, [r3, #12]	@ PADS_QSPI_BASE + GPIO_QSPI_SD1
10000012:	6118      	str	r0, [r3, #16]	@ PADS_QSPI_BASE + GPIO_QSPI_SD2
10000014:	6158      	str	r0, [r3, #20]	@ PADS_QSPI_BASE + GPIO_QSPI_SD3
	

Setup the electrical characteristics of the QSPI interface to the on-board flash. Includes:

  1. QSPI SCLK pin: Fast slew rate, 8mA drive strength.
  2. QSPI Data pins: Disable Schmitt trigger on all data pins. This will increase the transfer rate (but less reliable signal).


10000016:	4b2e      	ldr	r3, =0x18000000	@ XIP_SSI_BASE
10000018:	2100      	movs	r1, #0
1000001a:	6099      	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR
1000001c:	2102      	movs	r1, #2
1000001e:	6159      	str	r1, [r3, #20]	@ XIP_SSI_BASE + BAUD
10000020:	2101      	movs	r1, #1
10000022:	22f0      	movs	r2, #0xf0
10000024:	5099      	str	r1, [r3, r2]	@ XIP_SSI_BASE + RX_SAMPLE_DLY
10000026:	492b      	ldr	r1, =0x00070000	@ 0 (STD) << SPI_FRF | 7 << DFS32 | 0 (TX_AND_RX) << TMOD | 0 << SCPH
10000028:	6019      	str	r1, [r3, #0]	@ XIP_SSI_BASE + CTRLR0
1000002a:	2101      	movs	r1, #1
1000002c:	6099      	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR
	

Disable SSI for configuration.

Set Rx delay to 1 clock cycle to compensate slow signal propagation speed.

Use 8-bit frame size with standard SPI, enable both Tx and Rx.

When the SCPH bit in CTRLR0 is 0, the slave select pin will be held high when the SSI is inactive. That means, the hardware will automatically pull the slave select low when sending / receiving data; and pull the line high after transmission finished.

Re-enable SSI.



1000002e:	2035      	movs	r0, #53
10000030:	f000 f844 	bl	xip_write2read2
10000034:	2202      	movs	r2, #2
10000036:	4290      	cmp	r0, r2
10000038:	d014      	beq.n	set_qspi
	

Send 53. Is the result 2?

Sending 53 (0x35) to W25Q80 flash reads the status register 2.

2 (0b00000010) in status register 2 means:

If yes, flash is ready to be used in QSPI mode, skip to set_qspi. Otherwise, initial the flash device.



1000003a:	2106      	movs	r1, #6
1000003c:	6619      	str	r1, [r3, #96]	@ XIP_SSI_BASE + DR0
1000003e:	f000 f834 	bl	xip_wait_send
10000042:	6e19      	ldr	r1, [r3, #96]
	

Send 6. Discard one returned data.

Sending 6 to W25Q80 flash enables write, no return data required.



10000044:	2101      	movs	r1, #1
10000046:	6619      	str	r1, [r3, #96]	@ XIP_SSI_BASE + DR0
10000048:	2000      	movs	r0, #0
1000004a:	6618      	str	r0, [r3, #96]
1000004c:	661a      	str	r2, [r3, #96]
1000004e:	f000 f82c 	bl	xip_wait_send
10000052:	6e19      	ldr	r1, [r3, #96]
10000054:	6e19      	ldr	r1, [r3, #96]
10000056:	6e19      	ldr	r1, [r3, #96]
	

Send 1, 0, 2 (r2 holds #2, see code address 0x10000036). Discard three returned data.

Sending 1 to W25Q80 flash writes 2 bytes to status register 1 and 2, as follow:



check_flash_busy:
10000058:	2005      	movs	r0, #5
1000005a:	f000 f82f 	bl	xip_write2read2
1000005e:	2101      	movs	r1, #1
10000060:	4208      	tst	r0, r1
10000062:	d1f9      	bne.n	check_flash_busy
	

Send 5. Is the bit 0 set?

Sending 5 to W25Q80 flash reads the status register 1.

Status register 1 contains:

If bit 0 is set, the flash is busy. Polling this flag until flash is free.



set_qspi:
10000064:	2100      	movs	r1, #0
10000066:	6099      	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR
10000068:	491b      	ldr	r1, =0x005f0300	@ 2 (QUAD) << SPI_FRF | 31 << DFS_32 | 3 (EEPROM_READ) << TMOD | 0 << SCPH
1000006a:	6019      	str	r1, [r3, #0]	@ XIP_SSI_BASE + CTRLR0
1000006c:	2100      	movs	r1, #0
1000006e:	6059      	str	r1, [r3, #4]	@ XIP_SSI_BASE + CTRLR1
10000070:	491a      	ldr	r1, =0x00002221	@ 4 << WAIT_CYCLES | 2 (8B) << INST_L | 8 (x4) << ADDR_L | 1 (1C2A) << TRANS_TYPE
10000072:	481b      	ldr	r0, =0x180000f4	@ XIP_SSI_BASE + SPI_CTRLR0 
10000074:	6001      	str	r1, [r0, #0]
10000076:	2101      	movs	r1, #1
10000078:	6099      	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR
	

Disable SSI for configuration.

Use 32-bit frame size with QSPI, use EEPROM mode (send then read). The slave select pin is controlled by hardware.

For now, we don’t need the slave to return any data, so we set 0 in CTRLR0.

Set command format in standard SPI mode, address in FRF (that is QSPI, set at code address 0x10000068). Use 32-bit address length, 8-bit instruction length, wait 4 cycles between control frame and data receiving. This format is required by W25Q80 fast read quad IO operation.

Re-enable SSI.



1000007a:	21eb      	movs	r1, #235
1000007c:	6619      	str	r1, [r3, #96]	@ XIP_SSI_BASE + DR0
1000007e:	21a0      	movs	r1, #160
10000080:	6619      	str	r1, [r3, #96]
10000082:	f000 f812 	bl	xip_wait_send
	

Send 235, 160.

The first word will be 8-bit command 235 (0xEB), as we set the command in standard SPI mode. Sending 235 (0xEB) to W25Q80 flash reads data in fast quad IO mode.

The next word will be 32-bit address 160 (0x000000A0). For the W25Q80 flash, the first 24 bits will be the actual address (A23-0), and the last 8 bits will be the continuation code (M7-M0). If M5-4 is 0b10, we will not need to issue the instruction in the next read, which reduces command overhead. In this instance, we send a dummy address (0x000000) and the continuation code (0b10100000).

No data is returned since we set 0 in CTRLR0.

W25Q Flash Fast Read Quad IO
W25Q Flash Fast Read Quad IO

Different chips require different continuation code, even in the same W25Q family. Check the manual first.



10000086:	2100      	movs	r1, #0
10000088:	6099      	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR
1000008a:	4916      	ldr	r1, =0xa0002022	@ 0xA0 << XIP_CMD | 4 << WAIT_CYCLES | 0 (0B) << INST_L | 8 (x4) << ADDR_L | 2 (2C2A) << TRANS_TYPE
1000008c:	4814      	ldr	r0, =0x180000f4	@ XIP_SSI_BASE + SPI_CTRLR0 
1000008e:	6001      	str	r1, [r0, #0]
10000090:	2101      	movs	r1, #1
10000092:	6099      	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR
	

Disable SSI for configuration.

Set both command (although no longer used because instruction length is now 0) and address in FRF (that is QSPI set at code address 0x10000068). Use 32-bit address length (24-bit actual address, followed by continuation code 0xA0), no instruction. Wait 4 cycles between control frame and data receiving, as required by W25Q80 flash.

Re-enable SSI.



10000094:	bc01      	pop	{r0}
10000096:	2800      	cmp	r0, #0
10000098:	d000      	beq.n	boot_launch
1000009a:	4700      	bx	r0
boot_launch:
1000009c:	4812      	ldr	r0, =0x10000100
1000009e:	4913      	ldr	r1, =0xe000ed08	@ PPB_BASE + VTOR
100000a0:	6008      	str	r0, [r1, #0]
100000a2:	c803      	ldmia	r0, {r0, r1}	@ r0 <= [r0, #0], r1 <= [r0, #4]
100000a4:	f380 8808 	msr	MSP, r0
100000a8:	4708      	bx	r1
	

Check the return address. If non-zero, branch and exchange to that address; otherwise, it means that we entered from on-chip bootloader.

Set 0x10000100 as the address of vector table, note the address must be aligned by 256. In this program, we assume the vector table is placed immediately after the 2nd stage bootloader.

Load the first vector (vector table address + 0), which is the initial stack address, into MSP (main stack pointer). In this program, we use 0x20042000, which indicates the top of SRAM bank 5.

Branch and exchange to the second vector (vector table address + 4), which is the reset vector. In this program, it is 0x100001f7 (0x100001f6 thumb), which indicates the reset handler of the main program.

Subroutine


xip_wait_send:
100000aa:	b503      	push	{r0, r1, lr}
xip_wait_send_loop:
100000ac:	6a99      	ldr	r1, [r3, #40]	@ XIP_SSI_BASE + SR
100000ae:	2004      	movs	r0, #4		@ 1 << TFE
100000b0:	4201      	tst	r1, r0
100000b2:	d0fb      	beq.n	xip_wait_send_loop
100000b4:	2001      	movs	r0, #1		@ 1 << BUSY
100000b6:	4201      	tst	r1, r0
100000b8:	d1f8      	bne.n	xip_wait_send_loop
100000ba:	bd03      	pop	{r0, r1, pc}
	

Wait until the Tx buffer is empty (FIFO drained) and SSI no longer busy (data sent away from output hardware).



xip_write2read2:
100000bc:	b502      	push	{r1, lr}
100000be:	6618      	str	r0, [r3, #96]	@ XIP_SSI_BASE + DR0
100000c0:	6618      	str	r0, [r3, #96]
100000c2:	f7ff fff2 	bl	xip_wait_send
100000c6:	6e18      	ldr	r0, [r3, #96]
100000c8:	6e18      	ldr	r0, [r3, #96]
100000ca:	bd02      	pop	{r1, pc}
	

Write a word (data from r0) to Tx twice, wait until data is fully sent, then read word twice (data to r0).

This is useful when we need to send 1 byte (in most cases, the opcode) and receive 1 byte (the result).

When we send the first byte to the flash, the flash returns a dummy byte to us at the same time.

In the next frame, we send a dummy byte to the flash, the flash returns the data to us at the same time.

So, in fact, 1 byte sent, 1 byte received.

Misc

From program address 0x100000cc to 0x100000ff contains the data table, includes:

  1. Program data used by ldr rd, [pc, offset].
  2. Zero-paddings.
  3. Checksum.

From program address 0x10000100 contains the main program:


10000100 <__VECTOR_TABLE>:
10000100:	2000
10000102:	2004
10000104:	01f7
10000106:	1000
...
100001f6 <_reset_handler>:
100001f6:	481d      	ldr	r0, [pc, #116]
100001f8:	6800      	ldr	r0, [r0, #0]
100001fa:	2800      	cmp	r0, #0
	

Note the vector and the address of the reset handler.

Execute the Main Program in Place

Now, the SSI is ready to XIP. All we need to do is to place our main program into the flash memory immediately after the 2nd stage bootloader.

Here is a link to my files: c-blink-xip.zip.

Use the same boot2 code as the SDK, in file boot2_src.s:


.cpu cortex-m0plus
.thumb
.align 2
.section .boot2, "ax"

boot2:
	push	{lr}

@ Pad setup
	ldr	r3, =0x40020000	@ PADS_QSPI_BASE
	movs	r0, #0x21	@ 2 (8mA) << DRIVE | 1 << SLEWFAST
	str	r0, [r3, #4]	@ PADS_QSPI_BASE + GPIO_QSPI_SCLK
	ldr	r0, [r3, #8]	@ PADS_QSPI_BASE + GPIO_QSPI_SD0
	movs	r1, #2		@ 1 << Schmitt
	bic	r0, r1
	str	r0, [r3, #8]	@ PADS_QSPI_BASE + GPIO_QSPI_SD0
	str	r0, [r3, #12]	@ PADS_QSPI_BASE + GPIO_QSPI_SD1
	str	r0, [r3, #16]	@ PADS_QSPI_BASE + GPIO_QSPI_SD2
	str	r0, [r3, #20]	@ PADS_QSPI_BASE + GPIO_QSPI_SD3

@ Use standard SPI for
	ldr	r3, =0x18000000	@ XIP_SSI_BASE
	movs	r1, #0
	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR
	movs	r1, #2
	str	r1, [r3, #20]	@ XIP_SSI_BASE + BAUD
	movs	r1, #1
	movs	r2, #0xf0
	str	r1, [r3, r2]	@ XIP_SSI_BASE + RX_SAMPLE_DLY
	ldr	r1, =0x00070000	@ 0 (STD) << SPI_FRF | 7 << DFS32 | 0 (TX_AND_RX) << TMOD | 0 << SCPH
	str	r1, [r3, #0]	@ XIP_SSI_BASE + CTRLR0
	movs	r1, #1
	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR

@ Check flash in QSPI mode
	movs	r0, #53
	bl	xip_write2read2
	movs	r2, #2
	cmp	r0, r2
	beq	set_qspi

@ Enable falsh write
	movs	r1, #6
	str	r1, [r3, #96]	@ XIP_SSI_BASE + DR0
	bl	xip_wait_send
	ldr	r1, [r3, #96]

@ Setup flash QSPI mode
	movs	r1, #1
	str	r1, [r3, #96]	@ XIP_SSI_BASE + DR0
	movs	r0, #0
	str	r0, [r3, #96]
	str	r2, [r3, #96]
	bl	xip_wait_send
	ldr	r1, [r3, #96]
	ldr	r1, [r3, #96]
	ldr	r1, [r3, #96]

@ Wait flash ready
check_flash_busy:
	movs	r0, #5
	bl	xip_write2read2
	movs	r1, #1
	tst	r0, r1
	bne	check_flash_busy

@ Use QSPI and send fast read command to flash
set_qspi:
	movs	r1, #0
	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR
	ldr	r1, =0x005f0300	@ 2 (QUAD) << SPI_FRF | 31 << DFS_32 | 3 (EEPROM_READ) << TMOD | 0 << SCPH
	str	r1, [r3, #0]	@ XIP_SSI_BASE + CTRLR0
	movs	r1, #0
	str	r1, [r3, #4]	@ XIP_SSI_BASE + CTRLR1
	ldr	r1, =0x00002221	@ 4 << WAIT_CYCLES | 2 (8B) << INST_L | 8 (x4) << ADDR_L | 1 (1C2A) << TRANS_TYPE
	ldr	r0, =0x180000f4	@ XIP_SSI_BASE + SPI_CTRLR0 
	str	r1, [r0, #0]
	movs	r1, #1
	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR

@ Setup insturction bypass
	movs	r1, #235
	str	r1, [r3, #96]	@ XIP_SSI_BASE + DR0
	movs	r1, #160
	str	r1, [r3, #96]
	bl	xip_wait_send

@ Setup QSPI with insturction bypass
	movs	r1, #0
	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR
	ldr	r1, =0xa0002022	@ 0xA0 << XIP_CMD | 4 << WAIT_CYCLES | 0 (0B) << INST_L | 8 (x4) << ADDR_L | 2 (2C2A) << TRANS_TYPE
	ldr	r0, =0x180000f4	@ XIP_SSI_BASE + SPI_CTRLR0 
	str	r1, [r0, #0]
	movs	r1, #1
	str	r1, [r3, #8]	@ XIP_SSI_BASE + SSIENR

@ Exit 2nd stage bootloader
	pop	{r0}
	cmp	r0, #0
	beq	boot_launch
	bx	r0
	boot_launch:
	ldr	r0, =0x10000100
	ldr	r1, =0xe000ed08	@ PPB_BASE + VTOR
	str	r0, [r1, #0]
	ldmia	r0, {r0, r1}	@ r0 <= [r0, #0], r1 <= [r0, #4]
	msr	MSP, r0
	bx	r1

@ Wait SPI sent
xip_wait_send:
	push	{r0, r1, lr}
xip_wait_send_loop:
	ldr	r1, [r3, #40]	@ XIP_SSI_BASE + SR
	movs	r0, #4		@ 1 << TFE
	tst	r1, r0
	beq	xip_wait_send_loop
	movs	r0, #1		@ 1 << BUSY
	tst	r1, r0
	bne	xip_wait_send_loop
	pop	{r0, r1, pc}

@ SPI send 1 command and receive 1 data
xip_write2read2:
	push	{r1, lr}
	str	r0, [r3, #96]	@ XIP_SSI_BASE + DR0
	str	r0, [r3, #96]
	bl	xip_wait_send
	ldr	r0, [r3, #96]
	ldr	r0, [r3, #96]
	pop	{r1, pc}
	

With the same linker script we used earlier, in file boot2_src.ld:


MEMORY {
	BOOT2(rx) : ORIGIN = 0x10000000, LENGTH = 256
}

SECTIONS {
	. = ORIGIN(BOOT2);
	.text : {
		KEEP(*(.boot2))
	} >BOOT2
}
	

Assemble, link, add checksum, result in file boot2.o:


arm-none-eabi-as --warn --fatal-warnings -g boot2_src.s -o boot2_src.o
arm-none-eabi-ld -nostdlib -nostartfiles -T boot2_src.ld boot2_src.o -o boot2_src.elf

arm-none-eabi-objcopy -O binary boot2_src.elf boot2_src.bin
pico-pad_checksum -s 0xFFFFFFFF boot2_src.bin boot2.s
arm-none-eabi-as --warn --fatal-warnings -g boot2.s -o boot2.o
	

Let’s create the same blink program; however, we will write it in C this time, in file main.c:


#include <stdint.h>

void reset();

uint32_t vector[48] __attribute__ ((section (".vector"))) = {
	0x20042000,
	(uint32_t)reset
};

void reset() {
	*(uint32_t volatile * const)(0x4000f000) = (1<<5);
	*(uint32_t volatile * const)(0x400140cc) = 5;
	*(uint32_t volatile * const)(0xd0000020) = (1<<25);

	for (;;) {
		*(uint32_t volatile * const)(0xd000001c) = (1<<25);
		for (uint32_t i = 30000; i; i--) { __asm("nop\n\t"); }
	}
}
	

At the beginning, we will need to define the vector table (a set of address). The first vector should be the initial stack pointer, the second vector should be the reset handler (our main program). The ARM document shows there should be 16 + n vectors, where n represents number of IRQs; the RP2040 document shows there are 32 IRQs. Therefore, the vector table should contain 48 32-bit address.

Name this vector .vector, so we can access it during linking.

We had discussed the register addresses in previous sections. The special *(uint32_t volatile * const)(0x4000f000) = data syntax can be used to write data directly to that address.

Compile this program against Cortex-M0+ CPU, use arm-none-eabi-gcc -mcpu=cortex-m0plus -c -O3 main.c -o main.o. Note we add the -c flag, this tells GCC to only compile the code, but don’t link it. Result in file main.o.

Create linker script in file flash.ld:


MEMORY {
	FLASH(rx) : ORIGIN = 0x10000000, LENGTH = 2048k
	SRAM(rwx) : ORIGIN = 0x20000000, LENGTH = 264k
}

SECTIONS {
	.text : {
		. = ORIGIN(FLASH);
		KEEP(*(.boot2))
		KEEP(*(.vector))
		KEEP(*(.text))
	} >FLASH
}
	

In this linker script, we place the boot2 code (2nd stage bootloader with checksum) at the beginning of the flash region, as required by the on-chip bootloader.

Immediately after the boot2 is the vector table. In other words, the vector table will be placed at address 0x10000100.

After the vector table is the text, which contains all other program code.

Link the program and create the uf2 file:


arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld boot2.o main.o -o flash.elf
pico-elf2uf2 flash.elf flash.uf2
	

The generated image file flash.uf2 can be download to Pico now.