ATtiny based WS2812 Light Bar Controller

This article shows how to use AVR MCU (ATtiny) to efficiently drive WS2812 LEDs, that is, using parallel signals to drive segmented LED stripe.

--by Captdam @ Aug 1, 2025 Jul 9, 2025

Frame Format

LED Bit format

WS2812 uses PWM signals to send 0 and 1 bits. In each bit, the high signal first for a specific period, then the low signal. Depending on the length of the high portion, it can mean a 0-bit (short high level) or a 1-bit (long high level).

Raw Timing

Different manufacturers and models use different timing specifications. In most cases, the high level period is less than 0.5 microseconds for 0-bit; or larger than 0.5 microseconds for 1-bit. The low level period varies.

Following tables shows the timing diagram for some WS2812:

Bit Format: Min-Typical-Max (nanoseconds)
Model 0-bit High T0h 0-bit Low T0l 1-bit High T1h 1-bit Low T1l Bit Period Reset
Timing diagram
Tnh + Tnl
XL-6028RGBW-WS2812B X-250-470 X-1000-X 580-850-1000 X-400-X 1200-1250-X 80,000+
XL-5050RGBW-WS2812B-HM 200-295-350 550-595-1200 550-595-1200 200-295-350 900-X-X 80,000+
XL-3210RGBC-WS2812B 300-X-X 900-X-X 900-X-X 300-X-X 1200-X-X 200,000+
Worldsemi WS2812 220-X-380 580-X-1000 580-X-1000 580-X-1000 N/A 280,000+
Worldsemi WS2812C-2020-V1/W 220-X-380 580-X-1000 580-X-1000 580-X-1000 N/A 280,000+
Worldsemi WS2812B 250-400-550 700-850-1000 650-800-950 300-450-600 650-1250-1850 50,000+

As we can see, it is chaotic. Different manufacturers, different models, completely different timing. It is hard to find a universal timing that fits all. Furthermore, when buying WS2812 LEDs on eBay or Amazon, the vendor only states WS2812, and there is no way to tell which manufacturer and which specific model it is. Unless buying a whole reel (so you can see the label), the best solution is to test it to find the actual timing; and of course, do not mix them (unless after testing the timing compatibility).

Timing for CPU clocked @ 8MHz

Although you are allowed to use any frequency between 0 to 20MHz to clock the AVR, a low-cost system will use the internal 8MHz RC clock (in that way, we can save a crystal, two capacitors, and get two free IOs to use). That means, every CPU cycle is 125 nanoseconds. Therefore, in terms of CPU cycles, to send a 1 or 0:

RC clock frequency is voltage and temperature dependen! Calibrate RC clock is required to ensure timing correctness. Note the ambient temperature change will cause clock frequency change; hence making the output signal longer or shorter than the required timing length. Ensure there is some safe margin. (Example: 3 cycles is 375 nanoseconds, that too close to 380)

RC clock frequency can be calibrated slightly off the 8MHz standard frequency to satisfy LED timing requirement.

CPU cycles @ 20MHz (Minround up-Typical-Maxround down)
Model 0-bit High T0h 0-bit Low T0l 1-bit High T1h 1-bit Low T1l Bit Period Reset
XL-6028RGBW-WS2812B X-2-3 X-8-X 5-7-8 X-3-X 10-10-X 640+
XL-5050RGBW-WS2812B-HM 2-2-2 5-5-9 5-5-9 2-2-2 8-X-X 640+
XL-3210RGBC-WS2812B 3-X-X 8-X-X 8-X-X 3-X-X 10-X-X 1600+
Worldsemi WS2812 2-X-3 5-X-8 5-X-8 5-X-8 N/A 2240+
Worldsemi WS2812C-2020-V1/W 2-X-3 5-X-8 5-X-8 5-X-8 N/A 2240+
Worldsemi WS2812B 2-3-4 6-7-8 6-6-7 2-3-4 6-10-14 400+

Timing for CPU clocked @ 16MHz

If you want to clock the CPU at 16MHz (like Arduino) using external crystal:

Some AVRs come with internal PLL to achieve higher clock frequency without the need of external crystal and clock input pins.

CPU cycles @ 16MHz (Min-Typical-Max)
Model 0-bit High T0h 0-bit Low T0l 1-bit High T1h 1-bit Low T1l Bit Period Reset
XL-6028RGBW-WS2812B X-4-7 X-16-X 10-14-16 X-6-X 20-20-X 1280+
XL-5050RGBW-WS2812B-HM 4-5-5 9-10-19 9-10-19 4-5-5 15-X-X 1280+
XL-3210RGBC-WS2812B 5-X-X 15-X-X 15-X-X 5-X-X 20-X-X 3200+
Worldsemi WS2812 4-X-6 10-X-16 10-X-16 10-X-16 N/A 4480+
WorldsemiWS2812C-2020-V1/W 4-X-6 10-X-16 10-X-16 10-X-16 N/A 4480+
Worldsemi WS2812B 4-6-8 12-14-16 11-13-15 5-7-9 11-20-29 800+

Timing for CPU clocked @ 20MHz

If you want to clock the CPU at max frequency (20MHz) using external crystal:

CPU cycles @ 20MHz (Min-Typical-Max)
Model 0-bit High T0h 0-bit Low T0l 1-bit High T1h 1-bit Low T1l Bit Period Reset
XL-6028RGBW-WS2812B X-5-9 X-20-X 12-17-20 X-8-X 24-25-X 1600+
XL-5050RGBW-WS2812B-HM 4-6-7 11-12-24 11-12-24 4-6-7 18-X-X 1600+
XL-3210RGBC-WS2812B 6-X-X 18-X-X 18-X-X 6-X-X 24-X-X 4000+
Worldsemi WS2812 5-X-7 12-X-20 12-X-20 12-X-20 N/A 5600+
Worldsemi WS2812C-2020-V1/W 5-X-7 12-X-20 12-X-20 12-X-20 N/A 5600+
Worldsemi WS2812B 5-8-11 14-17-20 13-16-19 6-9-12 13-25-37 1000+

Example: XL-5050RGBW-WS2812B-HM & Worldsemi WS2812B with CPUs at Different Clocks

Let's use XL-5050RGBW-WS2812B-HM and Worldsemi WS2812B with CPUs at different clocks as out next example:

XL-5050RGBW-WS2812B-HM & Worldsemi WS2812B with CPUs at Different Clocks
Model 0-bit High T0h 0-bit Low T0l 1-bit High T1h 1-bit Low T1l Bit Period
Timing diagram
Tnh + Tnl
XL-5050RGBW-WS2812B-HM
Raw timing 200-295-350 550-595-1200 550-595-1200 200-295-350 900-X-X
CPU cycles @ 8MHz 2 cycles / 250 ns 6 cycles / 750 ns 6 cycles / 750 ns 2 cycles / 250 ns 8 cycles / 1000 ns
CPU cycles @ 16MHz 5 cycles / 312.5 ns 10 cycles / 625 ns 10 cycles / 625 ns 5 cycles / 312.5 ns 15 cycles / 937.5 ns
Worldsemi WS2812B
Raw timing 250-400-550 700-850-1000 650-800-950 300-450-600 650-1250-1800
CPU cycles @ 8MHz 3 cycles / 375 ns 7 cycles / 875 ns 6 cycles / 750 ns 4 cycles / 500 ns 10 cycles / 1250 ns
CPU cycles @ 16MHz 6 cycles / 375 ns 14 cycles / 875 ns 13 cycles / 812.5 ns 7 cycles / 437.5 ns 20 cycles / 1250 ns

8MHz is too slow for XL-5050RGBW-WS2812B-HM. See reason in next section.

Furthermore, both 0-bit and 1-bit starts with a high level. At a specific cycle (2nd cycle for this example), we will send the data (high for 1-bit, low for 0-bit). At another specific cycle (6th cycle for this example), we send low.

It is clear to say, the timing requirement is tight for our application when the CPU is running at 8MHz. We will have to buffer the data in SRAM, and then burst output the data to satisfy the timing requirement.

Reset

To reset (start a new stream), we will need to pull the signal low for a long period. Above examples shows 80 or 280 microseconds, but other models may varies.

This time period allows us to load the prepared data in SRAM for burst output.

Furthermore, we can utilize this time period to send data streams to other LEDs. For example, we can use 100 microseconds to send data to LED series A, use the next 100 microseconds to send data to LED series B. Once B is finished, we can go back to A, which is now ready for the next data stream.

LED Word format

Word mans a data unit. Its width depends on the application, it does NOT always mean 16-bit or 32-bit.

LED0 G7:G0 LED0 R7:R0 LED0 B7:B0 LED1 G7:G0 LED1 R7:R0 LED1 B7:B0

WS2812 supports 24-bit RGB color. That is, it takes 24 bits of data to program one LED. Green first, then red, then blue. MSB first.

Multiple LEDs are connected in series. We send a data stream to program them. The first LED in the series will consume the first 24 bits of data (bit 0 to bit 23); the second LED will consume the second 24 bits of data (bit 24 to bit 47), and so on.

8-Lane Parallel Bus

Segment 0 LED0 G7:G0 Segment 0 LED0 R7:R0 Segment 0 LED0 B7:B0 Segment 0 LED1 G7:G0 Segment 0 LED1 R7:R0 Segment 0 LED1 B7:B0 Segment 1 LED0 G7:G0 Segment 1 LED0 R7:R0 Segment 1 LED0 B7:B0 Segment 1 LED1 G7:G0 Segment 1 LED1 R7:R0 Segment 1 LED1 B7:B0 Segment 2 LED0 G7:G0 Segment 2 LED0 R7:R0 Segment 2 LED0 B7:B0 Segment 2 LED1 G7:G0 Segment 2 LED1 R7:R0 Segment 2 LED1 B7:B0

The more LEDs, the longer to take to program the entire light bar.

Furthermore, we are now using only 1 output bit. AVR (and other 8-bit MCUs) group 8-bit into a byte as a port. Bit output on ports is very inefficient.

To speed up the transmission, we can break the light bar into 8 segments. Next, we can use a 8-bit IO port to drive the segmented light bar, one bit for one segment.

As the above figure shows, the first byte output by the MCU will contain the first bit of all 8 segments, the second byte output by the MCU will contain the second bit of all 8 segments, and so on...

Example: 24 LEDs in 8 Segments

Following example shows 24 LEDs divided into 8 segments. That is, each segment has 3 LEDs.

With out segmented design, the controller should output:


		LED 0 G		LED 0 R		LED 0 B		LED 1 G		LED 1 R		LED 1 B		LED 2 G		LED 2 R		LED 2 B
Segment 0	0000 0000	0000 0001	0000 0010	0000 0100	0000 0101	0000 0110	0000 1000	0000 1001	0000 1010
Segment 1	0001 0000	0001 0001	0001 0010	0001 0100	0001 0101	0001 0110	0001 1000	0001 1001	0001 1010
Segment 2	0010 0000	0010 0001	0010 0010	0010 0100	0010 0101	0010 0110	0010 1000	0010 1001	0010 1010
Segment 3	0011 0000	0011 0001	0011 0010	0011 0100	0011 0101	0011 0110	0011 1000	0011 1001	0011 1010
Segment 4	0100 0000	0100 0001	0100 0010	0100 0100	0100 0101	0100 0110	0100 1000	0100 1001	0100 1010
Segment 5	0101 0000	0101 0001	0101 0010	0101 0100	0101 0101	0101 0110	0101 1000	0101 1001	0101 1010
Segment 6	0110 0000	0110 0001	0110 0010	0110 0100	0110 0101	0110 0110	0110 1000	0110 1001	0110 1010
Segment 7	0111 0000	0111 0001	0111 0010	0111 0100	0111 0101	0111 0110	0111 1000	0111 1001	0111 1010
	

With the segmented design, the controller should output:


00000000 00001111 00110011 01010101 00000000 00000000 00000000 00000000
00000000 00001111 00110011 01010101 00000000 00000000 00000000 11111111
00000000 00001111 00110011 01010101 00000000 00000000 11111111 00000000
00000000 00001111 00110011 01010101 00000000 11111111 00000000 00000000
00000000 00001111 00110011 01010101 00000000 11111111 00000000 11111111
00000000 00001111 00110011 01010101 00000000 11111111 11111111 00000000
00000000 00001111 00110011 01010101 11111111 00000000 00000000 00000000
00000000 00001111 00110011 01010101 11111111 00000000 00000000 11111111
00000000 00001111 00110011 01010101 11111111 00000000 11111111 00000000
	

The first bit in each segment is 00000000; therefore, we output 00000000 in the beginning. The second bit in each segment is 00001111; therefore, we output 00001111 then. The third bit in each segment is 00110011; therefore, we output 00110011 next.

Driver Software

Bit Driver

As we discussed in the previous section, timing requirement is tight. Especially, for XL-5050RGBW-WS2812B-HM, the shorter signal only last 350 microseconds, that's even less than 3 cycles for 8MHz (375 microseconds). Therefore, the most important aspect of this driver is to be fast and accurate.

For each data bit, we will need to:

  1. At the beginning, set output high.
  2. Read the color data bit, and index the read pointer.
  3. At the some cycle, output the color data bit.
  4. At the some cycle, set output low.
  5. Check for the boundary of data stream (number of data bits in each segment).
    • If it is not the last data bit: At the 8th cycle, go to step 1.
    • If it is the last data bit: Stop sending, keep the output low for at least 80 microseconds. We can prepare data for the next output burst during this time.

Let’s write this task in C, assume we will be using PORTA on ATtiny461 to drive the LEDs:


1.c
#define SIZE (24 * 8) // 24 bits/LED

#include <avr/io.h>

volatile uint8_t data[SIZE];

void main() {
	DDRA = 0xFF;
	for (volatile uint8_t* p = data; p < &(data[SIZE]); p++) {
		PORTA = 0xFF;
		PORTA = *p;
		/* asm("nop\n"); */
		PORTA = 0x00;
	}
	for(;;);
}
	

Compile with avr-gcc 1.c -o 1.out -mmcu=attiny461 -O3.

The Evil for Loop with 16-bit Memory Addressing

Let's disassemble the above C code with avr-objdump -m avr25 -D 1.out > 1.txt:


00000048 <main>:
  48:	8f ef       	ldi	r24, 0xFF	; 255
  4a:	8a bb       	out	0x1a, r24	; 26
  4c:	e0 e6       	ldi	r30, 0x60	; 96
  4e:	f0 e0       	ldi	r31, 0x00	; 0
  50:	9f ef       	ldi	r25, 0xFF	; 255
  52:	9b bb       	out	0x1b, r25	; 27 === Cycle 0   ===
  54:	81 91       	ld	r24, Z+		; load needs 2 cycles
  56:	8b bb       	out	0x1b, r24	; 27 === Cycle 3   ===
  ...               	nop			;    === Cycle 3+n ===
  58:	1b ba       	out	0x1b, r1	; 27 === Cycle 4+n ===
  5a:	81 e0       	ldi	r24, 0x01	; 1
  5c:	e0 32       	cpi	r30, 0x20	; 32
  5e:	f8 07       	cpc	r31, r24
  60:	a8 f3       	brcs	.-16     	; 0x52 <main+0xa> === Cycle 8+n, plus another 2 cycles to jump to beginning of the loop ===
  62:	ff cf       	rjmp	.-2      	; 0x68 <main+0x20>
	

Problem:

But... What if we can rearrange the instructions to perform the index check while the CPU is idling for timing (the nop instructions)?

Be Fast!

We have to use Assembly for this purpose:


2.c

#define SIZE (24 * 8) // 24 bits/LED

#include <avr/io.h>
uint8_t data[SIZE];

__attribute__((naked)) void burst(uint8_t* start, uint8_t* end) {
	asm(
		"movw	r30, r24\n\t"		// First argument in r25:r24, r31:r30 (Z) are call-used regs
		"ldi	r24, 0xFF\n\t"		// R24 is call-used, content in R24 copied to R30 and no loinger required
		// Loop starts
		"1:	\n\t"
		"out	%[port], r24\n\t"	// Cycle 0
		"ld	r25, Z+\n\t"		// R25 is call-used (load requires 2 cycles)
		"out	%[port], r25\n\t"	// Cycle 3
		"cp	r30, r22\n\t"		// Second argument in r23:r22
		"cpc	r31, r23\n\t"
		"out	%[port], r1\n\t"	// Cycle 6, R1 is always 0, OUT instruction will NOT clob zero flag
		"nop	\n\t"
		"nop	\n\t"
		"brne	1b\n\t"			// Cycle 9 and 10 (branch requires 2 cycles)
		"ret	\n\t"
		:
		: [port] "I" (_SFR_IO_ADDR(PORTA))
	);
}

void main() {
	DDRA = 0xFF;
	burst(data, &(data[SIZE]));
	for(;;);
}
		

Compile with avr-gcc 2.c -o 2.out -mmcu=attiny461 -O3.

Disassemble with avr-objdump -m avr25 -D 2.out > 2.txt:


00000048 <burst>:
  48:	fc 01       	movw	r30, r24
  4a:	8f ef       	ldi	r24, 0xFF	; 255
  4c:	8b bb       	out	0x1b, r24	; 27
  4e:	91 91       	ld	r25, Z+
  50:	9b bb       	out	0x1b, r25	; 27
  52:	e6 17       	cp	r30, r22
  54:	f7 07       	cpc	r31, r23
  56:	00 00       	nop
  58:	00 00       	nop
  5a:	1b ba       	out	0x1b, r1	; 27
  5c:	c1 f3       	brne	.-18     	; 0x4c <burst+0x4>
  5e:	08 95       	ret

00000060 <main>:
  60:	8f ef       	ldi	r24, 0xFF	; 255
  62:	8a bb       	out	0x1a, r24	; 26
  64:	60 e2       	ldi	r22, 0x20	; 32
  66:	71 e0       	ldi	r23, 0x01	; 1
  68:	80 e6       	ldi	r24, 0x60	; 96
  6a:	90 e0       	ldi	r25, 0x00	; 0
  6c:	ee df       	rcall	.-38     	; 0x48 <burst>
  6e:	ff cf       	rjmp	.-2      	; 0x6e <main+0xe>
		

We successfully satisfy the timing requirement.

This driver uses the AVR-GCC calling convention, it is compatible with AVR-GCC ABI ONLY!

Word Format Converter

Since we are using parallel design, we will have to convert the signal from the original GRB8.

Following C code shows how to perform the conversion:


#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <errno.h>

typedef struct __attribute__((packed)) RGB {
	uint8_t r, g, b;
} RGB;
typedef struct __attribute__((packed)) Lane {
	uint8_t g[8];
	uint8_t r[8];
	uint8_t b[8];
} Lane;

int main(int argc, char* argv[]) {
	if (argc != 7) {
		fprintf(stderr, "Use: this leds segments lines frames source dest, (%d given)\n", argc);
		return 1;
	}
	int led = atoi(argv[1]);
	int segment = atoi(argv[2]);
	int line = atoi(argv[3]);
	int frame = atoi(argv[4]);
	fprintf(stderr, "%d LEDs per segment, %d segments, %d bits buffer, %d lines, %d frames\n", led, segment, led * segment * 24, line, frame);

	uint8_t* bin = malloc(led * segment * 3);
	uint8_t* bout = malloc(led * segment * 3);
	FILE* fin = fopen(argv[5], "rb");
	FILE* fout = fopen(argv[6], "wb");
	int pRead[segment]; // Offset in bit (not byte)
	int pWrite;

	for (int iframe = 0; iframe < frame; iframe++) {
		for (int iline = 0; iline < line; iline++) {
			pWrite = 0;
			for (int isegment = 0; isegment < segment; isegment++) {
				pRead[isegment] = isegment * led * 24; // Each segment has 24 * led bits
			}
			fread(bin, 3, led * segment, fin);
			for (int iled = 0; iled < led; iled++) {
				for (int ibit = 0; ibit < 24; ibit++) {
					for (int isegment = 0; isegment < segment; isegment++) {
						int readIdx = pRead[isegment] >> 3;
						int readBit = 7 - (pRead[isegment] & 0b111);
						uint8_t x = ( bin[readIdx] >> readBit ) & 1;
						pRead[isegment]++;
						//fprintf(stderr, "Read %d: Byte %d, bit %d: %d\n", ibit, readIdx, readBit, x);
						int writeIdx = pWrite >> 3;
						int writeBit = 7 - (pWrite & 0b111);
						uint8_t write = bout[writeIdx];
						write &= ~(1<<writeBit);
						write |= x << writeBit;
						bout[writeIdx] = write;
						pWrite++;
					}
				}
			}
			fwrite(bout, 3, led * segment, fout);
			for (int i = 0; i < 3 * led * segment; i++) {
				fprintf(stderr, "Write %d:\t", i);
				for (int j = 7; j >= 0; j--) {
					fprintf(stderr, "%c", '0' + ((bout[i] >> j) & 1));
				}
				fprintf(stderr, "\n");
				if (i % 8 == 0b111) fprintf(stderr, "\n");
			}
		}
	}

	free(bin);
	free(bout);
	fclose(fin);
	fclose(fout);
	return 0;
}
	

Compile with gcc -O3 image.c -o image

Assume we have 24 LEDs in 8 segments (so, 3 LEDs each segment). If we save the original GRB8 data in a file named test.in, and we are planning to generate the processed data in test.out, we can use the command: ./image 3 8 1 1 test.in test.out.

We should pre-process the data on a PC. Leting the MCU to process the data on-the-fly is way too heavy.

Hardware Selection

ATtiny Family MCU Selection

For maximum performance, we will need to use a MCU with at least one full 8-bit IO port.

According to my another blog, we have the following options:

Memory Size Selection

To satisfy the tight timing requirement, ee will need to save the color data into SRAM buffer first, then burst output the data in the buffer.

Since each LED word is 24 bits and we are dividing the light bar into 8 segments, every 8 LEDs will need 24 bytes of buffer.

That means, theoretically, the maximum number of LEDs the ATtiny MCU could allow is:

Number of LEDs v.s. MCU Memory Size
ATtiny model SRAM size (Bytes) Maxmium LED Required buffer size Free memory
2X 128 40 120 8
4X 256 80 240 16
8X 512 168 504 8

If you want larger buffer size, consider ATmega.

You may fabricate PCB and solder the LEDs on (if you really enjoy working with SMDs). However, probably the easiest way to make a WS2812-based light bar is to buy the commonly-used cost-effective light stripe, then stick it on a hard wood or metal plate.

These light stripes may have different densities, most common ones include 144 LEDs per meter, 60 LEDs per meter and 30 LEDs per meter. The following table helps you choose the right LED stripe according to your dimension requirement.

Number of LEDs v.s. Common Light Stripe Length
Buffer LEDs 144/m 60/m 30/m
120 40 0.278m 0.667m 1.333m
240 80 0.556m 1.333m 2.667m
504 160 1.167m 2.8m 5.6m
1008 336 2.333m 5.6m 11.2m
2040 680 4.722m 11.333m 22.667m

Unit Test

Test Circuit

Now, we will build a 4-LED PCB test unit. This simulate a 1-segment 4-LED unit. Following diagram shows the PCB design, you can also cut a section of WS2812 stripe to build the test unit.

Test Circuit Schematic
Test Circuit Schematic
Test Circuit PCB
Test Circuit PCB

Note the signal input on left side, output on right side; however, the power is on bottom side, GND on top side. This is weird, I don't know why the guy designed the package to place power on lower side but GND on top.

We will connect the unit to a ATtiny44 MCU. The data input pin of the test unit will be connected to port A pin 0 of the MCU, which simulates segment 0.

Fabricated Test Unit PCB
Fabricated Test Unit PCB
Test Circuit Connection
Test Circuit Connection

All LEDs On with Different Color


#define SIZE (24 * 3) // 24 bits/LED, 8 segment, 3LEDs per segment

#include <avr/io.h>

uint8_t data[SIZE] = {
//	MSB 7 6     5     4     3     2     1     0 LSB
	
	/* LED 0 */
	0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, //Green
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Red
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Blue
	/* LED 1 */
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Green
	0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Red
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Blue
	/* LED 2 */
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Green
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Red
	0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Blue
	
	/* Parallel design, each bit in a byte represents a segment */
};

__attribute__((naked)) void burst(uint8_t* start, uint8_t* end) {
	asm(
		"movw	r30, r24\n\t"		// First argument in r25:r24, r31:r30 (Z) are call-used regs
		"ldi	r24, 0xFF\n\t"		// R24 is call-used, content in R24 copied to R30 and no loinger required
		// Loop starts
		"1:	\n\t"
		"out	%[port], r24\n\t"	// Cycle 0
		"ld	r25, Z+\n\t"		// R25 is call-used
		"out	%[port], r25\n\t"	// Cycle 3
		"cp	r30, r22\n\t"		// Second argument in r23:r22
		"cpc	r31, r23\n\t"
		"out	%[port], r1\n\t"	// Cycle 6, R1 is always 0, OUT instruction will not clob zero flag
		"nop	\n\t"
		"brne	1b\n\t"			// Cycle 7 and 8
		"ret	\n\t"
		:
		: [port] "I" (_SFR_IO_ADDR(PORTA))
	);
}

int main() {
	DDRA = 0xFF;
	PORTA = 0x00;
	
	for (uint8_t d = 0; ; d++) {
		burst(data, &(data[SIZE]));
		
		for (uint32_t i = 0; i < 100000; i++); // Reset
	}
	
	return 0;
}
	

In the above example, we are driving 24 LEDs divided into 8 segments (So, 3 LEDs in each segment). We save the color data in array data[SIZE].


/* LED 0 */
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, //Green
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Red
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Blue
	

The first 24 bytes contains the data for the first LED in each segment. Bit 0 of each byte for segment 0, bit 1 of each byte for segment 1, and so on...

In this example, the MCU will write 11111111 00000000 00000000 to all segments. This will turn the green color at maximum intensity.


/* LED 1 */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Green
0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Red
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Blue
	

The second 24 bytes contains the data for the second LED in each segment.

In this example, the MCU will write 00000000 10000000 00000000 to all segments. This will turn the red color at half intensity.


/* LED 2 */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Green
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Red
0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //Blue
	

The third 24 bytes contains the data for the third LED in each segment.

We will need to look this example bitwisely. Because the first byte in bule channel is 0x55 (0b01010101), the MCU will write 00000000 00000000 10000000 to segment 0, 2, 4, 6; 00000000 00000000 00000000 to segment 1, 3, 5, 7. This will turn the bule color at half intensity for even segments; off for odd segments.

Since the buffer is 72 bytes large, the MCU will output 72 bits on each segment. Each LED will consume 24 bits of data; therefore, any LED after the 3rd LED will receive no data.

All LEDs On with Different Color
All LEDs On with Different Color

Program the MCU, we can see the first LED is green with maxmium intensity, the second and third LEDs are red and blue with half intensity.

The MCU is clocked at 8MHz, use avrdude -P com3 -c arduino -p t44 -U lfuse:w:0xa2:m to program the low fuse of ATtiny44 to use internal 8MHz RC clock.

Blinking LED


int main() {
	DDRA = 0xFF;
	PORTA = 0x00;
	
	for (uint8_t d = 0; ; d++) {
		data[0] = ~data[0];
		burst(data, &(data[SIZE]));
		for (uint32_t i = 0; i < 100000; i++); // Reset
		data[32] = ~data[32];
		burst(data, &(data[SIZE]));
		for (uint32_t i = 0; i < 100000; i++); // Reset
		data[64] = ~data[64];
		burst(data, &(data[SIZE]));
		for (uint32_t i = 0; i < 100000; i++); // Reset
	}
	
	return 0;
}
	

In this example, the MCU will toggle the MSB (first bit) of LED 0 green, LED 1 red and LED 2 blue, it will cause the LEDs in segment 0 light in the following patterns in loop:

Blinking Test Patterns
Time LED 0 LED 1 LED 2 Note
Init 11111111-00000000-00000000 00000000-10000000-00000000 00000000-00000000-10000000 Beginning pattern in buffer
6n+1 01111111-00000000-00000000 00000000-10000000-00000000 00000000-00000000-10000000 LED 0 gose half intensity
6n+2 01111111-00000000-00000000 00000000-00000000-00000000 00000000-00000000-10000000 LED 1 off
6n+3 01111111-00000000-00000000 00000000-00000000-00000000 00000000-00000000-00000000 LED 2 off
6n+4 11111111-00000000-00000000 00000000-00000000-00000000 00000000-00000000-00000000 LED 0 gose full intensity
6n+5 11111111-00000000-00000000 00000000-10000000-00000000 00000000-00000000-00000000 LED 1 half on
6n+6 11111111-00000000-00000000 00000000-10000000-00000000 00000000-00000000-10000000 LED 2 half on

Now, program the MCU.

Blinking LEDs
Blinking LEDs

We can confirm the LEDs light on and off same as the pattern table above. Note the green intensity change.