Captdam

Understanding MODBUS RTU

Introduction to MODBUS and MODBUS RTU protocol. This blog presents the format of MODBUS RTU packet, including the request from master to slave and the response form slave to master. This blog also shows how to ccompute the CRC-16 value for MODBUS packet in AVR assembly language.

--by Captdam @ 2023-08-20 23:00 GMT edited

【中】 这里是这篇文章的中文版

Introduction

Recently, I was working on a project that reads data from PLC and manipulates the data using my custom designed hardware.

My first thought was to use the RS-232 / RS-485 port of the PLC to implement a custom protocol. For example, when I send 0x01 to the PLC, the PLC should return a packet as the result of function 1; when I send 0x02 to the PLC, the PLC should return a packet as the result of function 2.

When reading the manual for that PLC, I cannot find too much detail about how to receive and send data through the serial port (RS-232 or RS-485). However, the term “MODBUS” in the manual pops-up and get my attention.

After some reaching about the “MODBUS” on Wikipedia and on the MODBUS website, I realized that “MODBUS” is exactly the diamond I was looking for. The serial version of MODBUS (MODBUS RTU, which stands for MODBUS Remote Terminal Unit) was developed in 1979 by Modicon (now Schneider) is a very elegant protocol and it is a de facto standard in PLCs and other industry devices. Furthermore, this is a royalty-free protocol.

It is elegant because the protocol defined a simple and compact packet format. MODBUS RTU packet has three segments:

MODBUS packet structure
MODBUS packet structure - Screenshot from MODBUS APPLICATION PROTOCOL SPECIFICATION V1.1b3

The total length of a packet (ADU, Application Data Unit) includes the address, the payload and the checksum. And it must be less than or equal to 256 bytes. Since the address is 1-byte long and the checksum is 2-byte long, the maximum length of the payload (PDU, Protocol Data Unit) is 253 bytes.

For multi-byte data, the MODBUS requires the implementation to send high-byte first. For example, to send a 16-bit integer 0x1234, 0x12 first, then 0x34.

We can arrange a 256-byte block aligned to 256-byte page to store the packet in our Tx buffer. For a typical 8-bit MCU (microcontroller) like AVR or 8015 XRAM, we can use a 16-bit pointer with fixed high-byte. For micro-optimization, we can start the transmission after setting the Tx buffer pointer. We increase the low-byte of this pointer at each Tx interrupt and stops when the low-byte is zero. We know, single-byte comparison is way easier than multi-byte comparison in 8-bit world. See implementation code for detail.

ELEGANT! ELEGANT!!!

MODBUS transmission without error
MODBUS transmission without error - Screenshot from MODBUS APPLICATION PROTOCOL SPECIFICATION V1.1b3

MODBUS RTU is a request-response protocol like some of the other common protocols. The master sends a request packet to a specific slave device, then the addressed slave device returns the request data to the master. The master Tx is connected to Rx of all slaves and all slaves’ Tx are connected together with the master’s Rx.

A transmission can be divided into two steps:

MODBUS transmission with error
MODBUS transmission with error - Screenshot from MODBUS APPLICATION PROTOCOL SPECIFICATION V1.1b3

Of course, both the master and slaves perform packet check to verify the transmission integrity. The master also has a timeout and retry mechanism in case of the packet is lost during the transmission.

Both the request and response packet have the address segment and the checksum segment. The payload section is different.

Because a single-data read/write is equivalent to a multi-data read/write with length of 1. If we want to go extreme on the bandwidth control, we can implement the single-data read/write as well. In this article, we will only discuss the concepts of multi-data read/write. More detail can be found in the MODBUS Specification.

For a read process:

The request on the master-side: the payload has a 1-byte long function code; followed by a 2-byte long start address of the data in slave's memory; followed by a 2-byte long length of data to read (note, this is the number of data elements not the length in bytes).

The response on the salve-side: the payload has a 1-byte long function code, same as the function code in the request; followed by a 1-byte long length of data (note this number is the length of data in bytes); followed by a string of data in interest.

For a write process:

The request on the master-side: the payload has a 1-byte long function code; followed by a 2-byte long start address of the data in slave's memory; followed by a 2-byte long length of data to write (note, this is the number of data elements not the length in bytes); followed by a 1-byte long length of data (this time in unit of bytes); followed by variable length of data to write.

The response on the salve-side: the payload has a 1-byte long function code, same as the function code in the request; followed by a 2-byte long address of data in slave's memory; followed by a 2-byte long length of data to write (note, this is the number of data elements not the length in bytes).

The MODBUS protocol was designed in 1979, it deses not cover fancy functionality found in some more modern protocols. However, my application only requires read basic data from PLC, MODBUS is good enough for my requirement.

Bench Study

People says don't build a wheel; but sometimes reading other libraries actually cost more time than develop your own. Plus, you learn what happened, you have full control, and you know where to look during debugging.

Probably one of the best ways to inspect the MODBUS packet is to let the PLC to send a request to a computer to analysis request packet format; then return that same packet (or flip some bits to simulate a transmission error) to the PLC and see what happen on the PLC.

Enable MODBUS on PLC

First. Let's configurate the PLC to enable the MODBUS functionality.

PLC program to enable MODBUS
PLC program to enable MODBUS

This ladder setup the PLC to use MODBUS on the serial communication port at 9600 BAUD. It also sets the slave ID for this PLC to receive request. It defines the retey, delay and timeout value for this PLC when sending request.

View Request Packet

Because we have no idea how a MODBUS packet will look like. We will write a program in our PLC to send a “Read registers” request to our computer. Of course, our computer will not return anything. All we need to know in this step is what the response looks like, and we don't care what the response will be and what the PLC will do in case of no response.

PLC program to send MODBUS request
PLC program to send MODBUS request

This ladder let the PLC to send a Read registers command to read 3 registers from slave 2, starting at address 16, save to address 32 in the PLC at the rising edge of Input pin 7.

The following data are placed on the bus:

0x02 0x03 0x00 0x10 0x00 0x03 0x04 0x3D

Because the MODBUS sends high-byte first for multi-byte data, the above packet can be read as:

  1. 0x02: Slave ID

  2. 0x03: Function code = Read registers

  3. 0x0010: Start address = 16

  4. 0x0003: Length of data = 3 data

  5. 0x043D: Checksum = 0x3D40

View Response Packet (No Error)

Now, we have a valid request packet, we can echo this request back to the PLC to read registers on the PLC.

Let's send the above request to the PLC, we get the packet:

0x02 0x03 0x06 0x30 0x39 0x00 0xF4 0x00 0xF3 0xAD 0xC7

where:

  1. 0x02: Slave ID

  2. 0x03: Function code = Read registers

  3. 0x06: Length of data = 6 bytes

  4. 0x3039: Data in register 16 = 12345

  5. 0x00F4: Data in register 17 = 0x00F4

  6. 0x00F3: Data in register 18 = 0x00F3

  7. 0xADC7: Checksum = 0xC7AD

View Response Packet (Error in Packet)

Let's re-send the above request to the PLC, but with some bit flipped, we get a packet indicating checksum error:

0x02 0x03 0x01 0x70 0xF0

where:

  1. 0x02: Slave ID

  2. 0x03: Function code = Read registers

  3. 0x01: Error code = 01

  4. 0x70F0: Checksum = 0xF070

Specification

Let's now verify what we found the experiment with the MODBUS Specification.

MODBUS transmission example
MODBUS transmission example - Screenshot from MODBUS APPLICATION PROTOCOL SPECIFICATION V1.1b3

Timing

Now, we have studied that MODBUS response contains a byte indicating the length of the packet. Using this data can help us determine the end of the packet (in another word, when to stop receiving and process to next step).

However, we still facing two issues:

These issues will cause our program to enter a dead waiting loop.

Another solution is to set a watchdog timer. At timeout, we determine the slave is offline or the packet is end. In the MODBUS spec, the space between two packet should be greater than the time to send 3.5 characters.

MODBUS signal on logic analyzer
MODBUS signal on logic analyzer

Above figure shows a MODBUS transaction. In this figure, the bus is set to 19200 BAUD. As we can see, the slave spends about 20 characters of time to response. We measured the space is about 12ms long. On the other hand, the space between two characters in the same packet is about 1 to 2 bits.

Therefore, we will need 2 watchdog timers.

CRC Compute

MODBUS RTU is a very basic data read/write protocol. It is quite easy to understant the packet format; the only challenge involved is to compute the CRC checksum.

MODBUS RTU uses Initial value of 0xFFFF with Polynomial of 0xA001. The AVR GCC provides a ready-to-us lib for us. There are also tons of tutorial about CRC compute available online. Consider the following C code for AVR embedded system:


#include 
#include 

volatile uint8_t modbus_buf[256] __attribute__ ((aligned(256)));

int main(void) {
	// Init
	modbus_buf[0xF8] = 0x02;
	modbus_buf[0xF9] = 0x03;
	modbus_buf[0xFA] = 0x00;
	modbus_buf[0xFB] = 0x10;
	modbus_buf[0xFC] = 0x00;
	modbus_buf[0xFD] = 0x03;
	modbus_buf[0xFE] = 0x04;
	modbus_buf[0xFF] = 0x3D;
	
	// Process
	uint16_t val = 0xFFFF;
	for (volatile uint8_t* ptr = modbus_buf + 0xF8; ptr < modbus_buf + 0x100; ptr++) {
		val ^= *ptr;
		for (uint8_t i = 0; i < 8; i++) {
			if (val & 0x0001) {
				val >>= 1;
				val ^= 0xA001;
			} else {
				val >>= 1;
			}
		}
		PORTD = val;
	}
}
	

First, we write a test packet into a dedicated buffer we created for MODBUS. This packet contains 6 bytes of data and 2 bytes of checksum. In the loop, we explicitly add PORTD = val; to force the compiler to output the checksum to preventing it beging optimized away. Because we get 6 bytes of data, the 16-bit value val should contain the CRC-16 checksum at the 6th time we reach PORTD = val;. If this packet is error free, val should be 0 at the 8th time we reach PORTD = val;.

Now, compile the code. We use the release mode when compiling to generate optimized program. Following is the disassembly of the program, only the CRC compute part is shown here:


0000005D e8.ef                LDI R30,0xF8		Load immediate 
0000005E f1.e0                LDI R31,0x01		Load immediate 
0000005F 8f.ef                SER R24			Set Register 
00000060 9f.ef                SER R25			Set Register 
00000061 22.e0                LDI R18,0x02		Load immediate 
00000062 e0.30                CPI R30,0x00		Compare with immediate 
00000063 f2.07                CPC R31,R18		Compare with carry 
00000064 a0.f4                BRCC PC+0x15		Branch if carry cleared 
00000065 20.81                LDD R18,Z+0		Load indirect with displacement 
00000066 82.27                EOR R24,R18		Exclusive OR 
00000067 28.e0                LDI R18,0x08		Load immediate 
00000068 ac.01                MOVW R20,R24		Copy register pair 
00000069 56.95                LSR R21			Logical shift right 
0000006A 47.95                ROR R20			Rotate right through carry 
0000006B 80.ff                SBRS R24,0		Skip if bit in register set 
0000006C 06.c0                RJMP PC+0x0007		Relative jump 
0000006D ca.01                MOVW R24,R20		Copy register pair 
0000006E 31.e0                LDI R19,0x01		Load immediate 
0000006F 83.27                EOR R24,R19		Exclusive OR 
00000070 30.ea                LDI R19,0xA0		Load immediate 
00000071 93.27                EOR R25,R19		Exclusive OR 
00000072 01.c0                RJMP PC+0x0002		Relative jump 
00000073 ca.01                MOVW R24,R20		Copy register pair 
00000074 21.50                SUBI R18,0x01		Subtract immediate 
00000075 91.f7                BRNE PC-0x0D		Branch if not equal 
00000076 8b.b9                OUT 0x0B,R24		Out to I/O location 
00000077 31.96                ADIW R30,0x01		Add immediate to word 
00000078 e8.cf                RJMP PC-0x0017		Relative jump 
00000079 80.e0                LDI R24,0x00		Load immediate 
0000007A 90.e0                LDI R25,0x00		Load immediate 
0000007B 08.95                RET 			Subroutine return 
0000007C f8.94                CLI 			Global Interrupt Disable 
0000007D ff.cf                RJMP PC-0x0000		Relative jump 
0000007E ff.ff                NOP 			Undefined
	

Then, execute the program in simulator. Following table shows the cycle time, the PC and the CRC value at some key moments:


CYCLE	PC	OP		val	Comments
1575	0x5B	STS 0x01FF,R24	-	Ready to compute CRC
1693	0x76	OUT 0x0B,R24	0x8E31	Finishing compute 1st byte
1815	0x76	OUT 0x0B,R24	0xD140	Finishing compute 2nd byte
1917	0x76	OUT 0x0B,R24	0x8E31	Finishing compute 3rd byte
2014	0x76	OUT 0x0B,R24	0x50F0	Finishing compute 4th byte
2116	0x76	OUT 0x0B,R24	0x4450	Finishing compute 5th byte
2223	0x76	OUT 0x0B,R24	0x3D04	Finishing compute 6th byte, this value will be the CRC to use when sending a packet
2315	0x76	OUT 0x0B,R24	0x003D	Finishing compute 7th byte
2407	0x76	OUT 0x0B,R24	0x0000	Finishing compute 8th byte, this should be 0 when receving a packet if no error in transmission
CYCLE USED: 832 / (8 x 8) = 13 cycles/bit
	

NOT ELEGANT!

Since MODBUS was developed in 1979, we should write our program using the old school way as people did in 1979; that is, write assembly:


.dseg
.org	0x0200	$	MODBUS_RTU_BUFFER:	.byte	0x100

.cseg
.org	0x0000

.macro	lsr16 ; high-byte, low-byte
	lsr	@0
	ror	@1
.endmacro
.macro	eor16 ; dest-high-byte, dest-low-byte, src-high-byte, src-low-byte
	eor	@0, @2
	eor	@1, @3
.endmacro

.org	0x34					; ATmega328
reset:
	ldi	r16, 0x02
	sts	0x02F8, r16
	ldi	r16, 0x03
	sts	0x02F9, r16
	ldi	r16, 0x00
	sts	0x02FA, r16
	ldi	r16, 0x10
	sts	0x02FB, r16
	ldi	r16, 0x00
	sts	0x02FC, r16
	ldi	r16, 0x03
	sts	0x02FD, r16
	ldi	r16, 0x04
	sts	0x02FE, r16
	ldi	r16, 0x3D
	sts	0x02FF, r16

eth_check:
	ldi	r19, 0xA0			; R19:R18 = CRC 0xA001
	ldi	r18, 0x01
	ser	r17				; R17:R16 = Init value 0xFFFF
	ser	r16
	ldi	r31, high(MODBUS_RTU_BUFFER)
	ldi	r30, 0xF8
	eth_check_crc:
	ld	r0, z+
	eor	r16, r0
	lsr16	r17,r16				; crc = crc >> 1;
	brcc	eth_check_crc_b0		; crc >= (crc&1) ? 0xA001 0x0000 (previous bit 1 now in carry)
	eor16	r17,r16, r19,r18
	eth_check_crc_b0:
	lsr16	r17,r16
	brcc	eth_check_crc_b1
	eor16	r17,r16, r19,r18
	eth_check_crc_b1:
	lsr16	r17,r16
	brcc	eth_check_crc_b2
	eor16	r17,r16, r19,r18
	eth_check_crc_b2:
	lsr16	r17,r16
	brcc	eth_check_crc_b3
	eor16	r17,r16, r19,r18
	eth_check_crc_b3:
	lsr16	r17,r16
	brcc	eth_check_crc_b4
	eor16	r17,r16, r19,r18
	eth_check_crc_b4:
	lsr16	r17,r16
	brcc	eth_check_crc_b5
	eor16	r17,r16, r19,r18
	eth_check_crc_b5:
	lsr16	r17,r16
	brcc	eth_check_crc_b6
	eor16	r17,r16, r19,r18
	eth_check_crc_b6:
	lsr16	r17,r16
	brcc	eth_check_crc_b7
	eor16	r17,r16, r19,r18
	eth_check_crc_b7:
	tst	r30
	brne	eth_check_crc

	rjmp	eth_check
	

We made the following optimization in the assembly code:

  1. Unrool the inner loop to reduce loop control overhead.

  2. Use a 256-byte aligned 256-byte block for buffer. When write into the buffer, make sure the last byte of the packet is aligned to the end of buffer (at address 0xNFF). In this way, we can determining the end of packet by simply comparing the low-byte of the pointer against zero (the fastes comparison).

  3. Instead of looking the LSB of the byte to determining whether to XOR the Polynomial after sgifting, we shift first, then check the Carry-flag to determine whether to skip XOR step. For AVR, shifting will write bit into Carry-flag and there are carry-branch opcodes.

  4. Use registers in the correct way. (AVR GCC has a set of rules about register usage; but they are not smart)

Now, let's assemble the code:


0000004C 30.ea                LDI R19,0xA0		Load immediate 
0000004D 21.e0                LDI R18,0x01		Load immediate 
0000004E 1f.ef                SER R17			Set Register 
0000004F 0f.ef                SER R16			Set Register 
00000050 f2.e0                LDI R31,0x02		Load immediate 
00000051 e8.ef                LDI R30,0xF8		Load immediate 

00000052 01.90                LD R0,Z+			Load indirect and postincrement 
00000053 00.25                EOR R16,R0		Exclusive OR 
00000054 16.95                LSR R17			Logical shift right 
00000055 07.95                ROR R16			Rotate right through carry 
00000056 10.f4                BRCC PC+0x03		Branch if carry cleared 
00000057 13.27                EOR R17,R19		Exclusive OR 
00000058 02.27                EOR R16,R18		Exclusive OR 
00000059 16.95                LSR R17			Logical shift right 
0000005A 07.95                ROR R16			Rotate right through carry 
0000005B 10.f4                BRCC PC+0x03		Branch if carry cleared 
0000005C 13.27                EOR R17,R19		Exclusive OR 
0000005D 02.27                EOR R16,R18		Exclusive OR 
0000005E 16.95                LSR R17			Logical shift right 
0000005F 07.95                ROR R16			Rotate right through carry 
00000060 10.f4                BRCC PC+0x03		Branch if carry cleared 
00000061 13.27                EOR R17,R19		Exclusive OR 
00000062 02.27                EOR R16,R18		Exclusive OR 
00000063 16.95                LSR R17			Logical shift right 
00000064 07.95                ROR R16			Rotate right through carry 
00000065 10.f4                BRCC PC+0x03		Branch if carry cleared 
00000066 13.27                EOR R17,R19		Exclusive OR 
00000067 02.27                EOR R16,R18		Exclusive OR 
00000068 16.95                LSR R17			Logical shift right 
00000069 07.95                ROR R16			Rotate right through carry 
0000006A 10.f4                BRCC PC+0x03		Branch if carry cleared 
0000006B 13.27                EOR R17,R19		Exclusive OR 
0000006C 02.27                EOR R16,R18		Exclusive OR 
0000006D 16.95                LSR R17			Logical shift right 
0000006E 07.95                ROR R16			Rotate right through carry 
0000006F 10.f4                BRCC PC+0x03		Branch if carry cleared 
00000070 13.27                EOR R17,R19		Exclusive OR 
00000071 02.27                EOR R16,R18		Exclusive OR 
00000072 16.95                LSR R17			Logical shift right 
00000073 07.95                ROR R16			Rotate right through carry 
00000074 10.f4                BRCC PC+0x03		Branch if carry cleared 
00000075 13.27                EOR R17,R19		Exclusive OR 
00000076 02.27                EOR R16,R18		Exclusive OR 
00000077 16.95                LSR R17			Logical shift right 
00000078 07.95                ROR R16			Rotate right through carry 
00000079 10.f4                BRCC PC+0x03		Branch if carry cleared 
0000007A 13.27                EOR R17,R19		Exclusive OR 
0000007B 02.27                EOR R16,R18		Exclusive OR 
0000007C ee.23                TST R30			Test for Zero or Minus 
0000007D a1.f6                BRNE PC-0x2B		Branch if not equal 
0000007E cd.cf                RJMP PC-0x0032		Relative jump 
0000007F ff.ff                NOP 			Undefined 
	

Then, execute the program in simulator. Following table shows the cycle time, the PC and the CRC value at some key moments:


CYCLE	PC	OP		val	Comments
31	0x51	LDI R30,0xF8	0xFFFF	Ready to compute CRC
73	0x7D	BRNE PC-0x2B	0x8E31	Finishing compute 1st byte
117	0x76	BRNE PC-0x2B	0xD140	Finishing compute 2nd byte
157	0x76	BRNE PC-0x2B	0xF0D0	Finishing compute 3rd byte
196	0x76	BRNE PC-0x2B	0x50F0	Finishing compute 4th byte
236	0x76	BRNE PC-0x2B	0x4450	Finishing compute 5th byte
227	0x76	BRNE PC-0x2B	0x3D04	Finishing compute 6th byte, this value will be the CRC to use when sending a packet
315	0x76	BRNE PC-0x2B	0x003D	Finishing compute 7th byte
353	0x76	BRNE PC-0x2B	0x0000	Finishing compute 8th byte, this should be 0 when receving a packet if no error in transmission
CYCLE USED: 322 / (8 x 8) = 5.03 cycles/bit
	

That more than double the performance.

ELEGANT! ELEGANT!!!