Lynn's Industrial Automation Protocol Tips
Home > Modbus > Modbus Bridge Tips >
IP-Enable Blog
White Papers
Digi Product Tips
Yahoo Group
Contact Info

The information on this web page is either Lynn's opinion or something Lynn agrees with. He has complete control over its contents. Nothing contained within this pages is approved by or represents the opinion of Digi International, The Modbus Organization, the Modbus-IDA Group, ODVA, or any other organization Lynn is associated with.

Modbus Bridge Tips

Here is my collection of tips for designing your MB/TCP bridge.

Question? How many TCP sockets should I support?

Answer: As many as your can!

Many MB/TCP clients will attempt to open one TCP socket for every slave they wish to poll. This means if you have 32 slaves on an RS-485 multi-drop they'll try to open 32 TCP sockets! So only supporting 4 or 8 TCP sockets means you only can support 4 or 8 slaves on your RS-485.

Plus many network hiccups can cause a client to uncleanly drop its previous TCP sockets and attempt to open another set. So if you only support 8 sockets and a client drops 8 sockets and tries to open 8 MORE, very likely the client will be locked out for a few minutes until your MB/TCP bridge can detect the drop and free up your limited resources.

So bottom line - if you support RS-485 you should try to support a minimum of 32 MB/TCP sockets and 64 is better still.

return to top

Question? If my serial slave didn't answer, how does the bridge respond?

Answer: You must return Exception 11 (hex 0B)

Serial Modbus has always use no response to indicate a slave being off-line or CRC error. But TCP/IP is a reliable media - once your bridge accepts the connection, the remote client will ASSUME you'll answer all responses. Since the socket is open, you obviously are NOT off-line. Since TCP/IP uses internal retries, there obviously won't be a CRC error. Several common MB/TCP clients behave very badly if you do not. So return

return to top

Question? Must my bridge support exception 10 and 11?

Answer: Yes, by default. But it's a bit more complex then that.

While most MB/TCP clients run better with these exceptions, some clients may be using MB/RTU logic and will not understand these two newer expections. It may even be that the remote client is just a MB/RTU master using a bridge. In fact, a few DCS gateways that include support for redundant channels will NOT properly failover when they see any exception response. They reason "Well, there was an answer so the slave must just be busy". So this means if your bridge has a serial Master attached, you need to protect it from exception 10/11 unless you know the serial Master is designed to handle them.

So it's important to offer the following:

  • By default: always return exception 10 and 11 to remote MB/TCP clients.
  • But allow the user to disable these responses if they don't want them.
  • By default: Always filter out exception 10 and 11 when talking to a serial Modbus Master.
  • But allow the user to enable these responses if they desire them.

return to top

Question? What is network overload or run-away queue build-up?

Answer: It's when network clients continue to send more requests than your bridge can answer.

For example, your bridge is receiving 8 requests every second and is only able to answer 6 of those, leaving a net build up of 2 requests per second - or 120 per minute! Since TCP is reliable and includes flow control, the TCP socket will be able to hold thousands of these stale requests. The primary symptoms of network overload are

  • Data writes take longer and longer to impact the slave,
  • the MB/TCP client marks ALL slaves as being off-line even though LED activity shows there is communications,
  • the bridge continues to poll the serial line (LED activity) after the MB/TCP clients have stopped asking new requests,
  • a power cycle of the bridge fixes this (ie: it flushes the TCP socket).

A common network overload (meltdown) scenerio

A MB/TCP client/master has been happliy issuing 8 requests each 1 second for months. It "pipe-lines" these requests to a bridge with 8 RS-485 slaves, meaning it sends them all at once. If it takes an average of 100 msec for your bridge to obtain the RS-485 slave response, then the client will get see all 8 responses every second. In order for the meltdown to occur a number of 'design assumptions' must have been built into both the MB/TCP client and your bridge. I'll highlight these assumptions after the example, but unfortunately these 'design assumptions' are extermely likely to be true.

diagram showing queue build-up

Now the melt-down starts. Assume the MB/TCP client has a Modbus/RTU mentality and just blindly retries polls it did not receive last cycle after a 1 second timeout. There are two very common causes for the network overload. One is a second MB/TCP client comes in and also asks 8 requests per second. Since our serial line with an average response time of 100msec has a maximum bandwidth of 10 transactions per second, meltdown now starts because the bridge is seeing 16 requests per second.

A second, more common but stealthful cause is when 1 of the 8 slaves goes off-line. Say the bridge has a slave timeout of 500 msec, thus the time to issue 8 transaction polls and receive a response or timeout is now 1200 msec (7 * 100 msec + 1 * 500 msec). So in this example the bridge will only be able to answer an average of 6 of the 8 requests every second. So the MB/TCP client timeouts and aborts the 2 of 8 requests not returned, and promptly issues 8 more. Of course, your bridge does NOT understand that the 2 remaining polls are no longer desired, so it handles them in the next 200 msec of time and returns responses with unexpected MB/TCP sequence numbers - which the MB/TCP client silently discards.

diagram showing queue build-up

After just 4 seconds, the network overload (melt-down) is in full force. Your bridge is happily answering 6 requests per second, it is noticing neither the steady queue buildup nor that all 6 of the requests it is answering are stale and unexpected by the MB/TCP client. The MB/TCP client receives the 6 responses, sees the MB/TCP header sequence numbers are not expected and discards the responses. So even though your bridge is still answering 6 of 8 requests per second, the MB/TCP client is seeing 0 of 8 new requests sent each second.

If the MB/TCP client fails to implement the MB/TCP header sequence number (a sadly too common wimp-out), then reads will appear almost normal. The client will be mismatching old stale polls to new requests, plus you'll have 1 or 2 "timeouts" per cycle that appear to move from slave to slave. But writes will appear to take longer and longer as they wait in the building TCP queue longer and longer.

return to top

Question? How to avoid network overload or run-away queue build-up?

Answer: Depends on if you are the client or server ...

Likely you only control one-half of the system here. Either you are supplying the bridge or the TCP client/Master such as an OPC. If you are the MB/TCP server, then:

  • Do not take the easy design and leave all pipelined requests buffered in TCP. When you do this, your bridge can never detect that a "new" request seen is really very old.
  • Instead, you ideally unqueue every MB/TCP request immediately, time-stamp them, and place into an internal queue. The server can then use a client (Master) timeout to detect when requests are too old.
  • Implement the exception response 11 (hex 0x0B) to inform the MB/TCP client of the fact that either the request received no slave response (a true timeout) or because of congestion cannot be answered.

If you are the MB/TCP client, then:

  • SUPER CRITICAL! Use the MB/TCP header field "sequence number". If you blindly set this to one (1) every message you'll never detect that you are receiving old/stale responses. This can lead to serious control glitches!
  • Do not silently discard old/stale MB/TCP responses. Instead, use them as feedback that you are polling too fast. Find some way to notify the user that slave response is slower than expected and start polling slower. This is a radically different design than most OPC servers use! Poller slower is not likely what the user expects, but it is better than causing active slaves to appear off-line and out of control!

return to top