Lynn's Industrial Protocols over IP Discussing my adventures in "IP-Enabling" commercial devices that don't always seen to appreciate being liberated from serial or Ethernet. I have been Ethernet-enabling serial devices for over a decade. However my current vocation involves putting Ethernet devices up on cellular (as in cell-phone) machine-to-machine data plans. tag:blogger.com,1999:blog-35259639 2006-12-07T17:45:58Z Blogger
This is an Atom formatted XML site feed. It is intended to be viewed in a Newsreader or syndicated to another site. Please visit the Blogger Help for more info.
true Lynn August Linse 2006-12-07T11:00:00-06:00 2006-12-07T17:45:58Z 2006-12-07T17:30:28Z tag:blogger.com,1999:blog-35259639.post-116551262871190759 Rockwell Protocol Documents
A friend just pointed out this public web page to me: How to Communicate with Rockwell Automation Products. While have seen many of this documents before, a few of them were new for me. It includes information on:
  • How to talk to ControlLogix tag data via Ethernet/IP
  • How to understand ControlLogix data structure packing when read raw
  • The DF1 serial protocol specification
  • How to encapsulate CIP over DF1 (ie: talk to serial port of Compact/ControlLogix)
  • How to use Ethernet/IP explicit messaging to ControlLogix
  • How to use Ethernet/IP I/O messaging with ControlLogix

In addition, I see www.ab.com has added a new DF1 supplement to its Knowledge Base. Since you have to login giving you a direct link is pointless, but it is called " DF1 Protocol supplement 17706516 ". It compares PLC5 and SLC5 communications, covers some useful commands such as 0x0AB "Protected Typed Logical Write with Mask" to write individual bits, and new data file types not covered in the latest 1996 version of the DF1 specification.

While we are discussing new Rockwell protocol information, you should also review and be aware of the new "DF1 Radio Modem" protocol. I don't think there is a form specification, but you can find a file in the http://www.ab.com/ Knowledge Base that describes the simple differences between it and DF1 Full-Duplex. In summary, DF1 Radio Modem *IS* DF1 Full-Duplex without the protocol ACK/NAK. It is designed for use in radio systems where the powering up of slave modems just to ACK something they will respond to anyway just slows down overall polling. I'm also finding it ideal for cellular IP networks since it literally cuts your data usage by 50 to 60% to NOT be moving 2-byte DF1 ACKs within TCP/IP which also includes a TCP acknowledgements. SInce DF1 includes a TNS or transaction number, there is no problem with mishandling lost or delayed messages.

The main catch today is that I think neither RSLinx nor ControlLogix support DF1 Radio Modem - it is mainly a MicroLogix and SLC5 family resource. However the next release of the Digi One IAP will include the ability to bridge to DF1 Radio Modem from Ethernet/IP, CSPv4 (AB/Ethernet) and DF1 Full-Duplex.

false
Lynn August Linse 2006-12-06T08:59:00-06:00 2006-12-07T17:45:16Z 2006-12-06T15:16:44Z tag:blogger.com,1999:blog-35259639.post-116541820473804629 Cellular IP-Friendly Apps - It Costs to Talk
Summary: All communication must be "under control". All data sent into the cellular system costs money; even if the remote cellular device is powered off , the customer still pays for data set to it.

As a follow-on to the discussion of Retrying TCP Socket Opens, applications must allow the user to both understand and limit all aspects of protocol usage and retry. Users must be allowed to limit and predict a reasonable worst case traffic cost. For example, some protocols include large blocks of initial connection negotiation, which means talking once per minute over an continuously open socket could result in much less cost than talking once per 10 minutes over a socket opened just for one transaction. I have seen applications that allow users to set a maximum desired retry setting - then not always follow that setting and do retries anyway in certain fault conditions.

Recommendation: application-writers must step back and examine every place within the application they create traffic and confirm users have the ability to limit the traffic created.

Example and Numbers: now most of you will be saying "Yah, dahh - so obvious why is this even mentioned?". Well, I'll give you an all too typical example of how this affects real customers. A customer (call him Joe) running a pilot on cellular data access calls to complain his costs are higher than expected. He says he's just polling 3 Modbus registers every 5 minutes. Being no dummy, Joe has already calculated that each request should be 12 bytes of data (One Modbus/TCP function 3 read) and each response should be 17 bytes of data (One Modbus/TCP response with 4 registers since he is reading 4x00003, 4x00004 and 4x00006 so one assumes 4x00005 comes along for the ride). One poll each 5 minutes works out to be 8640 poll per 30 days, so he had hoped to see only about a quarter-megabyte of traffic a month. Yet Joe was seeing data bill for 6 to 10 MB of traffic a month. This means his $20 per month 5MB plan was costing him closer to $60 per month with data overages.

First, Joe overlooked the fact that he has to pay for not only his Modbus data, but also the TCP and IP overhead used to move it. Standard Windows-generated TCP headers are 20 bytes and so are the IP headers. Linux tends to defaults to use TCP time-stamps and thus creates 28-byte TCP headers. So each request is NOT 12 bytes, but 52-60 bytes ... plus the TCP Acknowledge frame will add an additional 40-48 bytes. Yes, YOU pay for the TCP Acknowledgements as well! With headers and TCP Acknowledgement, his responses will be 97-113 bytes not 17. So right off the bat, I can see that he has been under-estimating his monthly traffic. Since he is using Windows, he should be seeing at least 1.6MB of traffic and never 0.25MB.

So I vist Joe and do a network trace of his OPC server traffic. We see that OPC is issuing 3 Modbus polls every 5 minutes - not 1. Hmmm, of course Joe's first reaction is "Heck no - I'm not polling 3 - just 1" but the proof is there as colored pixels on my notebook display. We decode the polls and see the OPC server is polling 3 blocks of 32-registers each. After decoding the Modbus/TCP bytes we learn the exact registers being polled and Joe eventually discovers why these are being polled:

  • One block of 32 registers is fetching his 3 desired value of 4x00003, 4x00004 and 4x00006. Reading the fine print in the OPC manual we see that the OPC server decided this was a "scattered poll" of 2 separate memory areas so it bumped the size up to 32 registers. So just for this one poll, his monthly budget is up to 2.8MB instead of 0.25MB
  • A second block was caused by Joe programming an HMI display to pop-up if a certain alarm condition where true in the field. This was a demo he'd done to impress a customer, but Joe hadn't thought to disable it nor had realized the exact "cost" of such a feature. So the OPC server needs a single register off somewhere else in the PLC memory to satisfy the HMI's alarm/event function. We don't know why this is polled as 32 registers instead of 1 - it is not a "scattered poll" as defined by the OPC vendor's documentation. Perhaps his HMI or OPC server software has a bug in it. Since this is Modbus/TCP (not serial) it is unlikely anyone else has noticed or cared that the application is moving 62 bytes of extra data in every poll. After all, Ethernet is fast and costs nothing to use. It is possible the programmers at the OPC vendor just decided there was no reason to ever poll less than 32 registers when using fast, free "Ethernet".
  • The final block was being caused by Joe's boss leaving open an HMI display in another room that wasn't supposed to be left open - human error (or is it?). Joe learned instanty how important it was for him to properly configure the HMI display settings which timed out displays - either closing the window or just stopping the supporting data polls. He had done that for the normal "user display", but had been lazy and not put such settings into the various diagnostic displays users weren't expected to use!
So now his 6 to 10MB of traffic a month begins to make sense. Each distinct poll is creating nearly 3MB of traffic per month, and his traffic is influenced by which HMI displays users open. Multiple 3MB by 3 polls and you roughly 9MB per month.

What has been learned here?

  • With the overhead of TCP/IP, Joe learned that he had to pay for over 4 times more traffic than his raw Modbus byte calculates had led him to believe.
  • Joe learned that he should be looking at using UDP/IP instead of TCP/IP for his Modbus/TCP since this would cut 40-60% off his bill instantly. Modbus doesn't really require the TCP Acknowledgement and my own tests of UDP/IP over cellular shows it to be about 99.99% reliable - or put another way, I only see about 1 packet lost per 10,000 sent.
  • Joe learned how to review his OPC server's data statistics page. His OPC server had been (indirectly) giving him the answer as to why his data usage was so high. While his OPC server never totaled up the data bytes to include TCP/IP overhead, it was able to show him the 36 polls per hour he was moving instead of his expected 12 (one per 5 minutes).
  • Joe learned that perhaps he needs to look for a new OPC supplier, since his present vendor just doesn't seem to see the big picture of IP-enabled protocols; that Ethernet is not the only media using TCP/IP. Increasingly people expect TCP/IP to move through diverse media which is not always "fast and free" like Ethernet. Joe's present OPC supplier didn't give him the ability to reduce the poll block size below 32 registers when the OPC system thought "Ethernet" was being used.
  • Joe learned he had to be more aggressive in his HMI display design. He couldn't assume users would only look at certain displays and not leave open displays unexpectedly. Joe needed to actively set every possible display to automatically close or stop generating new polls. In fact, after review he discovered that most of his displays had no need for "real-time" update and he could just set them to display the data once as read without any refresh. Users always had to the option to manually redisplay the page.
  • Joe learned that maybe just reading data from the RTU program directly was not such a wise idea. His RTU had the ability to copy and repack data into special polling areas to eliminate "scattered polls". In fact, in the above example we traced at Joe's site, all of the data in those 3 polls could have easily fit within a single 13 register block. So Joe is reviewing his RTU program design to repack ALL data of interest - even data supporting rare HMI displays - into a dedicated memory area. While Joe had previously hoped to avoid this work, he now sees the potential dollar saving or cost penalty his company could face if he avoided this work.

So really in summary I have to say Your data polling needs to be UNDER CONTROL , as in being controlled. You need both the tools and the investment in effort to define as exactly as possible each and every data poll.

false
Lynn August Linse 2006-11-17T16:29:00-06:00 2006-11-22T15:15:16Z 2006-11-17T23:17:37Z tag:blogger.com,1999:blog-35259639.post-116380545798335422 TCP/IP Encapsulation Limited by Distance?
Summary: serial tunneling or TCP encapsulation is NOT directly affected by distance. However, it is affected by "hops" or how many routers and segments it goes through. To reduce the effect "hops" have on your TCP/IP Encapsulation, you need to set the correct options in your Digi Device Server. These are NOT standard defaults since what works best for Wide-Area-Network is not best for local direct Ethernet links.

What is serial TCP/IP encapsulation? It is also called serial bridging or serial tunneling. Think of it as the modern IP equivalent to old short-haul modems or leased line modems. At each end you have a "modem", you connect a serial cable to each, and you create a virtual serial link running from end to end.
Diagram showing serial tunnel

During a webinar I gave, a traffic-industry user asked if serial TCP encapsulation is limited by distance. If he serial bridges between 2 intersections of a road, things work fine. However, if he tries the same serial bridge between an intersection and the home office, then serial bridging does NOT work. So he was wondering "what is the distance limit for serial bridging or encapsulation?"

The simple answer is "There is no distance limitation". However, the longer the distance you move your serial encapsulation, the more routers and network segments (hops) your traffic moves through. The more hops your traffic moves through, the more variable latency (or delay or jitter) is introduced between consecutive network packets. This affects the timing and different protocols and software implementations react differently to it.

Let me just throw some hypothetical numbers together. Let's say the device sends data as a block of 100 bytes; the receiving device will loop and collect this data. Of course the receiving device cannot just wait for 100 bytes - what if the line breaks? It could sit there forever waiting for the end of the message. So various timers are coded to enable the receiving device to understand failure and abort receiving. Let's say the receiving device waits at most 20 milliseconds for the next byte. On a direct serial link this is very common - once the sending device starts sending data it is very unlikely that even a 2 or 3 millisecond gap will appear between bytes.

Enter serial encapsulation - either by radio or Ethernet or any IP-based media. All of these technologies are packet-based and most include some form of error-retry. So now the serial bytes collect in a buffer up to some point, then a chunk of them move together as one packet. Ideally, the full message moves as a single packet. However, if the message becomes split between 2 or more packets it is possible a gap will be detected by the receiving device. So for illustration we'll say the 100 bytes is split into 4 x 25-byte packets. On a single-hop network, there is much less opportunity for timing delays to open gaps in the final serial data. This diagram shows a small gap between the 25th and 26th bytes:
Diagram shows less variability with single IP hop

But running the serial encapsulation through many network hops greatly increases the accumulated delays added to each packet. So each hop has the opportunity to increase the latency and lag. This diagram shows a much larger gaps that may occur when the packets create the serial traffic at the remote end:
Diagram shows less variability with single IP hop

How to solve this problem? On the Digi Connect products, you'll be using one of these serial port profiles: TCP Sockets or Serial Bridge. Under the Advanced Serial Settings you need to enable the check box labeled: [ ] Send data only under any of the following conditions. If you do not check this option, the Digi Device Server purposely fragments the serial data into many TCP segments to provide more realistic end-to-end performance on direct Ethernet links. However, since you want to move data through a wide-area network, you are less concerned with raw throughput than in preventing message fragmentation. You may also be interested in creating fewer TCP/IP packets to send more serial data. Changing this setting accomplishes both of these things.

You now have 2 options to define when TCP packets are moved:
  • "Send when data is present on the serial line" allows you to define an end-of-message pattern such as a carriage-return (\r or \r\n, etc).
  • "Send after the following number of idle milliseconds" allows you to define an idle or quiet time to wait before sending data. This second option is generally safest and I find a value of 10 or 25 milliseconds to be ideal with most automated devices.

Note: do NOT change the setting in the "Send after the following number of bytes" field! This is rarely useful and it does NOT mean (when unchecked) that the Digi Device Server must see 1024 bytes before it sends anything. I have had too many users change this to 1 and then wonder why they have a huge amount of network traffic!

false
Lynn August Linse 2006-11-10T10:31:00-06:00 2006-11-10T22:53:52Z 2006-11-10T16:54:23Z tag:blogger.com,1999:blog-35259639.post-116317766368549125 Application Pitfalls: Serial DF1 over WAN
Digi's wireless group was asking me why AB/DF1 didn't always work over radio when per the specification, DF1 has a nice end-of-message pattern. One would think moving serial DF1 through radio or cellular-IP would be natural and painless.

However, the problem I see watching Windows applications use the serial API (via the PortMon utility) is that they ASSUME a small delay or gap between the (DLE)(ACK) bytes and the response from the slave.

So the application uses the incorrect algorithm:
  1. Read 2 Bytes
  2. Ask Windows to notify application when more data comes
  3. Loop, reading buffered data and waiting until full response seen
The problem with this algorithm is it assumes the response will NOT have been received by the time the application sees the 2 byte (DLE)(ACK). Because we're dealing with radio or IP system that make effort to packetize data there is a high probability that the (DLE)(ACK) and the slave response arrived at the same time within the same packet without any noticable gap. Therefore, Windows NEVER notifies the application of that more data has arrived ... because no more ever comes! The full response has already been received and buffered. The application makes the false assumption that measurable time will occur between step #1 and the start of loop #3.

The correct algorithm would be:
  1. Read 2 Bytes
  2. Loop, reading buffered data and waiting until full response seen
This works because it handles both the situation of no response, a response already received and fully buffered, and a response trickling in over time.

See Also:
false
Lynn August Linse 2006-11-09T08:15:00-06:00 2006-11-09T14:24:38Z 2006-11-09T14:24:38Z tag:blogger.com,1999:blog-35259639.post-116308227867152086 City-wide WiFi - it's not Ethernet
One of my customers is struggling to IP-enable a few dozen Ethernet PLC via one of these new fangled city-wide WiFi systems that are all the rage now. Looked good on paper, but they can only keep the PLCs online for about 20 minutes at a time.

Why? Is the WiFi system defective? Of course not ... it is just behaving more like Wide-Area-Network than Local-Area-Network. I am not directly involved in this, but I'd wager the problem is neither the WiFi nor the PLC. The problem is the host software making Ethenet LAN assumptions about the system. I should offer to go out and do some latency tests; I'd wager the system has high and variable latency more like satellite or cellular.

So industrial control application developers beware, migrating your Ethernet-enabled, LAN-friendly applications to be true IP-enabled, WAN-friendly applications will become more important every time another city annouces the installation of a city-wide wireless infrastructure.
false
Lynn August Linse 2006-11-06T16:54:00-06:00 2006-11-06T23:20:36Z 2006-11-06T23:00:34Z tag:blogger.com,1999:blog-35259639.post-116285403496356567 Cellular-IP Friendly Apps - Retrying Socket Opens
Most industrial applications allow the user to set a slow poll rate – such as one poll per 5 minutes. This allows a user to budget a cell plan at 5MB per month and be quite assured of not going over. Unfortunately, this steady-state poll rate is unrelated to initial TCP/IP socket connection opens!

If the remote device is powered down or the TCP socket open fails for any reason, most applications will attempt to reopen the TCP socket continuously. On Ethernet this may make sense; the more frequently the open is retried, the sooner the failed connection will recover. Most Ethernet-based applications will retry opening a TCP socket every 5 to 30 seconds forever. However, for cellular you are paying for all traffic entering the cellular system. It is not Cingular or Sprint or Verizon's fault your remote device is off-line. You will be billed for each and every TCP retry. I have literally seen applications create up to 1000 MB of traffic each day attempting to reopen a TCP socket to an unreachable remote IP. On a 5 MB per month plan, this 1GB of overage could easily cost you $1000 or more for the month!

Recommendation: all applications must include a user-settable option to delay attempts to reopen TCP sockets. This value can default to no-delay, but users must be able to set a delay of at least 1 hour between retries. This enables the user to define and stay within their data usage budget regardless of success or failure of the TCP connection.

Impact: On Ethernet this should have no direct consequences since the recommended default is no delay. Cellular users must adjust this retry delay to match their data traffic expectations and their cell plan budget.

For example, an application polling 10 Modbus registers per 5 minutes via TCP/IP creates about 198 bytes per poll. This works out to 2376 bytes per hour or a little under 2 MB per month. This is a very safe poll rate when paying for a 5 MB per month plan.

Therefore the desired TCP reconnect scenario should also create no more than 2400 bytes per hour. Consider that a 20-second timeout under Windows creates at least 120 bytes of traffic to an off-line remote. Windows sends a 40-byte [SYN] packet and retries the same 40-byte [SYN] in roughly 3 and then 8 seconds from the previous [SYN] packet. Increasing the timeout to 30 or more seconds creates a fourth 40-byte [SYN] packet sent about 18 seconds after the third. So forcing an application to only attempt one connection per 5 minutes will create from 1440 to 1920 bytes of traffic per hour. This will not break our budgeted cell plan.
false
Lynn August Linse 2006-10-31T09:01:00-06:00 2006-10-31T15:44:17Z 2006-10-31T15:14:36Z tag:blogger.com,1999:blog-35259639.post-116230767695390205 Modbus Bid Spec Suggestions A large customer asked me for advice on bid-specing the use of Modbus/TCP. They are expecting HVAC and other non-production systems to provide "gateways" with Modbus/TCP to simplify central HMI and data collection. Experienced field people know this is not quite as easy as it sounds.<br /><br />So I stepped back and put myself in their shoes - if I were trying to design a new assembly plant and I hoped to monitor HVAC, chemical tank farm, and such by Modbus/TCP, then what issues would hinder this? What details <strong>NOT</strong> included in the <a href="http://www.modbus-ida.org/">http://www.modbus-ida.org/</a> protocol specification would complicate true interoperability? I have many experiences of integration problems with Modbus masters and slaves from 2 different vendors not quite talking as expected. Usually the customer ends up <strong><em>PAYING</em></strong> one vendor or the other to change; or the customer has to buy a 3rd party box to fix the difference of opinion.<br /><br /><p>So how to avoid this up front? Here is the list I created: </p><ol><li>The required interface is Modbus/TCP running on standard Ethernet II frames and following the published specification at <a href="http://www.modbus-ida.org/">http://www.modbus-ida.org/</a>. </li><br /><li>All devices must support at least 100M half-duplex Ethernet and if auto-negotiate is supported, they must be manually configurable to force 100M Half-Duplex.</li><br /><li>If the supplied product uses serial Modbus/RTU or Modbus/ASCII, then vendor must supply a configured, tested, and powered Modbus Ethernet-to-Serial bridge (such as the Digi One IA, model 70001862) to bridge this to Modbus/TCP on Ethernet. </li><br /><li>All data must be available and/or mirrored within the Modbus 4x or "Holding Register" memory area. The other areas can be optionally supported, but all 0x, 1x, and 3x data must be readable in the 4x memory area. For digital writes, support of single-bit writes (function 5) to the 0x area are acceptable. Products that require access to the 1x and 3x area to operate are not acceptable; access to 1x/3x area must be optional. </li><br /><li>Modbus 32-bit longs and floating points must be available in <em>Modicon 984 Compatibility</em> format, which means as two consecutive 16-bit big-endian registers, with the low word in the first register. Other forms (Daniels/Enron or high-word first) can exist but must be optional.</li><br /><li>All gateways or converters bridging non-Modbus data to Modbus must not provide stale data and must not require special "status registers" be monitored to confirm data validity. If the source device of the non-Modbus data is unavailable or the data is out-of-date, then Modbus/TCP requests must return an exception such as 0x0B until the source data is valid again. </li><br /><li>Register 4x00001 must exist and be readable to allow simple, predictable "comm tests". </li><br /><li>Software tools must function properly with slaves only supporting Modbus functions 3 and 16. Requiring diagnostic function 8 support is not acceptable.</li><br /><li>Software tools must be configurable to write a single register as either function 6 or 16.</li><br /><li>Software tools must be configurable to limit reads and write to user selectable limits; for example, the software must accept being limited to reading 1 register per transaction and writing 1 register per transaction. </li><br /><li>Software tools must allow setting to the Modbus/TCP "Unit Id" to a value other than zero. This is required for Ethernet-to-Serial bridging.</li><br /><li>Software tools must use the Modbus/TCP sequence number and modify it between polls. The tool must not leave it set as 0 or 1 all the time.</li><br /><li>To support future wide-area-network usage, all "Masters" must permit TCP socket opens to take up to 30 seconds.</li><br /><li>To support future wide-area-network usage, all "Masters" must permit slave timeouts be set to at least 30 seconds.</li><br /><li>To support future wide-area-network usage, all serial slave devices must have a configurable "gap" or intercharecter delay timeout. The Modbus spec's "3.5 character times" is problematic when dealing with radio & other error-correcting media. </li><br /><li>All devices must be capable of transport via wireless bridging by common Ethernet radio systems such as 802.11 bridges and more traditional 900Mhz line-of-sight bridges.<br /></li></ol><p>Now, will all vendors be able to meet all of these requirements? Probably not since many of them are not required per the Modbus-IDA specifications. However, at least this brings the issues up front to be addressed during the bid award phase. If custom firmware modifications are required, it can be addressed up front and not during factory acceptance testing.</p><p> </p> false Lynn August Linse 2006-10-23T10:20:00-05:00 2006-10-23T16:03:22Z 2006-10-23T15:20:48Z tag:blogger.com,1999:blog-35259639.post-116161684833205300 Cellular-IP Friendly Apps - Socket Open
Most applications attempt to open a TCP socket using the OS/Windows default timeout. This results in an unpredictable timeout. I looked through Microsoft's VS.NET documentation looking for the "How long?" answer ... and never found an answer. I suspect it depends on your version and service-pack levels. I did a web search to discover the truth and found people claiming Windows timed out in 2 seconds, 5 seconds, 10 seconds, 20 seconds, 20-30 seconds, and even one claiming 5 minutes. Sadly, most of these people were looking for a way to force Windows to use a connection timeout of 1 second or less - which will prevent their applications from working on normal wide-area networks.

Such short connection attempts are not suitable for cellular network where the first response packet from an idle remote tends to complete in 3-4 seconds during average conditions. Therefore even a 5 second timeout is too close to the norm to be suitable.

Recommendation: all applications must use an explicit, predictable timeout during a TCP socket open request. This value can be user-settable higher or lower, but for cellular should default to 20 seconds and be settable to at least 60 seconds for satellite.

Impact: On Ethernet this should have limited direct consequences since the timeout only has affect if the remote is not available. If having your application wait 20 seconds for an inaccessable remote is a problem, then enable a user setting to select either "local-area-network" or "wide-area-network" mode and adjust the default connection timeout as appropriate.

In a best case scenario, failing to wait long enough to open a TCP socket when the network is sluggish could prevent connecting for many minutes as sockets succeed to open, but the OS aborts the open before the successful response can come back from the remote. Keep in mind that over cellular the end user is paying for at least 120 bytes of data for every open attempt, and that TCP retransmissions likely make this 160 or 200 bytes.

In a worst case scenario, this aborting of opened sockets on a remote with limited resources risks tying up all resources with past failed opens. Remember, just because your OS timed out the open does not mean the remote device didn't allocate the connection resource and send a successful response. The lack of resources blocks new attempts by the application to reconnect until TCP keepalive or some other mechanism detects the broken sockets and frees up the resources.

Be warned that under cellular - as if in defiance of traditional faith in the reliability of the TCP state machine - TCP sockets break in rare occasions in ways that common OS will fail to detect! During cellular network hiccups, I have seen machines "hang" for 11 hours waiting for a TCP Acknowledgement that never comes! This is with TCP Keepalive enabled for 5 minutes even.

Some Visual Studio discussion: Just out of curiosity, I did some snooping around inside the Visual Studio .NET documentation. I didn't find a good answer, so cannot explain how to solve this problem.

Here is example VB.NET code to opean a TCP socket:
  • Dim tcpClient As New TcpClient
  • Dim ipAddress As IPAddress = dns.GetHostEntry("www.digi.com").AddressList(0)
  • TcpClient.Connect(ipAddress, 11003)
Notice we cannot ask the OS to wait "longer" or "shorter". The documentation says "The Connect method will block until it either connects or fails." A connection timeout results in a SocketException failure being thrown. The TcpClient class has ReceiveTimeout and SendTimeout properties, but these only relate to reads and writes on the connection and have no impact on the initial connection open. Suggestions to use the System.Net.Sockets.Socket class instead aren't helpful since this class also doesn't offer a simple connection timeout mechanism.

The only wait to define a predictable TCP socket connection timeout appears to be use an asynchronous design with BeginConnect and some form of external timer to call EndConnect at the desired timeout.

To rephrase myself, I am not saying the default Windows connection timeout is incorrect - I am saying evidence is that you cannot predict what timeout your customer will see if you don't explicitly define one . So while your application running on your computer may default to a nice 20 seconds timeout, what happens if your customer runs the same application on an older computer and sees a 3 second or 5 second timeout? The answer is they won't be able to reliably connect to cellular or satellite remote IPs, and either won't buy your product again or will call Tech Support.
false
Lynn August Linse 2006-10-20T08:58:00-05:00 2006-10-23T14:36:14Z 2006-10-20T14:08:12Z tag:blogger.com,1999:blog-35259639.post-116135329235915179 Cellular-IP Friendly Applications - Intro
In theory, host applications using TCP/IP on Ethernet should work over wide-area networks which support TCP/IP. Unfortunately, most host applications are written and tested for Ethernet, not generic IP. When you move into cellular IP or satellite, the high and variable latency introduced causes many host applications to either fail or generate an order of magnitute more traffic than than they should.

For example, here is a chart of 1000 Modbus/RTU polls over cellular TCP/IP. There is a random delay between polls of 30 seconds to 30 minutes. The patterns are rather striking: most polls complete in between 1 to 2 seconds, but there is clearly some systematic "aliasing" causing responses to complete in 2.8, 3.8, and 10.8 seconds.

chart showing Modbus times


After years of troubleshooting customers systems, I have been creating a running document and commentary on Bad things host apps do. I will be publishing these things over time in this blog. But to summarize:
  • The default OS timeout on opening TCP Sockets may be too short.
  • Attempting to open TCP sockets to unresponsive remotes must be a controlled process, since retries cost money.
  • Since all packets and retries cost money, all aspects of the implemented protocol must be controlled and adjustable.
  • OS stack calls may not return if the OS fails to detect a response or socket failure.
  • Responses from the remote could take 15 to 60 seconds.
  • TCP segment fragmentation and reassembly is exaggerated; can have many seconds of delay between fragments.
  • TCP sockets idle longer than 5 minutes often go away without error or detection.
  • Every byte your application sends (or resends) costs your customer money.
false
Lynn August Linse 2006-10-12T09:36:00-05:00 2006-12-07T13:23:02Z 2006-10-12T14:38:38Z tag:blogger.com,1999:blog-35259639.post-116066391826591935 Siemens PLC via Cellular
We succeeded in getting a Siemens S7-226 with CP243 and PPI serial up on the Digi Connect WAN, which is a cellular router for GSM or CDMA with local Ethernet and serial port.

In Summary:

  • To talk to S7-226 by serial PPI , you need a newer Siemens PC-to-PPI cable - the older one doesn't work. I am not sure why, but that is what we found. Using Digi RealPort we enabled a redirected COM port to the remote Digi Connect WAN's serial port, which is connected to the PPI port of the S7-200. We then defined a radio modem port within MicroWin using that COM port. Although a 30 second timeout would be ideal, MicroWin only gives options for 1, 10 or 100 seconds of timeout. You should probably select the 100 seconds to minimize your comm costs. Now MicroWin or Step7 can freely connect to and reprogram the S7-200. The high end-to-end latency of the cellular IP networks makes the performance pretty sluggish when compared to direct serial, but it works.
  • To talk to S7-315 by serial MPI , you need the special Siemens PC-to-MPI cable. Just as with the S7-200, we set up a redirected Digi RealPort, however we did NOT need to fool Step7 into thinking this was a radio modem. It just worked fine as is when given longer timeout settings.
  • To talk to CP243 by S7 protocol over ISODE on Ethernet , we enabled TCP port forwarding of port 102 on the Digi Connect WAN to the CP243 module. The CP243 is configured to treat the Digi Connect WAN as its Gateway IP. This also worked fine as is when given longer timeout settings.

false