Embedded Networking

Trials and Tribulations of using Embedded TCP

Issues with using TCP/IP in embedded systems

I was recently called in for a consult on a program that was having trouble with their 802.11 link.

The team working on this program had created a system using a number of embedded micros which were to communicate via Ethernet on an embedded LAN. In my experience, network communications on an embedded LAN normally run fairly smoothly because you are in total control of the environment. You can design the system based on the bandwidth required and put in Ethernet controllers which support those bandwidth requirements.  You can control who talks when and totally avoid the possibility of collisions occurring.

As it turns out that the system created was so complex that the team was unable to get all these micros communicating effectively in a timely fashion while at the same time doing all the number crunching that needed to be done.  The decision was made to backtrack a bit and prototype some of the systems on PCs instead of micros.

This course of action lead to the use of the 802.11 link; what was to be an embedded LAN now became partially embedded and partially a wireless LAN connecting the PCs.  Wireless LANs have their own issues–link saturation, SNR, etc.–some of which I’d had to deal with in the past on prior projects. This is what prompted the request for my help; the team was getting very little data across their wireless link and couldn’t understand why.

After asking a few questions I discovered a couple things:

  1. they were using TCP/IP for their network connections, and
  2. the software engineers had never done network programming

These two factors, combined with the wireless LAN, made for the perfect storm.

The low bandwidth that the team was seeing was due to the fact that TCP uses an exponential backoff mechanism when attempting to guarantee packet delivery. What caused the backoff to occur in the first place were some easily fixed wireless hardware issues.

What compounded the issue was the fact the the socket code on the micros was sending data without regard for the health and status of the socket.  In essence, they were also overflowing their transmit buffers.  This was because the engineers writing the code didn’t know any better.

After shaking my head and rolling my eyes at the state of affairs, the issues were fixed by resolving the wireless hardware issues and instructing the engineers in the use of the select() function to control the flow of data on the socket and monitor its health.

The system now works and the team recently executed a very successful demonstration, but I still have an issue with the fact they are using TCP in the system. Since you control the network and all the traffic on an embedded LAN, TCP is not required.  TCP is designed for traveling long distances through hardware of unknown origin and state; it is not required in a highly controlled embedded environment. In this environment, for this program, UDP is more than sufficient. Here’s why:

  1. The system is tolerant to a small percentage of data loss.
  2. UDP packets are checksummed at higher level–Ethernet CRC checksum and IP Header checksum.  If you get a packet then you are pretty much guaranteed the data is correct.
  3. The 100Mbps links on the system above provides more than ten times the bandwidth required–it had 5 nodes each transmitting less than 1 Mbps.  Staggering their communications to avoid collisions is a simple matter.
  4. Fragmentation can be eliminated by sending data in blocks no larger than a single MSU.
  5. UDP simplifies.  Creating and maintaining connections of a TCP socket can be time consuming and distracting, adding a lot of code with no added value.
  6. UDP datagram loss on a closed embedded LAN is negligible.

Item 5 and 6 above were particularly costly in this instance, many hours were spent maintaining connection oriented code when the occasional loss of data would not have had a negative impact on the system results.  In this case, even including the wireless LAN, iperf tests showed less than 0.02% datagram loss at the bandwidths this system was running.

Just as everything else posted here, this is one engineer’s opinion.  I hope by stating it, I can help you avoid some of the travails I’ve experienced.