I wrote a custom protocol on top of ‘Layer 3′ for my previous employer. The idea of this protocol was to use capabilities of managed network switches in order to ‘cast’ data to hundreds of computers simultaneously, whilst the sending computer only sends the data to ONE destination IP address.

The destination address is not a computer’s IP address, nor is it the ‘broadcast’ address, but it is a special ‘multicast’ IGMP address chosen by the sender. Potential receiving machines are ‘invited’ to join this group via a one-off broadcast message. This message is sent 3 times.

Computers that want the file ask the network switch to reconfigure IGMP so they can join the group. Machines have to check in quickly to avoid the time-out for checkins, so software which uses the protocol has to be running in an idle state. After a short delay, and subsequent speed ramp, they all receive the data at full speed, providing they can keep up. This has the result of the data effectively getting ‘amplified’, but not interfering with other network traffic. This means you can replicate data across machines in a fraction of the time you normally would spend copying it to each one.

Each computer sends back performance related-information on timed occasions, so speed can be calibrated, and retransmissions can be avoided.

The new protocol also supported realtime zlib compression as part of its design (to get rates of over 1Gbps on a 1Gbps network), and had a huge 4-second retransmit window, giving even the laggiest PCs time to ‘catch up’ before retransmits are sent. Excessive retransmits results in the protocol ‘rethinking’ how to send data and to review its strategy on inter-packet intervals. 99% of network traffic was used on tests with almost zero retransmits, depending on the speed ramping and ‘aggression’ of the protocol.

The theoretical top speed limit of the protocol was 128 Tbps (terabits per second), if the full 4000ms of latency was used. At this point ‘sequence numbers’ run out in the retransmit buffer, but the protocol still then puts a ‘temporary hold’ on the transmission.

I wrote official RFC documentation for the protocol (but you will not find it in the RFCs)!

Comments Off

Comments are closed.