Thursday, February 3, 2011

Implementing IP fragmentation

I'm working on an application that does network traffic generation. The application has eth1 in promiscous mode. So it directly handles all incoming and outgoing traffic.

One of the features I'm implementing is IP fragmentation and defragmentation. Incoming fragments need to be reassembled, and outgoing packets need to be fragmented if their sizes exceed the MTU of 1500 bytes.

One simple way of testing my application is by sending a ping command to the IP address of eth1:

ping -c1 -s 20000 10.3.2.1

This is working fine.

However, once the packet size exceeds ~53000 bytes it fails. According to Wireshark I receive fragments until fragment offset ~51000 and then nothing, followed by a reassembly timeout.

The maximum size of an IP packet is 65535 bytes. The ping command allows specifying a size up until 65507. This actually works if I ping to eth0 (OS controlled).

When inspecting the application from the inside with GDB all is going well. Stepping through the code reveals that the fragments enter my application, the IP packet is succesfully reassembled and then fragmented again, and the fragments are sent back to the sender. Even for the last fragment the return value of send(...) (socket API) is equal to the size of the fragment, indicating success.

Does anybody have an idea what could be going wrong?

Operating system is Linux (based on RTLinux).

  • The maximum TCP segment you can have is 65,495 bytes. 
    

    If you concern the headers ... it could be the bound you are looking for :>

    Also, do you have a constant window for TCP or it's adapting itself ? Maybe a smaller window is used and you can't send more data

    Chris S : The TCP header is normally between 40 and 60 bytes; up to 100 bytes with all the options (which isn't possible in all packets). Ethernet adds 38, so let's use 100, the MTU is 1500, less overhead would be 1400 bytes of data. So to get 65,495 bytes you need 47 packets, which would take 4,700 bytes off the top, and leave over 60KB for data still. If the MTU was significantly less, this may be a problem, but it's not the 51KB limit he's hitting.
    StackedCrooked : The reassembly and fragmentation occur on IP level. The response is initiated by ICMP. So TCP is never involved.
    Nikolaidis Fotis : Here is a senario that involves TCP. Let's say that the bitrate is different for uplink / downlink ... and as a result TCP window is different for each way. A-> B (TCP window ... 50000) B-> A (TCP window ... 45000) A send a 50000 TCP segment fragmented in N packets. B reassembles the packet. Now we have two cases. B could either forward packet to upper level and put 45000 bytes, or he could redirect the packet back to A. Now A will receive a packet 50000 bytes, but his receiving buffer (window) is 45000. So ... you can't reassemble it :>
    Nikolaidis Fotis : Another point is ... what's your topology ? Maybe timeout is too short and you should increase it. Or the reassemble buffer.
  • If the packets are correctly fragmented then reassembled on client side, but the reverse does not work,

    A -----> B
             |
    A <--x-- B
    

    I would suggest first to try to inverse the roles (since the problem is on the way back) and check if, this time, the fragmented packets/reassembling can be performed by A

    A <----- B
    |
    ?
    

    If not, there is a problem in the packets management from B to A, could be a firewall limitation, or any router or switch you may have in between.

    From ring0
  • Solved

    The issue was packet loss. The IP response was sent to a switch over a gigabit line. This switch in turn forwarded the messages over a 100 MBit line. This means that packets arrived at greater speed than that they left, causing memory usage inside the switch to quickly rise. Once all memory was used the switch had no other option but to start dropping packets.

0 comments:

Post a Comment