Showing posts from July, 2011

Zero-copy network transmission with vmsplice

This post completes a set that also includes asynchronous reading with PACKET_RX_RING and asynchronous writing with PACKET_TX_RING. In this post, we look at sending packets out over a raw socket in zero-copy fashion.

First, understand that the presented code is a silly hack: it requires four system calls for each transmitted packet, so will never be fast. I post it here because the code can be helpful in other situations where you want to use vmsplice.

Second, note that splice support is a protocol-specific feature. Raw sockets support it as of recent kernels. I believe 2.6.36 has it, but would not be surprised if it is lacking in 2.6.34, for instance (please leave a comment if you know when it was introduced).

The basic idea is to send data to a network socket without copying using vmsplice(). But, the vmsplice syscall will only splice into a pipe, not a network socket. Thus, the data first has to be appended to a pipe and then has to be moved to the socket using a splice() call. One e…

Asynchronous packet socket writing with PACKET_TX_RING

2016 update This post is quite old by now. For a more recent example, take a look at

In my last post, I showed how you can read packets enqueued on a packet socket without system calls, by setting up a memory mapped ring buffer between kernel and userspace. Since version 2.6.31, the kernel also supports a transmission ring (or at least, the macro exists since that version; I tested this code against version 2.6.36).

Setting up of a transmission ring is trivial once you know how to create a reception ring. In the setup snippet of the previous post, simply change the call to init_packet_sock to read
  fd = init_packetsock(&ring, PACKET_TX_RING);
Then, at runtime, write packets as follows:
  /// transmit a packet using packet ring // NOTE: for high rate processing try to batch system calls, // by writing multiple packets to the ring before calling send() // // @param pkt is a packet from the network la…

Asynchronous packet socket reading with PACKET_RX_RING

2016 update This post is quite old by now. For a more recent example, take a look at

Since Linux 2.6.2x, processes can read network packets asynchronously using a packet socket ring buffer. By setting the socket option SOL_SOCKET PACKET_RX_RING on a packet socket, the kernel allocates a ring buffer to hold packets. It will then copy all packets that a caller would have had to read using read() to this ring buffer. The caller then maps the ring into its virtual memory by executing an mmap() call on the packet socket and from then on can read packets without issuing any system calls. It signals the kernel that it has finished processing a packet by setting a value in a header structure that is prefixed to the packet. If the caller has processed all outstanding packets, it can block by isssuing a select() involving the packet socket.

This snippet shows how to set up a packet socket with ring
  #include <stdlib.h&…

HOWTO: bind to a non local address (transparent proxy)

In certain situations, you may want to send packets as if they're coming from a different computer. Linux prevents such IP address spoofing by default, because the most well known use is as a malicious spoofing attack. Still, there are legitimate reasons. For instance, a transparent proxy intercepts traffic and replies in name of the original destination. Especially with larger sites, it is common to setup a virtual destination address and have a set of servers handle the load by mimicking this virtual host.

In Linux 2.6+, to spoof packets in IPv4, bind an INET socket to a non-local address, as in this straightforward example:

  #include <errno.h> #include <netinet/in.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/socket.h> #include <sys/types.h> #include <unistd.h> #define SPOOFPORT 80 // really, whichever you want #define SPOOFADDR ((214 << 24) + 1) // address to impersonate int…