UDP Tunnel
Idea
Normally, when we refer to network tunneling, this always involes the presence of a tunnel interface.
This tunnel interface is a purely software interface, residing in the network (IP) stack, with its own configuration. Upon writing to it (sending a packet though it), the tunnel interface wraps the packet with outer headers (encapsulates the packet) and sends it over some virtual link (probably TCP or UDP socket, or some lower level connection-oriented or connectionless transport). Similarly, upon receving from that virtual link, the tunnel interface then strips the outer header (decapsulates the packet) and delivers the packet to target destination internal network.
The aim of this tutorial is explore a different approach of tunneling, without the need of a dedicated (additional) tunneling interface. Intead, we will simply put an anchor on an interface to monitor the incoming packets and if they are destined for the other end of our tunnel, we will tunnel them by simply sending them over a UDP socket.
Setup and topology
For demonstration purposes, we will make a setup with physically connected bare-metal Raspberry PI and a bare-metal PC as tunnel termination. This aims to show the capability of the PacketCord library to work consistently on different processor architectures, including x86-64 (PC with Ryzen CPU) and ARM32 (a pretty ancient Raspberry Pi 2 Model B Rev 1.1). The network topology is depicted on the below diagrams:
The other side (we will call it server side) of the tunnel will be an Ubuntu server with a network namespace.
Encapsulation
For the sake of simplicity, the UDP payload will be the L3-and-above portion of the incoming packet - thus we will achieve Layer 3 tunnel functionality.
We can encrypt any part of the encapsulated packet (headers, payload) according to our needs from the code (by using the CORD-CRYPTO library).
Code
We will use the example from the PacketCord.io official repo. We only need to modify the IP addresses according to our needs, for both the client and server sides.
Client code
This is the code that terminates the right side of the tunnel on the above diagram. It runs on the Ubuntu computer.
#include <cord_flow/event_handler/cord_linux_api_event_handler.h>
#include <cord_flow/flow_point/cord_l2_raw_socket_flow_point.h>
#include <cord_flow/flow_point/cord_l3_stack_inject_flow_point.h>
#include <cord_flow/flow_point/cord_l4_udp_flow_point.h>
#include <cord_flow/memory/cord_memory.h>
#include <cord_flow/match/cord_match.h>
#include <cord_error.h>
#define MTU_SIZE 1420
#define ETHERNET_HEADER_SIZE 14
#define DOT1Q_TAG_SIZE 4
#define BUFFER_SIZE (MTU_SIZE + ETHERNET_HEADER_SIZE)
#define MATCH_IP_TO_TUNNEL "11.11.11.100"
#define MATCH_NETMASK "255.255.255.255"
static struct
{
CordFlowPoint *l2_eth;
CordFlowPoint *l3_si;
CordFlowPoint *l4_udp;
CordEventHandler *evh;
} cord_app_context;
static void cord_app_setup(void)
{
CORD_LOG("[CordApp] Expecting manual additional setup - blackhole routes, interface MTU.\n");
}
static void cord_app_cleanup(void)
{
CORD_LOG("[CordApp] Destroying all objects!\n");
CORD_DESTROY_FLOW_POINT(cord_app_context.l2_eth);
CORD_DESTROY_FLOW_POINT(cord_app_context.l3_si);
CORD_DESTROY_FLOW_POINT(cord_app_context.l4_udp);
CORD_DESTROY_EVENT_HANDLER(cord_app_context.evh);
CORD_LOG("[CordApp] Expecting manual additional cleanup.\n");
}
static void cord_app_sigint_callback(int sig)
{
cord_app_cleanup();
CORD_LOG("[CordApp] Terminating the PacketCord Tunnel App!\n");
CORD_ASYNC_SAFE_EXIT(CORD_OK);
}
int main(void)
{
struct in_addr prefix_ip, netmask;
inet_pton(AF_INET, MATCH_IP_TO_TUNNEL, &prefix_ip);
inet_pton(AF_INET, MATCH_NETMASK, &netmask);
cord_retval_t cord_retval;
CORD_BUFFER(buffer, BUFFER_SIZE);
size_t rx_bytes = 0;
size_t tx_bytes = 0;
cord_ipv4_hdr_t *ip = NULL;
cord_udp_hdr_t *udp = NULL;
CORD_LOG("[CordApp] Launching the PacketCord Tunnel App!\n");
signal(SIGINT, cord_app_sigint_callback);
cord_app_context.l2_eth = CORD_CREATE_L2_RAW_SOCKET_FLOW_POINT('A', "enp6s0");
cord_app_context.l3_si = CORD_CREATE_L3_STACK_INJECT_FLOW_POINT('I');
cord_app_context.l4_udp = CORD_CREATE_L4_UDP_FLOW_POINT('B', inet_addr("192.168.100.5"), inet_addr("38.242.203.214"), 60000, 50000);
cord_app_context.evh = CORD_CREATE_LINUX_API_EVENT_HANDLER('E', -1);
cord_retval = CORD_EVENT_HANDLER_REGISTER_FLOW_POINT(cord_app_context.evh, cord_app_context.l2_eth);
cord_retval = CORD_EVENT_HANDLER_REGISTER_FLOW_POINT(cord_app_context.evh, cord_app_context.l4_udp);
while (1)
{
int nb_fds = CORD_EVENT_HANDLER_WAIT(cord_app_context.evh);
if (nb_fds == -1)
{
if (errno == EINTR)
continue;
else
{
CORD_ERROR("[CordApp] Error: CORD_EVENT_HANDLER_WAIT()");
CORD_EXIT(CORD_ERR);
}
}
for (uint8_t n = 0; n < nb_fds; n++)
{
if (cord_app_context.evh->events[n].data.fd == cord_app_context.l2_eth->io_handle)
{
cord_retval = CORD_FLOW_POINT_RX(cord_app_context.l2_eth, buffer, BUFFER_SIZE, &rx_bytes);
if (cord_retval != CORD_OK)
continue; // Raw socket receive error
if (rx_bytes < sizeof(cord_eth_hdr_t))
continue; // Packet too short to contain Ethernet header
cord_eth_hdr_t *eth = cord_get_eth_hdr(buffer);
if (!cord_match_eth_type(eth, CORD_ETH_P_IP))
continue; // Only handle IPv4 packets
if (rx_bytes < sizeof(cord_eth_hdr_t) + sizeof(cord_ipv4_hdr_t))
continue; // Too short for IP header
ip = cord_get_ipv4_hdr_from_eth(eth);
if (!cord_match_ipv4_version(ip))
continue; // Not IPv4
int iphdr_len = cord_get_ipv4_header_length(ip);
if (rx_bytes < sizeof(cord_eth_hdr_t) + iphdr_len)
continue; // IP header incomplete
if (CORD_L2_RAW_SOCKET_FLOW_POINT_ENSURE_INBOUD(cord_app_context.l2_eth) != CORD_OK)
continue; // Ensure this is not an outgoing packet
if (rx_bytes < sizeof(cord_eth_hdr_t) + iphdr_len + sizeof(cord_udp_hdr_t))
continue; // Too short for UDP header
udp = cord_get_udp_hdr_ipv4(ip);
uint32_t src_ip = cord_get_ipv4_src_addr_ntohl(ip);
uint32_t dst_ip = cord_get_ipv4_dst_addr_ntohl(ip);
if (cord_match_ipv4_dst_subnet(ip, cord_ntohl(prefix_ip.s_addr), cord_ntohl(netmask.s_addr)))
{
uint16_t total_len = cord_get_ipv4_total_length_ntohs(ip);
cord_retval = CORD_FLOW_POINT_TX(cord_app_context.l4_udp, ip, total_len, &tx_bytes);
if (cord_retval != CORD_OK)
{
// Handle the error
}
}
}
if (cord_app_context.evh->events[n].data.fd == cord_app_context.l4_udp->io_handle)
{
cord_retval = CORD_FLOW_POINT_RX(cord_app_context.l4_udp, buffer, BUFFER_SIZE, &rx_bytes);
if (cord_retval != CORD_OK)
continue; // Raw socket receive error
cord_ipv4_hdr_t *ip_inner = cord_get_ipv4_hdr_l3(buffer);
if (rx_bytes != cord_get_ipv4_total_length_ntohs(ip_inner))
continue; // Packet partially received
if (!cord_match_ipv4_version(ip_inner))
continue;
int ip_inner_hdrlen = cord_get_ipv4_header_length(ip_inner);
CORD_L3_STACK_INJECT_FLOW_POINT_SET_TARGET_IPV4(cord_app_context.l3_si, cord_get_ipv4_dst_addr_l3(ip_inner));
cord_retval = CORD_FLOW_POINT_TX(cord_app_context.l3_si, buffer, cord_get_ipv4_total_length_ntohs(ip_inner), &tx_bytes);
if (cord_retval != CORD_OK)
{
// Handle the error
}
}
}
}
cord_app_cleanup();
return CORD_OK;
}
Server code
#include <cord_flow/event_handler/cord_linux_api_event_handler.h>
#include <cord_flow/flow_point/cord_l2_raw_socket_flow_point.h>
#include <cord_flow/flow_point/cord_l3_stack_inject_flow_point.h>
#include <cord_flow/flow_point/cord_l4_udp_flow_point.h>
#include <cord_flow/memory/cord_memory.h>
#include <cord_flow/match/cord_match.h>
#include <cord_error.h>
#define MTU_SIZE 1420
#define ETHERNET_HEADER_SIZE 14
#define DOT1Q_TAG_SIZE 4
#define BUFFER_SIZE (MTU_SIZE + ETHERNET_HEADER_SIZE)
#define MATCH_IP_TO_TUNNEL "192.168.111.2"
#define MATCH_NETMASK "255.255.255.255"
static struct
{
CordFlowPoint *l2_eth;
CordFlowPoint *l3_si;
CordFlowPoint *l4_udp;
CordEventHandler *evh;
} cord_app_context;
static void cord_app_setup(void)
{
CORD_LOG("[CordApp] Expecting manual additional setup - blackhole routes, interface MTU.\n");
}
static void cord_app_cleanup(void)
{
CORD_LOG("[CordApp] Destroying all objects!\n");
CORD_DESTROY_FLOW_POINT(cord_app_context.l2_eth);
CORD_DESTROY_FLOW_POINT(cord_app_context.l3_si);
CORD_DESTROY_FLOW_POINT(cord_app_context.l4_udp);
CORD_DESTROY_EVENT_HANDLER(cord_app_context.evh);
CORD_LOG("[CordApp] Expecting manual additional cleanup.\n");
}
static void cord_app_sigint_callback(int sig)
{
cord_app_cleanup();
CORD_LOG("[CordApp] Terminating the PacketCord Tunnel App!\n");
CORD_ASYNC_SAFE_EXIT(CORD_OK);
}
int main(void)
{
struct in_addr prefix_ip, netmask;
inet_pton(AF_INET, MATCH_IP_TO_TUNNEL, &prefix_ip);
inet_pton(AF_INET, MATCH_NETMASK, &netmask);
cord_retval_t cord_retval;
CORD_BUFFER(buffer, BUFFER_SIZE);
size_t rx_bytes = 0;
size_t tx_bytes = 0;
cord_ipv4_hdr_t *ip = NULL;
cord_udp_hdr_t *udp = NULL;
CORD_LOG("[CordApp] Launching the PacketCord Tunnel App!\n");
signal(SIGINT, cord_app_sigint_callback);
cord_app_context.l2_eth = CORD_CREATE_L2_RAW_SOCKET_FLOW_POINT('A', "veth0");
cord_app_context.l3_si = CORD_CREATE_L3_STACK_INJECT_FLOW_POINT('I');
cord_app_context.l4_udp = CORD_CREATE_L4_UDP_FLOW_POINT('B', inet_addr("38.242.203.214"), inet_addr("78.83.207.86"), 50000, 60000);
cord_app_context.evh = CORD_CREATE_LINUX_API_EVENT_HANDLER('E', -1);
cord_retval = CORD_EVENT_HANDLER_REGISTER_FLOW_POINT(cord_app_context.evh, cord_app_context.l2_eth);
cord_retval = CORD_EVENT_HANDLER_REGISTER_FLOW_POINT(cord_app_context.evh, cord_app_context.l4_udp);
while (1)
{
int nb_fds = CORD_EVENT_HANDLER_WAIT(cord_app_context.evh);
if (nb_fds == -1)
{
if (errno == EINTR)
continue;
else
{
CORD_ERROR("[CordApp] Error: CORD_EVENT_HANDLER_WAIT()");
CORD_EXIT(CORD_ERR);
}
}
for (uint8_t n = 0; n < nb_fds; n++)
{
if (cord_app_context.evh->events[n].data.fd == cord_app_context.l2_eth->io_handle)
{
cord_retval = CORD_FLOW_POINT_RX(cord_app_context.l2_eth, buffer, BUFFER_SIZE, &rx_bytes);
if (cord_retval != CORD_OK)
continue; // Raw socket receive error
if (rx_bytes < sizeof(cord_eth_hdr_t))
continue; // Packet too short to contain Ethernet header
cord_eth_hdr_t *eth = cord_get_eth_hdr(buffer);
if (!cord_match_eth_type(eth, CORD_ETH_P_IP))
continue; // Only handle IPv4 packets
if (rx_bytes < sizeof(cord_eth_hdr_t) + sizeof(cord_ipv4_hdr_t))
continue; // Too short for IP header
ip = cord_get_ipv4_hdr_from_eth(eth);
if (!cord_match_ipv4_version(ip))
continue; // Not IPv4
int iphdr_len = cord_get_ipv4_header_length(ip);
if (rx_bytes < sizeof(cord_eth_hdr_t) + iphdr_len)
continue; // IP header incomplete
if (CORD_L2_RAW_SOCKET_FLOW_POINT_ENSURE_INBOUD(cord_app_context.l2_eth) != CORD_OK)
continue; // Ensure this is not an outgoing packet
if (rx_bytes < sizeof(cord_eth_hdr_t) + iphdr_len + sizeof(cord_udp_hdr_t))
continue; // Too short for UDP header
udp = cord_get_udp_hdr_ipv4(ip);
uint32_t src_ip = cord_get_ipv4_src_addr_ntohl(ip);
uint32_t dst_ip = cord_get_ipv4_dst_addr_ntohl(ip);
if (cord_match_ipv4_dst_subnet(ip, cord_ntohl(prefix_ip.s_addr), cord_ntohl(netmask.s_addr)))
{
uint16_t total_len = cord_get_ipv4_total_length_ntohs(ip);
cord_retval = CORD_FLOW_POINT_TX(cord_app_context.l4_udp, ip, total_len, &tx_bytes);
if (cord_retval != CORD_OK)
{
// Handle the error
}
}
}
if (cord_app_context.evh->events[n].data.fd == cord_app_context.l4_udp->io_handle)
{
cord_retval = CORD_FLOW_POINT_RX(cord_app_context.l4_udp, buffer, BUFFER_SIZE, &rx_bytes);
if (cord_retval != CORD_OK)
continue; // Raw socket receive error
cord_ipv4_hdr_t *ip_inner = cord_get_ipv4_hdr_l3(buffer);
if (rx_bytes != cord_get_ipv4_total_length_ntohs(ip_inner))
continue; // Packet partially received
if (!cord_match_ipv4_version(ip_inner))
continue;
int ip_inner_hdrlen = cord_get_ipv4_header_length(ip_inner);
CORD_L3_STACK_INJECT_FLOW_POINT_SET_TARGET_IPV4(cord_app_context.l3_si, cord_get_ipv4_dst_addr_l3(ip_inner));
cord_retval = CORD_FLOW_POINT_TX(cord_app_context.l3_si, buffer, cord_get_ipv4_total_length_ntohs(ip_inner), &tx_bytes);
if (cord_retval != CORD_OK)
{
// Handle the error
}
}
}
}
cord_app_cleanup();
return CORD_OK;
}
Configuration
We will execute the necessary additional configuration independently. The user can also make use of the static void cord_app_setup(void) and static void cord_app_cleanup(void) for that purpose, but in order to keep the tutorial simple and clear, we will make the fine tuning separately.
Client side
Our client side comprises two devices - the RaspberryPI client and the directly connected PC that terminates the tunnel (and runs the PacketCord app). The only consideration that we should take is the reducing the MTU size on the link between both points in order to take into consideration the overhead caused by our our UDP encapsulation - let's set it to 1420.
Raspberry PI
sudo ip link set mtu 1420 dev eth0
Host
sudo ip link set mtu 1420 dev enp6s0
We also need to blachole the route towards the target tunnel subnet. This is because we will be using a normal (v1) raw socket which always copies the packet from the kernel upon entering the user-space. Our app will be wrapping the user-space copy of the packet and we need to get rid of the packet in the kernel.
sudo ip route add blackhole 11.11.11.0/24
Server side
The server side is similar to the client side in terms of termination and endpoint organisation, but instead of physical device as an endpoint, we will create a namespaces connected to the IP stack of the host via a veth (virtual Ethernet) pair:
# 1. Create the namespace
sudo ip netns add ns1
# 2. Create the veth pair
sudo ip link add veth0 type veth peer name veth1
# 3. Move veth1 to namespace ns1
sudo ip link set veth1 netns ns1
# 4. Assign IP address to veth0 (host side)
sudo ip addr add 11.11.11.1/24 dev veth0
# 5. Bring up veth0
sudo ip link set veth0 up
# 6. Inside the namespace, assign IP to veth1
sudo ip netns exec ns1 ip addr add 11.11.11.100/24 dev veth1
# 7. Bring up veth1 inside namespace
sudo ip netns exec ns1 ip link set veth1 up
# 8. Bring up the loopback interface inside namespace
sudo ip netns exec ns1 ip link set lo up
# 9. Set default route inside namespace via veth0's IP
sudo ip netns exec ns1 ip route add default via 11.11.11.1
# 10. Disable the offload functionality on the vethX interfaces
sudo ethtool --offload veth0 rx off tx off
sudo ip netns exec ns1 ethtool --offload veth1 rx off tx off
# 11. Set the MTU size on both ends of the veth pair
sudo ip link set mtu 1420 dev veth0
sudo ip netns exec ns1 ip link set mtu 1420 dev veth1
Similarly to the host side, we also need the magic with the blackhole route:
sudo ip route add blackhole 192.168.111.0/30
Compile and run
On boths sides, compile the application and run it as sudo:
mkdir build
cd build/
cmake ..
make
sudo ./apps/l3_tunnel/l3_tunnel_app
In case no errors have occured, the console output should be something like:
[CordApp] Launching the PacketCord Tunnel App!
[CordL4UdpFlowPoint] Successfully bound to port 60000
Ping test
Let's initiate the ping from the client side towards the server first (because the client is behind NAT).
$ ping 11.11.11.100 -c 4
PING 11.11.11.100 (11.11.11.100) 56(84) bytes of data.
64 bytes from 11.11.11.100: icmp_seq=1 ttl=64 time=40.6 ms
64 bytes from 11.11.11.100: icmp_seq=2 ttl=64 time=43.9 ms
64 bytes from 11.11.11.100: icmp_seq=3 ttl=64 time=39.1 ms
64 bytes from 11.11.11.100: icmp_seq=4 ttl=64 time=40.1 ms
Then, the reverse path (from server to the client end) should also work:
# ip netns exec ns1 ping 192.168.111.2 -c 4
PING 192.168.111.2 (192.168.111.2) 56(84) bytes of data.
64 bytes from 192.168.111.2: icmp_seq=1 ttl=64 time=39.3 ms
64 bytes from 192.168.111.2: icmp_seq=2 ttl=64 time=40.1 ms
64 bytes from 192.168.111.2: icmp_seq=3 ttl=64 time=39.8 ms
64 bytes from 192.168.111.2: icmp_seq=4 ttl=64 time=39.3 ms
Traffic capture
Here are some hints related to sniffing the packets.
In the below example, we are executing the following command (on the server) to observe the encapsulated ICMP packets within UDP:
# tcpdump -n -i eth0 udp port 50000
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
17:58:58.017466 IP 78.83.207.86.60000 > 38.242.203.214.50000: UDP, length 84
17:58:58.018840 IP 38.242.203.214.50000 > 78.83.207.86.60000: UDP, length 84
17:58:59.018242 IP 78.83.207.86.60000 > 38.242.203.214.50000: UDP, length 84
17:58:59.018543 IP 38.242.203.214.50000 > 78.83.207.86.60000: UDP, length 84
17:59:00.019320 IP 78.83.207.86.60000 > 38.242.203.214.50000: UDP, length 84
17:59:00.019538 IP 38.242.203.214.50000 > 78.83.207.86.60000: UDP, length 84
Again, on the server, we are observing the decapsulated ICMP packets between the two IPs that we are tunneling (namely 192.168.111.2/32 and 11.11.11.100/32):
# ip netns exec ns1 tcpdump -ni veth1
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C18:02:48.072207 IP 192.168.111.2 > 11.11.11.100: ICMP echo request, id 40, seq 1, length 64
18:02:48.072234 IP 11.11.11.100 > 192.168.111.2: ICMP echo reply, id 40, seq 1, length 64
18:02:49.076955 IP 192.168.111.2 > 11.11.11.100: ICMP echo request, id 40, seq 2, length 64
18:02:49.076981 IP 11.11.11.100 > 192.168.111.2: ICMP echo reply, id 40, seq 2, length 64
18:02:50.073511 IP 192.168.111.2 > 11.11.11.100: ICMP echo request, id 40, seq 3, length 64
18:02:50.073539 IP 11.11.11.100 > 192.168.111.2: ICMP echo reply, id 40, seq 3, length 64
Result
After both sides should be able to ping each other, we can try a performance test.
On the server, we start the iperf3 server:
ip netns exec ns1 iperf3 -s 11.11.11.100
And on the RaspberryPI, we launch the client test:
iperf3 -c 11.11.11.100
Connecting to host 11.11.11.100, port 5201
[ 5] local 192.168.111.2 port 37940 connected to 11.11.11.100 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 3.95 MBytes 33.1 Mbits/sec 77 207 KBytes
[ 5] 1.00-2.00 sec 4.05 MBytes 33.9 Mbits/sec 19 167 KBytes
[ 5] 2.00-3.00 sec 2.82 MBytes 23.7 Mbits/sec 3 132 KBytes
[ 5] 3.00-4.00 sec 2.94 MBytes 24.7 Mbits/sec 0 146 KBytes
[ 5] 4.00-5.00 sec 3.00 MBytes 25.2 Mbits/sec 3 122 KBytes
[ 5] 5.00-6.00 sec 1.35 MBytes 11.3 Mbits/sec 4 70.8 KBytes
[ 5] 6.00-7.00 sec 1.53 MBytes 12.9 Mbits/sec 5 42.8 KBytes
[ 5] 7.00-8.00 sec 628 KBytes 5.14 Mbits/sec 8 17.4 KBytes
[ 5] 8.00-9.00 sec 628 KBytes 5.15 Mbits/sec 0 33.4 KBytes
[ 5] 9.00-10.00 sec 314 KBytes 2.57 Mbits/sec 8 10.7 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 21.2 MBytes 17.8 Mbits/sec 127 sender
[ 5] 0.00-10.05 sec 20.4 MBytes 17.0 Mbits/sec receiver
Note: We do like and respect iperf and iperf3 but we also agree that real performance tests should be done with professional equipment and software for this very purpose. Also, the results may vary due to different network appliances between the two endpoints. At the moment of writing this material, we are using a VPS (virtual private server) from a public hosting provider, so we have no control over the security and traffic metering and shaping along the path to our target destination.
Outro
We have managed to successfully demonstrate a container and virualisation friendly tunneling solution (using both bare-metal and Linux namespaces), in user-space, software defined, at source code level (we could also refer to the compiled output as NFV application), not reliant on the old school tunneling methods. On-demand encryption that utilises the CPU instructions for hardware acceleration could be added via the CORD-CRYPTO library.
Just what is required, in a fully transparent programmable way - nothing more, nothing less. And the focus is on network programming, not only on dealing with configuration files of something already existing. You can always react to the market needs and any forthcoming regulations (like the European CRA), and you can run your custom programmable network packet processing logic not only on server grade hardware, but also on embedded devices like Raspberry PI and Toradex.