☀️ Suzhou program white parsing virtual network interface in Linux ☀️ < ❤️ Remember to collect ❤️>

catalogue 🏳️‍🌈 Let's talk!!!! 🏳️‍🌈 Suzhou program Dabai 🏳️‍🌈 🌟 Blogger introduction 💂 Personal home page...
C language programming test TUN equipment
The difference between TUN and TAP
Virtual machine scenario (bridge mode)
Cross namespace communication scenario (container network, NAT mode)
More efficient vxlan implementation than multicast
Network performance measurement

catalogue 🏳️‍🌈 Let's talk!!!! 🏳️‍🌈 Suzhou program Dabai 🏳️‍🌈

🌟 Blogger introduction 💂 Personal home page: Suzhou program white
🤟 The author introduces: member of China DBA Alliance (ACDU) and administrator of program ape (yuan) gathering place of CSDN all over the country. Currently engaged in industrial automation software development. Good at C#, Java, machine vision, underlying algorithms and other languages. In 2019, Qiyue software studio was established.
💬 If the article is helpful to you, you are welcome to pay attention, like, collect (one click, three links) and subscribe to some columns such as C#, Halcon, python+opencv, VUE, interviews with major companies, etc
🎗️ Undertake various software development projects
💅 If you have any questions, you are welcome to send a private letter and will reply in time
👤 Micro signal: stbsl6, WeChat official account: Suzhou program whiten
🎯 Those who want to join the technical exchange group can add my friends. The group will share learning materials
preface


Note: any network configuration created or modified using the ip command in this article is not persistent, and will disappear when the host restarts. ​

Linux has powerful virtual network capability, which is also the basis of virtual networks such as openstack network, docker container network and kubernetes network.

Here we introduce the types of virtual network interfaces commonly used in Linux: TUN/TAP, bridge, veth, ipvlan/macvlan, vlan and vxlan/geneve.

tun/tap virtual network interface

tun/tap is a virtual network device in the operating system kernel. They provide data reception and transmission for user layer programs.

Common physical network interfaces, such as eth0, have kernel protocol stack and external physical network at both ends.

For a TUN / tap virtual interface such as tun0, one end must be a connected user layer program, and the other end varies depending on the configuration mode. It can be directly connected to the kernel protocol stack or a bridge (described later). Linux provides the TUN / tap function through the kernel module TUN, which provides a device interface / dev/net/tun for the user layer program to read and write, and the user layer program reads and writes the data of the host kernel protocol stack through / dev/net/tun.

> modinfo tun filename: /lib/modules/5.13.6-1-default/kernel/drivers/net/tun.ko.xz alias: devname:net/tun alias: char-major-10-200 license: GPL author: (C) 1999-2004 Max Krasnyansky <[email protected]> description: Universal TUN/TAP device driver ... > ls /dev/net/tun /dev/net/tun

An example diagram of a TUN device is as follows:

+----------------------------------------------------------------------+ | | | +--------------------+ +--------------------+ | | | User Application A | | User Application B +<-----+ | | +------------+-------+ +-------+------------+ | | | | 1 | 5 | | |...............+......................+...................|...........| | ↓ ↓ | | | +----------+ +----------+ | | | | socket A | | socket B | | | | +-------+--+ +--+-------+ | | | | 2 | 6 | | |.................+.................+......................|...........| | ↓ ↓ | | | +------------------------+ +--------+-------+ | | | Network Protocol Stack | | /dev/net/tun | | | +--+-------------------+-+ +--------+-------+ | | | 7 | 3 ^ | |................+...................+.....................|...........| | ↓ ↓ | | | +----------------+ +----------------+ 4 | | | | eth0 | | tun0 | | | | +-------+--------+ +-----+----------+ | | | 10.32.0.11 | | 192.168.3.11 | | | | 8 +---------------------+ | | | | +----------------+-----------------------------------------------------+ ↓ Physical Network

Because one end of the TUN/TAP device is the kernel protocol stack, it is obvious that the packets flowing into tun0 first pass through the local routing rules.

The route matching is successful. After the packet is sent to tun0, tun0 finds that the other end is connected to application B through / dev/net/tun, and the data will be lost to application B.

After the application processes the data packet, it may construct a new data packet and send it through the physical network card. For example, a common VPN program encapsulates / encrypts the original data packet and sends it to the VPN server.

C language programming test TUN equipment

In order to use the tun/tap device, the user layer program needs to open / dev/net/tun through the system call to obtain a file descriptor (FD) for reading and writing the device, and call ioctl() to register a virtual network card of TUN or TAP type with the kernel (instantiate a tun/tap device), whose name may be tun0/tap0, etc.

After that, the user program can interact with the host kernel protocol stack (or other network devices) through the TUN/TAP virtual network card. When the user layer program is closed, its registered TUN/TAP virtual network card and automatically generated routing table related entries will be released by the kernel.

The user layer program can be regarded as another host on the network. They are connected through tun/tap virtual network card.

A simple C program example is as follows. Each time it receives data, it simply prints the number of bytes received:

#include <linux/if.h> #include <linux/if_tun.h> #include <sys/ioctl.h> #include <fcntl.h> #include <string.h> #include <unistd.h> #include<stdlib.h> #include<stdio.h> int tun_alloc(int flags) { struct ifreq ifr; int fd, err; char *clonedev = "/dev/net/tun"; // Open the tun file to get fd if ((fd = open(clonedev, O_RDWR)) < 0) { return fd; } memset(&ifr, 0, sizeof(ifr)); ifr.ifr_flags = flags; // Register a TUN network card with the kernel and associate it with the fd obtained earlier // When the program is closed, the registered tun network card and the automatically generated routing policy will be automatically released if ((err = ioctl(fd, TUNSETIFF, (void *) &ifr)) < 0) { close(fd); return err; } printf("Open tun/tap device: %s for reading...\n", ifr.ifr_name); return fd; } int main() { int tun_fd, nread; char buffer[1500]; /* Flags: IFF_TUN - TUN device (no Ethernet headers) * IFF_TAP - TAP device * IFF_NO_PI - Do not provide packet information */ tun_fd = tun_alloc(IFF_TUN | IFF_NO_PI); if (tun_fd < 0) { perror("Allocating interface"); exit(1); } while (1) { nread = read(tun_fd, buffer, sizeof(buffer)); if (nread < 0) { perror("Reading from interface"); close(tun_fd); exit(1); } printf("Read %d bytes from tun/tap device\n", nread); } return 0; }

Next, open three terminal windows to test the above program, and run the above tun program, tcpdump and iproute2 instructions respectively.

First, run the above c program through compilation. The program will block and wait for the data to arrive:

# Compilation, please ignore some warning s > gcc mytun.c -o mytun # root permission is required to create and listen to tun devices > sudo mytun Open tun/tap device: tun0 for reading...

Now use iproute2 to view the lower link layer devices:

# You can find the interface named tun0 at the end, but the status is down ❯ ip addr ls ...... 3: wlp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether c0:3c:59:36:a4:16 brd ff:ff:ff:ff:ff:ff inet 192.168.31.228/24 brd 192.168.31.255 scope global dynamic noprefixroute wlp4s0 valid_lft 41010sec preferred_lft 41010sec inet6 fe80::4ab0:130f:423b:5d37/64 scope link noprefixroute valid_lft forever preferred_lft forever 7: tun0: <POINTOPOINT,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 500 link/none # Set the ip address for tun0. Be careful not to be in the same network segment as other interfaces, which will lead to routing conflicts > sudo ip addr add 172.21.22.23/24 dev tun0 # Start the tun0 interface. This step will automatically add the policy of routing 172.21.22.23/24 to tun0 to the routing table > sudo ip link set tun0 up #Confirm whether the routing policy added in the previous step exists ❯ ip route ls default via 192.168.31.1 dev wlp4s0 proto dhcp metric 600 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 172.21.22.0/24 dev tun0 proto kernel scope link src 172.21.22.23 192.168.31.0/24 dev wlp4s0 proto kernel scope link src 192.168.31.228 metric 600 # At this time, check the interface again and find that the status of tun0 is unknown > ip addr ls ...... 8: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 500 link/none inet 172.21.22.23/24 scope global tun0 valid_lft forever preferred_lft forever inet6 fe80::3d52:49b5:1cf3:38fd/64 scope link stable-privacy valid_lft forever preferred_lft forever # Using tcpdump to try to grab the data of tun0 will block here and wait for the data to arrive > tcpdump -i tun0

Now start the third window to send point data to tun0, and continue to observe the previous tcpdump and mytun logs:

# Ping the address of tun0 directly. There seems to be a problem. The data does not enter the mytun program, and there is a response ❯ ping -c 4 172.21.22.23 PING 172.21.22.23 (172.21.22.23) 56(84) bytes of data. 64 bytes from 172.21.22.23: icmp_seq=1 ttl=64 time=0.167 ms 64 bytes from 172.21.22.23: icmp_seq=2 ttl=64 time=0.180 ms 64 bytes from 172.21.22.23: icmp_seq=3 ttl=64 time=0.126 ms 64 bytes from 172.21.22.23: icmp_seq=4 ttl=64 time=0.141 ms --- 172.21.22.23 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3060ms rtt min/avg/max/mdev = 0.126/0.153/0.180/0.021 ms # However, if you ping other addresses under the network segment, the traffic will be forwarded to the mytun program, because mytun does not return any data, and the natural packet loss rate is 100% # Both tcpdump and mytun print out relevant logs ❯ ping -c 4 172.21.22.26 PING 172.21.22.26 (172.21.22.26) 56(84) bytes of data. --- 172.21.22.26 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 3055ms

The output of mytun is shown below:

Read 84 bytes from tun/tap device Read 84 bytes from tun/tap device Read 84 bytes from tun/tap device Read 84 bytes from tun/tap device

And tcpdump output:

00:22:03.622684 IP (tos 0x0, ttl 64, id 37341, offset 0, flags [DF], proto ICMP (1), length 84) 172.21.22.23 > 172.21.22.26: ICMP echo request, id 11, seq 1, length 64 00:22:04.633394 IP (tos 0x0, ttl 64, id 37522, offset 0, flags [DF], proto ICMP (1), length 84) 172.21.22.23 > 172.21.22.26: ICMP echo request, id 11, seq 2, length 64 00:22:05.653356 IP (tos 0x0, ttl 64, id 37637, offset 0, flags [DF], proto ICMP (1), length 84) 172.21.22.23 > 172.21.22.26: ICMP echo request, id 11, seq 3, length 64 00:22:06.677341 IP (tos 0x0, ttl 64, id 37667, offset 0, flags [DF], proto ICMP (1), length 84) 172.21.22.23 > 172.21.22.26: ICMP echo request, id 11, seq 4, length 64

The difference between TUN and TAP

The difference between TUN and TAP lies in the different network layers. The user program can only read and write IP packets of the network layer through the TUN device, while the TAP device supports reading and writing data packets of the link layer (usually Ethernet packets with Ethernet headers).

The relationship between TUN and TAP is similar to socket and raw socket.

The most frequently used scenario of TUN/TAP is VPN proxy, such as:

1,clash : a tunnel that supports various rules and TUN mode.

2,tun2socks : a global transparent agent, which is the same as the working mode of VPN. It creates a virtual network card and modifies the routing table to proxy system traffic at the third network layer.

veth

veth interfaces always appear in pairs. A pair of veth interfaces is like a network cable, and the data coming in from one end will go out from the other end.

At the same time, veth is a virtual network interface, so it can configure mac/ip address (but not necessarily mac/ip address) like TUN/TAP or other physical network interfaces.

Its main function is to connect different networks. For example, in the container network, it is used to connect the container's namespace with the bridge br0 of the root namespace. In the container network, the veth on the container side sets the ip/mac address and is renamed eth0 as the network interface of the container, while the veth on the host side is directly connected to docker0/br0.

Using veth to implement container network needs to be combined with the bridge introduced in the next section, which will give the container network structure diagram.

bridge

Linux Bridge is a network switch working in the link layer. It is provided by the Linux kernel module bridge. It is responsible for forwarding link layer data packets between all interfaces connected to it.

Devices added to the Bridge are set to accept only layer 2 data frames and forward all received packets to the Bridge. In the Bridge, processing logic such as checking MAC port mapping table, forwarding and updating MAC port mapping table similar to that of physical switch will be carried out, so that data packets can be forwarded to another interface / discarded / broadcast / sent to the upper protocol stack. Therefore, the Bridge realizes the function of data forwarding.

If you use tcpdump to capture packets on the Bridge interface, you can catch the packets in and out of all interfaces on the Bridge, because these packets must be forwarded through the Bridge.

Different from the physical switch, the Bridge itself can set the IP address. It can be considered that when a br0 Bridge is created by using brctl addbr br0, the system automatically creates a hidden br0 network interface with the same name. br0 once the IP address is set, it means that the hidden br0 interface can be used as a routing interface device to participate in IP layer routing (you can use route -n to view the last column of Iface). Therefore, only when br0 setting the IP address can Bridge send packets to the upper protocol stack.

However, the network cards added to the Bridge cannot be configured with IP addresses. They work in the data link layer and are not visible to the routing system.

It is often used to forward data between different namepsaces on virtual machines and hosts.

Virtual machine scenario (bridge mode)

Take QEMU KVM as an example. In the virtual machine bridging mode, QEMU KVM will create a tun/tap virtual network card for each virtual machine and connect it to the br0 bridge. The network interface eth0 inside the virtual machine is simulated by QEMU KVM software. In fact, the sending and receiving of network data in the virtual machine will be converted by QEMU KVM to the reading and writing of / dev/net/tun.

Taking sending data as an example, the whole process is as follows:

  • The packets sent by the virtual machine arrive at the QEMU KVM program first.

  • The data is written to / dev/net/tun by the user layer program QEMU KVM and reaches the tap device.

  • The tap device transfers data to the br0 bridge.

  • br0 sends the data to eth0.

After the whole process, the data packets do not need to go through the protocol stack of the host, which is efficient.

+------------------------------------------------+-----------------------------------+-----------------------------------+ | Host | VirtualMachine1 | VirtualMachine2 | | | | | | +--------------------------------------+ | +-------------------------+ | +-------------------------+ | | | Network Protocol Stack | | | Network Protocol Stack | | | Network Protocol Stack | | | +--------------------------------------+ | +-------------------------+ | +-------------------------+ | | ↑ | ↑ | ↑ | |.......................|........................|................|..................|.................|.................| | ↓ | ↓ | ↓ | | +--------+ | +-------+ | +-------+ | | | .3.101 | | | .3.102| | | .3.103| | | +------+ +--------+ +-------+ | +-------+ | +-------+ | | | eth0 |<--->| br0 |<--->|tun/tap| | | eth0 | | | eth0 | | | +------+ +--------+ +-------+ | +-------+ | +-------+ | | ↑ ↑ ↑ +--------+ ↑ | ↑ | | | | +------|qemu-kvm|-----------+ | | | | | ↓ +--------+ | | | | | +-------+ | | | | | | |tun/tap| | | | | | | +-------+ | | | | | | ↑ | +--------+ | | | | | +-------------------------------------|qemu-kvm|-------------|-----------------+ | | | | +--------+ | | | | | | | +---------|--------------------------------------+-----------------------------------+-----------------------------------+ ↓ Physical Network (192.168.3.0/24)

Cross namespace communication scenario (container network, NAT mode)

Because containers run in their own separate network namespace, like virtual machines, they also have their own separate protocol stack.

The structure of the container network is similar to that of the virtual machine, but it uses the NAT network and replaces tun/tap with veth. As a result, the data from docker0 must pass through the host protocol stack before entering the veth interface.

One more layer of NAT and one more layer of host protocol stack will lead to performance degradation.

The schematic diagram is as follows:

+-----------------------------------------------+-----------------------------------+-----------------------------------+ | Host | Container 1 | Container 2 | | | | | | +---------------------------------------+ | +-------------------------+ | +-------------------------+ | | | Network Protocol Stack | | | Network Protocol Stack | | | Network Protocol Stack | | | +----+-------------+--------------------+ | +-----------+-------------+ | +------------+------------+ | | ^ ^ | ^ | ^ | |........|.............|........................|................|..................|.................|.................| | v v ↓ | v | v | | +----+----+ +-----+------+ | +-----+-------+ | +-----+-------+ | | | .31.101 | | 172.17.0.1 | +------+ | | 172.17.0.2 | | | 172.17.0.3 | | | +---------+ +-------------<---->+ veth | | +-------------+ | +-------------+ | | | eth0 | | docker0 | +--+---+ | | eth0(veth) | | | eth0(veth) | | | +----+----+ +-----+------+ ^ | +-----+-------+ | +-----+-------+ | | ^ ^ | | ^ | ^ | | | | +------------------------+ | | | | | v | | | | | | +--+---+ | | | | | | | veth | | | | | | | +--+---+ | | | | | | ^ | | | | | | +------------------------------------------------------------------------------+ | | | | | | | | | | | +-----------------------------------------------+-----------------------------------+-----------------------------------+ v Physical Network (192.168.31.0/24)

Every time a new container is created, a new veh interface will be created in the container's namespace and the command will be eth0. At the same time, a veh will be created in the main namespace to connect the container's eth0 with docker0.

You can see through iproute2 in the container that the interface type of eth0 is veth:

❯ docker run -it --rm debian:buster bash root@5facbe4ddc1e:/# ip --details addr ls 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 20: eth0@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535 veth numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0 valid_lft forever preferred_lft forever

At the same time, it can be seen in the host that the corresponding veth device is bound to the docker0 bridge:

❯ sudo brctl show bridge name bridge id STP enabled interfaces docker0 8000.0242fce99ef5 no vethea4171a
macvlan

Currently, docker/podman supports the creation of Linux container network based on MAC VLAN.

Note that there is a compatibility problem between macvlan and WiFi. If you use notebook test, you may encounter trouble.

macvlan is a relatively new Linux feature and requires kernel version > = 3.9. It is used to configure multiple virtual sub interfaces on the network interface (parent interface) of the host. These sub interfaces have their own independent mac addresses and can also be equipped with ip addresses for communication.

The virtual machine or container network under MAC VLAN shares the same broadcast domain with the host in the same network segment. Macvlan is similar to bridge, but because it eliminates the existence of bridge, it is relatively simple to configure and debug, and the efficiency is relatively high. In addition, macvlan itself perfectly supports VLAN.

If you want the container or virtual machine to be placed in the same network as the host and enjoy various advantages of the existing network stack, you can consider macvlan.

ipvlan

linux network virtualization: ipvlan

cilium 1.9 has provided an ipvlan based network (beta feature) to replace the traditional veth+bridge container network. See details IPVLAN based Networking (beta) - Cilium 1.9 Docs

The functions of ipvlan and mac VLAN are very similar. They are also used to configure multiple virtual sub interfaces on the network interface (parent interface) of the host. However, the difference is that each sub interface of ipvlan does not have an independent mac address. They share the mac address with the parent interface of the host.

Because the mac address is shared, if DHCP is used, it should be noted that the mac address cannot be used for DHCP, and a unique clientID needs to be configured additionally.

If you encounter the following situations, please consider using ipvlan:

  • The parent interface limits the number of mac addresses, or it will cause serious performance loss when there are too many mac addresses.

  • It works in 802.11(wireless) wireless network (MAC VLAN cannot work with wireless network).

  • I hope to build a more complex network topology (not a simple layer-2 network and VLAN), such as working with BGP network.

  • The container network based on ipvlan / Mac VLAN has higher performance than veth+bridge+iptables.

vlan

vlan, that is, virtual LAN, is a link layer broadcast domain isolation technology, which can be used to segment LAN to solve the problems of broadcast flooding and security. The isolated broadcast domains need to rise to the third layer to complete communication.

VLANs can be set up in common enterprise routers such as ER-X, and Linux also directly supports VLANs

The Ethernet packet has a special field for VLAN. The VLAN packet will record its VLAN ID at this location. The switch uses this ID to distinguish different VLANs, and only broadcasts the Ethernet packet to the VLAN corresponding to this ID.

vxlan/geneve

rfc8926 - Geneve: Generic Network Virtualization Encapsulation rfc7348 - Virtual eXtensible Local Area Network (VXLAN)

Implementing vxlan network on Linux

Before introducing vxlan, explain the meanings of the following two terms:

  • underlay network: physical network.

  • overlay network: refers to the virtual network built on the existing physical network. In fact, it is a tunnel technology, which encapsulates the original layer-2 data frame message and transmits it through the tunnel.

vxlan and geneve are both overlay network protocols. Both of them use UDP packets to encapsulate the Ethernet frame of the link layer.

vxlan was standardized in 2014, while geneve passed the draft stage at the end of 2020. At present, the final standard has not been formed. However, linux/cilium now supports geneve.

The biggest difference between geneve and vxlan is that it is more flexible -- its header length is variable.

At present, almost all overlay cross host container network schemes are implemented based on vxlan (exception: cilium also supports geneve).

In a single machine container network, you do not need to contact vxlan, but when learning cross host container network schemes such as flannel/calico/cilium, vxlan(overlay) and BGP(underlay) will inevitably contact.

The following describes the packet structure of vxlan:

When creating the vtep virtual device of vxlan, we need to manually set the following properties in the figure:

  • Vxlan target port: that is, the port used by the receiver vtep. Here, the port defined by IANA is 4789, but only the vxlan mode of calico uses this port by default, while the default port of cilium/flannel is 8472 by Linux by default.

  • VNID: each VXLAN network interface will be assigned an independent VNID.

The network architecture of a point-to-point vxlan is as follows:


It can be seen that each virtual machine VM will be assigned a unique VNID, and then a VXLAN tunnel is established between the two physical machines through the VTEP virtual network device. All virtual machines in the VXLAN network communicate with each other through VTEP.

With the above knowledge, we can establish a point-to-point VXLAN tunnel between two Linux machines through the following command:

# Create VTEP device vxlan0 on host A # Tunnel with another vtep interface B (192.168.8.101) # Set vxlan0's own IP address to 192.168.8.100 # The VXLAN target port used is 4789(IANA standard) ip link add vxlan0 type vxlan \ id 42 \ dstport 4789 \ remote 192.168.8.101 \ local 192.168.8.100 \ dev enp0s8 # Set the virtual network segment for our VXLAN network. vxlan0 is the default gateway ip addr add 10.20.1.2/24 dev vxlan0 # Enable our vxlan0 device, which will automatically generate routing rules ip link set vxlan0 up # Now run the following command on host B to create a VTEP device vxlan0. The ip addresses of remote and local are just opposite to the commands used earlier. # Note that VNID and dstport must be exactly the same as before ip link add vxlan0 type vxlan \ id 42 \ dstport 4789 \ remote 192.168.8.100 \ local 192.168.8.101 \ dev enp0s8 # Set the virtual network segment for our VXLAN network. vxlan0 is the default gateway ip addr add 10.20.1.3/24 dev vxlan0 ip link set vxlan0 up # Here, the two machines are connected and can communicate. You can ping 10.20.1.2 on host B, and you should receive A response from host A. ping 10.20.1.2

The point-to-point vxlan tunnel is of little practical use. If each node in the cluster builds vxlan tunnels with each other, the cost is too high.

A better way is to use the vxlan tunnel of "multicast mode". In this mode, a vtep can establish a tunnel with all vteps in the group at one time. The example command is as follows (the information on how to set the multicast address 239.1.1.1 is omitted here):

ip link add vxlan0 type vxlan \ id 42 \ dstport 4789 \ group 239.1.1.1 \ dev enp0s8 ip addr add 10.20.1.2/24 dev vxlan0 ip link set vxlan0 up

As you can see, you just need to simply put local_ ip/remote_ Just replace IP with a multicast address. The multicast function will send the received packet to all vtep interfaces in the group, but only VNID can process the packet to the vtep on the group, and other vteps will directly discard the data.

Next, in order to enable all virtual machines / containers to communicate through vtep, we add a bridge network to act as the switch between vtep and the container. The architecture is as follows:

Use the ip command to create a network bridge, a network namespace, and veth pairs to form the container network in the figure above:

# Create br0 and bind vxlan0 to it ip link add br0 type bridge ip link set vxlan0 master bridge ip link set vxlan0 up ip link set br0 up # Simulate adding containers to the bridge ip netns add container1 ## Create a veth pair and add one end to the bridge ip link add veth0 type veth peer name veth1 ip link set dev veth0 master br0 ip link set dev veth0 up ## Configure the network and IP inside the container ip link set dev veth1 netns container1 ip netns exec container1 ip link set lo up ip netns exec container1 ip link set veth1 name eth0 ip netns exec container1 ip addr add 10.20.1.11/24 dev eth0 ip netns exec container1 ip link set eth0 up

Then do the same operation on another machine and create a new container. The two containers can communicate through vxlan~

More efficient vxlan implementation than multicast

The biggest problem with multicast is that each vtep sends a copy because it does not know the destination of the data. If each time data is sent, if it can be accurate to the corresponding vtep, a lot of resources can be saved.

Another problem is that ARP queries will also be multicast. You should know that vxlan itself is an overlay network, and the cost of ARP is also very high.

The above problems can be solved through a centralized Registry (such as etcd). The registration and changes of all containers and networks are written into this registry, and then the program automatically maintains the tunnel, fdb table and ARP table between vtep.

Rate of virtual network interfaces

Loopback, like other virtual network interfaces mentioned in this chapter, is a network device simulated by software. Is their rate limited by the bandwidth of the link layer (such as Ethernet) like the physical link?

For example, many old network devices only support 100M Ethernet, which determines its bandwidth limit. Even newer devices basically only support Gigabit Ethernet, that is, 1GbE Ethernet standard. Does the virtual network interface mentioned in this paper only communicate within the machine? Is there such a restriction? Can you only run to 1GbE?

Check with ethtool:

# The veth interface rate of the docker container > ethtool vethe899841 | grep Speed Speed: 10000Mb/s # The bridge does not appear to have a fixed rate > ethtool docker0 | grep Speed Speed: Unknown! # The default speed of tun0 device seems to be 10Mb/s? > ethtool tun0 | grep Speed Speed: 10Mb/s # In addition, ethtool cannot check the rate of lo and wifi

Network performance measurement

Next, the actual test shall be carried out, and the machine parameters shall be given first:

❯ cat /etc/os-release NAME="openSUSE Tumbleweed" # VERSION="20210810" ... ❯ uname -a Linux legion-book 5.13.8-1-default #1 SMP Thu Aug 5 08:56:22 UTC 2021 (967c6a8) x86_64 x86_64 x86_64 GNU/Linux ❯ lscpu Architecture: x86_64 CPU(s): 16 Model name: AMD Ryzen 7 5800H with Radeon Graphics ... # Memory in MB ❯ free -m total used free shared buff/cache available Mem: 27929 4482 17324 249 6122 22797 Swap: 2048 0 2048

Test with iperf3:

# Start the server iperf3 -s ------------- # Start the client in a new window and access the iperf3 server through the loopback interface, about 49Gb/s ❯ iperf3 -c 127.0.0.1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 48656 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 4.46 GBytes 38.3 Gbits/sec 0 1.62 MBytes [ 5] 1.00-2.00 sec 4.61 GBytes 39.6 Gbits/sec 0 1.62 MBytes [ 5] 2.00-3.00 sec 5.69 GBytes 48.9 Gbits/sec 0 1.62 MBytes [ 5] 3.00-4.00 sec 6.11 GBytes 52.5 Gbits/sec 0 1.62 MBytes [ 5] 4.00-5.00 sec 6.04 GBytes 51.9 Gbits/sec 0 1.62 MBytes [ 5] 5.00-6.00 sec 6.05 GBytes 52.0 Gbits/sec 0 1.62 MBytes [ 5] 6.00-7.00 sec 6.01 GBytes 51.6 Gbits/sec 0 1.62 MBytes [ 5] 7.00-8.00 sec 6.05 GBytes 52.0 Gbits/sec 0 1.62 MBytes [ 5] 8.00-9.00 sec 6.34 GBytes 54.5 Gbits/sec 0 1.62 MBytes [ 5] 9.00-10.00 sec 5.91 GBytes 50.8 Gbits/sec 0 1.62 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 57.3 GBytes 49.2 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 57.3 GBytes 49.2 Gbits/sec receiver # The client accesses the iperf3 server through the wlp4s0 wifi network card (192.168.31.228). In fact, it still uses the local machine, but the speed is a little faster than loopback. It may be a problem with the default setting ❯ iperf3 -c 192.168.31.228 Connecting to host 192.168.31.228, port 5201 [ 5] local 192.168.31.228 port 43430 connected to 192.168.31.228 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 5.12 GBytes 43.9 Gbits/sec 0 1.25 MBytes [ 5] 1.00-2.00 sec 5.29 GBytes 45.5 Gbits/sec 0 1.25 MBytes [ 5] 2.00-3.00 sec 5.92 GBytes 50.9 Gbits/sec 0 1.25 MBytes [ 5] 3.00-4.00 sec 6.00 GBytes 51.5 Gbits/sec 0 1.25 MBytes [ 5] 4.00-5.00 sec 5.98 GBytes 51.4 Gbits/sec 0 1.25 MBytes [ 5] 5.00-6.00 sec 6.05 GBytes 52.0 Gbits/sec 0 1.25 MBytes [ 5] 6.00-7.00 sec 6.16 GBytes 52.9 Gbits/sec 0 1.25 MBytes [ 5] 7.00-8.00 sec 6.08 GBytes 52.2 Gbits/sec 0 1.25 MBytes [ 5] 8.00-9.00 sec 6.00 GBytes 51.6 Gbits/sec 0 1.25 MBytes [ 5] 9.00-10.00 sec 6.01 GBytes 51.6 Gbits/sec 0 1.25 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 58.6 GBytes 50.3 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 58.6 GBytes 50.3 Gbits/sec receiver # There is almost no difference in the speed of accessing the iperf3 server of the host from the container ❯ docker run -it --rm --name=iperf3-server networkstatic/iperf3 -c 192.168.31.228 Connecting to host 192.168.31.228, port 5201 [ 5] local 172.17.0.2 port 43436 connected to 192.168.31.228 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 4.49 GBytes 38.5 Gbits/sec 0 403 KBytes [ 5] 1.00-2.00 sec 5.31 GBytes 45.6 Gbits/sec 0 544 KBytes [ 5] 2.00-3.00 sec 6.14 GBytes 52.8 Gbits/sec 0 544 KBytes [ 5] 3.00-4.00 sec 5.85 GBytes 50.3 Gbits/sec 0 544 KBytes [ 5] 4.00-5.00 sec 6.14 GBytes 52.7 Gbits/sec 0 544 KBytes [ 5] 5.00-6.00 sec 5.99 GBytes 51.5 Gbits/sec 0 544 KBytes [ 5] 6.00-7.00 sec 5.86 GBytes 50.4 Gbits/sec 0 544 KBytes [ 5] 7.00-8.00 sec 6.05 GBytes 52.0 Gbits/sec 0 544 KBytes [ 5] 8.00-9.00 sec 5.99 GBytes 51.5 Gbits/sec 0 544 KBytes [ 5] 9.00-10.00 sec 6.12 GBytes 52.5 Gbits/sec 0 544 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 58.0 GBytes 49.8 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 58.0 GBytes 49.8 Gbits/sec receiver

Run iperf3 server in the container and test again:

# Start iperf3 server in the container and map to host port 6201 > docker run -it --rm --name=iperf3-server -p 6201:5201 networkstatic/iperf3 -s > docker inspect --format "{{ .NetworkSettings.IPAddress }}" iperf3-server 172.17.0.2 ----------------------------- # Test the speed of mutual access between containers. The ip is the ip of iperf3 server container, which is slower. # After all, it has passed the fifth layer virtual network interface of Veth - > Veth - > docker0 - > Veth - > Veth ❯ docker run -it --rm networkstatic/iperf3 -c 172.17.0.2 Connecting to host 172.17.0.2, port 5201 [ 5] local 172.17.0.3 port 40776 connected to 172.17.0.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 4.74 GBytes 40.7 Gbits/sec 0 600 KBytes [ 5] 1.00-2.00 sec 4.48 GBytes 38.5 Gbits/sec 0 600 KBytes [ 5] 2.00-3.00 sec 5.38 GBytes 46.2 Gbits/sec 0 600 KBytes [ 5] 3.00-4.00 sec 5.39 GBytes 46.3 Gbits/sec 0 600 KBytes [ 5] 4.00-5.00 sec 5.42 GBytes 46.6 Gbits/sec 0 600 KBytes [ 5] 5.00-6.00 sec 5.39 GBytes 46.3 Gbits/sec 0 600 KBytes [ 5] 6.00-7.00 sec 5.38 GBytes 46.2 Gbits/sec 0 635 KBytes [ 5] 7.00-8.00 sec 5.37 GBytes 46.1 Gbits/sec 0 667 KBytes [ 5] 8.00-9.00 sec 6.01 GBytes 51.7 Gbits/sec 0 735 KBytes [ 5] 9.00-10.00 sec 5.74 GBytes 49.3 Gbits/sec 0 735 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 53.3 GBytes 45.8 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 53.3 GBytes 45.8 Gbits/sec receiver # This machine directly accesses the container ip and uses the docker0 bridge. It's incredibly fast ❯ iperf3 -c 172.17.0.2 Connecting to host 172.17.0.2, port 5201 [ 5] local 172.17.0.1 port 56486 connected to 172.17.0.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 5.01 GBytes 43.0 Gbits/sec 0 632 KBytes [ 5] 1.00-2.00 sec 5.19 GBytes 44.6 Gbits/sec 0 703 KBytes [ 5] 2.00-3.00 sec 6.46 GBytes 55.5 Gbits/sec 0 789 KBytes [ 5] 3.00-4.00 sec 6.80 GBytes 58.4 Gbits/sec 0 789 KBytes [ 5] 4.00-5.00 sec 6.82 GBytes 58.6 Gbits/sec 0 913 KBytes [ 5] 5.00-6.00 sec 6.79 GBytes 58.3 Gbits/sec 0 1007 KBytes [ 5] 6.00-7.00 sec 6.63 GBytes 56.9 Gbits/sec 0 1.04 MBytes [ 5] 7.00-8.00 sec 6.75 GBytes 58.0 Gbits/sec 0 1.04 MBytes [ 5] 8.00-9.00 sec 6.19 GBytes 53.2 Gbits/sec 0 1.04 MBytes [ 5] 9.00-10.00 sec 6.55 GBytes 56.3 Gbits/sec 0 1.04 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 63.2 GBytes 54.3 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 63.2 GBytes 54.3 Gbits/sec receiver # If you use the local loopback address + container port mapping, the speed will be much slower # Maybe it's caused by using iptables for port mapping? ❯ iperf3 -c 127.0.0.1 -p 6201 Connecting to host 127.0.0.1, port 6201 [ 5] local 127.0.0.1 port 48862 connected to 127.0.0.1 port 6201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 2.71 GBytes 23.3 Gbits/sec 0 1.37 MBytes [ 5] 1.00-2.00 sec 3.64 GBytes 31.3 Gbits/sec 0 1.37 MBytes [ 5] 2.00-3.00 sec 4.08 GBytes 35.0 Gbits/sec 0 1.37 MBytes [ 5] 3.00-4.00 sec 3.49 GBytes 30.0 Gbits/sec 0 1.37 MBytes [ 5] 4.00-5.00 sec 5.50 GBytes 47.2 Gbits/sec 2 1.37 MBytes [ 5] 5.00-6.00 sec 4.06 GBytes 34.9 Gbits/sec 0 1.37 MBytes [ 5] 6.00-7.00 sec 4.12 GBytes 35.4 Gbits/sec 0 1.37 MBytes [ 5] 7.00-8.00 sec 3.99 GBytes 34.3 Gbits/sec 0 1.37 MBytes [ 5] 8.00-9.00 sec 3.49 GBytes 30.0 Gbits/sec 0 1.37 MBytes [ 5] 9.00-10.00 sec 5.51 GBytes 47.3 Gbits/sec 0 1.37 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 40.6 GBytes 34.9 Gbits/sec 2 sender [ 5] 0.00-10.00 sec 40.6 GBytes 34.9 Gbits/sec receiver # Can go wlp4s0 + container port mapping, and the speed is not slow ❯ iperf3 -c 192.168.31.228 -p 6201 Connecting to host 192.168.31.228, port 6201 [ 5] local 192.168.31.228 port 54582 connected to 192.168.31.228 port 6201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 4.34 GBytes 37.3 Gbits/sec 0 795 KBytes [ 5] 1.00-2.00 sec 4.78 GBytes 41.0 Gbits/sec 0 834 KBytes [ 5] 2.00-3.00 sec 6.26 GBytes 53.7 Gbits/sec 0 834 KBytes [ 5] 3.00-4.00 sec 6.30 GBytes 54.1 Gbits/sec 0 875 KBytes [ 5] 4.00-5.00 sec 6.26 GBytes 53.8 Gbits/sec 0 875 KBytes [ 5] 5.00-6.00 sec 5.75 GBytes 49.4 Gbits/sec 0 875 KBytes [ 5] 6.00-7.00 sec 5.49 GBytes 47.2 Gbits/sec 0 966 KBytes [ 5] 7.00-8.00 sec 5.72 GBytes 49.1 Gbits/sec 2 966 KBytes [ 5] 8.00-9.00 sec 4.81 GBytes 41.3 Gbits/sec 2 966 KBytes [ 5] 9.00-10.00 sec 5.98 GBytes 51.4 Gbits/sec 0 966 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 55.7 GBytes 47.8 Gbits/sec 4 sender [ 5] 0.00-10.00 sec 55.7 GBytes 47.8 Gbits/sec receiver

Generally speaking, loopback, bridge and veth interfaces are basically not speed limited. veth found that the upper limit is 10000Mb/s (10Gb/s), which is also a false number. In fact, the measured data is basically between 35Gb/s and 55Gb/s, which fluctuates according to the situation.

The performance change is related to the link and type of virtual network equipment, and may also be related to the difference between the default configuration.

In addition, the TUN device is not measured here. The value found by ethtool tun0 is an outrageous 10Mb/s, but it is unlikely to be so slow. You can measure it again when you have time.

🌟 Author related articles and resource sharing 🌟 🌟 Let the world have no technology that can't be learned 🌟
Learning C# is no longer a difficult problem
🌳 C# getting started to advanced tutorial 🌳
Relevant C# practical projects
👉 C#RS232C communication source code 👈
👉 C # entrusted data transmission 👈
👉 C# Modbus TCP source code 👈
👉 C# warehouse management system source code 👈
👉 C# Omron communication Demo 👈
👉 C#+WPF+SQL is the camera system of vehicle management station currently on-line in a city 👈
👉 2021C# and Halcon vision common framework 👈
👉 In the vision project in 2021, C# is used to complete the communication between Mitsubishi PLC and host computer 👈
👉 VP joint open source deep learning programming (WPF) 👈
✨ For C# project, please check your home page ✨
🌟 Machine vision, deep learning 🌟
Learning machine vision and deep learning are no longer difficult problems
🌌 Halcon introduction to mastery 🌌
🌌 In depth learning materials and tutorials 🌌
Machine vision, deep learning and actual combat
👉 2021 C#+HALCON vision software 👈
👉 In 2021, C#+HALCON will realize template matching 👈
👉 C# integrates Halcon's deep learning software 👈
👉 C# integrated Halcon's deep learning software with [MNIST example] data set 👈
👉 C # halcon WPF open source form control that supports equal scaling and dragging 👈
👉 Labview and HALCON in 2021 👈
👉 Labview and Visionpro in 2021 👈
👉 Automatic identification module of brake pad thickness of EMU based on Halcon and VS 👈
✨ For machine vision and in-depth learning, welcome to your personal home page ✨
🌟 Java, database tutorials and projects 🌟
Learning Java and database tutorials is no longer a difficult problem
🍏 Introduction to JAVA advanced tutorial 🍏
🍏 Getting started with database to advanced tutorial 🍏
Actual combat of Java and database projects
👉 Java classic nostalgic bully web game console source code enhanced version 👈
👉 js+css similar web version Netease music source code 👈
👉 Java property management system + applet source code 👈
👉 JavaWeb Home Electronics Mall 👈
👉 Design and implementation of JAVA hotel room reservation management system SQLserver 👈
👉 Research and development of JAVA library management system MYSQL 👈
✨ For Java, database tutorials and project practice, welcome to your personal home page ✨
🌟 Share Python knowledge, explain and share 🌟
Learning Python is no longer a difficult problem
🥝 Python knowledge and project column 🥝
🥝 "Python detects the tremble, concerns the account number tiktok". 🥝
🥝 Teach you how to install and use Python+Qt5 🥝
🥝 Q & A on the fundamentals of python Programming in 10000 words to Xiaobai 🥝
🥝 Python drawing Android CPU and memory growth curve 🥝
About Python project practice
👉 Python library management system based on Django 👈
👉 Python management system 👈
👉 Nine commonly used python crawler source codes in 2021 👈
👉 python QR code generator 👈
✨ For Python tutorial and project practice, welcome to your personal home page ✨
🌟 Share the interview questions and interview process of major companies 🌟
It's not difficult to succeed in an interview
🍏 The latest VUE interview questions of gold, nine and silver in 2021 ☀️ < ❤️ Remember to collect ❤️>>🍏
🍏 As long as you read 10000 words carefully ☀️ Linux Operating System Basics ☀️ Hang the interviewer every minute< ❤️ Remember to collect ❤️>>🍏
🍏 < ❤️ Give Xiaobai a comprehensive explanation of the basics of python Programming in 10000 words ❤️ < 😀 Remember to collect, or it will disappear 😀>>🍏
✨ About the interview questions and interview process of major companies, you are welcome to view your personal home page ✨


❤️ Pay attention to the official account of Suzhou procedures ❤️

👇 👇👇

16 October 2021, 03:28 | Views: 8982

Add new comment

For adding a comment, please log in
or create account

0 comments