Practice VxLAN under Linux

This article was first published in my public number Linux cloud computing network (id: cloud_dev), focusing on dry goods sharing. There are 10T books and video resources in the number. You can get it by replying to "1024" in the background. Welcome your attention. The two-dimensional code can be scanned at the end.

In the last article, we have discussed the concept and basic principle of VxLAN. This article is a practice of VxLAN based on Linux. If you don't understand the relevant concepts, you can read the article first.

01 Linux Support for VxLAN

First, look at Linux's support for VxLAN. Linux's support for VxLAN protocol was not long ago. Stephen Hemminger merged the related work into kernel in 2012 and finally appeared in kernel version 3.7.0. For stability and a lot of functionality, you can see that some software is recommended to use VxLAN on kernels after 3.9.0 or 3.10.0.

These versions of Linux support VxLAN fully, supporting unicast and multicast, IPv4 and IPv6. Using man to view the link subcommand of ip, you can see if there is a vxlan type, as follows:

man ip link

The following experiments were performed in the following environments:

  • Operating System Version: CentOS Linux release 7.4.1708 (Core)
  • Kernel version: 3.10.0-693.2.2.el7.x86_64
  • Cloud virtual machine vm1 eth0 network interface IP 172.31.0.106, cloud virtual machine vm2 eth0 network interface IP 172.31.0.107

02 Experiment 1: The simplest point-to-point VxLAN

Creating a simple point-to-point VxLAN environment is very simple. As shown in the following figure, only one Vxlan-type network interface is needed in two machines (physical machine or virtual machine, in this experiment, virtual machine environment on cloud), and vxlan-type interface vxlan1 can be used as VTE.

In the above environment, note that we configure the IP address of the vxlan network interface in 10.0.0/24 segment. After IP address assignment, the routing table of Linux system will create a route to go out to the network interface vxlan1 for messages in 10.0.0.0/24 segments. vm1 to 10.0.0.0/24 message, VxLAN package will be done on vxlan1, the inner address is 10.0.106, the outer address is 172.31.0.106. VxLAN message achieves VETP vxlan1 on vm2 through physical network, and decomposes VxLAN protocol on vxlan1 interface of vm2, thus ending the whole process.

The figure above is a physical sketch. The VxLAN overlay network environment formed logically is as follows. The dotted line part shows that the Overlay Network and VxLAN Tunnel are both logical concepts. If containers and virtual machines are connected to the logical Overlay network 10.0.0/24, they do not need to perceive the underlying physical network at all. It seems that the opposite end is in the same two-tier environment as themselves. It is like building a VxLAN Tunnel directly on the VTEP device, and directly linking the network interface in the Overlay network to the second layer. Layer to layer.

The specific configuration requires only three commands. The following commands are executed on vm1:

# ip link add vxlan1 type vxlan id 1 remote 172.31.0.107 dstport 4789 dev eth0
# ip link set vxlan1 up
# ip addr add 10.0.0.106/24 dev vxlan1

The first command above creates a network interface of type vxlan on Linux called vxlan1.

  • Id: The VNI identifier is 1.
  • Remote: As a VTEP device to encapsulate and unpack VXLAN messages, it is necessary to know which VXLAN messages are sent to the opposite VTEP. Linux can use group to specify multicast group address, or remote to specify the unicast address of the opposite end. Multicast is not supported by default in the experimental cloud environment, where remote is used to specify a point-to-point IP address of 172.31.0.107.
  • Dstport: The specified destination port is 4789. Because when VXLAN was first implemented in version 3.7 of the Linux kernel, UDP ports were not specified. Many vendors use 8472, and Linux uses the same port. Later, IANA allocated 4789 as the UDP port for VXLAN. If you need to use the IANA port, you need to specify it with dstport.
  • dev: Specifies which physical device VTEP communicates through, here using eth0.

The second command up dates the vxlan1 interface. The third command assigns the device an IP address of 10.0.0.106 and a subnet mask of 24 (255.255.255.0).

On vm2, a network interface named vxlan1 is created using a similar method.

# ip link add vxlan1 type vxlan id 1 remote 172.31.0.106 dstport 4789 dev eth0
# ip link set vxlan1 up
# ip addr add 10.0.0.107/24 dev vxlan1

The above simple commands complete all the configurations. With ifconfig, you can see the vxlan1 network interface as follows:

# ifconfig vxlan1
vxlan1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.0.0.106  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 22:2d:c4:f0:c7:29  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Look at the following routing table for vm1. Messages destined for 10.0.0.0/24 of the destination segment will follow the vxlan1 interface.

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.31.0.253    0.0.0.0         UG    0      0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 vxlan1
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eth0
172.31.0.0      0.0.0.0         255.255.255.0   U     0      0        0 eth0

The opposite IP address of ping overlay network on vm1 is 10.0.0.107, which can be ping-connected.

# ping 10.0.0.107 -c 3
PING 10.0.0.107 (10.0.0.107) 56(84) bytes of data.
bytes from 10.0.0.107: icmp_seq=1 ttl=64 time=0.447 ms
bytes from 10.0.0.107: icmp_seq=2 ttl=64 time=0.361 ms
bytes from 10.0.0.107: icmp_seq=3 ttl=64 time=0.394 ms

--- 10.0.0.107 ping statistics ---
packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.361/0.400/0.447/0.042 ms

At the same time of ping package, we use tcpdump to catch the package of vm1 eth0 network card. Because the packet was encapsulated by the network interface vxlan1 before it reached eth0, the complete VxLAN message should be seen in the result of packet grabbing.

When grabbing a package, only messages communicating with the opposite end 172.31.0.107 can be grabbed, as follows:

# tcpdump -i eth0 host 172.31.0.107 -s0 -v -w vxlan_vni_1.pcap

The results are as follows: wireshark automatically identifies messages with UDP destination port 4789 as VxLAN messages, directly displays the inner messages, and protocol is ICMP protocol. If you use the default Linux interface 8472, the UDP protocol should be displayed, and the protocol settings of wireshark need to be modified to identify it as VxLAN.

03 Experiment 2: Container Cross-host Communication

The simplest point-to-point VxLAN experiment above is just a brief demonstration, not much practical engineering significance. In this section, container communication is used to demonstrate a more complete scenario.

Scenario Description: Deploy a docker container on vm1 and vm2. By default, a container on a container host can communicate directly with a private IP address because they are connected by a network bridge. Containers on different hosts cannot communicate directly with private IP addresses. The network construction in docker deployment software such as k8s actually completes this part of work, so that the containers of different hosts can communicate directly. This section uses native docker and vxlan network interface built on the host to connect containers on different hosts so that they can communicate directly using intranet IP.

Note: Because the experiment is done on a virtual machine in the cloud, the container host mentioned above uses a virtual machine in the cloud. The host computer of the container can also be a physical machine with the same experimental effect.

3.1 Prepare docker container

The process of installing docker is not expanded, and docker official documents are described in detail. After installing docker in Linux, you can see an additional network interface of docker 0, which defaults to 172.17.0.0/16 segments. This is a bridge connecting multiple local containers.

# ifconfig docker0
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:44:e8:74:e8  txqueuelen 0  (Ethernet)
        RX packets 6548  bytes 360176 (351.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7489  bytes 40249455 (38.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Using the default 172.17.0.0/16 segment, the ip address of the docker container will be allocated from 172.17.0.2. In order to enable the containers on vm1 and vm2 to use different ip addresses, it is necessary to customize ip addresses when starting containers with docker run, while the function of customizing ip addresses by using - ip parameters can only be supported in the customized network, so a customized network is created first, and the network segment 172.18.0.0/16 is specified.

# docker network create --subnet 172.18.0.0/16 mynetwork
3231f89d69f6b3fbe2550392ebe4d00daa3d19e251f66ed2d81f61f2b9184362
# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
1cb284a6cb33        bridge              bridge              local
069538be0246        host                host                local
3231f89d69f6        mynetwork           bridge              local
0b7934996485        none                null                local

Using docker networks, you can see that a new bridge network has been created with the name mynetwork specified for me. Using ifconfig, you can see that there is an additional network interface, the name is not dockerXX, but a bridge that begins with br directly.

br-3231f89d69f6: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.18.0.1  netmask 255.255.0.0  broadcast 172.18.255.255
        ether 02:42:97:22:a5:f9  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Create a new container as follows:

# docker run -itd --net mynetwork --ip 172.18.0.2 centos
16bbaeaaebfccd2a497e3284600f5c0ce230e89678e0ff92f6f4b738c6349f8d
  • - net Specifies Custom Networks
  • - IP Specifies IP Address
  • centos specifies image

View the container ID and status, and log in to SHELL, as follows:

# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
16bbaeaaebfc        centos              "/bin/bash"         2 minutes ago       Up 2 minutes                            condescending_swartz
# docker exec -it 16bbaeaaebfc /bin/bash
[root@16bbaeaaebfc /]# ifconfig
bash: ifconfig: command not found

Note: docker usually uses a small size image to create containers efficiently, which means that many common tools need to be installed, such as ifconfig in centos image. You can use the yum whatprovision ifconfig command to see which package ifconfig enters and find that it belongs to the net-tools-2.0-0.22.20131004git.el7.x86_64 package. You can install it directly with Yum install net-tools-y. If you execute the ifconfig command, you can see that the IP address of the container eth0 network card is 172.18.0.2.

[root@16bbaeaaebfc /]# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.18.0.2  netmask 255.255.0.0  broadcast 172.18.255.255
        ether 02:42:ac:12:00:02  txqueuelen 0  (Ethernet)
        RX packets 3319  bytes 19221325 (18.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2015  bytes 132903 (129.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Do the same on vm2. When creating a new container, specify an IP address of 172.18.0.3, and the container's environment is ready. In centos container on vm1, ping 172.18.0.3, which is consistent with expectations, is unable to ping.

[root@16bbaeaaebfc /]# ping 172.18.0.3 -c 2
PING 172.18.0.3 (172.18.0.3) 56(84) bytes of data.
From 172.18.0.2 icmp_seq=1 Destination Host Unreachable
From 172.18.0.2 icmp_seq=2 Destination Host Unreachable

--- 172.18.0.3 ping statistics ---
packets transmitted, 0 received, +2 errors, 100% packet loss, time 1000ms
pipe 2
[root@16bbaeaaebfc /]# ping 172.18.0.1 -c 2
PING 172.18.0.1 (172.18.0.1) 56(84) bytes of data.
bytes from 172.18.0.1: icmp_seq=1 ttl=64 time=0.060 ms
bytes from 172.18.0.1: icmp_seq=2 ttl=64 time=0.079 ms

--- 172.18.0.1 ping statistics ---
packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.060/0.069/0.079/0.012 ms

3.2 Create VxLAN interface access docker Bridge

First of all, I will sort out the operation of docker and docker container in the Linux host network module. After sorting out, I will find that the method of opening docker container on different hosts is very simple. From the perspective of the host Linux system, network devices in the operating system are summarized as follows:

  • docker0 interface: the bridge is created by default after installing the docker. The segment is 172.17.0.0/16, and the default IP address of the bridge is 172.17.0.1.
  • br-xxxx interface: the bridge is created after the creation of a custom docker network. The segment is 172.18.0.0/16 designated by the user, and the default IP address of the bridge is 172.18.0.1.
  • vethxxxx interface: veth network interface, created after creating a specific docker container, if there are N running containers, there will be N veth network interface. The eth0 interface in the container and the veth network interface of the host are a veth network pair. The veth interface on Linux is connected to the docker bridge as a port, such as docker0 or other custom bridges. This is why docker containers on a host can communicate by default because they are created and then connected to the same bridge.

To facilitate understanding, two containers are created in the default segment 172.17.0.0/16, and a docker container has been created above the custom segment. The bridge and its interface are viewed using btctl as follows:

# brctl show
bridge name    bridge id        STP enabled    interfaces
br-3231f89d69f6        8000.02429722a5f9    no        veth2fa4c50
docker0        8000.024244e874e8    no        vethc7cd982
                                       vethd3d0c18

As can be seen from the output above, there are two network interfaces on the default bridge docker0, vethc7cd982 and vethd3d0c18. On a port defined as network bridge br-3231f89d69f6, veth2fa4c50 network interface is accessed. The three Veth network interfaces are connected to eth0 network interface of a docker container, vethc7cd982 and vethd3d0c18, which are connected to the same bridge by default.

With the above combing and the basic knowledge of VXLAN network interface in the first section of this article, it is sure that the method of getting through docker containers on different hosts is also clear. The idea is to create a VXLAN interface on each of the two container hosts and connect the VXLAN interface to the port of the docker bridge, as follows:

With the connection of VXLAN interface, when the package from docker container on vm1 arrives at docker bridge, the message can be encapsulated into VXLAN message at VETP(VXLAN interface), and then from physical network to host vm2 where VETP is located. If the VTEP on the opposite end can correctly unpack the VXLAN message, then the message can be sent to the docker container on the upper level through the docker bridge on vm2.

Specific configuration is as follows, on vm1:

# ip link add vxlan_docker type vxlan id 200 remote 172.31.0.107 dstport 4789 dev eth0
# ip link set vxlan_docker up
# brctl addif br-3231f89d69f6 vxlan_docker
  • The first command creates a VXLAN network interface with VNI 2000, named vxlan_docker, with parameters similar to those in Scenario 1.
  • The third command connects the newly created VXLAN interface vxlan_docker to docker bridge br-3231f89d69f6.

On vm2, enter the following commands:

# ip link add vxlan_docker type vxlan id 200 remote 172.31.0.106 dstport 4789 dev eth0
# ip link set vxlan_docker up
# brctl addif br-f4b35af34313 vxlan_docker

ping 172.18.0.3 on the docker container of vm1. The results are as follows: ping can pass. Notice the RTT time. The RTT of ping 172.18.0.3 is at 10 ^(-1) milliseconds level, and the RTT of ping 172.18.0.1 is at 10 ^(-2) milliseconds level. The former is the delay of walking physical network, and the latter is the delay of protocol stack. There is a magnitude difference between the two.

# docker exec -it 16bbaeaaebfc ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.18.0.2  netmask 255.255.0.0  broadcast 172.18.255.255
        ether 02:42:ac:12:00:02  txqueuelen 0  (Ethernet)
        RX packets 3431  bytes 19230266 (18.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2132  bytes 141908 (138.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# docker exec -it 16bbaeaaebfc ping 172.18.0.3 -c 2
PING 172.18.0.3 (172.18.0.3) 56(84) bytes of data.
bytes from 172.18.0.3: icmp_seq=1 ttl=64 time=0.544 ms
bytes from 172.18.0.3: icmp_seq=2 ttl=64 time=0.396 ms

--- 172.18.0.3 ping statistics ---
packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.396/0.470/0.544/0.074 ms
#
# docker exec -it 16bbaeaaebfc ping 172.18.0.1 -c 2
PING 172.18.0.1 (172.18.0.1) 56(84) bytes of data.
bytes from 172.18.0.1: icmp_seq=1 ttl=64 time=0.072 ms
bytes from 172.18.0.1: icmp_seq=2 ttl=64 time=0.072 ms

--- 172.18.0.1 ping statistics ---
packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms

04 Summary

Finally, the purpose of this section is to demonstrate the use of Linux VxLAN and construct this simple but not practical scenario. VxLAN is used to connect the containers across the host environment from the second tier. In engineering, there are many aspects to consider when making container cross-host communication, and many projects are devoted to this research. Flannel, for example, provides a virtual network for the container by assigning a subnet to each host. It is based on Linux TUN/TAP, uses UDP to encapsulate IP packets to realize L3 overlay network, and uses etcd to maintain network allocation.

Reference sources: https://www.cnblogs.com/wipan...

Backstage response of the Public Number "Jiaqun" will bring you into the elite communication group like clouds.

My public number is "Linux Cloud Computing Network" (id: cloud_dev). There are 10T books and video resources in the number. Background reply "1024" is available. Shared content includes but is not limited to Linux, network, cloud computing virtualization, container Docker, OpenStack, Kubernetes, tools, SDN, OVS, DPDK, Go, Python, C/C++. Programming technology and other content, you are welcome to pay attention to.

Tags: Linux network Docker CentOS

Posted on Mon, 29 Jul 2019 04:06:32 -0400 by hogleg