This article was first published in my public number Linux cloud computing network (id: cloud_dev), focusing on dry goods sharing. There are 10T books and video resources in the number. You can get it by replying to "1024" in the background. Welcome your attention. The two-dimensional code can be scanned at the end.
In the last article, we have discussed the concept and basic principle of VxLAN. This article is a practice of VxLAN based on Linux. If you don't understand the relevant concepts, you can read the article first.
01 Linux Support for VxLAN
First, look at Linux's support for VxLAN. Linux's support for VxLAN protocol was not long ago. Stephen Hemminger merged the related work into kernel in 2012 and finally appeared in kernel version 3.7.0. For stability and a lot of functionality, you can see that some software is recommended to use VxLAN on kernels after 3.9.0 or 3.10.0.
These versions of Linux support VxLAN fully, supporting unicast and multicast, IPv4 and IPv6. Using man to view the link subcommand of ip, you can see if there is a vxlan type, as follows:
man ip link
The following experiments were performed in the following environments:
- Operating System Version: CentOS Linux release 7.4.1708 (Core)
- Kernel version: 3.10.0-693.2.2.el7.x86_64
- Cloud virtual machine vm1 eth0 network interface IP 172.31.0.106, cloud virtual machine vm2 eth0 network interface IP 172.31.0.107
02 Experiment 1: The simplest point-to-point VxLAN
Creating a simple point-to-point VxLAN environment is very simple. As shown in the following figure, only one Vxlan-type network interface is needed in two machines (physical machine or virtual machine, in this experiment, virtual machine environment on cloud), and vxlan-type interface vxlan1 can be used as VTE.
In the above environment, note that we configure the IP address of the vxlan network interface in 10.0.0/24 segment. After IP address assignment, the routing table of Linux system will create a route to go out to the network interface vxlan1 for messages in 10.0.0.0/24 segments. vm1 to 10.0.0.0/24 message, VxLAN package will be done on vxlan1, the inner address is 10.0.106, the outer address is 172.31.0.106. VxLAN message achieves VETP vxlan1 on vm2 through physical network, and decomposes VxLAN protocol on vxlan1 interface of vm2, thus ending the whole process.
The figure above is a physical sketch. The VxLAN overlay network environment formed logically is as follows. The dotted line part shows that the Overlay Network and VxLAN Tunnel are both logical concepts. If containers and virtual machines are connected to the logical Overlay network 10.0.0/24, they do not need to perceive the underlying physical network at all. It seems that the opposite end is in the same two-tier environment as themselves. It is like building a VxLAN Tunnel directly on the VTEP device, and directly linking the network interface in the Overlay network to the second layer. Layer to layer.
The specific configuration requires only three commands. The following commands are executed on vm1:
# ip link add vxlan1 type vxlan id 1 remote 172.31.0.107 dstport 4789 dev eth0 # ip link set vxlan1 up # ip addr add 10.0.0.106/24 dev vxlan1
The first command above creates a network interface of type vxlan on Linux called vxlan1.
- Id: The VNI identifier is 1.
- Remote: As a VTEP device to encapsulate and unpack VXLAN messages, it is necessary to know which VXLAN messages are sent to the opposite VTEP. Linux can use group to specify multicast group address, or remote to specify the unicast address of the opposite end. Multicast is not supported by default in the experimental cloud environment, where remote is used to specify a point-to-point IP address of 172.31.0.107.
- Dstport: The specified destination port is 4789. Because when VXLAN was first implemented in version 3.7 of the Linux kernel, UDP ports were not specified. Many vendors use 8472, and Linux uses the same port. Later, IANA allocated 4789 as the UDP port for VXLAN. If you need to use the IANA port, you need to specify it with dstport.
- dev: Specifies which physical device VTEP communicates through, here using eth0.
The second command up dates the vxlan1 interface. The third command assigns the device an IP address of 10.0.0.106 and a subnet mask of 24 (255.255.255.0).
On vm2, a network interface named vxlan1 is created using a similar method.
# ip link add vxlan1 type vxlan id 1 remote 172.31.0.106 dstport 4789 dev eth0 # ip link set vxlan1 up # ip addr add 10.0.0.107/24 dev vxlan1
The above simple commands complete all the configurations. With ifconfig, you can see the vxlan1 network interface as follows:
# ifconfig vxlan1 vxlan1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 10.0.0.106 netmask 255.255.255.0 broadcast 0.0.0.0 ether 22:2d:c4:f0:c7:29 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Look at the following routing table for vm1. Messages destined for 10.0.0.0/24 of the destination segment will follow the vxlan1 interface.
# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.31.0.253 0.0.0.0 UG 0 0 0 eth0 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 vxlan1 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 eth0 172.31.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
The opposite IP address of ping overlay network on vm1 is 10.0.0.107, which can be ping-connected.
# ping 10.0.0.107 -c 3 PING 10.0.0.107 (10.0.0.107) 56(84) bytes of data. bytes from 10.0.0.107: icmp_seq=1 ttl=64 time=0.447 ms bytes from 10.0.0.107: icmp_seq=2 ttl=64 time=0.361 ms bytes from 10.0.0.107: icmp_seq=3 ttl=64 time=0.394 ms --- 10.0.0.107 ping statistics --- packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.361/0.400/0.447/0.042 ms
At the same time of ping package, we use tcpdump to catch the package of vm1 eth0 network card. Because the packet was encapsulated by the network interface vxlan1 before it reached eth0, the complete VxLAN message should be seen in the result of packet grabbing.
When grabbing a package, only messages communicating with the opposite end 172.31.0.107 can be grabbed, as follows:
# tcpdump -i eth0 host 172.31.0.107 -s0 -v -w vxlan_vni_1.pcap
The results are as follows: wireshark automatically identifies messages with UDP destination port 4789 as VxLAN messages, directly displays the inner messages, and protocol is ICMP protocol. If you use the default Linux interface 8472, the UDP protocol should be displayed, and the protocol settings of wireshark need to be modified to identify it as VxLAN.
03 Experiment 2: Container Cross-host Communication
The simplest point-to-point VxLAN experiment above is just a brief demonstration, not much practical engineering significance. In this section, container communication is used to demonstrate a more complete scenario.
Scenario Description: Deploy a docker container on vm1 and vm2. By default, a container on a container host can communicate directly with a private IP address because they are connected by a network bridge. Containers on different hosts cannot communicate directly with private IP addresses. The network construction in docker deployment software such as k8s actually completes this part of work, so that the containers of different hosts can communicate directly. This section uses native docker and vxlan network interface built on the host to connect containers on different hosts so that they can communicate directly using intranet IP.
Note: Because the experiment is done on a virtual machine in the cloud, the container host mentioned above uses a virtual machine in the cloud. The host computer of the container can also be a physical machine with the same experimental effect.
3.1 Prepare docker container
The process of installing docker is not expanded, and docker official documents are described in detail. After installing docker in Linux, you can see an additional network interface of docker 0, which defaults to 172.17.0.0/16 segments. This is a bridge connecting multiple local containers.
# ifconfig docker0 docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 ether 02:42:44:e8:74:e8 txqueuelen 0 (Ethernet) RX packets 6548 bytes 360176 (351.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 7489 bytes 40249455 (38.3 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Using the default 172.17.0.0/16 segment, the ip address of the docker container will be allocated from 172.17.0.2. In order to enable the containers on vm1 and vm2 to use different ip addresses, it is necessary to customize ip addresses when starting containers with docker run, while the function of customizing ip addresses by using - ip parameters can only be supported in the customized network, so a customized network is created first, and the network segment 172.18.0.0/16 is specified.
# docker network create --subnet 172.18.0.0/16 mynetwork 3231f89d69f6b3fbe2550392ebe4d00daa3d19e251f66ed2d81f61f2b9184362 # docker network ls NETWORK ID NAME DRIVER SCOPE 1cb284a6cb33 bridge bridge local 069538be0246 host host local 3231f89d69f6 mynetwork bridge local 0b7934996485 none null local
Using docker networks, you can see that a new bridge network has been created with the name mynetwork specified for me. Using ifconfig, you can see that there is an additional network interface, the name is not dockerXX, but a bridge that begins with br directly.
br-3231f89d69f6: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 172.18.0.1 netmask 255.255.0.0 broadcast 172.18.255.255 ether 02:42:97:22:a5:f9 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Create a new container as follows:
# docker run -itd --net mynetwork --ip 172.18.0.2 centos 16bbaeaaebfccd2a497e3284600f5c0ce230e89678e0ff92f6f4b738c6349f8d
- - net Specifies Custom Networks
- - IP Specifies IP Address
- centos specifies image
View the container ID and status, and log in to SHELL, as follows:
# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 16bbaeaaebfc centos "/bin/bash" 2 minutes ago Up 2 minutes condescending_swartz # docker exec -it 16bbaeaaebfc /bin/bash [root@16bbaeaaebfc /]# ifconfig bash: ifconfig: command not found
Note: docker usually uses a small size image to create containers efficiently, which means that many common tools need to be installed, such as ifconfig in centos image. You can use the yum whatprovision ifconfig command to see which package ifconfig enters and find that it belongs to the net-tools-2.0-0.22.20131004git.el7.x86_64 package. You can install it directly with Yum install net-tools-y. If you execute the ifconfig command, you can see that the IP address of the container eth0 network card is 172.18.0.2.
[root@16bbaeaaebfc /]# ifconfig eth0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.18.0.2 netmask 255.255.0.0 broadcast 172.18.255.255 ether 02:42:ac:12:00:02 txqueuelen 0 (Ethernet) RX packets 3319 bytes 19221325 (18.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2015 bytes 132903 (129.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Do the same on vm2. When creating a new container, specify an IP address of 172.18.0.3, and the container's environment is ready. In centos container on vm1, ping 172.18.0.3, which is consistent with expectations, is unable to ping.
[root@16bbaeaaebfc /]# ping 172.18.0.3 -c 2 PING 172.18.0.3 (172.18.0.3) 56(84) bytes of data. From 172.18.0.2 icmp_seq=1 Destination Host Unreachable From 172.18.0.2 icmp_seq=2 Destination Host Unreachable --- 172.18.0.3 ping statistics --- packets transmitted, 0 received, +2 errors, 100% packet loss, time 1000ms pipe 2 [root@16bbaeaaebfc /]# ping 172.18.0.1 -c 2 PING 172.18.0.1 (172.18.0.1) 56(84) bytes of data. bytes from 172.18.0.1: icmp_seq=1 ttl=64 time=0.060 ms bytes from 172.18.0.1: icmp_seq=2 ttl=64 time=0.079 ms --- 172.18.0.1 ping statistics --- packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.060/0.069/0.079/0.012 ms
3.2 Create VxLAN interface access docker Bridge
First of all, I will sort out the operation of docker and docker container in the Linux host network module. After sorting out, I will find that the method of opening docker container on different hosts is very simple. From the perspective of the host Linux system, network devices in the operating system are summarized as follows:
- docker0 interface: the bridge is created by default after installing the docker. The segment is 172.17.0.0/16, and the default IP address of the bridge is 172.17.0.1.
- br-xxxx interface: the bridge is created after the creation of a custom docker network. The segment is 172.18.0.0/16 designated by the user, and the default IP address of the bridge is 172.18.0.1.
- vethxxxx interface: veth network interface, created after creating a specific docker container, if there are N running containers, there will be N veth network interface. The eth0 interface in the container and the veth network interface of the host are a veth network pair. The veth interface on Linux is connected to the docker bridge as a port, such as docker0 or other custom bridges. This is why docker containers on a host can communicate by default because they are created and then connected to the same bridge.
To facilitate understanding, two containers are created in the default segment 172.17.0.0/16, and a docker container has been created above the custom segment. The bridge and its interface are viewed using btctl as follows:
# brctl show bridge name bridge id STP enabled interfaces br-3231f89d69f6 8000.02429722a5f9 no veth2fa4c50 docker0 8000.024244e874e8 no vethc7cd982 vethd3d0c18
As can be seen from the output above, there are two network interfaces on the default bridge docker0, vethc7cd982 and vethd3d0c18. On a port defined as network bridge br-3231f89d69f6, veth2fa4c50 network interface is accessed. The three Veth network interfaces are connected to eth0 network interface of a docker container, vethc7cd982 and vethd3d0c18, which are connected to the same bridge by default.
With the above combing and the basic knowledge of VXLAN network interface in the first section of this article, it is sure that the method of getting through docker containers on different hosts is also clear. The idea is to create a VXLAN interface on each of the two container hosts and connect the VXLAN interface to the port of the docker bridge, as follows:
With the connection of VXLAN interface, when the package from docker container on vm1 arrives at docker bridge, the message can be encapsulated into VXLAN message at VETP(VXLAN interface), and then from physical network to host vm2 where VETP is located. If the VTEP on the opposite end can correctly unpack the VXLAN message, then the message can be sent to the docker container on the upper level through the docker bridge on vm2.
Specific configuration is as follows, on vm1:
# ip link add vxlan_docker type vxlan id 200 remote 172.31.0.107 dstport 4789 dev eth0 # ip link set vxlan_docker up # brctl addif br-3231f89d69f6 vxlan_docker
- The first command creates a VXLAN network interface with VNI 2000, named vxlan_docker, with parameters similar to those in Scenario 1.
- The third command connects the newly created VXLAN interface vxlan_docker to docker bridge br-3231f89d69f6.
On vm2, enter the following commands:
# ip link add vxlan_docker type vxlan id 200 remote 172.31.0.106 dstport 4789 dev eth0 # ip link set vxlan_docker up # brctl addif br-f4b35af34313 vxlan_docker
ping 172.18.0.3 on the docker container of vm1. The results are as follows: ping can pass. Notice the RTT time. The RTT of ping 172.18.0.3 is at 10 ^(-1) milliseconds level, and the RTT of ping 172.18.0.1 is at 10 ^(-2) milliseconds level. The former is the delay of walking physical network, and the latter is the delay of protocol stack. There is a magnitude difference between the two.
# docker exec -it 16bbaeaaebfc ifconfig eth0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.18.0.2 netmask 255.255.0.0 broadcast 172.18.255.255 ether 02:42:ac:12:00:02 txqueuelen 0 (Ethernet) RX packets 3431 bytes 19230266 (18.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2132 bytes 141908 (138.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 # docker exec -it 16bbaeaaebfc ping 172.18.0.3 -c 2 PING 172.18.0.3 (172.18.0.3) 56(84) bytes of data. bytes from 172.18.0.3: icmp_seq=1 ttl=64 time=0.544 ms bytes from 172.18.0.3: icmp_seq=2 ttl=64 time=0.396 ms --- 172.18.0.3 ping statistics --- packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.396/0.470/0.544/0.074 ms # # docker exec -it 16bbaeaaebfc ping 172.18.0.1 -c 2 PING 172.18.0.1 (172.18.0.1) 56(84) bytes of data. bytes from 172.18.0.1: icmp_seq=1 ttl=64 time=0.072 ms bytes from 172.18.0.1: icmp_seq=2 ttl=64 time=0.072 ms --- 172.18.0.1 ping statistics --- packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms
04 Summary
Finally, the purpose of this section is to demonstrate the use of Linux VxLAN and construct this simple but not practical scenario. VxLAN is used to connect the containers across the host environment from the second tier. In engineering, there are many aspects to consider when making container cross-host communication, and many projects are devoted to this research. Flannel, for example, provides a virtual network for the container by assigning a subnet to each host. It is based on Linux TUN/TAP, uses UDP to encapsulate IP packets to realize L3 overlay network, and uses etcd to maintain network allocation.
Reference sources: https://www.cnblogs.com/wipan...
Backstage response of the Public Number "Jiaqun" will bring you into the elite communication group like clouds.
My public number is "Linux Cloud Computing Network" (id: cloud_dev). There are 10T books and video resources in the number. Background reply "1024" is available. Shared content includes but is not limited to Linux, network, cloud computing virtualization, container Docker, OpenStack, Kubernetes, tools, SDN, OVS, DPDK, Go, Python, C/C++. Programming technology and other content, you are welcome to pay attention to.