The security of Docker container largely depends on the Linux system itself. When evaluating the security of Docker, the following aspects are mainly considered:
Linux The kernel's namespace mechanism provides container isolation security Linux The control group mechanism controls the container resource security. Linux Operation permission security brought by kernel capability mechanism Docker The anti attack ability of the program (especially the server) itself. The impact of other security enhancement mechanisms on container security.
Namespace isolated security:
When docker run starts a container, Docker will create an independent Namespace for the container in the background. Namespaces provide the most basic and direct isolation. Compared with the virtual machine mode, the isolation through Linux namespace is not so complete. In the Linux kernel, there are many resources and objects that cannot be Namespace, such as time.
Security for controlling group resources:
When docker run When starting a container, Docker A separate control group policy collection is created in the background for the container. Linux Cgroups It provides many useful features to ensure that each container can share the memory of the host fairly CPU,disk IO And other resources. Ensure that when the resource pressure in the container does not affect the local host system and other containers, it is preventing denial of service attacks( DDoS)Aspects are essential.
Kernel capability mechanism:
Capability mechanism( Capability)yes Linux The kernel is a powerful feature that provides fine-grained access control. In most cases, containers do not need to be "real" root Permission, the container only needs a few capabilities. By default, Docker The "white list" mechanism is adopted to disable other permissions other than "required functions".
Docker Server Protection:
use Docker The core of the container is Docker The server ensures that only trusted users can access it Docker Service. Place the container's root Users are mapped to non users on the local host root Users, reduce the security problems caused by privilege promotion between the container and the host. allow Docker The server is not root Run under permissions, and use safe and reliable sub processes to perform operations requiring privileged permissions. These sub processes are only allowed to operate within a specific range.
Other safety features:
Enable in kernel GRSEC and PAX，This will add more compilation and runtime security checks; and avoid malicious detection through address randomization mechanism. Enabling this feature does not need to Docker Make any configuration. Use some container templates with enhanced security features. Users can customize more stringent access control mechanisms to customize security policies. When the file system is mounted inside the container, the read-only mode can be configured to prevent applications in the container from damaging the external environment through the file system, especially some directories related to the running state of the system.
2. Container resource control
The full name of Linux Cgroups is Linux Control Group.
We can only limit the upper limit of resources used by a process: including CPU, memory, disk, network bandwidth, etc.
The operating interface exposed by Linux Cgroups to users is the file system.
It is organized in the form of files and directories under the / sys/fs/cgroup path of the operating system.
Under / sys/fs/cgroup, there are many subdirectories such as cpuset, cpu and memory, also known as subsystems.
Under each subsystem, create a control group (that is, create a new directory) for each container.
The value to be filled in the resource file under the control group is specified by the parameters when the user executes docker run.
2.1. Memory limit
The available memory of the container consists of two parts: physical memory and swap partition.
Enter the container limit directory and check the memory limit. It is consistent with the local limit and inherits the local limit
Container restriction directory
docker run --help |grep mem ##View the usage of docker run for memory limitation
docker run -it --memory 200M -d --name demo nginx ##Limit the memory size to 200M and pull up the container cd /sys/fs/cgroup/memory/docker/
The maximum viewing memory limit is 200M
cat memory.limit_in_bytes cat memory.memsw.limit_in_bytes
Import stress: a small Linux distribution that can run directly on a bootable CD or through PXE.
docker load -i stress.tar docker tag reg.westos.org/library/stress.latest stress:latest docker rmi reg.westos.org/library/stress
We created a special directory x1 for testing memory
cd /sys/fs/cgroup/memory mkdir x1 cd x1/ ls (He will directly inherit all files in the parent directory)
Then we limit the maximum upload memory to 200M
echo 209715200 > memory.limit_in_bytes
Download the libcgroup tools tool
yum install -y libcgroup-tools.x86_64
cd /dev/shm/ cgexec -g memory:x1 dd if=/dev/zero of=bigfile bs=1M count=100 cgexec -g memory:x1 dd if=/dev/zero of=bigfile bs=1M count=300 #All successful
Therefore, when we close the swap partition, the test of 300M space will fail
However, in actual production, the swap partition will be used and cannot be closed
rm -f bigfile cd /sys/fs/cgroup/memory/x1 echo 209715200 > memory.memsw.limit_in_bytes %Memory+swap A total of 200 M
So we need to find other ways to solve the problem just now
We can limit 200M to memory + swap, a total of 200M
cgexec -g memory:x1 dd if=/dev/zero of=bigfile bs=1M count=300
You can see that the creation of 300M space failed directly
2.2.2 CPU limit
cpu.shares represents the priority of the file
cpu.cfs_period_us indicates the total amount. The total amount is 100000. It is a scheduling cycle of CFS algorithm. The general value is 100000. The unit is microseconds, which is 100ms
cpu.cfs_quota_us, which represents the allowable running time of the control group in a scheduling cycle in the CFS algorithm. For example, when the value is 50000, it is 50ms.
If this value cpu.cfs_quota_us is divided by the scheduling cycle cpu.cfs_period_us,50ms/100ms=0.2, it means that the maximum allowed CPU quota of this control group is 0.2 CPUs
docker run -it --cpu-quota 20000 --rm stress -c 2
As shown in the figure
2.3.Block IO limit (disk IO)
– device write bps limits the bps of the write device
The current block IO limit is only valid for direct IO. (file cache is not used)
docker run -it --rm --device-write-bps /dev/vda:30MB ubuntu dd if=/dev/zero of=bigfile bs=1M count=200 oflag=direct
With oflag=direct parameter, the speed is about 30MB, and without parameter, the speed is 2G
3.docker safety reinforcement
3.1 enhance docker container isolation and resource visibility with LXCFS
yum install lxcfs-2.0.5-3.el7.centos.x86_64.rpm -y lxcfs /var/lib/lxcfs & ##function docker run -it --memory 256M -v /var/lib/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw -v /var/lib/lxcfs/proc/diskstats:/proc/diskstats:rw -v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rw -v /var/lib/lxcfs/proc/stat:/proc/stat:rw -v /var/lib/lxcfs/proc/swaps:/proc/swaps:rw -v /var/lib/lxcfs/proc/uptime:/proc/uptime:rw ubuntu
You can see that the size of mem and swap are the same
When running an ordinary container, it is found that the control network is rejected, that is, our permissions are not enough
3.2. Set the container for privilege level operation
Sometimes we need the container to have more permissions, such as operating the kernel module, controlling the swap partition, mounting the USB disk, modifying the MAC address, etc.
docker run -it --rm --privileged=true busyboxplus
ip link set down eth0
3.2.1 set container white list
Because the permission is too large after adding the parameter – privileged=true, you can limit the permission through the container permission white list
– cap add = net_admin means to put the permission of network operation into the white list, and users can only perform network operation
3.3 isolating containers using user namespaces
Linux namespaces provide isolation for running processes and restrict their access to system resources, which processes are not aware of. For more information about Linux namespaces, see Linux namespaces.
The best way to prevent privilege escalation attacks within a container is to configure the container's application to run as a non privileged user. For containers whose processes must run as root in the container, this user can be remapped to a user with lower permissions on the Docker host. The mapped user is assigned a series of UIDs that run in the namespace as normal UIDs from 0 to 65536, but have no privileges on the host.
systemctl stop docker systemctl stop docker.socket
Add a new user
useradd devops passwd devops
yum update shadow-utils-22.214.171.124-25.el7.x86_ The built-in version of 64 system is too low and needs to be updated
ssh devops@localhost Enter user
Installation prompt when error is found
user.max_user_namespaces = 28633
export PATH=/usr/bin:$PATH export DOCKER_HOST=unix:///run/user/1001/docker.sock
dockerd-rootless.sh --storage-driver=vfs &
3.4 safety reinforcement ideas
Ensure the security of the image Use secure base mirroring Delete in mirror setuid and setgid jurisdiction Enable Docker Content trust Minimum installation principle Scan the image for security vulnerabilities. The image security scanner: Clair Container use non root User run Ensure the safety of the container yes docker The host computer shall be reinforced Limit network traffic between containers to configure Docker Daemon TLS Authentication Enable user namespace support(userns-remap) Limit the memory usage of the container Set the container properly CPU priority