Learning Go programming and linux containerization by reimplementing https://github.com/lizrice/containers-from-scratch step by step
Notice: Every step was tagged in this repository. You can jump to the final state of each step by the following git command:
git checkout -f stepx
git checkout -f step1
insert run() function, print out all arguments after the third one os.Args[2:]
$go run main.go run Hello, world
Running [Hello, world]
git checkout -f step2
modify run() function to enable executing the command in arguments
$go run main.go run echo Hello, world
Running [echo Hello, world]
Hello, world
git checkout -f step3
By enabling UTS Namespace, the hostname can be changed without affecting host process hostname.
root@ubuntu18:$go run main.go run /bin/bash
Running [/bin/bash]
root@ubuntu18:$hostname
ubuntu18
root@ubuntu18:$hostname container
root@ubuntu18:$hostname
container
while in parent shell hostname remains unchanged:
root@ubuntu18:$hostname
ubuntu18
git checkout -f step4
In order to display an updated hostname in bash, add a child
function and let run
to execute this child
in the new UTS namespace with updated hostname.
root@ubuntu18:$go run main.go run /bin/bash
Running [/bin/bash]
Running in child [/bin/bash]
root@container:$hostname
container
And if you run ps
in this containerized bash, you still can find all parent processes of this bash process, and the PIDs are also still the number of PIDs in host OS, which means still not containerized
completely.
root@container:$ps
$ ps
PID TTY TIME CMD
27070 pts/10 00:00:00 sudo
27071 pts/10 00:00:00 bash
27823 pts/10 00:00:00 go
27844 pts/10 00:00:00 main
27848 pts/10 00:00:00 exe
27852 pts/10 00:00:00 bash
28292 pts/10 00:00:00 ps
root@container:$ps fax
...
27070 pts/10 S 0:00 | | \_ sudo /bin/bash
27071 pts/10 S 0:00 | | \_ /bin/bash
27823 pts/10 SLl 0:00 | | \_ go run main.go run /bin/bash
27844 pts/10 SLl 0:00 | | \_ /tmp/go-build645762770/b0
27848 pts/10 SLl 0:00 | | \_ /proc/self/exe child
27852 pts/10 S 0:00 | | \_ /bin/bash
28779 pts/10 R+ 0:00 | | \_ ps fax
git checkout -f step5
In run
function, let the PID in the containerized bash start from 1 by enabling PID Namespace.
root@ubuntu18:$go run main.go run /bin/bash
Running [/bin/bash] as 30582
Running in child [/bin/bash] as 1
But if you run ps
again, you still get all parent processes of this bash process, and the PIDs are still the number of PIDs in host OS. This is beause these processes information are from /proc
folder which is not isolated by UTS and PID namespace, and therefore still shared between host and containerized bash.
root@container:$ps
PID TTY TIME CMD
30191 pts/10 00:00:00 sudo
30192 pts/10 00:00:00 bash
30561 pts/10 00:00:00 go
30582 pts/10 00:00:00 main
30586 pts/10 00:00:00 exe
30590 pts/10 00:00:00 bash
30818 pts/10 00:00:00 ps
git checkout -f step6
In order to isolate containerized process from sharing with host os /proc
, use chroot
and chdir
syscall to set a new root dir for containerized process.
Firstly, use the following command to prepare a clean ubuntu filesystem.
$ CID=$(docker create ubuntu)
$ ROOTFS=~/ubuntufs
$ docker export $CID | tar -xf - -C $ROOTFS
In child
function, call chroot
and chdir
syscall to set root direcotry to the ubuntufs folder. You can check the new root for containerized process by the following commands.
In containerized bash. You can also find that ps
will get error, as there is still nothing under /proc
.
root@container:$sleep 100
root@container:$ps
Error, do this: mount -t proc proc /proc
In host OS bash.
root@ubuntu18:$ps -C sleep
PID TTY TIME CMD
5697 pts/10 00:00:00 sleep
root@ubuntu18:$ls -l /proc/5697/root
lrwxrwxrwx 1 root root 0 Apr 28 23:36 /proc/5697/root -> /home/jizg/ubuntufs
So we actually use the extracted ubuntu:latest
image on docker hub as our new root directory.
Step7. Enable Mount Namespace in run function, and mount /proc to containerized process in child function
git checkout -f step7
By mount /proc
to the new root file system, ps
will only display processes in the containerized process.
root@container:$ps
PID TTY TIME CMD
1 ? 00:00:00 exe
5 ? 00:00:00 bash
7 ? 00:00:00 ps
In host OS.
root@ubuntu18:$mount | grep proc
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=25,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13556)
proc on /home/jizg/ubuntufs/proc type proc (rw,relatime)
After adding unshare flags for Mount Namespace, the new root directory information will not be exposed to host OS. Hence the mount | grep proc
in host OS will not return mounting info about /proc
in containerized process.
Step8. Add pids cgroup(process number controller) to child function to limit the max process number to 20 for containerized process environment
git checkout -f step8
Add a new function cg
to config the pids cgroup and set max process number to 20, call cg
in child
.
root@container:$ps
PID TTY TIME CMD
1 ? 00:00:00 exe
5 ? 00:00:00 bash
7 ? 00:00:00 ps
root@container:$sleep 100
In host OS.
root@ubuntu18:$ps -C sleep
PID TTY TIME CMD
30048 pts/10 00:00:00 sleep
root@ubuntu18:$cd /sys/fs/cgroup/pids/jizg
root@ubuntu18:$cat cgroup.procs
30038
30045
30048
And you can run :() { : | : & }; :
(a fork bomb) in the containerized bash to fork new process endless, and eventually failed when total process number reaches 20.