Docker Networking
This is a write-up of my docker networking demo at the Docker Orlando meetup.Let’s begin with the diagram.
We can see in figure 1 a server which can be a physical device or a vm(cloud or not)
connected to the Internet by an interface ens3 as its gateway interface - ens3
because kvm configures it by default.
Then we have four Linux bridges. One is docker0 which is the default linux bridge
configured when Docker is set up. It has connection to the outside world, and it has
inter-container communication enabled by default. Then we have two network segments managed by Docker
which are user-defined networks. These bridges are also Linux bridges. One of these
bridges(demo_internal) is an internal network which means it doesn’t have connection to the outside, and
the other one(demo_net) has connection to the outside world. Finally, there’s
demo_ns which is only a network namespace and NOT A VIRTUAL SERVER. This
segment has two virtual ethernet interfaces which their final end are connected
to a Linux bridge nsbr0 and to the server respectively.
Docker0
docker0 is just a Linux bridge with no modifications whatsoever managed by thedocker engine. It gives the subnet 172.17.0.0/16, so you can fire up plenty of containers
to play with. An interesting observation is to see if the host’s mac address table
can map as much containers as the network segment valid hosts.
The docker0 is part of the docker default networks. Docker supports three types
of networks: bridge(docker0), none, and host.
- Bridge is just a Linux bridge where all the containers if no network is specified
are allocated to it. The bridge network is customisable, but the docker daemon needs
to be restarted. Options of this bridge can be found indocker network inspect bridge
.
Options | Values |
---|---|
com.docker.network.bridge.default_bridge | true or false |
com.docker.network.bridge.enable_icc | true or false |
com.docker.network.bridge.enable_ip_masquerade | true or false |
com.docker.network.bridge.host_binding_ipv4 | ipv4 to bind |
com.docker.network.bridge.name | bridge name |
com.docker.network.driver.mtu | mtu |
- None disables network capabilities to containers; in other words, it’s attached to itself.
- Host adds a container on the host’s network stack.
at installation time. But let’s take a look to the network environment before doing it.
$iptables -L -v && sudo iptables -t nat -L -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
$ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:43:8b:a8 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.194/24 brd 192.168.122.255 scope global ens3
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe43:8ba8/64 scope link
valid_lft forever preferred_lft forever
$ip route show
default via 192.168.122.1 dev ens3
192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.194
Iptables is accepting everything with no rules defined. Also, the routing table onlyshows the directly connected interface and the default gateway rule. Last, there’s only
two interfaces.
Let’s install docker.
$curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
$sudo apt update && sudo apt install docker-ce -y
# Output might be slightly different from yours
$sudo iptables -L -v && sudo iptables -t nat -L -v
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER-ISOLATION all -- any any anywhere anywhere
0 0 DOCKER all -- any docker0 anywhere anywhere
0 0 ACCEPT all -- any docker0 anywhere anywhere ctstate RELATED,ESTABLISHED
0 0 ACCEPT all -- docker0 !docker0 anywhere anywhere
0 0 ACCEPT all -- docker0 docker0 anywhere anywhere
Chain DOCKER (1 references)
pkts bytes target prot opt in out source destination
Chain DOCKER-ISOLATION (1 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- any any anywhere anywhere
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- any any anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain OUTPUT (policy ACCEPT 2 packets, 137 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- any any anywhere !localhost/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT 2 packets, 137 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- any !docker0 172.17.0.0/16 anywhere
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- docker0 any anywhere anywhere
$ip addr show
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:be:33:09:ba brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
$ip route show
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
First of all, Docker enables ip forwarding at the kernel level(echo 1 > /proc/sys/net/ipv4/ip_forward
)to have communication between internal and external hosts.In addition, it creates some
firewall rules for docker isolation to
DROP
or ACCEPT
communication betweencontainers and other hosts throught the
FORWARD
chain.It’s better to check this official explanation
from the docker site about container communication.
Let’s disable inter-container communication (
--icc=false
) and create some containers. $sudo systemctl stop docker
$sudo dockerd --icc=false &
$docker run -itd --name=c_in_bridge busybox
$docker run -itd --name=c_in_internal busybox
$docker run -itd --name=c_in_net busybox
$docker run -itd --name=web_in_all httpd:2.4
# Inspecting iptables and interfaces after docker is installed
# Output has been cut
$sudo iptables -L FORWARD -v
$sudo iptables -t nat -L POSTROUTING
Chain POSTROUTING (policy ACCEPT 72 packets, 4997 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- any !docker0 172.17.0.0/16 anywhere
$ sudo iptables -L FORWARD -v
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER-ISOLATION all -- any any anywhere anywhere
0 0 DOCKER all -- any docker0 anywhere anywhere
0 0 ACCEPT all -- any docker0 anywhere anywhere ctstate RELATED,ESTABLISHED
0 0 ACCEPT all -- docker0 !docker0 anywhere anywhere
0 0 DROP all -- docker0 docker0 anywhere anywhere
$ip addr show
5: vethe4f578b@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
link/ether 26:1f:8b:1a:7b:82 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::241f:8bff:fe1a:7b82/64 scope link
valid_lft forever preferred_lft foreve
7: veth896d794@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
link/ether 3e:63:f6:3f:0e:62 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::3c63:f6ff:fe3f:e62/64 scope link
valid_lft forever preferred_lft forever
9: vethf085771@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
link/ether a6:c1:31:d7:f1:22 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::a4c1:31ff:fed7:f122/64 scope link
valid_lft forever preferred_lft forever
11: veth690dd4c@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
link/ether 6e:9a:80:a5:01:fc brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::6c9a:80ff:fea5:1fc/64 scope link
valid_lft forever preferred_lft forever
Each time a container is run with a published port(web_in_all
), docker insertsa
POSTROUTING
rule to nat the pusblished ports from the host to the container.Also, notice that the rule to communicate between container has been changed from DROP to ACCEPT
when docker is run with inter-container communication disabled. Furthermore, we see
four new virtual interfaces, one for each container. Each interface is connected
to the docker0 bridge in the global namepsace, and the other end is connected to
the container network namespace; however,
vethXXXXX
doesn’t say anything ofwhich interface belongs to which container on the host. Nonetheless, the following one liner can
help you identifying the interface index, because each veth pair is created sequentially.
$DOCKER_ID=`docker ps -aqf "name=web_in_all"`
$docker inspect --format='{{.State.Pid}}' $(DOCKER_ID) | xargs -I '{}' sudo nsenter -t '{}' -n ethtool -S eth0
NIC statistics:
peer_ifindex: 11
We can verify communication in docker0
$ docker inspect -f '{{.NetworkSettings.Networks.bridge.IPAddress}}' c_in_bridge
172.17.0.3
$ docker inspect -f '{{.NetworkSettings.Networks.bridge.IPAddress}}' web_in_all
172.17.0.2
$ docker exec c_in_bridge ping 172.17.0.1 -c 2
PING 172.17.0.1 (172.17.0.1): 56 data bytes
64 bytes from 172.17.0.1: seq=0 ttl=64 time=0.604 ms
64 bytes from 172.17.0.1: seq=1 ttl=64 time=0.367 ms
$ docker exec c_in_bridge ping 172.17.0.2 -c 2
PING 172.17.0.2 (172.17.0.2): 56 data bytes
--- 172.17.0.2 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
$ docker exec c_in_bridge ping web_in_all -c 2
ping: bad address 'web_in_all'
$ docker exec c_in_bridge ping example.com -c 2
PING example.com (93.184.216.34): 56 data bytes
64 bytes from 93.184.216.34: seq=0 ttl=52 time=25.110 ms
64 bytes from 93.184.216.34: seq=1 ttl=52 time=31.700 ms
As docker0
is run with inter-container communication disabled, there’s no communicationbetween containers. Also, there’s no dns, so no resolution using other container names.
Communication to the outside world is permited.
User-defined Networks
In this doc we are not explaining about docker swarm(vxlan) at all. PleaseDocker provides something called user-defined networks which are Linux bridges with
refer to the official documentation
to learn more.
DNS resolution without having to configure your own DNS server. It has several
network drivers: bridge, overlay, macvlan; and supports network plugins to build
your own network driver. Also, It permits to connect several containers to
different networks(including the docker0 network). As you can connect a container
to several networks, its external connectivity is provided via the first
non-internal network, in lexical order.
Docker has an option for linking containers inNow, let’s create the docker networks.docker0
, with user-defined
network I don’t see a reason to keep using it. Also, linking is not supported
in user-defined networks.
$docker network create -o "com.docker.network.kbridge.enable_icc=false" --internal demo_internal
$docker network create demo_net
$docker network ls
NETWORK ID NAME DRIVER SCOPE
77283fff31b2 bridge bridge local
22abcb2ef140 demo_internal bridge local
119bb8423775 demo_net bridge local
e2dfddfecaaa host host local
11899b8b3162 none null local
# Inspecting iptables after creating the networks
# Output has been cut
$ sudo iptables -L -v
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER-ISOLATION all -- any any anywhere anywhere
0 0 DOCKER all -- any docker0 anywhere anywhere
0 0 ACCEPT all -- any docker0 anywhere anywhere ctstate RELATED,ESTABLISHED
0 0 ACCEPT all -- docker0 !docker0 anywhere anywhere
0 0 DOCKER all -- any br-119bb8423775 anywhere anywhere
0 0 ACCEPT all -- any br-119bb8423775 anywhere anywhere ctstate RELATED,ESTABLISHED
0 0 ACCEPT all -- br-119bb8423775 !br-119bb8423775 anywhere anywhere
0 0 ACCEPT all -- br-119bb8423775 br-119bb8423775 anywhere anywhere
0 0 ACCEPT all -- br-22abcb2ef140 br-22abcb2ef140 anywhere anywhere
0 0 DROP all -- docker0 docker0 anywhere anywhere
Chain OUTPUT (policy ACCEPT 158 packets, 19432 bytes)
pkts bytes target prot opt in out source destination
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
Chain DOCKER-ISOLATION (1 references)
pkts bytes target prot opt in out source destination
0 0 DROP all -- br-119bb8423775 docker0 anywhere anywhere
0 0 DROP all -- docker0 br-119bb8423775 anywhere anywhere
0 0 DROP all -- any br-22abcb2ef140 !172.18.0.0/16 anywhere
0 0 DROP all -- br-22abcb2ef140 any anywhere !172.18.0.0/16
0 0 RETURN all -- any any anywhere anywhere
$sudo iptables -t nat -L -v
Chain POSTROUTING (policy ACCEPT 2 packets, 138 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- any !br-119bb8423775 172.19.0.0/16 anywhere
0 0 MASQUERADE all -- any !docker0 172.17.0.0/16 anywhere
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- br-119bb8423775 any anywhere anywhere
0 0 RETURN all -- docker0 any anywhere anywhere
br-22abcb2ef140
is the bridge for demo_internal
in my environment. The bridgename follows the syntax
br-$(network-id)
. The network id can be obtained bydocker network ls
command or docker network inspect [network-name]
. The same appliesfor
demo_net
with its bridge br-119bb8423775
.We can see that two new networks has been created. Also, there’s one new rule in
the
FORWARD
chain, and there are two new rules in the DOCKER-ISOLATION
chain.In addition, there’s one rule added in the
POSTROUTING
chain and one in the DOCKER
(PREROUTING) chainfor
demo_net
. This mean that demo_net
has NAT capabilities to communicate to the outside world,but
demo_internal
has not. DOCKER-ISOLATION
chain isolates completelythe
demo_internal
network, but for docker_net
means there’s no communication to the docker0 bridge. In theFORWARD chain we see that both networks have accepted connections to communicate
between containers. In the case of
demo_net
it also has communication to theoutside world.
We can attach/dettach running containers to several networks. At this moment, both
networks are down, because there’s now device connected to the bridge.
$ip link show
13: br-22abcb2ef140: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:0b:7a:5f:f6 brd ff:ff:ff:ff:ff:ff
15: br-119bb8423775: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:89:3b:e5:2e brd ff:ff:ff:ff:ff:ff
To connect the containers to the different networks use docker network connect [network_name] [container]
. $docker network connect bridge web_in_all
$docker network connect demo_internal web_in_all
$docker network connect demo_internal c_in_internal
$docker network connect demo_net web_in_all
$docker network connect demo_net c_in_net
$docker network disconnect bridge c_in_internal
$docker network disconnect bridge c_in_internal
# Inspecting the networks
# Output has been cut
$docker network inspect bridge
[
{
"Name": "bridge",
"Id": "b04143660337bc477e4d2b420f3a799340ae2a40c9585ceb81410bf630b6a49c",
"Driver": "bridge",
"Internal": false,
"Containers": {
"270dca5b1f6715487b9b8471bdd2c1490c89defb7f311f3d5ccbd03e59687c5e": {
"Name": "web_in_all",
"EndpointID": "32147fecafb40f926d3193f4a0d61e628530b71c7c2006523e70a27d36fddfd1",
"MacAddress": "02:42:ac:11:00:02",
"IPv4Address": "172.17.0.2/16",
"IPv6Address": ""
},
"81533d0311f8d634ce84f6da6a13d0745c14a6c13150750f6563674584259175": {
"Name": "c_in_bridge",
"EndpointID": "d459ffd8177b85b5de687f5fd48e2a1b8c8b2c991af213b24f23532f64bd3ee6",
"MacAddress": "02:42:ac:11:00:05",
"IPv4Address": "172.17.0.5/16",
"IPv6Address": ""
},
},
}
]
$ docker network inspect demo_internal
[
{
"Name": "demo_internal",
"Id": "22abcb2ef14008d075290e75bf4f25463c5849617cc736c8b0196de0cabcb86f",
"Driver": "bridge",
"Internal": true,
"Attachable": false,
"Containers": {
"270dca5b1f6715487b9b8471bdd2c1490c89defb7f311f3d5ccbd03e59687c5e": {
"Name": "web_in_all",
"EndpointID": "9e22fbea0191ee648f4f4553d995fc3d10e20d821200efd01b248c77c9592c45",
"MacAddress": "02:42:ac:12:00:02",
"IPv4Address": "172.18.0.2/16",
"IPv6Address": ""
},
"56a0bb976ced986f3b7889f5dbcc5f5a3449f61b858b1857103b5c2287b5c40e": {
"Name": "c_in_internal",
"EndpointID": "1a8300e23320560e64bf984b08e98810427a88ae9b25d1c1bccce86876e3ddad",
"MacAddress": "02:42:ac:12:00:03",
"IPv4Address": "172.18.0.3/16",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.kbridge.enable_icc": "false"
},
}
]
$ docker network inspect demo_net
[
{
"Name": "demo_net",
"Id": "119bb8423775647a339c3c4cfbb29f30c4c99ad1e66a0606b2799c7c6bf840a4",
"Internal": false,
"Containers": {
"270dca5b1f6715487b9b8471bdd2c1490c89defb7f311f3d5ccbd03e59687c5e": {
"Name": "web_in_all",
"EndpointID": "4c69c773b612af883268b91037f2bc2280c4f01f8bb01f86b06635b479595e1e",
"MacAddress": "02:42:ac:13:00:02",
"IPv4Address": "172.19.0.2/16",
"IPv6Address": ""
},
"a9294f31b168363cd99e3ec1a4b5a125d06dabfa2422f84fcc66b57033bc556c": {
"Name": "c_in_net",
"EndpointID": "b651d1ba56524e10302a716930e6da42b1997bdd7b371a276dff6cd2fc5b5ca3",
"MacAddress": "02:42:ac:13:00:03",
"IPv4Address": "172.19.0.3/16",
"IPv6Address": ""
}
},
"Options": {},
}
]
web_in_all
is the only container connected to all the networks. $ ip route show
default via 192.168.122.1 dev ens3
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.18.0.0/16 dev br-22abcb2ef140 proto kernel scope link src 172.18.0.1
172.19.0.0/16 dev br-119bb8423775 proto kernel scope link src 172.19.0.1
192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.194
$ ip link show
13: br-22abcb2ef140: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:0b:7a:5f:f6 brd ff:ff:ff:ff:ff:ff
15: br-119bb8423775: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:89:3b:e5:2e brd ff:ff:ff:ff:ff:ff
Both bridges now are up after we connect containers to them. There are alsotwo rules added to the routing table as directly connected.
# Exploring demo_internal
$ docker inspect -f '{{.NetworkSettings.Networks.demo_internal.IPAddress}}' web_in_all
172.18.0.2
$ docker inspect -f '{{.NetworkSettings.Networks.demo_internal.IPAddress}}' c_in_internal
172.18.0.3
$ docker exec c_in_internal ping 172.18.0.1 -c 2
PING 172.18.0.1 (172.18.0.1): 56 data bytes
64 bytes from 172.18.0.1: seq=0 ttl=64 time=0.111 ms
64 bytes from 172.18.0.1: seq=1 ttl=64 time=0.295 ms
$ docker exec c_in_internal ping 172.18.0.3 -c 2
PING 172.18.0.3 (172.18.0.3): 56 data bytes
64 bytes from 172.18.0.3: seq=0 ttl=64 time=0.102 ms
64 bytes from 172.18.0.3: seq=1 ttl=64 time=0.272 ms
$ docker exec c_in_internal ping web_in_all -c 2
PING web_in_all (172.18.0.2): 56 data bytes
64 bytes from 172.18.0.2: seq=0 ttl=64 time=0.099 ms
64 bytes from 172.18.0.2: seq=1 ttl=64 time=0.618 ms
$ docker exec c_in_internal ping example.com -c 2
ping: bad address 'example.com'
Verifying connectivity in the demo_internal network we notice connection betweencontainer even though inter-container communication has been disabled. NICE BUG YOU
HAVE THERE DOCKER. In addition, in a user-defined network we can use the container
name just fine to communicate between containers. As
demo_internal
is an internalnetwork, there’s no connection to the outside world.
# Exploring demo_internal
$ docker inspect -f '{{.NetworkSettings.Networks.demo_net.IPAddress}}' c_in_net
172.19.0.3
$ docker inspect -f '{{.NetworkSettings.Networks.demo_net.IPAddress}}' web_in_all
172.19.0.2
$ docker exec c_in_net ping 172.19.0.1 -c 2
PING 172.19.0.1 (172.19.0.1): 56 data bytes
64 bytes from 172.19.0.1: seq=0 ttl=64 time=0.115 ms
64 bytes from 172.19.0.1: seq=1 ttl=64 time=0.269 ms
$ docker exec c_in_net ping 172.19.0.2 -c 2
PING 172.19.0.2 (172.19.0.2): 56 data bytes
64 bytes from 172.19.0.2: seq=0 ttl=64 time=0.105 ms
64 bytes from 172.19.0.2: seq=1 ttl=64 time=0.374 ms
$ docker exec c_in_net ping web_in_all -c 2
PING web_in_all (172.19.0.2): 56 data bytes
64 bytes from 172.19.0.2: seq=0 ttl=64 time=0.059 ms
64 bytes from 172.19.0.2: seq=1 ttl=64 time=0.276 ms
$ docker exec c_in_net ping example.com -c 2
PING example.com (93.184.216.34): 56 data bytes
64 bytes from 93.184.216.34: seq=0 ttl=52 time=25.192 ms
64 bytes from 93.184.216.34: seq=1 ttl=52 time=29.856 ms
Two main differences of demo_net
between demo_internal
. First, it has inter-containercommunication enabled(so, no bug). Second,
demo_net
has connection to the outside world.Network Namespaces
Provides isolation of the system resources associated with networking: networkdevices, IPv4 and IPv6 protocol stacks, IP routing tables, firewalls,
the /proc/net directory, the /sys/class/net directory, port numbers (sockets),
and so on. This means that each network namespace has its own networking stack.
It uses a virtual device(veth) pair to create a tunnel
for communication between namespaces. It always comes in pair, with one end in
the root namespace and the other end in a namespace.
The device is created in whatever namespace is current in. If a device does not
belong to the current namespace, it becomes invisible.
In the case of docker containers, each container has their own network stack.
The network namespace is located in /proc/$pid/ns/ for each process:
$DOCKER_ID=`docker ps -aqf "name=web_in_all"`
/proc/`docker inspect --format='{{.State.Pid}}' ${DOCKER_ID}`/ns/net
To configure a network namespace by hand we’ll use the ip
command. $ sudo ip netns add demo_ns
$ sudo ip netns ls
demo_ns
$mount | grep demo_ns
nsfs on /run/netns/demo_ns type nsfs (rw)
nsfs on /run/netns/demo_ns type nsfs (rw)
We have created and listed a new namespace; however, ip
mounts a a virtualfilesystem named nsfs to keep
demo_ns
alive; otherwise, the namespace would have endedwhen
ip
command terminated executing without mounting the network namespace.Let’s create a veth pair device and send one of them to
demo_ns
$ sudo ip link add v-eth0 type veth peer name v-peer0
$ip link show
18: v-peer0@v-eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 26:46:1c:69:75:42 brd ff:ff:ff:ff:ff:ff
19: v-eth0@v-peer0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether aa:92:de:a0:77:a8 brd ff:ff:ff:ff:ff:ff
$sudo ip link set v-peer0 netns demo_ns
$ip link show
19: v-eth0@if18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether aa:92:de:a0:77:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 4
Notice that only v-eth0
remains in the global namespace. $ sudo ip addr add 10.100.0.1/24 dev v-eth0
$ sudo ip link set v-eth0 up
$ sudo ip netns exec demo_ns ip addr add 10.100.0.2/24 dev v-peer0
$ sudo ip netns exec demo_ns ip link set v-peer0 up
$ sudo ip netns exec demo_ns ip link set lo up
$ sudo ip netns exec demo_ns ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
18: v-peer0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 26:46:1c:69:75:42 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.100.2.0/24 scope global v-peer0
valid_lft forever preferred_lft forever
inet6 fe80::2446:1cff:fe69:7542/64 scope link
valid_lft forever preferred_lft forever
$ip addr show v-eth0
19: v-eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether aa:92:de:a0:77:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 4
inet 10.100.0.1/24 scope global v-eth0
valid_lft forever preferred_lft forever
inet6 fe80::a892:deff:fea0:77a8/64 scope link
valid_lft forever preferred_lft forever
$ ip route
default via 192.168.122.1 dev ens3
10.100.0.0/24 dev v-eth0 proto kernel scope link src 10.100.0.1
ip netns exec [net_ns]
permits executing commands inside a network namespace.In this case we have configure
v-eth0
in the global namespace and v-peer0
insidedemo_ns
. Also notice that the host creates a route for 10.100.0.0, because it’sdirected connected device. Iptables has no changes at all.
$ ping 10.100.0.2 -c 2
PING 10.100.0.2 (10.100.0.2) 56(84) bytes of data.
64 bytes from 10.100.0.2: icmp_seq=1 ttl=64 time=0.094 ms
64 bytes from 10.100.0.2: icmp_seq=2 ttl=64 time=0.057 ms
$ sudo ip netns exec demo_ns ping 10.100.0.1 -c 2
PING 10.100.0.1 (10.100.0.1) 56(84) bytes of data.
64 bytes from 10.100.0.1: icmp_seq=1 ttl=64 time=0.057 ms
64 bytes from 10.100.0.1: icmp_seq=2 ttl=64 time=0.064 ms
We have verified there’s connection inside the tunnel.Let’s get weird
First of all, we are going to linkweb_in_all
network namespace to /var/run/netns
,so it can be managed by the
ip
command. $DOCKER_ID=`docker ps -aqf "name=web_in_all"`
$sudo ln -s /proc/`docker inspect --format='{{.State.Pid}}' ${DOCKER_ID}`/ns/net /var/run/netns/${DOCKER_ID}
$sudo ip netns ls
270dca5b1f67 (id: 0)
demo_ns (id: 4)
Then, we are going to install and configure a Linux bridge named nsbr0
. Thisbridge will have two veth connected to it. One is a new veth pair which one end is
going to be in the
demo_ns
network namespace. The other veth pair will be atunnel connected to the
web_in_all
network namespace. $sudo apt install bridge-utils -y
$sudo ip link add v-eth1 type veth peer name v-peer-1
$sudo ip link add v-ethc type veth peer name v-peer-c
$sudo brctl addbr nsbr0
$ sudo brctl addif nsbr0 v-eth1
$ sudo brctl addif nsbr0 v-ethc
$ip link show
20: nsbr0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 26:bc:6a:17:76:9a brd ff:ff:ff:ff:ff:ff
23: v-peer-1@v-eth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 6a:56:f9:8a:22:c2 brd ff:ff:ff:ff:ff:ff
24: v-eth1@v-peer-1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop master nsbr0 state DOWN mode DEFAULT group default qlen 1000
link/ether b2:9f:3c:ba:0c:97 brd ff:ff:ff:ff:ff:ff
25: v-peer-c@v-ethc: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether a6:4c:ea:4e:f0:a1 brd ff:ff:ff:ff:ff:ff
26: v-ethc@v-peer-c: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop master nsbr0 state DOWN mode DEFAULT group default qlen 1000
$sudo ip link set v-peer-c netns ${DOCKER_ID}
$sudo ip link set v-peer-1 netns demo_ns
$sudo ip link set v-eth1 up
$sudo ip link set v-ethc up
$sudo ip link set nsbr0 up
$sudo ip netns exec ${DOCKER_ID} ip link set v-peer-c up
$sudo ip netns exec ${DOCKER_ID} ip addr add 10.200.0.3/24 dev v-peer-c
$sudo ip addr add 10.200.0.1/24 dev nsbr0
$sudo ip netns exec demo_ns ip addr add 10.200.0.2/24 dev v-peer-1
$sudo ip netns exec demo_ns ip link set v-peer-1 up
After configuring our new bridge and veths devices. Let’s explore how’s the viewin each network namespace:
$ip link show
20: nsbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 26:bc:6a:17:76:9a brd ff:ff:ff:ff:ff:ff
24: v-eth1@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nsbr0 state UP mode DEFAULT group default qlen 1000
link/ether b2:9f:3c:ba:0c:97 brd ff:ff:ff:ff:ff:ff link-netnsid 4
26: v-ethc@if25: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue master nsbr0 state UP mode DEFAULT group default qlen 1000
link/ether 26:bc:6a:17:76:9a brd ff:ff:ff:ff:ff:ff link-netnsid 0
$ ip route
10.200.0.0/24 dev nsbr0 proto kernel scope link src 10.200.0.1
Now we have v-eth1
and v-ethc
connected to nsbr0
. and ip route shows a new routefor
10.200.0.1
. $ sudo ip netns exec ${DOCKER_ID} ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
6: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
8: eth1@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
10: eth2@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:ac:13:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
25: v-peer-c@if26: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether a6:4c:ea:4e:f0:a1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
$ sudo ip netns exec ${DOCKER_ID} ip route
default via 172.17.0.1 dev eth0
10.200.0.0/24 dev v-peer-c proto kernel scope link src 10.200.0.3 linkdown
web_in_all
has a new veth and a new route for 10.200.0.0. $sudo iptables -t nat -A POSTROUTING -s 10.200.0.0/24 -o ens3 -j MASQUERADE
$sudo iptables -A FORWARD -o nsbr0 -j ACCEPT
$sudo iptables -A FORWARD -i nsbr0 -j ACCEPT
$sudo ip netns exec demo_ns ip route add default via 10.200.0.1
$ sudo ip netns exec demo_ns ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
18: v-peer0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 26:46:1c:69:75:42 brd ff:ff:ff:ff:ff:ff link-netnsid 0
23: v-peer-1@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 6a:56:f9:8a:22:c2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
$ sudo ip netns exec demo_ns ip route
default via 10.200.0.1 dev v-peer-1
10.100.0.0/24 dev v-peer0 proto kernel scope link src 10.100.0.2
10.200.0.0/24 dev v-peer-1 proto kernel scope link src 10.200.0.2
We added a rule in the POSTROUTING
chain to map demo_ns connections to nsbr0 ip address. Then,a default route has been added to demo_ns to route connections to the outside world.
Nevertheless, which dns resolver is using demo_ns?. Remember that a network namespace
just gives a new IP stack. A dns resolver is not part of the network namespace.
Well, when you execute
ip netns exec [net_ns]
you are still using the globalmount namespace, so the files you are using are the same as the root filesystem.
In other words,
/etc/resolv.conf
is the same for all the namespaces created withthe
ip netns
command. However, with ip netns
you might use a different resolvercreating a new file in /etc/netns/[netns_name]/resolv.conf for each network namespace.
$ echo '127.0.0.1 mytest' | sudo tee -a /etc/hosts
127.0.0.1 mytest
$ sudo ip netns exec demo_ns ping mytest -c 2
PING mytest (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.070 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.086 ms
As you can see, we hava added a new entry in /etc/hosts
. Pinging from demo_ns
resulted succesful for
mytest
. $sudo mkdir -p /etc/netns/demo_ns/
$echo '127.0.0.1 myns' | sudo tee -a /etc/netns/demo_ns/hosts
$ sudo ip netns exec demo_ns ping myns -c 2
PING myns (127.0.0.1) 56(84) bytes of data.
64 bytes from myns (127.0.0.1): icmp_seq=1 ttl=64 time=0.072 ms
64 bytes from myns (127.0.0.1): icmp_seq=2 ttl=64 time=0.093 m
$ sudo ip netns exec demo_ns ping mytest -c 2
ping: unknown host mytest
If we add a new resolver for the demo_ns
we notice that mytest
is no longerreachable, but
myns
is. Just remember that /etc/netns/[netns_name]/
only workswith the
ip netns
command.What about giving web_in_all exit to the outside world by nsbr0.
# Some output has been cut
$sudo ip netns exec ${DOCKER_ID} ip route add 93.184.216.34/32 via 10.200.0.1
$sudo ip netns exec ${DOCKER_ID} ip route
default via 172.17.0.1 dev eth0
10.200.0.0/24 dev v-peer-c proto kernel scope link src 10.200.0.3
93.184.216.34 via 10.200.0.1 dev v-peer-c
$sudo ip netns exec ${DOCKER_ID} ip route get 93.184.216.34
93.184.216.34 via 10.200.0.1 dev v-peer-c src 10.200.0.3
cache
$sudo ip netns exec ${DOCKER_ID} ping 93.184.216.34 -c 2
PING 93.184.216.34 (93.184.216.34) 56(84) bytes of data.
64 bytes from 93.184.216.34: icmp_seq=1 ttl=52 time=22.9 ms
64 bytes from 93.184.216.34: icmp_seq=2 ttl=52 time=33.4 ms
$sudo ip netns exec ${DOCKER_ID} ping example.com -c 2
PING example.com (93.184.216.34) 56(84) bytes of data.
64 bytes from 93.184.216.34: icmp_seq=1 ttl=52 time=53.1 ms
64 bytes from 93.184.216.34: icmp_seq=2 ttl=52 time=26.5 ms
We added a static route for example.com
and we verified that is using nsbr0
as their gateway with
ip route get
command.which resolver is using
web_in_all
? As it uses its own mount namespace, it usesthe resolver configured by docker, but you can use a different resolver if you connect
to the
web_in_all
network namespace using ip netns
command and follow correctnetwork configurations.
Conclusion
- It’s better to use user-defined networks to segment, organize and isolate containers.
- You can still find bugs in Docker, some of them can be confusing.
- Network namespace provides isolation for network resources.
- Network namespace is not only used by Docker, but is fundamental in kubernetes,
openstack and many others.
Bibliography
[1] http://stackoverflow.com/a/34497614/3621080[2] https://github.com/moby/moby/issues/20224
[3] namespaces(7)
[4] https://github.com/torvalds/linux/blob/master/fs/nsfs.c
[5] ip-netns(8)
[6] ip-route(8)