sábado, 20 de mayo de 2017

Docker Networking + Network Namespaces

Untitled Document.md

Docker Networking

This is a write-up of my docker networking demo at the Docker Orlando meetup.
Let’s begin with the diagram.
We can see in figure 1 a server which can be a physical device or a vm(cloud or not)
connected to the Internet by an interface ens3 as its gateway interface - ens3
because kvm configures it by default.
Then we have four Linux bridges. One is docker0 which is the default linux bridge
configured when Docker is set up. It has connection to the outside world, and it has
inter-container communication enabled by default. Then we have two network segments managed by Docker
which are user-defined networks. These bridges are also Linux bridges. One of these
bridges(demo_internal) is an internal network which means it doesn’t have connection to the outside, and
the other one(demo_net) has connection to the outside world. Finally, there’s
demo_ns which is only a network namespace and NOT A VIRTUAL SERVER. This
segment has two virtual ethernet interfaces which their final end are connected
to a Linux bridge nsbr0 and to the server respectively.

Docker0

docker0 is just a Linux bridge with no modifications whatsoever managed by the
docker engine. It gives the subnet 172.17.0.0/16, so you can fire up plenty of containers
to play with. An interesting observation is to see if the host’s mac address table
can map as much containers as the network segment valid hosts.
The docker0 is part of the docker default networks. Docker supports three types
of networks: bridge(docker0), none, and host.
  • Bridge is just a Linux bridge where all the containers if no network is specified
    are allocated to it. The bridge network is customisable, but the docker daemon needs
    to be restarted. Options of this bridge can be found in docker network inspect bridge.
Options Values
com.docker.network.bridge.default_bridge true or false
com.docker.network.bridge.enable_icc true or false
com.docker.network.bridge.enable_ip_masquerade true or false
com.docker.network.bridge.host_binding_ipv4 ipv4 to bind
com.docker.network.bridge.name bridge name
com.docker.network.driver.mtu mtu
  • None disables network capabilities to containers; in other words, it’s attached to itself.
  • Host adds a container on the host’s network stack.
Let’s configure the docker0 bridge. As we said previously, this is pretty much configured
at installation time. But let’s take a look to the network environment before doing it.
    $iptables -L -v && sudo iptables -t nat -L -v
    Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination

    Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination

    Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination
    Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination

    Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination

    Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination

    Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination

    $ip addr show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 52:54:00:43:8b:a8 brd ff:ff:ff:ff:ff:ff
        inet 192.168.122.194/24 brd 192.168.122.255 scope global ens3
           valid_lft forever preferred_lft forever
        inet6 fe80::5054:ff:fe43:8ba8/64 scope link
           valid_lft forever preferred_lft forever

    $ip route show
    default via 192.168.122.1 dev ens3
    192.168.122.0/24 dev ens3  proto kernel  scope link  src 192.168.122.194
Iptables is accepting everything with no rules defined. Also, the routing table only
shows the directly connected interface and the default gateway rule. Last, there’s only
two interfaces.
Let’s install docker.
    $curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    $sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu  $(lsb_release -cs) stable"
    $sudo apt update && sudo apt install docker-ce -y

    # Output might be slightly different from yours
    $sudo iptables -L -v && sudo iptables -t nat -L -v
    Chain FORWARD (policy DROP 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination
        0     0 DOCKER-ISOLATION  all  --  any    any     anywhere             anywhere
        0     0 DOCKER     all  --  any    docker0  anywhere             anywhere
        0     0 ACCEPT     all  --  any    docker0  anywhere             anywhere             ctstate RELATED,ESTABLISHED
        0     0 ACCEPT     all  --  docker0 !docker0  anywhere             anywhere
        0     0 ACCEPT     all  --  docker0 docker0  anywhere             anywhere

    Chain DOCKER (1 references)
     pkts bytes target     prot opt in     out     source               destination

    Chain DOCKER-ISOLATION (1 references)
     pkts bytes target     prot opt in     out     source               destination
        0     0 RETURN     all  --  any    any     anywhere             anywhere


    Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination
        0     0 DOCKER     all  --  any    any     anywhere             anywhere             ADDRTYPE match dst-type LOCAL

    Chain OUTPUT (policy ACCEPT 2 packets, 137 bytes)
     pkts bytes target     prot opt in     out     source               destination
        0     0 DOCKER     all  --  any    any     anywhere            !localhost/8          ADDRTYPE match dst-type LOCAL

    Chain POSTROUTING (policy ACCEPT 2 packets, 137 bytes)
     pkts bytes target     prot opt in     out     source               destination
        0     0 MASQUERADE  all  --  any    !docker0  172.17.0.0/16        anywhere

    Chain DOCKER (2 references)
     pkts bytes target     prot opt in     out     source               destination
        0     0 RETURN     all  --  docker0 any     anywhere             anywhere

    $ip addr show
    3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
        link/ether 02:42:be:33:09:ba brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.1/16 scope global docker0
           valid_lft forever preferred_lft forever

    $ip route show
    172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown
First of all, Docker enables ip forwarding at the kernel level(echo 1 > /proc/sys/net/ipv4/ip_forward)
to have communication between internal and external hosts.In addition, it creates some
firewall rules for docker isolation to DROP or ACCEPT communication between
containers and other hosts throught the FORWARD chain.
It’s better to check this official explanation
from the docker site about container communication.
Let’s disable inter-container communication (--icc=false) and create some containers.
    $sudo systemctl stop docker
    $sudo dockerd --icc=false &

    $docker run -itd --name=c_in_bridge busybox
    $docker run -itd --name=c_in_internal busybox
    $docker run -itd --name=c_in_net busybox
    $docker run -itd --name=web_in_all httpd:2.4

    # Inspecting iptables and interfaces after docker is installed
    # Output has been cut
    $sudo iptables -L FORWARD -v

    $sudo iptables -t nat -L POSTROUTING
    Chain POSTROUTING (policy ACCEPT 72 packets, 4997 bytes)
     pkts bytes target     prot opt in     out     source               destination
        0     0 MASQUERADE  all  --  any    !docker0  172.17.0.0/16        anywhere

    $ sudo iptables -L FORWARD -v
    Chain FORWARD (policy DROP 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination
        0     0 DOCKER-ISOLATION  all  --  any    any     anywhere             anywhere
        0     0 DOCKER     all  --  any    docker0  anywhere             anywhere
        0     0 ACCEPT     all  --  any    docker0  anywhere             anywhere             ctstate RELATED,ESTABLISHED
        0     0 ACCEPT     all  --  docker0 !docker0  anywhere             anywhere
        0     0 DROP       all  --  docker0 docker0  anywhere             anywhere

    $ip addr show
    5: vethe4f578b@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
        link/ether 26:1f:8b:1a:7b:82 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet6 fe80::241f:8bff:fe1a:7b82/64 scope link
           valid_lft forever preferred_lft foreve
    7: veth896d794@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
        link/ether 3e:63:f6:3f:0e:62 brd ff:ff:ff:ff:ff:ff link-netnsid 1
        inet6 fe80::3c63:f6ff:fe3f:e62/64 scope link
           valid_lft forever preferred_lft forever
    9: vethf085771@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
        link/ether a6:c1:31:d7:f1:22 brd ff:ff:ff:ff:ff:ff link-netnsid 2
        inet6 fe80::a4c1:31ff:fed7:f122/64 scope link
           valid_lft forever preferred_lft forever
    11: veth690dd4c@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
        link/ether 6e:9a:80:a5:01:fc brd ff:ff:ff:ff:ff:ff link-netnsid 3
        inet6 fe80::6c9a:80ff:fea5:1fc/64 scope link
           valid_lft forever preferred_lft forever
Each time a container is run with a published port(web_in_all), docker inserts
a POSTROUTING rule to nat the pusblished ports from the host to the container.
Also, notice that the rule to communicate between container has been changed from DROP to ACCEPT
when docker is run with inter-container communication disabled. Furthermore, we see
four new virtual interfaces, one for each container. Each interface is connected
to the docker0 bridge in the global namepsace, and the other end is connected to
the container network namespace; however, vethXXXXX doesn’t say anything of
which interface belongs to which container on the host. Nonetheless, the following one liner can
help you identifying the interface index, because each veth pair is created sequentially.
    $DOCKER_ID=`docker ps -aqf "name=web_in_all"`
    $docker inspect --format='{{.State.Pid}}' $(DOCKER_ID) | xargs -I '{}' sudo nsenter -t '{}' -n ethtool -S eth0
    NIC statistics:
         peer_ifindex: 11
We can verify communication in docker0
    $ docker inspect -f '{{.NetworkSettings.Networks.bridge.IPAddress}}' c_in_bridge
    172.17.0.3

    $ docker inspect -f '{{.NetworkSettings.Networks.bridge.IPAddress}}' web_in_all
    172.17.0.2

    $ docker exec c_in_bridge ping 172.17.0.1 -c 2
    PING 172.17.0.1 (172.17.0.1): 56 data bytes
    64 bytes from 172.17.0.1: seq=0 ttl=64 time=0.604 ms
    64 bytes from 172.17.0.1: seq=1 ttl=64 time=0.367 ms

    $ docker exec c_in_bridge ping 172.17.0.2 -c 2
    PING 172.17.0.2 (172.17.0.2): 56 data bytes
    --- 172.17.0.2 ping statistics ---
    2 packets transmitted, 0 packets received, 100% packet loss

    $ docker exec c_in_bridge ping web_in_all -c 2
    ping: bad address 'web_in_all'

    $ docker exec c_in_bridge ping example.com -c 2
    PING example.com (93.184.216.34): 56 data bytes
    64 bytes from 93.184.216.34: seq=0 ttl=52 time=25.110 ms
    64 bytes from 93.184.216.34: seq=1 ttl=52 time=31.700 ms
As docker0 is run with inter-container communication disabled, there’s no communication
between containers. Also, there’s no dns, so no resolution using other container names.
Communication to the outside world is permited.

User-defined Networks

In this doc we are not explaining about docker swarm(vxlan) at all. Please
refer to the official documentation
to learn more.
Docker provides something called user-defined networks which are Linux bridges with
DNS resolution without having to configure your own DNS server. It has several
network drivers: bridge, overlay, macvlan; and supports network plugins to build
your own network driver. Also, It permits to connect several containers to
different networks(including the docker0 network). As you can connect a container
to several networks, its external connectivity is provided via the first
non-internal network, in lexical order.
Docker has an option for linking containers in docker0, with user-defined
network I don’t see a reason to keep using it. Also, linking is not supported
in user-defined networks.
Now, let’s create the docker networks.
    $docker network create -o "com.docker.network.kbridge.enable_icc=false" --internal demo_internal

    $docker network create demo_net

    $docker network ls
    NETWORK ID          NAME                DRIVER              SCOPE
    77283fff31b2        bridge              bridge              local
    22abcb2ef140        demo_internal       bridge              local
    119bb8423775        demo_net            bridge              local
    e2dfddfecaaa        host                host                local
    11899b8b3162        none                null                local

    # Inspecting iptables after creating the networks 
    # Output has been cut
    $ sudo iptables -L -v
    Chain FORWARD (policy DROP 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination
        0     0 DOCKER-ISOLATION  all  --  any    any     anywhere             anywhere
        0     0 DOCKER     all  --  any    docker0  anywhere             anywhere
        0     0 ACCEPT     all  --  any    docker0  anywhere             anywhere             ctstate RELATED,ESTABLISHED
        0     0 ACCEPT     all  --  docker0 !docker0  anywhere             anywhere
        0     0 DOCKER     all  --  any    br-119bb8423775  anywhere             anywhere
        0     0 ACCEPT     all  --  any    br-119bb8423775  anywhere             anywhere             ctstate RELATED,ESTABLISHED
        0     0 ACCEPT     all  --  br-119bb8423775 !br-119bb8423775  anywhere             anywhere
        0     0 ACCEPT     all  --  br-119bb8423775 br-119bb8423775  anywhere             anywhere
        0     0 ACCEPT     all  --  br-22abcb2ef140 br-22abcb2ef140  anywhere             anywhere
        0     0 DROP       all  --  docker0 docker0  anywhere             anywhere

    Chain OUTPUT (policy ACCEPT 158 packets, 19432 bytes)
     pkts bytes target     prot opt in     out     source               destination

    Chain DOCKER (2 references)
     pkts bytes target     prot opt in     out     source               destination

    Chain DOCKER-ISOLATION (1 references)
     pkts bytes target     prot opt in     out     source               destination
        0     0 DROP       all  --  br-119bb8423775 docker0  anywhere             anywhere
        0     0 DROP       all  --  docker0 br-119bb8423775  anywhere             anywhere
        0     0 DROP       all  --  any    br-22abcb2ef140 !172.18.0.0/16        anywhere
        0     0 DROP       all  --  br-22abcb2ef140 any     anywhere            !172.18.0.0/16
        0     0 RETURN     all  --  any    any     anywhere             anywhere

    $sudo iptables -t nat -L -v
    Chain POSTROUTING (policy ACCEPT 2 packets, 138 bytes)
     pkts bytes target     prot opt in     out     source               destination
        0     0 MASQUERADE  all  --  any    !br-119bb8423775  172.19.0.0/16        anywhere
        0     0 MASQUERADE  all  --  any    !docker0  172.17.0.0/16        anywhere

    Chain DOCKER (2 references)
     pkts bytes target     prot opt in     out     source               destination
        0     0 RETURN     all  --  br-119bb8423775 any     anywhere             anywhere
        0     0 RETURN     all  --  docker0 any     anywhere             anywhere
br-22abcb2ef140 is the bridge for demo_internal in my environment. The bridge
name follows the syntax br-$(network-id). The network id can be obtained by
docker network ls command or docker network inspect [network-name]. The same applies
for demo_net with its bridge br-119bb8423775.
We can see that two new networks has been created. Also, there’s one new rule in
the FORWARD chain, and there are two new rules in the DOCKER-ISOLATION chain.
In addition, there’s one rule added in the POSTROUTING chain and one in the DOCKER(PREROUTING) chain
for demo_net. This mean that demo_net has NAT capabilities to communicate to the outside world
,but demo_internal has not. DOCKER-ISOLATION chain isolates completely
the demo_internal network, but for docker_net means there’s no communication to the docker0 bridge. In the
FORWARD chain we see that both networks have accepted connections to communicate
between containers. In the case of demo_net it also has communication to the
outside world.
We can attach/dettach running containers to several networks. At this moment, both
networks are down, because there’s now device connected to the bridge.
    $ip link show
    13: br-22abcb2ef140: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
        link/ether 02:42:0b:7a:5f:f6 brd ff:ff:ff:ff:ff:ff
    15: br-119bb8423775: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
        link/ether 02:42:89:3b:e5:2e brd ff:ff:ff:ff:ff:ff
To connect the containers to the different networks use docker network connect [network_name] [container].
    $docker network connect bridge web_in_all
    $docker network connect demo_internal web_in_all
    $docker network connect demo_internal c_in_internal
    $docker network connect demo_net web_in_all
    $docker network connect demo_net c_in_net
    $docker network disconnect bridge c_in_internal
    $docker network disconnect bridge c_in_internal
    
    # Inspecting the networks
    # Output has been cut
    $docker network inspect bridge
    [
        {
            "Name": "bridge",
            "Id": "b04143660337bc477e4d2b420f3a799340ae2a40c9585ceb81410bf630b6a49c",
            "Driver": "bridge",
            "Internal": false,
            "Containers": {
                "270dca5b1f6715487b9b8471bdd2c1490c89defb7f311f3d5ccbd03e59687c5e": {
                    "Name": "web_in_all",
                    "EndpointID": "32147fecafb40f926d3193f4a0d61e628530b71c7c2006523e70a27d36fddfd1",
                    "MacAddress": "02:42:ac:11:00:02",
                    "IPv4Address": "172.17.0.2/16",
                    "IPv6Address": ""
                },
                "81533d0311f8d634ce84f6da6a13d0745c14a6c13150750f6563674584259175": {
                    "Name": "c_in_bridge",
                    "EndpointID": "d459ffd8177b85b5de687f5fd48e2a1b8c8b2c991af213b24f23532f64bd3ee6",
                    "MacAddress": "02:42:ac:11:00:05",
                    "IPv4Address": "172.17.0.5/16",
                    "IPv6Address": ""
                },
            },
        }
    ]

    $ docker network inspect demo_internal
    [
        {
            "Name": "demo_internal",
            "Id": "22abcb2ef14008d075290e75bf4f25463c5849617cc736c8b0196de0cabcb86f",
            "Driver": "bridge",
            "Internal": true,
            "Attachable": false,
            "Containers": {
                "270dca5b1f6715487b9b8471bdd2c1490c89defb7f311f3d5ccbd03e59687c5e": {
                    "Name": "web_in_all",
                    "EndpointID": "9e22fbea0191ee648f4f4553d995fc3d10e20d821200efd01b248c77c9592c45",
                    "MacAddress": "02:42:ac:12:00:02",
                    "IPv4Address": "172.18.0.2/16",
                    "IPv6Address": ""
                },
                "56a0bb976ced986f3b7889f5dbcc5f5a3449f61b858b1857103b5c2287b5c40e": {
                    "Name": "c_in_internal",
                    "EndpointID": "1a8300e23320560e64bf984b08e98810427a88ae9b25d1c1bccce86876e3ddad",
                    "MacAddress": "02:42:ac:12:00:03",
                    "IPv4Address": "172.18.0.3/16",
                    "IPv6Address": ""
                }
            },
            "Options": {
                "com.docker.network.kbridge.enable_icc": "false"
            },
        }
    ]

    $ docker network inspect demo_net
    [
        {
            "Name": "demo_net",
            "Id": "119bb8423775647a339c3c4cfbb29f30c4c99ad1e66a0606b2799c7c6bf840a4",
            "Internal": false,
            "Containers": {
                "270dca5b1f6715487b9b8471bdd2c1490c89defb7f311f3d5ccbd03e59687c5e": {
                    "Name": "web_in_all",
                    "EndpointID": "4c69c773b612af883268b91037f2bc2280c4f01f8bb01f86b06635b479595e1e",
                    "MacAddress": "02:42:ac:13:00:02",
                    "IPv4Address": "172.19.0.2/16",
                    "IPv6Address": ""
                },
                "a9294f31b168363cd99e3ec1a4b5a125d06dabfa2422f84fcc66b57033bc556c": {
                    "Name": "c_in_net",
                    "EndpointID": "b651d1ba56524e10302a716930e6da42b1997bdd7b371a276dff6cd2fc5b5ca3",
                    "MacAddress": "02:42:ac:13:00:03",
                    "IPv4Address": "172.19.0.3/16",
                    "IPv6Address": ""
                }
            },
            "Options": {},
        }
    ]
web_in_all is the only container connected to all the networks.
    $ ip route show
    default via 192.168.122.1 dev ens3
    172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1
    172.18.0.0/16 dev br-22abcb2ef140  proto kernel  scope link  src 172.18.0.1
    172.19.0.0/16 dev br-119bb8423775  proto kernel  scope link  src 172.19.0.1
    192.168.122.0/24 dev ens3  proto kernel  scope link  src 192.168.122.194

    $ ip link show
    13: br-22abcb2ef140: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
        link/ether 02:42:0b:7a:5f:f6 brd ff:ff:ff:ff:ff:ff
    15: br-119bb8423775: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
        link/ether 02:42:89:3b:e5:2e brd ff:ff:ff:ff:ff:ff
Both bridges now are up after we connect containers to them. There are also
two rules added to the routing table as directly connected.
    # Exploring demo_internal
    $ docker inspect -f '{{.NetworkSettings.Networks.demo_internal.IPAddress}}' web_in_all
    172.18.0.2

    $ docker inspect -f '{{.NetworkSettings.Networks.demo_internal.IPAddress}}' c_in_internal
    172.18.0.3

    $ docker exec c_in_internal ping 172.18.0.1 -c 2
    PING 172.18.0.1 (172.18.0.1): 56 data bytes
    64 bytes from 172.18.0.1: seq=0 ttl=64 time=0.111 ms
    64 bytes from 172.18.0.1: seq=1 ttl=64 time=0.295 ms

    $ docker exec c_in_internal ping 172.18.0.3 -c 2
    PING 172.18.0.3 (172.18.0.3): 56 data bytes
    64 bytes from 172.18.0.3: seq=0 ttl=64 time=0.102 ms
    64 bytes from 172.18.0.3: seq=1 ttl=64 time=0.272 ms

    $ docker exec c_in_internal ping web_in_all -c 2
    PING web_in_all (172.18.0.2): 56 data bytes
    64 bytes from 172.18.0.2: seq=0 ttl=64 time=0.099 ms
    64 bytes from 172.18.0.2: seq=1 ttl=64 time=0.618 ms

    $ docker exec c_in_internal ping example.com -c 2
    ping: bad address 'example.com'
Verifying connectivity in the demo_internal network we notice connection between
container even though inter-container communication has been disabled. NICE BUG YOU
HAVE THERE DOCKER. In addition, in a user-defined network we can use the container
name just fine to communicate between containers. As demo_internal is an internal
network, there’s no connection to the outside world.
    # Exploring demo_internal
    $ docker inspect -f '{{.NetworkSettings.Networks.demo_net.IPAddress}}' c_in_net
    172.19.0.3

    $ docker inspect -f '{{.NetworkSettings.Networks.demo_net.IPAddress}}' web_in_all
    172.19.0.2

    $ docker exec c_in_net ping 172.19.0.1 -c 2
    PING 172.19.0.1 (172.19.0.1): 56 data bytes
    64 bytes from 172.19.0.1: seq=0 ttl=64 time=0.115 ms
    64 bytes from 172.19.0.1: seq=1 ttl=64 time=0.269 ms

    $ docker exec c_in_net ping 172.19.0.2 -c 2
    PING 172.19.0.2 (172.19.0.2): 56 data bytes
    64 bytes from 172.19.0.2: seq=0 ttl=64 time=0.105 ms
    64 bytes from 172.19.0.2: seq=1 ttl=64 time=0.374 ms

    $ docker exec c_in_net ping web_in_all -c 2
    PING web_in_all (172.19.0.2): 56 data bytes
    64 bytes from 172.19.0.2: seq=0 ttl=64 time=0.059 ms
    64 bytes from 172.19.0.2: seq=1 ttl=64 time=0.276 ms

    $ docker exec c_in_net ping example.com -c 2
    PING example.com (93.184.216.34): 56 data bytes
    64 bytes from 93.184.216.34: seq=0 ttl=52 time=25.192 ms
    64 bytes from 93.184.216.34: seq=1 ttl=52 time=29.856 ms
Two main differences of demo_net between demo_internal. First, it has inter-container
communication enabled(so, no bug). Second, demo_net has connection to the outside world.

Network Namespaces

Provides isolation of the system resources associated with networking: network
devices, IPv4 and IPv6 protocol stacks, IP routing tables, firewalls,
the /proc/net directory, the /sys/class/net directory, port numbers (sockets),
and so on. This means that each network namespace has its own networking stack.
It uses a virtual device(veth) pair to create a tunnel
for communication between namespaces. It always comes in pair, with one end in
the root namespace and the other end in a namespace.
The device is created in whatever namespace is current in. If a device does not
belong to the current namespace, it becomes invisible.
In the case of docker containers, each container has their own network stack.
The network namespace is located in /proc/$pid/ns/ for each process:
    $DOCKER_ID=`docker ps -aqf "name=web_in_all"`
    /proc/`docker inspect --format='{{.State.Pid}}' ${DOCKER_ID}`/ns/net
To configure a network namespace by hand we’ll use the ip command.
    $ sudo ip netns add demo_ns

    $ sudo ip netns ls
    demo_ns

    $mount | grep demo_ns
    nsfs on /run/netns/demo_ns type nsfs (rw)
    nsfs on /run/netns/demo_ns type nsfs (rw)
We have created and listed a new namespace; however, ip mounts a a virtual
filesystem named nsfs to keep demo_ns alive; otherwise, the namespace would have ended
when ip command terminated executing without mounting the network namespace.
Let’s create a veth pair device and send one of them to demo_ns
    $ sudo ip link add v-eth0 type veth peer name v-peer0
    $ip link show
    18: v-peer0@v-eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 26:46:1c:69:75:42 brd ff:ff:ff:ff:ff:ff
    19: v-eth0@v-peer0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether aa:92:de:a0:77:a8 brd ff:ff:ff:ff:ff:ff

    $sudo ip link set v-peer0 netns demo_ns
    $ip link show
    19: v-eth0@if18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether aa:92:de:a0:77:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 4
Notice that only v-eth0 remains in the global namespace.
    $ sudo ip addr add 10.100.0.1/24 dev v-eth0
    $ sudo ip link set v-eth0 up
    $ sudo ip netns exec demo_ns ip addr add 10.100.0.2/24 dev v-peer0
    $ sudo ip netns exec demo_ns ip link set v-peer0 up
    $ sudo ip netns exec demo_ns ip link set lo up
    $ sudo ip netns exec demo_ns ip addr show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    18: v-peer0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
        link/ether 26:46:1c:69:75:42 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 10.100.2.0/24 scope global v-peer0
           valid_lft forever preferred_lft forever
        inet6 fe80::2446:1cff:fe69:7542/64 scope link
           valid_lft forever preferred_lft forever
    $ip addr show v-eth0
    19: v-eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
        link/ether aa:92:de:a0:77:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 4
        inet 10.100.0.1/24 scope global v-eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::a892:deff:fea0:77a8/64 scope link
           valid_lft forever preferred_lft forever
    $ ip route
    default via 192.168.122.1 dev ens3
    10.100.0.0/24 dev v-eth0  proto kernel  scope link  src 10.100.0.1
ip netns exec [net_ns] permits executing commands inside a network namespace.
In this case we have configure v-eth0 in the global namespace and v-peer0 inside
demo_ns. Also notice that the host creates a route for 10.100.0.0, because it’s
directed connected device. Iptables has no changes at all.
    $ ping 10.100.0.2 -c 2
    PING 10.100.0.2 (10.100.0.2) 56(84) bytes of data.
    64 bytes from 10.100.0.2: icmp_seq=1 ttl=64 time=0.094 ms
    64 bytes from 10.100.0.2: icmp_seq=2 ttl=64 time=0.057 ms

    $ sudo ip netns exec demo_ns ping 10.100.0.1 -c 2
    PING 10.100.0.1 (10.100.0.1) 56(84) bytes of data.
    64 bytes from 10.100.0.1: icmp_seq=1 ttl=64 time=0.057 ms
    64 bytes from 10.100.0.1: icmp_seq=2 ttl=64 time=0.064 ms
We have verified there’s connection inside the tunnel.

Let’s get weird

First of all, we are going to link web_in_all network namespace to /var/run/netns,
so it can be managed by the ip command.
    $DOCKER_ID=`docker ps -aqf "name=web_in_all"`
    $sudo ln -s /proc/`docker inspect --format='{{.State.Pid}}' ${DOCKER_ID}`/ns/net /var/run/netns/${DOCKER_ID}
    $sudo ip netns ls
    270dca5b1f67 (id: 0)
    demo_ns (id: 4)
Then, we are going to install and configure a Linux bridge named nsbr0. This
bridge will have two veth connected to it. One is a new veth pair which one end is
going to be in the demo_ns network namespace. The other veth pair will be a
tunnel connected to the web_in_all network namespace.
    $sudo apt install bridge-utils -y

    $sudo ip link add v-eth1 type veth peer name v-peer-1
    $sudo ip link add v-ethc type veth peer name v-peer-c

    $sudo brctl addbr nsbr0
    $ sudo brctl addif nsbr0 v-eth1
    $ sudo brctl addif nsbr0 v-ethc

    $ip link show
    20: nsbr0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 26:bc:6a:17:76:9a brd ff:ff:ff:ff:ff:ff
    23: v-peer-1@v-eth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 6a:56:f9:8a:22:c2 brd ff:ff:ff:ff:ff:ff
    24: v-eth1@v-peer-1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop master nsbr0 state DOWN mode DEFAULT group default qlen 1000
        link/ether b2:9f:3c:ba:0c:97 brd ff:ff:ff:ff:ff:ff
    25: v-peer-c@v-ethc: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether a6:4c:ea:4e:f0:a1 brd ff:ff:ff:ff:ff:ff
    26: v-ethc@v-peer-c: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop master nsbr0 state DOWN mode DEFAULT group default qlen 1000

    $sudo ip link set v-peer-c netns ${DOCKER_ID}
    $sudo ip link set v-peer-1 netns demo_ns
    $sudo ip link set v-eth1 up
    $sudo ip link set v-ethc up
    $sudo ip link set nsbr0 up
    $sudo ip netns exec ${DOCKER_ID} ip link set v-peer-c up
    $sudo ip netns exec ${DOCKER_ID} ip addr add 10.200.0.3/24 dev v-peer-c
    $sudo ip addr add 10.200.0.1/24 dev nsbr0
    $sudo ip netns exec demo_ns ip addr add 10.200.0.2/24 dev v-peer-1
    $sudo ip netns exec demo_ns ip link set v-peer-1 up
After configuring our new bridge and veths devices. Let’s explore how’s the view
in each network namespace:
    $ip link show 
    20: nsbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
        link/ether 26:bc:6a:17:76:9a brd ff:ff:ff:ff:ff:ff
    24: v-eth1@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nsbr0 state UP mode DEFAULT group default qlen 1000
        link/ether b2:9f:3c:ba:0c:97 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    26: v-ethc@if25: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue master nsbr0 state UP mode DEFAULT group default qlen 1000
        link/ether 26:bc:6a:17:76:9a brd ff:ff:ff:ff:ff:ff link-netnsid 0

    $ ip route
    10.200.0.0/24 dev nsbr0  proto kernel  scope link  src 10.200.0.1
Now we have v-eth1 and v-ethc connected to nsbr0. and ip route shows a new route
for 10.200.0.1.
    $ sudo ip netns exec ${DOCKER_ID} ip link show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    6: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
        link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    8: eth1@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
        link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    10: eth2@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
        link/ether 02:42:ac:13:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    25: v-peer-c@if26: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
        link/ether a6:4c:ea:4e:f0:a1 brd ff:ff:ff:ff:ff:ff link-netnsid 0

    $ sudo ip netns exec ${DOCKER_ID} ip route
    default via 172.17.0.1 dev eth0
    10.200.0.0/24 dev v-peer-c  proto kernel  scope link  src 10.200.0.3 linkdown
web_in_all has a new veth and a new route for 10.200.0.0.
    $sudo iptables -t nat -A POSTROUTING -s 10.200.0.0/24 -o ens3 -j MASQUERADE
    $sudo iptables -A FORWARD -o nsbr0 -j ACCEPT
    $sudo iptables -A FORWARD -i nsbr0 -j ACCEPT
    $sudo ip netns exec demo_ns ip route add default via 10.200.0.1

    $ sudo ip netns exec demo_ns  ip link show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    18: v-peer0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
        link/ether 26:46:1c:69:75:42 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    23: v-peer-1@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
        link/ether 6a:56:f9:8a:22:c2 brd ff:ff:ff:ff:ff:ff link-netnsid 0

    $ sudo ip netns exec demo_ns ip route
    default via 10.200.0.1 dev v-peer-1
    10.100.0.0/24 dev v-peer0  proto kernel  scope link  src 10.100.0.2
    10.200.0.0/24 dev v-peer-1  proto kernel  scope link  src 10.200.0.2
We added a rule in the POSTROUTING chain to map demo_ns connections to nsbr0 ip address. Then,
a default route has been added to demo_ns to route connections to the outside world.
Nevertheless, which dns resolver is using demo_ns?. Remember that a network namespace
just gives a new IP stack. A dns resolver is not part of the network namespace.
Well, when you execute ip netns exec [net_ns] you are still using the global
mount namespace, so the files you are using are the same as the root filesystem.
In other words, /etc/resolv.conf is the same for all the namespaces created with
the ip netns command. However, with ip netns you might use a different resolver
creating a new file in /etc/netns/[netns_name]/resolv.conf for each network namespace.
    $ echo '127.0.0.1 mytest' | sudo tee -a /etc/hosts
    127.0.0.1 mytest
    $ sudo ip netns exec demo_ns ping mytest -c 2
    PING mytest (127.0.0.1) 56(84) bytes of data.
    64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.070 ms
    64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.086 ms
As you can see, we hava added a new entry in /etc/hosts. Pinging from demo_ns
resulted succesful for mytest.
    $sudo mkdir -p /etc/netns/demo_ns/
    $echo '127.0.0.1 myns' | sudo tee -a /etc/netns/demo_ns/hosts

    $ sudo ip netns exec demo_ns ping myns -c 2
    PING myns (127.0.0.1) 56(84) bytes of data.
    64 bytes from myns (127.0.0.1): icmp_seq=1 ttl=64 time=0.072 ms
    64 bytes from myns (127.0.0.1): icmp_seq=2 ttl=64 time=0.093 m

    $ sudo ip netns exec demo_ns ping mytest -c 2
    ping: unknown host mytest
If we add a new resolver for the demo_ns we notice that mytest is no longer
reachable, but myns is. Just remember that /etc/netns/[netns_name]/ only works
with the ip netns command.
What about giving web_in_all exit to the outside world by nsbr0.
    # Some output has been cut

    $sudo ip netns exec ${DOCKER_ID} ip route add  93.184.216.34/32 via 10.200.0.1

    $sudo ip netns exec ${DOCKER_ID} ip route
    default via 172.17.0.1 dev eth0
    10.200.0.0/24 dev v-peer-c  proto kernel  scope link  src 10.200.0.3
    93.184.216.34 via 10.200.0.1 dev v-peer-c

    $sudo ip netns exec ${DOCKER_ID} ip route get 93.184.216.34
    93.184.216.34 via 10.200.0.1 dev v-peer-c  src 10.200.0.3
        cache

    $sudo ip netns exec ${DOCKER_ID} ping 93.184.216.34 -c 2
    PING 93.184.216.34 (93.184.216.34) 56(84) bytes of data.
    64 bytes from 93.184.216.34: icmp_seq=1 ttl=52 time=22.9 ms
    64 bytes from 93.184.216.34: icmp_seq=2 ttl=52 time=33.4 ms

    $sudo ip netns exec ${DOCKER_ID} ping example.com -c 2
    PING example.com (93.184.216.34) 56(84) bytes of data.
    64 bytes from 93.184.216.34: icmp_seq=1 ttl=52 time=53.1 ms
    64 bytes from 93.184.216.34: icmp_seq=2 ttl=52 time=26.5 ms
We added a static route for example.com and we verified that is using nsbr0
as their gateway with ip route get command.
which resolver is using web_in_all? As it uses its own mount namespace, it uses
the resolver configured by docker, but you can use a different resolver if you connect
to the web_in_all network namespace using ip netns command and follow correct
network configurations.

Conclusion

  • It’s better to use user-defined networks to segment, organize and isolate containers.
  • You can still find bugs in Docker, some of them can be confusing.
  • Network namespace provides isolation for network resources.
  • Network namespace is not only used by Docker, but is fundamental in kubernetes,
    openstack and many others.

Bibliography

[1] http://stackoverflow.com/a/34497614/3621080
[2] https://github.com/moby/moby/issues/20224
[3] namespaces(7)
[4] https://github.com/torvalds/linux/blob/master/fs/nsfs.c
[5] ip-netns(8)
[6] ip-route(8)