Linux Containers with OpenWrt

Linux Containers Part 2

Traffic

Virtual Network in the Palm of your hand

Although you can turn your Pi into an OpenWrt router, it never appealed to me since the Pi has so few (2) interfaces. But playing with LXD, and a transparent bridge access for the containers, it made sense that it might be useful. But after creating a server farm on a Raspberry Pi, I can see where there are those who would want to have a firewall in front of the servers to reduce the threat surface.

Docker attempts this, by fronting the containers with the dockerd daemon, but the networking is klugy at best. If you choose to go it on your own, and use Docker’s routing, you will quickly find yourself in the 90s where everything must be manually configured (address range, gateway addresses, static routes to get into and out of the container network). The other option is to use NAT44 and NAT66, which is just wrong, and results in a losing true peer to peer connectivity, limited server access (since only 1 can be on port 80 or 443), and the other host of brokenness of NAT.

OpenWrt is, on the other hand, a widely used open-source router software project, running on hundreds of different routers. It includes excellent IPv6 support, including DHCPv6-PD (prefix delegation for automatic addressing of the container network, plus route insertion), an easy to use Firewall web interface, and full routing protocol support (such as RIPng or OSPF) if needed.

Going Virtual

The goal is to create a virtual environment which not only has excellent network management of LXC, but also an easy to use router/firewall via the OpenWrt web inteface (called LuCI), all running on the Raspberry Pi (or any Linux machine).

Virtual router Network

Motivation

OpenWrt project does an excellent job of creating images for hundreds of routers. I wanted to take a generic existing image and make it work on LXD without recompiling, or building OpenWrt from source.

Additionally, I wanted it to run on a Raspberry Pi (ARM processor). Most implementations of OpenWrt in virtual environments run on x86 machines.

If you would rather build OpenWrt, please see the github project https://github.com/mikma/lxd-openwrt (x86 support only)

Installing LXD on the Raspberry Pi

Unfortunately the default Raspian image does not support name spaces or cgroups which are used to isolate the Linux Containers. Fortunately, there is a Ubuntu 18.04 image available for the Pi which does.

If you haven’t already installed LXD on your Raspberry Pi, please look at Linux Containers on the Pi blog post.

Creating a LXD Image

NOTE: Unless otherwise stated, all commands are run on the Raspberry Pi

Using lxc image import an image can pulled into LXD. The steps are:

  1. Download the OpenWrt rootfs tarball
  2. Create a metadata.yaml file, and place into a tar file
  3. Import the rootfs tarball and metadata tarball to create an image

Getting OpenWrt rootfs

The OpenWrt project not only provides squashfs and ext4 images, but also simple tar.gz files of the rootfs. The (almost) current release is 18.06.1, and I recommend starting with it.

The ARM-virt rootfs tarball can be found at OpenWrt

Download the OpenWrt 18.06.1 rootfs tarball for Arm.

Create a metadata.yaml file

Although the yaml file can contain quite a bit of information the minimum requirement is architecture and creation_date. Use your favourite editor to create a file named metadata.yaml

architecture: "armhf"
creation_date: 1544922658

The creation date is the current time (in seconds) since the unix epoch (1 Jan 1970). Easiest way to get this value it to find it on the web, such as the EpochConverter

Once the metadata.yaml file is created, tar it up and name it anything that makes sense to you.

tar cvf openwrt-meta.tar metadata.yaml

Import the image into LXD

Place both tar files (metadata & rootfs) in the same directory on the Raspberry Pi. And use the following command to import the image:

lxc image import openwrt-meta.tar default-root.tar.gz  --alias openwrt_armhf

Starting up Virtual OpenWrt

Unfortunately, the OpenWrt image won’t boot with the imported image. So a helper script has been developed to create devices in /dev before OpenWrt will boot properly.

The steps to get your virtual OpenWrt up and running are:

  1. Create the container
  2. Adjust some of the parameters of the container
  3. Download init.sh script from github
  4. Copy the init.sh script to /root on the image
  5. Log into the OpenWrt container and execute sh init.sh
  6. Validate that OpenWrt has completed booting

I use router as the name of the OpenWrt container

lxc init local:openwrt_armhf router
lxc config set router security.privileged true

In order for init.sh to run the mknod command the container must run as privileged.

Download init.sh from the OpenWrt-LXD open source project

The init.sh script is open source and resides on github. To download it on your Pi, use curl (you may have to install curl)

curl https://raw.githubusercontent.com/cvmiller/openwrt-lxd/master/init.sh > init.sh

After copying the script to the container, Log into the router container using the lxc exec command, and run the init.sh script.

lxc exec router sh
#
# sh init.sh

Managing the Virtual OpenWrt router

The LuCI web interface by default is blocked on the WAN interface. In order to manage the router from the outside, a firewall rule allowing web access from the WAN must be inserted. It is possible to edit the /etc/config/firewall file within the OpenWrt container and open port 80 for external management.

lxc exec router sh

# vi /etc/config/firewall

...
config rule                      
        option target 'ACCEPT'   
        option src 'wan'         
        option proto 'tcp'       
        option dest_port '80'    
        option name 'ext_web'                                   

Save the file and then restart the firewall within the OpenWrt container.

/etc/init.d/firewall restart

Now you should be able to point your web browser to the WAN address and login, password is blank.

Step back and admire work

Type exit to return to the Raspberry Pi prompt. By looking at some lxc output, we can see the virtual network up and running.

$ lxc ls
+---------+---------+------------------------+-----------------------------------------------+------------+-----------+
|  NAME   |  STATE  |          IPV4          |                     IPV6                      |    TYPE    | SNAPSHOTS |
+---------+---------+------------------------+-----------------------------------------------+------------+-----------+
| docker1 | RUNNING | 192.168.215.220 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe58:1ac9 (eth0)  | PERSISTENT | 0         |
|         |         | 172.17.0.1 (docker0)   | fd4b:7e4:111:0:216:3eff:fe58:1ac9 (eth0)      |            |           |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fe58:1ac9 (eth0)  |            |           |
+---------+---------+------------------------+-----------------------------------------------+------------+-----------+
| router  | RUNNING | 192.168.215.198 (eth1) | fd6a:c19d:b07:2084::1 (br-lan)                | PERSISTENT | 1         |
|         |         | 192.168.181.1 (br-lan) | fd6a:c19d:b07:2080::8d1 (eth1)                |            |           |
|         |         |                        | fd6a:c19d:b07:2080:216:3eff:fe72:44b6 (eth1)  |            |           |
|         |         |                        | fd4b:7e4:111::1 (br-lan)                      |            |           |
|         |         |                        | fd4b:7e4:111:0:216:3eff:fe72:44b6 (eth1)      |            |           |
|         |         |                        | 2001:db8:ebbd:2084::1 (br-lan)                |            |           |
|         |         |                        | 2001:db8:ebbd:2080::8d1 (eth1)                |            |           |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fe72:44b6 (eth1)  |            |           |
+---------+---------+------------------------+-----------------------------------------------+------------+-----------+
| www     | RUNNING | 192.168.181.158 (eth0) | fd6a:c19d:b07:2084:216:3eff:fe01:e0a3 (eth0)  | PERSISTENT | 0         |
|         |         |                        | fd4b:7e4:111:0:216:3eff:fe01:e0a3 (eth0)      |            |           |
|         |         |                        | fd42:dc68:dae9:28e9:216:3eff:fe01:e0a3 (eth0) |            |           |
|         |         |                        | 2001:db8:ebbd:2084:216:3eff:fe01:e0a3 (eth0)  |            |           |
+---------+---------+------------------------+-----------------------------------------------+------------+-----------+

The docker1 container is still running from Part 1, and still connected to the outside network br0. You can see this by the addressing assigned (both v4 and v6).

The router container (which is running OpenWrt) has both eth1 (aka WAN) and br-lan (aka LAN) interfaces. The br-lan interface is connected to the inside lxdbr0 virtual network. And OpenWrt routes between the two networks.

Limitations of Virtual OpenWrt

There are some limitations of the virtual OpenWrt. Please see the github project for the most current list. Most notably, ssh although it works, needs improving.

Address Stability

Because all of this is running on LXC, there is address stability. Not matter how many times you reboot the Raspberry Pi, or restart containers in different order, the addresses remain the same. This means the addresses above can be entered into your DNS server with out churn. Something Docker doesn’t provide.

Running a Virtual Network

LXC is the best at container customization, and virtual networking (IPv4 and IPv6). With LXCs flexibility, it is easy to create templates to scale up multiple applications (e.g. a webserver farm running in the palm of your hand). OpenWrt is one of the best Open source router projects, and now it can be run virtually as well. Now you have a server farm in the palm of your hand, with excellent IPv6 support and a firewall! Perhaps the Docker folks will take note.

Article originally appeared (with more detail) on www.makikiweb.com

Linux Containers with IPv6 GUAs on the Pi

Traffic
Server Farm in the Palm of your hand

I have recently been exploring Docker containers on SBCs (Small Board Computers), including the Raspberry Pi. The Docker eco-system is impressive in the amount of preconfigured containers that are available. However, as I have written before, it falls down on networking support, specifically the bolted-on-after-thought IPv6. The best one can do is NAT66 on IPv6, which just perpetuates the complexities (and evils) of NAT.

The biggest problem with the Docker IPv6 implementation is that it was an after thought. Unfortunately, this is not uncommon. Think of adding security after the fact, and you will quickly discover the poorly implemented security model. Docker is limited in this kind of after-thought thinking.

Linux Containers

Another container technology which can also run on SBCs is Linux Containers (LXC/LXD). LXC shares the host’s kernel and is lighter weight than traditional Virtual Machines. But each LXC Container is isolated via namespaces and control groups, so it appears to have its own network stack. And therefore is more flexible than Docker.

What is the difference LXC vs LXD

In this article I will treat LXC and LXD as LXC, but they are separate. LXC (or Linux Containers) existed first. Versions 1 & 2 created Virtual Machines. With version 3, LXD was added, which provides a daemon that allows easier image management, including publishing images and an API to control LXC on remote machines. LXD complements LXC by providing more features.

Qualifying the SBC OS for LXC/LXD

Unfortunately, the raspian kernel from raspberrypi.org doesn’t support namespaces.

Fortunately, there is a Ubuntu 18.04 unofficial image available for the Pi which does. This image is compressed and must be decompressed before flashed to a SD Card. Fortunately in Linux you can do this on the fly:

$ xzcat ubuntu-18.04-preinstalled-server-armhf+raspi3.img.xz | sudo dd  of=/dev/sdZ bs=100M

Change sdZ to the device of your SD Card. Be careful with this command, if you give it the device of your boot drive, it will happily overwrite your boot device.

If you are using Window or a Mac, I suggest using Etcher which makes creating bootable SD Cards easy.

Make sure you follow the steps on the Ubuntu page to set an initial password for the ubuntu user. Best Practices is to setup a non-privileged user which you will use most of the time. This can be done with the adduser command. Below I have created a user craig with sudo privileges.

$ sudo adduser --ingroup sudo craig

Preparing the LXC Host (aka the Pi)

The key networking difference between Docker and LXC is that with LXC one can attach a container to any bridge on the Host. This includes a bridge on the outside interface. Via transparent bridging the container can have unfettered access to the existing IPv6 subnet, including picking up Global Unique Addresses (GUAs) without the host having to do router-like functions, such as adding routes, auto propagation of prefixes (with DHCPv6-PD), redistribution of routes, etc. Again, things which Docker doesn’t support.

Setting up an external bridge interface on the Host

Once you have the right kernel and distro, configure a bridge br0 which will in-turn have the ethernet interface as a member. This is best done from the Pi itself using a keyboard and monitor, rather than ssh-ing to a headless device. Because when you mess up, you are still connected to the Pi (believe me, it is easy to get disconnected with all interfaces down). Logically the bridge, br0 will not only be attached to the eth0 interface, but later on, the LXC Containers as well.

External Bridge

Set up the bridge

1) Install brctl the utility which controls/creates linux bridges. And install the ifupdown package which will be used later.

sudo apt-get install bridge-utils ifupdown
  1. Edit the /etc/network/interfaces file to automatically set up the bridge br0 and attach the ethernet device. Add the following lines:
iface br0 inet dhcp
    bridge_ports eth0
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0
iface br0 inet6 dhcp

Because Ubuntu uses systemd we must let systemd know about the bridge, or the IPv6 default route will disappear after about 5 minutes (not good).

3) Create/Edit /etc/systemd/network/br0.network file, and add the following:

[Match]
Name=br0

[Network]
DHCP=yes

Lastly, in order to make this all work when the Pi is rebooted, we have to hack at /etc/rc.local a bit to make sure the bridge is brought up and systemd is minding it at boot up time.

4) Create/Edit /etc/rc.local and add the following, and don’t forget to make it executable.

#!/bin/bash
#
## put hacks here

# fix for br0 interface
/sbin/ifup br0
# kick networkd as well
/bin/systemctl restart systemd-networkd
echo "Bridge is up"
exit 0

Make it executable:

$ sudo chmod 754 /etc/rc.local

Finally, reboot, login and see that the Pi br0 network is up

$ ip addr
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
    link/ether b8:27:eb:6c:02:88 brd ff:ff:ff:ff:ff:ff
3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b8:27:eb:6c:02:88 brd ff:ff:ff:ff:ff:ff
    inet 192.168.215.141/24 brd 192.168.215.255 scope global dynamic br0
       valid_lft 1995525700sec preferred_lft 1995525700sec
    inet6 2001:db8:ebbd:2080::9c5/128 scope global noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 2001:db8:ebbd:2080:ba27:ebff:fe6c:288/64 scope global mngtmpaddr noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::ba27:ebff:fe6c:288/64 scope link 
       valid_lft forever preferred_lft forever

As you can see, br0 has all the IPv4 and IPv6 addresses which is what we want. Now you can go back to headless access (via ssh) if you are like me, and the Pi is usually just sitting on a shelf (with power and network).

Installing LXC/LXD

Once setting up the br0 interface is done, we can install lxd and lxd-client. Linux Containers has been evolving of the years, and it is now (as I write this) up to version 3.0.2.

A note about versions

There is quite a bit on the internet about older versions of Linux Containers. If you see hyphenated commands like lxc-launch then stop and move to another page. Hyphenated commands are the older version 1 or 2 of Linux Containers.

A quick tour of LXC/LXD

Canonical has a nice Try It page, where you can run LXC/LXD in the comfort of your web browser without installing anything on your local machine. The Try It sets up a VM which has IPv6 access to the outside world, where you can install and configure LXC/LXD, even create Linux Containers. It is well worth the 10 minutes to run through the hands on tutorial.

Doing the install

But wait! It is already installed on this image. Although it is version 3.0.0, and the easiest way to get it to the latest version is to run:

$ sudo apt-get update
$ sudo apt-get upgrade lxd lxd-client

Add yourself to the lxd group so you won’t have to type sudo all the time.

sudo usermod -aG lxd craig
newgrp lxd

LXD Init

The LXD init script sets up LXD on the machine with a set of interactive questions. It is safe to accept all the defaults (just press return):

$ sudo lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: 
Name of the storage backend to use (btrfs, dir, lvm) [default=btrfs]: 
Create a new BTRFS pool? (yes/no) [default=yes]: 
Would you like to use an existing block device? (yes/no) [default=no]: 
Size in GB of the new loop device (1GB minimum) [default=15GB]: 
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, "auto" or "none") [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, "auto" or "none") [default=auto]: 
Would you like LXD to be available over the network? (yes/no) [default=no]: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] no
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: no

On the Pi, LXD will take a while to think about all this, just be patient (might be 10 minutes or so).

Default LXD Networking

Since we took all the defaults of lxd init it created another bridge on the system lxdbr0 which the YAML file would lead you to believe it is also bridged to the outside world, but it is not. The default config is similar to Docker, in that it creates a lxdbr0 bridge which uses NAT44 and NAT66 to connect to the outside world.

But we don’t care, because we have created a bridge br0 which is transparently bridged to the outside world. And unlike Docker, individual LXC containers can be attached to any bridge (either br0 or if you want NAT, lxdbr0)

Create a profile for the external transparent bridge (br0)

There is one more thing we have to do before running the first Linux Container, create a profile for the br0 bridge. Edit the profile to match the info below:

lxc profile create extbridge
lxc profile edit extbridge
    config: {}
    description: bridged networking LXD profile
    devices:
      eth0:
        name: eth0
        nictype: bridged
        parent: br0
        type: nic
    name: extbridge
    used_by:

The Linux Container network is now ready to attach containers to the br0 bridge like this:

container network

You may notice the bottom LXC container with Docker, more on this later.

Running the first Linux Container

So now it is time to have fun by running the first container. I suggest Alpine Linux because it is small, and quick to load. To create and start the container type the following:

lxc launch -p default -p extbridge images:alpine/3.8 alpine

LXD will automatically download the Alpine Linux image from the Linux Containers image server, and create a container with the name alpine. We’ll use the name alpine to manage the container going forward.

Typing lxc ls will list the running containers

$ lxc ls
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
|  NAME   |  STATE  |          IPV4          |                     IPV6                     |    TYPE    | SNAPSHOTS |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
| alpine  | RUNNING | 192.168.215.104 (eth0) | fd6a:c19d:b07:2080:216:3eff:fecf:bef5 (eth0) | PERSISTENT | 0         |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fecf:bef5 (eth0) |            |           |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+

You will note that the container has not only a IPv4 address from my upstream DHCP server, but it also has an IPv6 GUA (and in this case, an additional IPv6 ULA, Unique Local Address).

YAML overlaying

The alpine container has a GUA because we used two -p (profile) parameters when creating it. The first is the default profile which as I mentioned earlier is set up for NAT4 and NAT6. And the second is the extbridge profile we setup as a profile. The lxc launch command pulls in the YAML info from the default profile, and then overlays the extbridge profile, effectively overwriting the parts we want so that the alpine container is attached to br0 and the outside world!

Stepping into Alpine

Of course, what good is starting a Linux Container if all you can do is start and stop it. A key difference from Docker is that Linux Containers are not read-only, but rather you can install software, configure it the way you like, and then stop the container. When you start it again, all the changes you made are still there. I’ll talk about the goodness of this a little later.

But in order to do that customization one needs to get inside the container. This is done with the following command:

$ lxc exec alpine -- /bin/sh
~ # 

And now you are inside the running container as root. Here you can do anything you can do on a normal linux machine, install software, add users, start sshd, so you can ssh to it later, and so on. When you are done customizing the container type:

~ # exit
craig@pai:~$ 

And you are back on the LXC Host.

Advantages of customizing a container

A key advantage of customizing a container, is that you can create a template image which then can be used to create many instances of that customized application. For example, I started with alpine installed nginx and php7 and created a template image, which I called web_image. I used the following commands on the host, after installing the webserver with PHP inside the container:

$ lxc snapshot alpine snapshot_web                   # Make a back up of the container
$ lxc publish alpine/snapshot_web --alias web_image  # publish the back up as an image
$ lxc image list                                     # show the list of images
+--------------+--------------+--------+--------------------------------------+--------+----------+-----------------------------+
|    ALIAS     | FINGERPRINT  | PUBLIC |             DESCRIPTION              |  ARCH  |   SIZE   |         UPLOAD DATE         |
+--------------+--------------+--------+--------------------------------------+--------+----------+-----------------------------+
| web_image    | 84a4b1f466ad | no     |                                      | armv7l | 12.86MB  | Dec 4, 2018 at 2:46am (UTC) |
+--------------+--------------+--------+--------------------------------------+--------+----------+-----------------------------+
|              | 49b522955166 | no     | Alpine 3.8 armhf (20181203_13:03)    | armv7l | 2.26MB   | Dec 3, 2018 at 5:11pm (UTC) |
+--------------+--------------+--------+--------------------------------------+--------+----------+-----------------------------+

Scaling up the template container

And with that webserver image, I can replicate it as many times as I have disk space and memory. I tried 10, but based on how much memory it was using, I  can get to twenty on the Pi, with room for more.

$ lxc ls
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
|  NAME  |  STATE  |          IPV4          |                     IPV6                     |    TYPE    | SNAPSHOTS |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| alpine | RUNNING | 192.168.215.104 (eth0) | fd6a:c19d:b07:2080:216:3eff:fecf:bef5 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fecf:bef5 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w10    | RUNNING | 192.168.215.225 (eth0) | fd6a:c19d:b07:2080:216:3eff:feb2:f03d (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:feb2:f03d (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w2     | RUNNING | 192.168.215.232 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe7f:b6a5 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe7f:b6a5 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w3     | RUNNING | 192.168.215.208 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe63:4544 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe63:4544 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w4     | RUNNING | 192.168.215.244 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe99:a784 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe99:a784 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w5     | RUNNING | 192.168.215.118 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe31:690e (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe31:690e (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w6     | RUNNING | 192.168.215.200 (eth0) | fd6a:c19d:b07:2080:216:3eff:fee2:8fc7 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fee2:8fc7 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w7     | RUNNING | 192.168.215.105 (eth0) | fd6a:c19d:b07:2080:216:3eff:feec:baf7 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:feec:baf7 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w8     | RUNNING | 192.168.215.196 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe90:10b2 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe90:10b2 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w9     | RUNNING | 192.168.215.148 (eth0) | fd6a:c19d:b07:2080:216:3eff:fee3:e5b2 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fee3:e5b2 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| web    | RUNNING | 192.168.215.110 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe29:7f8 (eth0)  | PERSISTENT | 1         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe29:7f8 (eth0)  |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+

All of the webservers have their own unique IPv6 address, and all of them are running on port 80, something that can’t be done using NAT.

LXC plays well with DNS

Unlike Docker, LXC containers retain the same IPv6 address after being start and stopped. And if you are starting multiple containers, the order of starting doesn’t change the address (as Docker does).

This means that you can assign names to your LXC Containers without a lot of DNS churn. Here’s a chunk from my DNS zone file:

lxcdebian   IN  AAAA    2001:db8:ebbd:2080:216:3eff:feae:a30
lxcalpine   IN  AAAA    2001:db8:ebbd:2080:216:3eff:fe4c:4ab2
lxcweb      IN  AAAA    2001:db8:ebbd:2080:216:3eff:fe29:7f8
lxcw2       IN  AAAA    2001:db8:ebbd:2080:216:3eff:fe7f:b6a5
lxcdocker1  IN  AAAA    2001:db8:ebbd:2080:216:3eff:fe58:1ac9

DNS is your friend when using IPv6. With DNS entries, I can point my web browser to the servers running on these containers. I can even ssh in to the container, just like any host on my network.

$ ssh -X craig@lxcdebian
craig@lxcdebian's password: **********
craig@debian:~$

Key differences between LXC and Docker

Here’s a chart to show the key differences between Docker and Linux Containers.

Docker LXC/LXD
Always Routed, and with NAT (default) Any Container can be routed or bridged
IP addressing depends on container start order Addressing is stable regardless of start order, and plays well with DNS
A read-only-like Container A read-write container, make it easy to customize, and templatize
No check on Architectures (x86, ARM) LXC automatically selects the correct architecture
Most Containers are IPv4-only Containers support IPv4 & IPv6
Containers see Docker NAT address in logs Containers log real source addresses
Many, many containers to choose from By comparison, there are only a handful of pre-built containers

A key advantage to Docker is the last one, The sheer number of Docker containers are amazing. But what if you could have the best of both worlds?

Linux Containers + Docker

Traffic
No limitations

Since LXC containers are customizable, and also since it is easy to make a template image and replicate containers based on that template, why not install Docker inside a LXC Container, and have the best of both worlds?

Actually it is quite easy to do. Start with an image that is a bit more flush than Alpine Linux, like Debian. I used Debian 10 (the next version of Debian). Create the container with:

lxc launch -p default -p extbridge images:debian/10 debian

Enable nesting feature which allows LXC to run containers inside of containers

lxc config set debian security.nesting true
lxc restart debian      # restart with nesting enabled

Step into the container and customize it by installing Docker

$ lxc exec debian -- /bin/bash

# make a content directory for Docker/nginx
mkdir -p /root/nginx/www
# create some content
echo "<h2> Testing </h2>" > /root/nginx/www/index.html

# install docker and start it
apt-get install docker.io
/etc/init.d/docker start

# pull down nginx Docker Container for armhf
docker create --name=nginx -v /root/nginx:/usr/share/nginx/html:ro -p 80:80 -p 443:443 armhfbuild/nginx
# start the docker container
docker restart nginx
exit
$ 

And now Docker nginx is up and running inside Docker, inside a LXC Container with dual-stack predictable addressing and transparent bridging.

Make a template of the LXC + Docker Container

Follow the earlier procedure to create an image which will be used to launch customized LXC + Docker containers:

lxc snapshot debian docker_base_image
lxc publish debian/docker_base_image --alias docker_image       # publish image to local:
lxc image list                                                  # see the list of images
# start a lxc/docker container called docker1
lxc launch -p default -p extbridge local:docker_image docker1
# set config to allow nesting for docker1
lxc config set docker1 security.nesting true
lxc restart docker1

Looking at the running LXC containers, it easy to spot the one running Docker (hint: look for the Docker NAT address).

 lxc ls
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
|  NAME   |  STATE  |          IPV4          |                     IPV6                     |    TYPE    | SNAPSHOTS |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
| alpine  | RUNNING | 192.168.215.104 (eth0) | fd6a:c19d:b07:2080:216:3eff:fecf:bef5 (eth0) | PERSISTENT | 0         |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fecf:bef5 (eth0) |            |           |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
| docker1 | RUNNING | 192.168.215.220 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe58:1ac9 (eth0) | PERSISTENT | 0         |
|         |         | 172.17.0.1 (docker0)   | 2001:db8:ebbd:2080:216:3eff:fe58:1ac9 (eth0) |            |           |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
| w2      | RUNNING | 192.168.215.232 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe7f:b6a5 (eth0) | PERSISTENT | 0         |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fe7f:b6a5 (eth0) |            |           |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
| web     | RUNNING | 192.168.215.110 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe29:7f8 (eth0)  | PERSISTENT | 1         |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fe29:7f8 (eth0)  |            |           |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+

After getting it started, it is easy to step into the LXC container docker1 and query Docker on its container:

lxc exec docker1 -- /bin/bash
root@docker1:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                                      NAMES
19f3cbeba6d3        linuxserver/nginx   "/init"             3 hours ago         Up 3 hours          0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   nginx
root@docker1:~#

Running multiple LXC + Docker containers

Now that there is a template image, docker_image it is a breeze to spawn multiple LXC + Docker Containers. Don’t want them all to run nginx webservers? Easy, step into each, delete the nginx webserver and run of of the other thousands of Docker Containers.

Best of both Worlds

LXC is the best at container customization, and networking (IPv4 and IPv6). Docker is the best in the sheer volume of pre-built Docker containers (assuming you select the correct architecture, armhf for the Pi). With LXCs flexibility, it is easy to create templates to scale up multiple applications (e.g. a webserver farm running in the palm of your hand). And with LXC, it is possible to over come many of Dockers limitations opening up the world of Docker Containers to the LXC world. The best of both worlds.


originally posted with more detail on www.makikiweb.com

What makes a good IPv6 implementation?


Bad IPv6 support costs Money

I have been working with Docker lately, and as cool as the container technology is, it was originally built without consideration for IPv6, and then IPv6 was bolted on later. Making supporting IPv6 full of expensive work-a-rounds.

But that got me thinking what makes a good IPv6 implementation? Of course this is my opinion, and you are free to toss in other criteria, so think of this as a thought starter.

Why is this important?

With 25% of the internet carried over IPv6 as of this writing, if you are developing a product which has a lifetime of 5 to 10 years, and you aren’t giving thought as to how you will support IPv6, then your product will:

  • A) fail, or
  • B) you will try to bolt on IPv6 on the side, or
  • C) have to be completely rewritten.

All of that costs money.

A good IPv6 device implementation

There are broad areas where IPv6 should work well.

Addressing

As much as I like the simplicity of SLAAC (Stateless Address Auto Config), there are certainly use cases where DHCPv6 is a better choice. A good implementation should:

  • Support both addressing methods, SLAAC, and DHCPv6
  • Be able to reestablish IPv6 GUA (Global Unique Address) once the device comes out of sleep/suspend or link down/up (systemd suffers this problem)
  • Play well with DNS. Very few of us enjoy typing IPv6 addresses, the implementation should have a stable IPv6 address which can be entered into DNS without requiring a lot of DNS churn.

Routing

IPv6 is not IPv4 with colons. There are somethings which are different for good reason.

  • Default routes are link-local addresses (Docker fails on this one big time). GUAs may change, link-locals shouldn’t.
  • Supports RA (Router Advertisement) fields, RDNSS (DNS server), and DNSSL (DNS domain search list). Not much use having an address if the host can’t resolve names
  • If the device is routing (such as Docker) then support DHCPv6-PD, and provide the option of prefix delegation into the container/downstream network.

Resiliency

Basic protection from network misconfiguration, or out right attacks makes the IPv6 device better prepared for production use.

  • Rational limit on the number of IPv6 addresses an interface may have. Before systemd, the Linux kernel defaulted to 16. This seemed like a good compromise. Back in systemd v232, it was possible to exhaust memory on an IPv6 host by feeding it Random RA addresses, creating a denial of service. FreeBSD v11.5 has a similar problem, where the system will add over 3000 IPv6 addresses, and the system will slow to a crawl.
  • Rational limit on the number of neighbours. IPv6 /64 networks are sparsely populated and therefore one shouldn’t have to expect to support all 16 Quintilian (2^64) neighbours. Something like 1000, or even 256 should be enough.
  • Don’t assume that the Linux Stack has your back. Since systemd has become widespread, there are many IPv6 systemd bugs, which weren’t there in the pre-systemd kernel days. IPv6 is a different stack, be sure to test it.

Summary

I am sure I missing a few, but this is a start. When developing a product, the business case for supporting IPv6 well, is that it will save you money in the long run, by not having to go back and try to bolt IPv6 on, or rewrite your network stack later.

P.S I wouldn’t recommend putting Docker into production because of the severe IPv6 limitations. I’ll be looking at LXC next.

Image: Yachts colliding: Creative Commons/Mark Pilbeam

 

Hi Neighbour!

Traffic

Mr. Rogers

Neighbour Discovery Protocol (NDP) is more than just an IPv6 version of ARP (Address Resolution Protocol). It really is Neighbour Discovery, handling Layer 2 MAC address resolution, but also router discovery, and even better path handling (Redirection).

ARP is a funny protocol, as it isn’t part of the IPv4 suite, but IPv4 will not work without it. The creators of IPv6, were looking for an inclusive method (e.g. part of the IPv6 protocol) to accomplish MAC Address resolution. They chose to use ICMPv6, which is a Layer 4 protocol.

But how does one use a Layer 4 (L4) protocol to resolve Layer 2 (L2) information (the MAC address) when both L2 and L3 information is needed before the packet requesting Address Resolution can be sent?

ND Addressing

To answer this question, some special L3 addresses had to be created. The first is the Link-Local address, a non-routable address limited to the scope of the link (think broadcast domain), and always start with FE80.

The second involves multicast. Since it was decided that broadcast wouldn’t be used in IPv6 (since every end-station must listen to broadcast), a few special multicast addresses would be created.

  • FF02::1 All Nodes Address (all nodes must listen to this address on the link)
  • FF02::2 All Routers Address (all routers must listen to this address)
  • FF02::1:FF:XX:YYZZ Solicited Node Address where the last 3 bytes are the last 3 bytes of the L3 destination address (each node must listen to their own Solicited Node Multicast Address)

An example of the Solicited Node Address where the L3 address is: 2001:db8:ebbd:0:5066:64f6:547e:4872 would be ff02::1:ff7e:4872

ND Functions

Neighbour Discovery (ND) does more than just MAC Address Resolution. It also performs:

  • Duplicate Address Detection – DAD
  • Redirection (to a router with a better path)

In the early days of IPv4, duplicate addresses on the network was easy to do, since there was no automatic way to get an address (pre-DHCP), and addresses were manually entered. Creating a list of used addresses was quite common, some even used MS Excel. After DHCP had been created, duplicate addresses seemed like a thing of the past. But then VMs (Virtual Machines) came along, and cloning VMs became common. A clone which included the MAC address, would get the same IP address from the DHCP server as the original (one reason why DHCPv6 does not use a simple MAC address as an identifier).

Duplicate Address Detection

But back in the early days of creating IPv6, there was no DHCP, and the creators wanted to improved upon the situation by creating Duplicate Address Detection (DAD).

Because IPv6 nodes can create a globally unique address (GUA) without contacting any servers, using SLAAC (StateLess Auto Address Config), DAD becomes important in preventing duplicate addressing. When a node forms its GUA, it sends out a Neighbour Solicitation (NS) message to its own Solicited Node Address, similar to a gratuitous ARP in IPv4.

But the key difference from ARP is that if there isn’t a duplicate address on the network, no other host will hear the NS message. Why? Because no other nodes are listening to that specific multicast address, where as ARP requires ALL nodes to listen.

Redirection

Part of knowing the neighbourhood, is knowing what routers exist on the link. Communication with routers will use another special mutlicast address FF02::2.

As a host becomes active on the network, it will send Router Solicitations (RS) to the All Routers address. The routers on the link will then respond with Router Advertisements (RA), which include important bits of information, such as default gateway, prefixes defined on the link, even DNS servers.

On a network (shown below) where there is more than one router on the link, the default gateway router (e.g. the Production Router) may send a Redirect to host (DNS Server), stating another router (Test Network Router) on the link is a better path. The Redirect will also include the link-local address of the better path router.

Network Diagram

Unlike IPv4, next hop addresses, and default gateways are usually link-local addresses, rather than GUAs. Since link-local addresses are limited to the link, they are a good choice for next hop addresses. IPv6 Tip resist the temptation to make all router interface link-local addresses fd80::1.

  • This will break Redirection, which will lower the resiliency of your network (how to you say use this other address that is the same as my address?)
  • Should two networks get crossed (someone plugs in an ethernet cable to the wrong port), you will spend a long time sniffing fe80::1packets, wondering which router they are coming from.

Link-layer or MAC Address Resolution

Oh, and ND also does MAC Address resolution (like ARP). When Host A has the IP address, but not the MAC address of Host B, it will send a NS message to the Multicast Solicited Node address of Host B, using the last 3 bytes of the IPv6 address as the last 3 bytes of the Solicited Node Address.

Since on the average network, only one host will be listening to the Solicited Node Multicast Address, the Host B will respond with a NA which includes the destination MAC address ***.

After receiving the destination MAC address from Host BHost A can send an IPv6 packet to Host B.

ICMPv6 and ND

ND uses the following ICMPv6 message types to discover what is happening in the neighbourhood:

  • NS (type 135)
  • NA (type 136)
  • RS (type 133)
  • RA (type 134)
  • Redirect (type 137)

Because ND uses ICMPv6 extensively, the quickest way to cut yourself off of the internet is to block all ICMPv6 traffic on your firewall. Although this is common in IPv4, resist this temptation. ICMPv6 messages are required to remain connected in IPv6.

Free IPv6 Book

If you want to learn how IPv6 really works, check out this free eBook (PDF). It is an excellent Cisco Press book published in 2013. It may not have the latest RFC references, like RFC 8200 (the IPv6 Standard RFC), but it is still a good read.

IPv6 Fundementals

Download IPv6 Fundementals

The second edition is even better (but not free)

** Fred Rogers Creative Commons

*** Of course, it is possible that the last 3 bytes of the IPv6 address will match that of another host on the network (a 1/16 million chance), but multicast will prevent the rest of the hosts on the network from hearing the idle chatter, unlike ARP.

Babel: a routing protocol with wireless support

Traffic

Previously I wrote about resurrecting the old forgotten routing protocol, RIPng. In a small network of more than one router, you need a routing protocol to share information between the routers. I used RIPng for about six months, turned it on, and pretty much forgot that it was running. Worked like a charm in my wired network.

I moved to a new (to me) house this summer, and thought it was a good opportunity to try out a routing protocol which not only handles wired networks but also wireless. Babel seemed just the thing for this environment.

Enter Babel

Babel is a loop-avoiding distance-vector routing protocol that is robust and efficient both in ordinary wired networks and in wireless mesh networks. Based on the loss of hellos the cost of wireless links can be increased, making sketchy wireless links less preferred.

RFC 6126 standardizes the routing protocol.There are two implementations which are supported on OpenWrt routers, babeld and bird

Creating a network with redundant paths

Like anything in networking, it starts with the physical layer (wireless is a form of physical layer). I attached the wireless links of the backup link router to the production and test routers. Thus creating redundant path of connectivity within my house.

Network Diagram

Running BIRD with Babel

I chose bird6 (the IPv6 version of bird on OpenWrt) because I already had it installed on the routers for RIPng. It was merely a matter of commenting out the RIP section in the /etc/bird6.conf file, and enabling Babel.

The Bird Documentation provides an example. Add the following to /etc/bird6.conf get Babel running in bird6

protocol babel {
    interface "wlan0", "wlan1" {
        type wireless;
        hello interval 1;
        rxcost 512;
    };
    interface "br-lan" {
        type wired;
    };
    import all;
    export all;
}

In the example above, wlan0 is the 2.4 Ghz radio, and wlan1 is the 5 Ghz radio.

Checking the path of connectivity

When determining the connectivity path, traceroute6 (the IPv6 version) is your friend. Checking between the laptop and the DNS server, the path is:

$ traceroute6 6dns
traceroute to 6lilikoi.hoomaha.net (2001:db8:ebbd:4118::1) from 2001:db8:ebbd:bac0:d999:cd8a:cd9b:2037, port 33434, from port 49819, 30 hops max, 60 bytes packets
 1  2001:db8:ebbd:bac0::1 (2001:db8:ebbd:bac0::1)  4.561 ms  0.510 ms  0.487 ms 
 2  2001:db8:ebbd:4118::1 (2001:db8:ebbd:4118::1)  2.562 ms  2.193 ms  1.927 ms 
$ 

The traceroute is showing the path going clockwise through the 2.4 Ghz wireless link.

Network Failure!

To test how well Babel can automatically route around failed links, I started a ping to the DNS server from the laptop and disabled the 2.4 Ghz radio, thus blocking the link the pings were using, and waited…

$ ping6 6dns
PING 6dns(2001:db8:ebbd:4118::1) 56 data bytes
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=1 ttl=63 time=3.54 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=2 ttl=63 time=1.64 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=3 ttl=63 time=2.02 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=4 ttl=63 time=1.64 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=5 ttl=63 time=1.51 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=6 ttl=63 time=1.65 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=7 ttl=63 time=1.58 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=8 ttl=63 time=5.80 ms
From 2001:db8:ebbd:bac0::1 icmp_seq=33 Destination unreachable: No route
From 2001:db8:ebbd:bac0::1 icmp_seq=34 Destination unreachable: No route
...
From 2001:db8:ebbd:bac0::1 icmp_seq=48 Destination unreachable: No route
From 2001:db8:ebbd:bac0::1 icmp_seq=49 Destination unreachable: No route
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=101 ttl=61 time=2.12 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=102 ttl=61 time=3.42 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=103 ttl=61 time=3.16 ms

As you can see the outage was 93 seconds (101 – 8). Not a record time, OSPF would converge much faster, but still it did fix itself without human intervention.

Checking the connectivity path with traceroute6:

$ traceroute6 6dns
traceroute to 6lilikoi.hoomaha.net (2001:db8:ebbd:4118::1) from 2001:db8:ebbd:bac0:d999:cd8a:cd9b:2037, port 33434, from port 47725, 30 hops max, 60 bytes packets
 1  2001:db8:ebbd:bac0::1 (2001:db8:ebbd:bac0::1)  0.541 ms  0.445 ms  0.437 ms 
 2  2001:db8:ebbd:2080::1 (2001:db8:ebbd:2080::1)  1.705 ms  1.832 ms  1.817 ms 
 3  2001:db8:ebbd:2000::1 (2001:db8:ebbd:2000::1)  2.273 ms  1.891 ms  2.584 ms 
 4  2001:db8:ebbd:4118::1 (2001:db8:ebbd:4118::1)  2.348 ms  2.822 ms  2.289 ms 
$

The path can now be seen to be traveling counter-clockwise around the circle via the 5 Ghz link. The Babel routing protocol is routing packets around the failure.

Wireless is great, except …

As more and more things come online using wireless there will be more interference and contention for bandwidth, especially in the 2.4 Ghz band. Babel can enables routing of packets around sketchy wireless links due to interference in a crowded wifi environment.

Your Metric may vary

Because wireless is variable, Babel applies differing metrics to routes as the wireless signal changes. An unfortunate side effect of this is that the network is continuously converging (or changing). The route that may have been used last minute to the remote host, my be invalid the next minute.

I noticed this as my previously very stable IPv6-only servers were now disconnecting, or worse, not reachable.

Route Flapping!

As I looked at the OpenWrt syslog (using the logread command) I could see that the routes were continually changing.

Tue Jul 24 14:46:45 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:46:45 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:46:46 2018 daemon.info odhcpd[778]: Using a RA lifetime of 1800 seconds on br-lan
Tue Jul 24 14:47:01 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:01 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:02 2018 daemon.info odhcpd[778]: Using a RA lifetime of 1800 seconds on br-lan
Tue Jul 24 14:47:33 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:33 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:34 2018 daemon.info odhcpd[778]: Using a RA lifetime of 1800 seconds on br-lan
Tue Jul 24 14:47:49 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:49 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:50 2018 daemon.info odhcpd[778]: Using a RA lifetime of 1800 seconds on br-lan
Tue Jul 24 14:48:53 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:48:53 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:48:54 2018 daemon.info odhcpd[778]: Using a RA lifetime of 1800 seconds on br-lan
...

The problem with this route flapping is that it was being propagated to the other routers which were busy adding and removing routes, causing unreachable to parts of my network. Not a desired behaviour.

Settling things down

To rid my network of the route churn, I changed the Babel wireless interfaces to wired, giving them a stable metric, no longer tied to the variability of the wireless signal quality (signal to noise).

The /etc/bird6.conf now looks like:

protocol babel {
    interface "wlan0", "wlan1" {
        type wired;
        hello interval 5;
    };
    interface "br-lan" {
        type wired;
    };
    import all;
    export all;
}

Restarting bird6, and looking at the syslog, a brief activity can be seen, then the route churn stops, and the network is stable.

Babel, still a work in progress

Babel is still being actively developed, and has a more modern approach to wireless links (something that was near non-existent when RIPng was being standardized back in 1997). Like RIPng, it is easy to set up without having to understand the complexities of OSPF. It is easy to setup on OpenWrt routers and provides redundancy in your network. That said the wireless functionality as implemented by Bird (v 1.63) is not quite there. Fortunately, there is Bird v2.0 out, and I look forward to giving it a try when it comes to OpenWrt.

Postscript

Although the route churn has subsided, I re-measured the convergence time for Babel, and it was quite long, 317 seconds, probably due to the hello timer being set to 5 seconds.

In the end, I reverted my house network to RIPng. Running the same convergence test which yielded an outage of only 11 seconds with no route churn.

Perhaps many of the Babel issues are just Bird’s implementation. And there may be tweaks to reduce network converge times. I’d happily give Babel another chance, but for now, I’ll stick with good ol’ RIPng.

** if you are running a firewall, the default on OpenWrt/LEDE, you will need to put in a rule to accept IPv6 UDP port 6696

** Originally posted (with more detail) on www.makikiweb.com

 

RIPE-NCC IPv6 Educa

RIPE held an IPv6 Education seminar in Amsterdam on this year’s World IPv6 Day (6 June).  There were 18 presentations in seven hours. Lots of good presentations, here some of the highlights.

ISP Status

The good news is that IPv6 deployment continues to increase with Mobile operators taking the lead.

IP Transitions

Sadly, there are still IPv4-only sites out there in the world. We are going to need Transition Mechanisms to get there. NAT64 is not without its problems, most notably, it breaks DNSSEC. But its cousin, 464XLAT does not break DNSSEC, and it provides IPv4 support for the Apps which, unfortunately, still use IPv4-only socket calls.

Guidance on how to get to IPv6-only

There was also good guidance on how to get to IPv6-only. After all, dual-stack is not the end goal, but a transition mechanism to get to IPv6-only. OpenWrt routers already support a CLAT client, making 464XLAT even easier to deploy.

The cost of IPv4 is going up

Lastly, it was pointed out that the cost of computing continues its downward trend while the cost of IPv4 addresses is increasing. And this has proven to be one of business case drivers for some Cloud Providers to move to IPv6-only.

One can now buy a Raspberry PI Zero for $5, and the IPv4 address for that computer is going to cost nearly 5 times as much.

Celebrating IPv6 Day

The IPv6 Educa was well attended, with about 150 people online, and was quite timely given that 6 June was the 6th anniversary of World IPv6 Launch Day (in 2012). The Educa Agenda and recording can be found online.

Go forth and Deploy.

Link-Local Address to the rescue

I was recently setting up an OpenWrt router as a Managed AP (Access Point), but was having challenges setting up a global IPv6 address on the device. I could have put a static IPv4 address on the device, but it is on my IPv6-only network, and that just seemed like the wrong direction.

Then it occurred to me I had a tool, like the slingshot in my back pocket, which I could use: the link-local address of the AP. After all, every IPv6 interface has a link-local address. Of course this has the limitation that the PC must be on the same link, but that less of an issue for me that putting on an IPv4 address in an IPv6-only network.

Link-local addresses can also be used, when your network has gone terribly wrong, such as RAs (router advertisements) are no longer being sent, and you still need to get to that device to fix it.

Using IPv6 Scoped Addresses

The thing about using link-local addresses is that you have to indicate which interface to use (or scope), since all interfaces will have the same link-local subnet (FE80::/64). Fortunately RFC 4007 gives guidance on how to represent a scoped IPv6 Address. From the RFC (section 11):

 <address>%<zone_id>

where

<address> is a literal IPv6 address,

<zone_id> is a string identifying the zone of the address, and

`%' is a delimiter character to distinguish between <address> and
<zone_id>.

SSH using an IPv6 scoped link-local address

By using the scoped address, it is easy to ssh to the AP using the link-local address:

$ ssh admin@fe80::6a1:51ff:fea0:9338%wlan0
admin@fe80::6a1:51ff:fea0:9338%wlan0's password:

BusyBox v1.25.1 () built-in shell (ash)

     _________
    /        /\      _    ___ ___  ___
   /  LE    /  \    | |  | __|   \| __|
  /    DE  /    \   | |__| _|| |) | _|
 /________/  LE  \  |____|___|___/|___|                      lede-project.org
 \        \   DE /
  \    LE  \    /  -----------------------------------------------------------
   \  DE    \  /    Reboot (17.01.4, r3560-79f57e422d)
    \________\/    -----------------------------------------------------------

admin@pia:~#

Using the Link-Local Address in a Web Browser

The AP also has a web interface which can transform a complex configuration into a click of the mouse. But using Scoped Addresses in your browser isn’t as straight forward as it would seem.

RFC 6874 shows how to encode a scoped address in a URL. Since IPv6 literals are enclosed in square brackets, the URL should be as simple as: http://[fe80::6a1:51ff:fea0:9338%wlan0]/

But, wait, the percent sign is used to escape characters in a URL. So if we escape the percent sign, then the URL should be http://[fe80::6a1:51ff:fea0:9338%25wlan0]/. Unfortunately, this doesn’t work.

No support for scoped addresses in the major browsers

Unfortunately, this doesn’t work in most browsers. The browser manufactures have decided to not implement RFC 6874. In some cases like Chrome, publicly saying that it isn’t worth the effort to parse the URL. Firefox also decided they won’t fix this issue. Even the w3.org which makes a reference browser Arora, has decided it won’t fix this lack of conformance to RFC 6874.

SSH to the rescue

Fortunately, there is a work-around which can trick your modern browser to use a scoped link-local address. It requires the help of ssh and its TCP forwarding capability. By using the following command (put in your link-local address and scoped interface):

ssh -L '8080:[fe80::6a1:51ff:fea0:9338%wlan0]:80' localhost

Now return to your favourite browser, and use the URL localhost:8080 and magically you will be transported to your device (in my case my AP) using the scoped link-local address.

Tricking the Browser

Not the prettiest of solutions, but until the Browser folks realize there is a need for scoped link-local address it is what we have.

Using Link-Local Address as a last resort, when the network has gone wonky, or when the device won’t take a GUA (Global Unique Address), can be very useful. Keep it in your back pocket (like a slingshot) for when you really need to get there using IPv6.

Writing IPv6 Apps: Python Webserver


Moving to IPv6 starts at home. Applications have to speak IPv6 as well as the network. The good news is that there is lots of software available which already supports IPv6. Unfortunately, there is much more that doesn’t.

An example, a Python-based webserver. Certainly not ready for a production network, but handy as a learning tool about how easy it can be to support IPv6 in your application.

Why Python? Python is a wonderful programming language, and getting only better with version 3. There are libraries for most needs, including one which serves up the web. And it runs just about anywhere that Python runs (Windows, Linux, BSD, Mac, Pi, ODROID, etc)

Python module SimpleHTTPServer

The python module SimpleHTTPServer supports IPv4 out of the box with the simple command:

python -m SimpleHTTPServer

However it does not support IPv6. There is no one-line equivalent to support IPv6, so a small script is required.

Looking at the code

The ipv6-httpd script is a short script supporting both IPv4 and IPv6. Looking at the following sections:

  1. Initialization of the HTTPServer object (from SimpleHTTPServer library)
  2. Class creation (of HTTPServerV6) to support IPv6

As with all Python scripts, the details roll backwards from the bottom. Initialization occurs in main

def main():
    global server
    server = HTTPServerV6(('::', listen_port), MyHandler)
    print('Listening on port:' + str(listen_port) + '\nPress ^C to quit')
    server.serve_forever()
    os._exit(0)

The IPv6 part

In order to support IPv6, we use a bit of object oriented inheritance trickery to modify the default of an existing class HTTPServer

class HTTPServerV6(HTTPServer):
    address_family = socket.AF_INET6

This creates a new class (which is used in our server ) with the address_family set to AF_INET6 (aka IPv6). This two-line change to the script transforms an IPv4-only script into an application that also supports IPv6.

Running the code

Now that we have a server which supports both IPv4 and IPv6, all we need to do is cd to the directory we wish to share, and start the server

$ cd public/
$ ~/bin/ipv6-httpd.py 
Listening on port:8080
Press ^C to quit
2001:db8:ebbd:0:4d18:71cd:b814:9508 - - [09/Jul/2017 11:49:41] "GET / HTTP/1.1" 200 -
^CCaught SIGINT, dying

The webserver log is sent to standard out (stdout), and can be redirected to a file if desired. In the above example, an IPv6 client ..:9508 requests an index, then the server is terminated with a ^C.

Want to server from a different directory? Stop the server, cd to another directory, and restart the server, it will now serve files from the new current working directory (cwd) location.

Security (or lack there of)

This is a personal webserver, designed to be started and stopped whenever you need it. It is NOT a production quality webserver that you should put on the internet.

It does not server encrypted pages (read: no SSL/TLS), and it is single threaded, which means if a second request comes in while serving the first request, the second request will have to wait.

Adding IPv6 Support to your App

As you can see, it doesn’t have to be difficult to add IPv6 to your apps, you just need to give it some thought when planning your App. By adding IPv6, you will future proof your App by being ready for the future of the Internet.

Video From The IPv6 Solutions Tutorials at I2 Tech Exchange

Thanks to long and hard work by Richard Machida from U. Alaska, we have a video on You Tube with content from the tutorials we did at Internet2 Tech Exchange 2017 in San Francisco, October 15, 2017.


Presentation Markers, with links to start times in the video:

0:00:00 – 0:22:00 Intro and Context

0:22:58 – 1:42:01 – IPv6 Security

  • Jeff Harrington, NYSERNET

1:42:10 – 1:44:49 – Acknowledgments

1:44:49 – 2:51:30 – The IPv6-Only Network:

(Experiences setting up NAT64, DNS64, 464XLAT)

  • Alan Whinery – U. Hawaii
  • Jeffry Handal, Cisco Meraki

 

 

RIPng the forgotten routing protocol

Traffic

In a small network of more than one router, one will quickly figure out that you can’t get there from here, even with DHCPv6-PD (Prefix Delegate) providing IPv6 addressing,. Sure you can get to the internet, but you can’t get to other parts of the network which are behind the other routers. You need a routing protocol to share information between the routers on your small network.

How to Get There?

So when we all have /56s or larger delegated to our house/SOHO, and we put in multiple IPv6 capable routers, how will the routers convey connectivity information? Well we could use HomeNet RFC 7788, but currently home routers don’t come out of the box configured for HomeNet (for example the Wifi is on the same subnet as the LAN). Not to diminish HomeNet’s work, but there is a forgotten protocol that can solve the problem.

The Forgotten Protocol: RIPng

RIPng (RFC 2080), the IPv6 cousin of RIPv2  RFC 2453 , has all the advantages of RIPv2, except it conveys IPv6 route information (rather than RIPv2’s IPv4-only info). But the same principles apply, easy to deploy, works pretty well, solves the problem (of distributing routes to multiple routers).

But finding support, and documentation on RIPng on the internet is threadbare. Even the venerable BIRD (BIRD Internet Routing Daemon) has a non-functional example configuration, not to mention they just call it RIP confusing whether it is for IPv4 or IPv6.

But why has it been forgotten? Because real men use OSPF (or IS-IS). And you can too, if you want to spend a week (or more) learning the complexities of the protocol. But what if you just wanted it to work, and move onto the next problem.

Getting from here to there

Although some ISPs will give a fairly stable prefix (via DHCPv6-PD), others do not, leading to a very dynamic prefix environment at home or the SOHO (see Request: Less Dynamic Prefix, Mr./Ms. ISP. Cascaded routers can further divide the /56 via DHCPv6-PD allocating smaller chunks downstream. If one only needs to get to the internet (everything is in the cloud, right?), this works quite well. After all each cascaded router has a default route pointing upstream (towards the ISP).

But what if you have your NAS on Network B, and you client on Network C, it isn’t going to work.

Network

Router 3 doesn’t know about Router 2, or even the subnet B behind Router 2

Of course, you could put in static route in each of Routers 2 & 3. But not only does that take a bit more knowledge of routing (hint: the next hop is a link-local address), but it doesn’t work in a dynamic prefix (where the ISP is changing the prefix) environment.

RIPng to the rescue

So, enter an easy to configure, easy to use routing protocol, RIPng. Running RIPng, Routers 2 & 3 can share information about themselves, and more importantly, the subnets they are serving. If the ISP changes the prefix, DHCPv6-PD will propagate the new prefix down to the cascaded routers, and RIPng will refresh the routing information between the routers, all automagically.

Making RIPng work

First, use router software that supports RIPng. Of course Cisco supports RIPng, but not everyone has multiples of these at home. OpenWrt/LEDEis a good option, since if your router is not over 5 years old, it is probably supported. And it has BIRD as an easy to install package.

First, install bird6 & birdc6 (the CLI for bird6). The bird6 package is a direct port from the bird maintainers, and therefore uses its own configuration file (rather than UCI). But RIPng is easy, so onto the next step.

Uncomment the following lines in the package-provided configuration file /etc/bird6.conf

router id 192.168.0.1;

protocol kernel {
    learn;          # Learn all alien routes from the kernel
    scan time 20;       # Scan kernel routing table every 20 seconds
    export all;     # Default is export none
}

protocol rip {
    import all;
    export all;
    infinity 16;
    interface "*" { mode multicast; };
}

Even though RIPng does not require a “router id” bird6 won’t run without it. Restart the bird6 daemon with /etc/init.d/bird6 restart, and you are done*

Of course you could optimize it a bit by telling the config which interfaces to use, such as interface "eth1" {mode multicast }; but this article is about making it easy, and it doesn’t break anything sending route updates out the other interfaces.

How to tell RIPng is working

ssh to the router. The birdc6 CLI is the easiest in showing the routing table, which includes sources of the routes, like this:

# bindc6
bird> show route
fdcd:5cb4:d0e1::/64 dev br-lan [kernel1 2017-11-30] * (10)
fdcd:5cb4:d0e1::/48 unreachable [kernel1 2017-11-30] * (10)
2001:db8:ebbd:8::/64 dev br-lan [kernel1 2017-11-30] * (10)
2001:db8:ebbd:8::/62 unreachable [kernel1 2017-11-30] * (10)
2001:db8:ebbd:c::/64 via fe80::224:a5ff:fed7:3089 on eth1 [rip1 23:30:58] * (120/2)
2001:db8:ebbd:c::/62 via fe80::224:a5ff:fed7:3089 on eth1 [rip1 23:30:58] * (120/2)
2001:db8:ebbd::/60 via fe80::221:29ff:fec3:6cb0 on eth1 [rip1 00:00:44] * (120/2)
2001:db8:ebbd:4::/64 via fe80::221:29ff:fec3:6cb0 on eth1 [rip1 00:00:44] * (120/2)
2001:db8:ebbd:4::/62 via fe80::221:29ff:fec3:6cb0 on eth1 [rip1 00:00:44] * (120/2)
2001:270:ebbd::/60 via fe80::221:29ff:fec3:6cb0 on eth1 [kernel1 2017-11-30] * (10)
fd37:5218::/64     via fe80::221:29ff:fec3:6cb0 on eth1 [rip1 00:00:44] * (120/2)
fd37:5218::/48     via fe80::221:29ff:fec3:6cb0 on eth1 [rip1 00:00:44] * (120/2)
64:ff9b::/96       via fe80::221:29ff:fec3:6cb0 on eth1 [rip1 00:00:44] * (120/2)
fd11::/64          via fe80::224:a5ff:fed7:3089 on eth1 [rip1 23:30:58] * (120/2)
fd11::/48          via fe80::224:a5ff:fed7:3089 on eth1 [rip1 23:30:58] * (120/2)
bird> 

All those “rip1” lines are routes learned from RIPng. Since no filters were used, not only are the globally unique addresses shared, but so are the Unique Local addresses (ULA) configured in each router. Here’s the fun part, even though there is not ULA on subnet A, RIPng provides connectivity between the ULA islands. Of course, it would be better to have a single ULA /48 for the entire network, but this is a test network, and OpenWrt/LEDE creates a ULA by default for each router. I wanted to see how RIPng would handle this non-optimal situation. And, it handles is pretty well.

While it is possible to figure out who the other RIPng routers are, it is easier to use the bindc6 command:

bird> show rip neigh
rip1:
IP address                Interface  Metric Routes   Seen
fe80::224:a5ff:fed7:3089  eth1            1      4     27
fe80::221:29ff:fec3:6cb0  eth1            1      9      6
bird> 

What’s missing in this simple configuration: authentication

RIPv2 included authentication, to ensure that the routing updates that are being received, are from the correct sources.

RIPng RFC 2080 does not use authentication natively, but rather relies on IPv6’s Authentication Header (AH), one of the extension header types, for authentication (see Stretching IPv6 with extension headers). The configuration above, does not utilize the AH header, however, the risk should be low in a small network. More to come later on RIPng with AH.

Don’t forget DNS

Of course, if you creating a network with IPv6 and more than one subnet, don’t forget the DNS. Have each router forward DNS requests to your local DNS server (you do have a local DNS server, right?).

Add your local DNS server address the following to dnsmasq section of /etc/config/dhcp

config dnsmasq
    list server '2001:db8:ebbd:0::dbc8'

Now all your IPv6 addresses will have names, and it will be a breeze getting around your network.

Wrapping up RIPng

RIPng may have been forgotten, and overlooked for the past 20 years, but that doesn’t mean it can’t still solve problems for small networks (say less that 25 routers) quickly and easily. Sure, you can put in a more complex routing protocol like OSPFv3, but on the other hand, you might want to spend your time just going to the NAS, fire up a good movie, and enjoy the popcorn.

*traffic photo – Creative Commons

**if you are running a firewall, the default on OpenWrt/LEDE, you will need to put in a rule to accept IPv6 UDP port 521

 

See www.makikiweb.com for full article with tcpdump output.