IPv6 Christmas Tree

Christmas has come and gone, but the Christmas tree in the corner still has a few days left before we take it down.

On the IPv6 Internet (the “other” internet that the IPv4-only folks can’t see) there is an IPv6 Christmas Tree that can be decorated with your IPv6 pings.

By assigning 16 million IPv6 addresses to a single Christmas Tree, one can adjust the colours of the lights on the tree. According to the website:

2a05:9b81:2020::AA:BB:CC for HTML Color #AABBCC

There are many resources to convert colours to hex, but to light the lights red, one would ping:

ping -6 2a05:9b81:2020::FF:00:00

Because the tree doesn’t really have 16 million addresses assigned to the IPv6 stack, it will not reply, but you can watch a video of the tree and see the results of your pings.

So enjoy the last few days of 2019, and light the IPv6 Christmas Tree in your favourite colour.

Using IPv6 Link-Local to rescue your embedded device

Rescue!

IPv6 to the Rescue

Your embedded device has been running great for the past few weeks, and now all the sudden, it can’t be found on the network. You can’t ssh into see what the problem is, it has just disappeared.

Lots of reasons why this may have happened, perhaps the program hit a bug and crashed, or more likely, it has forgotten its IPv4 address. Sure you can just “turn it off and on again” and that may fix the problem, or it could make it worse, if it was writing out to the SD Card at the time you pulled power.

The real answer is to log in and find out what is really going on, but as I said, for some reason your Pi, router, or device isn’t responding. So what do you do?

IPv6 to the Rescue

But if you setup your network as a dual-stack network, then your device already has not only an IPv4 address, but also an IPv6 address as well. And if you put the IPv6 address into your local DNS, then you can just ssh to the hostname, and see what is going on with your device.

But what if you do have a dual-stack network (your ISP is providing IPv6) but you haven’t really done anything with IPv6. How can you use it to rescue your device?

ssh to the IPv6 address of the device, and Bob’s your uncle.

Finding the IPv6 Address of your device

Unlike IPv4 network scanners, scanning IPv6 networks is much more challenging. After all, instead of looking at 254 addresses, you are now looking to scan 18,446,744,073,709,551,616 or 18 quintillion addresses. Assuming that you use the fastest scanner zmap which claims to be able to scan the entire IPv4 internet (all 4 billion addresses) in 45 minutes. With 18 quintillion possible addresses, it is still going to take 367,719 years! (2^32 *45 min / 60 min/ 24 hours/ 365 days). And zmap doesn’t support IPv6 (and you can see why)

Fortunately, there are non-brute-force solutions to the problem.

IPv6 Basics, the all-nodes address

Although there is no broadcast in IPv6, there is a specific multicast address that all nodes must listen to. This is called the all-nodes address, or ff02::1. It is possible to send a ping to the all-nodes address, and get multiple responses back, similar to pinging the IPv4 broadcast address will (used to) return multiple responses.

$ ping6 -c 2 -I wlan0 ff02::1
PING ff02::1(ff02::1) from fe80::f203:8cff:fe3f:f041%wlan0 wlan0: 56 data bytes
64 bytes from fe80::f203:8cff:fe3f:f041%wlan0: icmp_seq=1 ttl=64 time=0.140 ms
64 bytes from fe80::2ac6:8eff:fe16:19d7%wlan0: icmp_seq=1 ttl=64 time=7.32 ms (DUP!)
64 bytes from fe80::21e:6ff:fe33:e990%wlan0: icmp_seq=1 ttl=64 time=7.66 ms (DUP!)
64 bytes from fe80::216:3eff:fea2:94e8%wlan0: icmp_seq=1 ttl=64 time=8.67 ms (DUP!)
64 bytes from fe80::ba27:ebff:fe89:bc51%wlan0: icmp_seq=1 ttl=64 time=9.60 ms (DUP!)
64 bytes from fe80::4aa2:12ff:fec2:16df%wlan0: icmp_seq=1 ttl=64 time=9.73 ms (DUP!)
64 bytes from fe80::216:3eff:feff:2f9d%wlan0: icmp_seq=1 ttl=64 time=10.6 ms (DUP!)
64 bytes from fe80::f203:8cff:fe3f:f041%wlan0: icmp_seq=2 ttl=64 time=0.686 ms

--- ff02::1 ping statistics ---
2 packets transmitted, 2 received, +6 duplicates, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.140/6.814/10.696/3.840 ms

In IPv6, multicast addresses are associated with multiple interfaces (there is an all-nodes address on each interface), therefore it is necessary to specify an interface -I to ping.

OK, but how to we find the IPv6 address in my dual-stack network?

Using an open source utility, v6disc.sh, which uses the all-nodes technique discovers the nodes on your IPv6 network in a matter of seconds, rather than years.

$ ./v6disc.sh 
WARN: avahi utis not found, skipping mDNS check 
-- Searching for interface(s) 
-- Found interface(s):  eth0 
-- INT:eth0 prefixs: 2001:470:db8:101 
-- Detecting hosts on eth0 link 

-- Discovered hosts for prefix: 2001:470:db8:101 on eth0 
2001:470:db8:101::1                      00:24:a5:f1:07:ca    Buffalo
2001:470:db8:101:203:93ff:fe67:4362      00:03:93:67:43:62    Apple
2001:470:db8:101:211:24ff:fece:f1a       00:11:24:ce:0f:1a    Apple
2001:470:db8:101:211:24ff:fee1:dbc8      00:11:24:e1:db:c8    Apple
2001:470:db8:101:226:bbff:fe1e:7e15      00:26:bb:1e:7e:15    Apple
2001:470:db8:101::303                    d4:9a:20:01:e0:a4    Apple
2001:470:db8:101:3e2a:f4ff:fe37:dac4     3c:2a:f4:37:da:c4    BrotherI
2001:470:db8:101:6a1:51ff:fea0:9339      04:a1:51:a0:93:38    Netgear
2001:470:db8:101:b41f:18a3:a97c:4a0c     10:9a:dd:54:b6:34    Apple
2001:470:db8:101::9c5                    b8:27:eb:89:bc:51    Raspberr

The utility looks up the Ethernet MAC address manufacturer and prints it in the third column.

As you can see it is easy to spot the Raspberry Pi on this network.

But wait, I don’t have a dual-stack network, now what?

So you have Shaw for an ISP, and they can’t spell IPv6, now what? Another IPv6 fact is that every device which has an IPv6 stack, must have a link-local address. The link-local address is used for all sorts of things, including Neighbour Discovery Protocol (NDP), the IPv6 equivalent of ARP. Therefore, even if your network doesn’t have an IPv6 connection to the internet, your IPv6-enabled device will have a link-local address.

Fortunately, v6disc.sh also can detect link-local addresses as fast as it detects IPv6 global addresses (in mere seconds).

$ ./v6disc.sh -i wlan0 -L
WARN: avahi utis not found, skipping mDNS check 
-- INT:wlan0    prefixs:  
-- Detecting hosts on wlan0 link 
-- Discovered hosts for prefix: fe80: on wlan0 
fe80::216:3eff:fea2:94e8                 00:16:3e:a2:94:e8    Xensourc
fe80::216:3eff:feff:2f9d                 00:16:3e:ff:2f:9d    Xensourc
fe80::21e:6ff:fe33:e990                  00:1e:06:33:e9:90    Wibrain
fe80::2ac6:8eff:fe16:19d7                28:c6:8e:16:19:d7    Netgear
fe80::4aa2:12ff:fec2:16df                48:a2:12:c2:16:df    
fe80::ba27:ebff:fe89:bc51                b8:27:eb:89:bc:51    Raspberr
fe80::f203:8cff:fe3f:f041                f0:03:8c:3f:f0:41    Azurewav
-- Pau 

Link-local addresses are not globally unique, and therefore an interface must be specified with the -i, and the -L tells v6disc.sh to only detect link-local addresses.

Again, as you can see, it is easy to pick out the Raspberry Pi link-local address on this network.

Now I have the IPv6 address, how do I use it?

With the Global or link-local IPv6 address, all one need to do it ssh into the lost device and find out what is going on.

If using the link-local address, the interface must also be specified with the %intf notation (e.g. <link-local_addr>%wlan0) :

$ ssh cvmiller@fe80::ba27:ebff:fe79:bc51%wlan0
cvmiller@fe80::ba27:ebff:fe79:bc51%wlan0's password: 
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-1030-raspi2 armv7l)

Last login: Mon Sep 30 19:57:11 2019 from fe80::2ac6:8eff:fe16:19d7%br0
$

Log in and fix it

And now you are logged into your wayward device, and you can troubleshoot to figure out what went wrong.

IPv6 Summer Fun

Fun!

IPv6 Fun

Everyone knows (by now) that IPv6 is an 128 bit address. And most know that the least significant 64 bits is the Interface ID (IID). That is 2^64 or 18,446,744,073,709,551,616 hosts per /64 prefix (think: subnet). But no one expects we will fill networks with that many hosts per prefix. IPv6 prefixes are part of the mind-change from IPv4, where subnets are tightly packed with hosts, to sparsely populated prefixes.

Consuming IPv6 Address space

But what if a host occupied all of the addresses in a /64 prefix? We all know that IPv6 hosts typically have more than one IPv6 address per interface. But can an interface have 2^64 addresses?

Well, probably your OS won’t allow anywhere near that number of addresses. But with a relatively simple Python program, a lowly Raspberry Pi can listen on all of those addresses simultaneously.

Fun with 2^64 Addresses

This summer I discovered ipv6board, a project run out of Sweden running on a Raspberry Pi streaming a short SMS style message that anyone can write to. You can view the Raspberry Pi display at ipv6board.best-practice.se (unfortunately, the author is having a problem the streaming video part, and may be off-line)

From the ipv6board website, one encodes ASCII into the last 8 bytes (the IID) of an addresses, and pings the ipv6board. The ASCII encoded message will show up on the Raspberry Pi’s 8×3 display. Pinging 2001:6b0:1001:105:4177:6573:6f6d:6521 will print “Awesome!” to the board.

ipv6board

Encoding ASCII

This is all fun, but converting text to ASCII by hand becomes tiresome fairly quickly. I thought, why not have a computer do the ASCII conversion, and pop those 8 bytes into an IPv6 address and send it off to the IPv6Board.

I decided to write a conversion program in shell script, because I wanted it to run everywhere (even on my OpenWrt router). The script takes a message argument, converts it to ASCII, and then pings the ipv6board.

$ ./ipv6board.sh "IPv6 Bd"
PING 2001:6b0:1001:105:4950:7636:2042:6420(2001:6b0:1001:105:4950:7636:2042:6420) 56 data bytes

--- 2001:6b0:1001:105:4950:7636:2042:6420 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

pau

Run it multiple times (with different messages) to fill the screen of IPv6board. ipv6board.sh is open source and hosted on github.com/cvmiller/ipv6board.sh.

How much address space do we need?

Clearly, 8 bytes is too small to write tweets. Perhaps the version of IP after IPv6 will have even more bytes so we can write a haiku poem. But for now, enjoy the Summer Fun with IPv6 today.


*Fun, Createive Commons

 

IPvFoo helping you create IPv6-only websites

Traffic

Firefox & Chrome Extension

The transition to IPv6 will be a long one. Even with Google measuring 25% utilization world-wideon the IPv6 internet, many services will be running dual-stack for some time to come.

IPv6-only

But there are those who have already moved to IPv6-only networks, most notably Facebook, and T-Mobile. They run a variety of transition mechanisms to help external IPv4-only services connect or traverse their IPv6-only networks.

But what if you just wanted to check your own servers to ensure they are ready for IPv6-only? Modern applications pull in javascript from many sources, and those external sources may not be available on IPv6, thus breaking your IPv6-only deployment.

There is an excellent extension to Chrome and Firefox which not only displays if the website is over IPv6, but also all the web page elements referred to on a given web page.

IPvFoo Screenshot

Looking for the Green 6

IPvFoo will put a green 6 or red 4 in the upper right corner of the browser indicating which network transport (IPv6 or IPv4 respectively) was used. In addition, a smaller 4 and/or 6 will be displayed to the right of the large 4/6 indicating referenced sites by the webpage.

Clicking on the 6 or 4, will display a list of referred sites and what addresses were used will pop up.

Looking up who owns that address

By right-clicking an address on the right side of the pop-up list, an option of Look up on bgp.he.net. Click that, and Hurricane electric will not only display the AS (autonomous system) that announced that IP block, but clicking on the whois tab will show you who is registered for that IP block.

IPvFoo Screenshot

Creating a IPv6-only site

When creating an IPv6-only site, IPvFoo can quickly tell you if not only your server is running IPv6, but also the references that your web application might be using. In a IPv6-only network, the IPv4 references will not connect (unless you are using a transition mechanism like NAT64)

But why should you create an IPv6-only site. Frankly it is easier and faster, with only one protocol and firewall/ACLs to manage, and no transition mechanisms to traverse. If you believe the projections, the IPv6 Internet will be at 80% by 2025, that is only a little more than five years from now.

Be Ready for the Future Now

IPvFoo not only displays if you are IPv6-only ready, but is interesting to see how the rest of the world is building web sites as well.


Originally posted at www.makikiweb.com

Free (ISP) IPv6 Deployment in France with 6rd

Some distance from Hawaii, but I thought you might enjoy seeing how other ISPs are deploying IPv6. Free.fr is an ISP in France that has been supplying IPv6 to its customers since 2007. Whether it is xDSL or Fibre, the legacy deployment is via 6rd (or Rapid Deployment) which is an IPv6-in-IPv4 encapsulation method.

What’s interesting about Free’s deployment, is that they use the customer’s globally routable IPv4 address and encode the 32 bits in the IPv6 prefix bits 28-60, leaving the customer with a /60. Typically the 0 network is used for the WAN tunnel (6rd tunnel) and the remaining 15 networks are for the customer to use at their house/small office. There is no DHCPv6-PD.

Free supplies a CPE, but if you want to have more control over your routing, and features, you can use an OpenWrt router. Here’s a how-to on configuring your OpenWrt router for Free.

Bon Appetit!

 

Wireguard IPv6 VPN

Bringing the IPv6 Internet to the IPv4-only land of NAT

Traffic
WireGuard bringing IPv6 to NATland

The evil of NAT (Net Address Translation) has become institutionalized. And because NAT munges the network header, it causes all sorts of problems, including preventing simple IPv6 tunneling (6in4).

But go to any Starbucks, McDonald’s, the airport or even the Library, and you will find yourself on a NATted network. How to get on the IPv6 Internet when stuck behind NAT? Enter the VPN (Virtual Private Network).

VPN tunneling IPv6

Most people use VPNs make it appear they are in a different location, or are looking for the extra security. But most (99%) VPN providers only support IPv4, and in fact, either disable IPv6, or ask you to do so to prevent IPv6 leakage.

But what if you could use a VPN to transport IPv6 traffic to the IPv6 Internet (now over 25% utilization). I looked at OpenVPN, for this purpose, but found all the moving parts (Certs, pushing routes, lack of IPv6 examples) daunting. If you have a working OpenVPN setup, you may find it easier to tunnel IPv6 through it.

Wireguard, the easy VPN

Wireguard is getting a lot of buzz these days, as it is much easier to setup than OpenVPN. It works similar to ssh keys. Create public/private key pair, for each node in the VPN, tell the each nodes the remote node IPv4 address, and connect! Wireguard is very good at making a complex VPN thing into a simple setup.

But the typical Wireguard VPN only has a roaming laptop at the far end. I wanted to share the IPv6 goodness with my friends, which meant that I wanted to have an entire IPv6 subnet available in IPv4-only NAT-Land.

Using OpenWrt to share IPv6 in NAT-Land

OpenWrt to the rescue. OpenWrt is an open source router software than runs on hundreds of different types of routers. And Wireguard is a package that is prebuilt for each of those routers. There’s even a friendly web GUI frontend to configure Wireguard! What’s not to like.

extending your IPv6 network

The network (above) shows the highlevel design. Allow IPv4 traffic to follow the usual NAT-Land path to the IPv4 Internet (via the Evil NAT Router). But push the IPv6 traffic through the Wireguard Tunnel, where there is another router which will forward it onto the IPv6 Internet. This is called split tunnel in VPN parlance.

The advantages of this topology are:

  • IPv4 traffic follows the usual NATted path, no change there
  • End stations (to the left of R1) require no special software configuration to use it
  • Rather than just keeping the IPv6 to yourself, you can share the IPv6 goodness with anyone connecting to R1 router

The last point means you can bring IPv6 networking into the unfriendly IPv4 NAT-Land world, and show people there is a better way (like a 4K TV). Training is the obvious application, but there are other applications such as transitional networks, and better security.

Address Planning

Before we get too far, you will need an address plan. Since IPv6 will need a network for each link (almost, we’ll use link-local for the point-to-point link), we need a plan so that packets can be routed down to R1 at the far end of the WireGuard VPN from the internet.

Since I am routing the VPN tunneled IPv6 packets through my house, I will need more than a /64 from my ISP. Fortunately, I have a /48. The address plan should look something like this:

  • ::/0 The Internet
  • 2001:db8:ebbd::/48 My House Network
  • 2001:db8:ebbd:9900::/56 my DMZ Network
  • 2001:db8:ebbd:9908::/62 the LAN ports of R2, which are unused, but DHCPv6 allocates them automagically
  • 2001:db8:ebbd:990a::/64 the LAN ports of R1, which is out in NATland

Assuming you are running a DMZ, and an internal LAN, you are going to need at least 3 /64 networks. Therefore if your ISP provides a /60 you have enough, and if they give you a /56, you have plenty to do other IPv6 projects as well.

Installing Wireguard on OpenWrt

The easiest method is to use the Web GUI to install software on the router. But it can also be done via the CLI, after ssh-ing to the router.

opkg update
opkg install luci-app-wireguard

The luci-app-wireguard is for the web GUI, but it also pulls in the kernel module which does most of the work kmod-wireguard, and wireguard-tools which contains the CLI interface.

After installing wireguard, use the CLI tool wg to create a private/public key pair. This command does both in one easy line

wg genkey | tee privatekey | wg pubkey > publickey  

Just like when using ssh private/public keys, the private key is private. It never leaves the system it was created on. Whereas the public key can go anywhere, including publishing it on the internet!

Configuring Wireguard on OpenWrt

  • Add a new interface called WGNET. This is quite easy using the OpenWrt LuCI Web GUI. Under Network->Interfaces, scroll to the bottom and click on the Add New Interface button.
  • Add Private Key and Listening Port to WGNET
  • Add a Peer, including the Peer’s public key and IPv6 Address. I used a Link-local address
  • Click Save & Apply
  • Click Connect WGNET

With any luck, WGNET will connect to the peer.

On the router using the wg show command to show the state of the connection. It should look something like this:

root@makiki:~# wg show
interface: WGNET
  public key: E/2sXvSeg8cggZhJvDOn22z5HqV+eSDduOw46BwBzww=
  private key: (hidden)
  listening port: 29998

peer: cqrT9TuDN3yAjRXprLVWYiH0tAgPxr8Np/HDIQ21+AM=
  endpoint: 250.250.250.250:29999
  allowed ips: ::/0, fe80::/64
  latest handshake: 1 minute, 46 seconds ago
  transfer: 1.05 MiB received, 1.03 MiB sent
  persistent keepalive: every 25 seconds

The key info is the latest handshake. If that isn’t there, then the VPN isn’t up, and you will need to go back and re-check your configuration.

The OpenWrt GUI also has this information under Status->WireGuard Status. (if it is blank, the VPN link isn’t connected)

Setting up DHCPv6-PD

But getting the WireGuard VPN link up is only half the fun. You will quickly discover that you can’t ping6 from a host connected to the R1 LAN ports (and wireless) for two reasons:

  1. the hosts on that LAN don’t have GUA (Global Unique Address) yet
  2. There is no return route down to the R1 LAN (see address plan above)

Advertising RAs on the LAN and creating a DHCPv6 Client

The WireGuard interface is just an interface. We must use a stacked interface to run a DHCPv6 client on top of the WGNET interface. Create yet another interface on Router R1, called LAN6.

Select DHCPv6 Client as the interface type, and WGNET as the underlying interface. Then select Request IPv6-prefix length of 64 Click save and apply. Once the DHCPv6 client gets a Prefix Delegation from R2, the R1 LAN hosts will receive GUAs.

Setting up a DHCPv6-PD server on the upstream router

By configuring a DHCPv6-PD server on R2, not only will a PD be sent down to R1, but a route will be automagically installed on R2 pointing down to the Hosts attached to R1.

Configuring a DHCPv6-PD server on R2 is similar to the procedure on R1, In the web interface, Network -> Interfaces -> Add New Interface, create LAN_WG over the WGNET interface as a type Static Address. Then edit the following blanks:

  • IPv6 Assignment length: 63
  • IPv6 assignment hint: 990a, but you can try leaving this blank

Lower on the page, select the IPv6 Settings tab, and configure the following:

  • Router-Advertisment-service: server mode
  • DHCPv6-Service: server mode
  • NDP-Proxy: disabled
  • DHCPv6-Mode: stateless + stateful

Click save and apply.

Advantages of DHCPv6-PD

The advantage of using DHCPv6-PD is that it will automagically update the addressing if your ISP changes your prefix. All your hosts will automatically pick up the new prefix, and will still have connectivity to the IPv6 Internet.

Road Warrior

What if your remote location is not static? What if the Evil NATland router changes your port? How can you fill in the Peer IP address and port if you have unpredictable NAT changing things on your?

I am still working on it. The real solution would be to leave the Peer IP and Port info blank, and let WireGuard figure it out. But alas with OpenWrt 18.06.1 that doesn’t work.

As a work-around, I have created a script for R2 which listens for the Peer, and reconfigures the R2 WireGuard IP and Port info dynamically. The script is on github.

Limitations

Although WireGuard works quite well at tunneling IPv6 through multiple layers of NAT, it is not without its limitations.

  • Network blocks UDP. Wireguard uses UDP as transport, and therefore won’t connect (there is no TCP option)
  • Network Latency. Anytime you tunnel IPv6 inside of IPv4, the network latency of IPv6 will never be less than that of IPv4. Therefore the performance of IPv6 will not be shown at its best.

IPv6 Oasis in the desert sands of NAT-Land

A tunneled IPv6 connection is always less desirable than a native one, but using WireGuard does allow one to use IPv6 when stuck in the deserts of IPv4 NAT-Land. And by using it with OpenWrt, the Oasis just got roomy enough to share with your friends.


Article (with more detail) originally appeared on www.makikiweb.com

Linux Containers with OpenWrt

Linux Containers Part 2

Traffic

Virtual Network in the Palm of your hand

Although you can turn your Pi into an OpenWrt router, it never appealed to me since the Pi has so few (2) interfaces. But playing with LXD, and a transparent bridge access for the containers, it made sense that it might be useful. But after creating a server farm on a Raspberry Pi, I can see where there are those who would want to have a firewall in front of the servers to reduce the threat surface.

Docker attempts this, by fronting the containers with the dockerd daemon, but the networking is klugy at best. If you choose to go it on your own, and use Docker’s routing, you will quickly find yourself in the 90s where everything must be manually configured (address range, gateway addresses, static routes to get into and out of the container network). The other option is to use NAT44 and NAT66, which is just wrong, and results in a losing true peer to peer connectivity, limited server access (since only 1 can be on port 80 or 443), and the other host of brokenness of NAT.

OpenWrt is, on the other hand, a widely used open-source router software project, running on hundreds of different routers. It includes excellent IPv6 support, including DHCPv6-PD (prefix delegation for automatic addressing of the container network, plus route insertion), an easy to use Firewall web interface, and full routing protocol support (such as RIPng or OSPF) if needed.

Going Virtual

The goal is to create a virtual environment which not only has excellent network management of LXC, but also an easy to use router/firewall via the OpenWrt web inteface (called LuCI), all running on the Raspberry Pi (or any Linux machine).

Virtual router Network

Motivation

OpenWrt project does an excellent job of creating images for hundreds of routers. I wanted to take a generic existing image and make it work on LXD without recompiling, or building OpenWrt from source.

Additionally, I wanted it to run on a Raspberry Pi (ARM processor). Most implementations of OpenWrt in virtual environments run on x86 machines.

If you would rather build OpenWrt, please see the github project https://github.com/mikma/lxd-openwrt (x86 support only)

Installing LXD on the Raspberry Pi

Unfortunately the default Raspian image does not support name spaces or cgroups which are used to isolate the Linux Containers. Fortunately, there is a Ubuntu 18.04 image available for the Pi which does.

If you haven’t already installed LXD on your Raspberry Pi, please look at Linux Containers on the Pi blog post.

Creating a LXD Image

NOTE: Unless otherwise stated, all commands are run on the Raspberry Pi

Using lxc image import an image can pulled into LXD. The steps are:

  1. Download the OpenWrt rootfs tarball
  2. Create a metadata.yaml file, and place into a tar file
  3. Import the rootfs tarball and metadata tarball to create an image

Getting OpenWrt rootfs

The OpenWrt project not only provides squashfs and ext4 images, but also simple tar.gz files of the rootfs. The (almost) current release is 18.06.1, and I recommend starting with it.

The ARM-virt rootfs tarball can be found at OpenWrt

Download the OpenWrt 18.06.1 rootfs tarball for Arm.

Create a metadata.yaml file

Although the yaml file can contain quite a bit of information the minimum requirement is architecture and creation_date. Use your favourite editor to create a file named metadata.yaml

architecture: "armhf"
creation_date: 1544922658

The creation date is the current time (in seconds) since the unix epoch (1 Jan 1970). Easiest way to get this value it to find it on the web, such as the EpochConverter

Once the metadata.yaml file is created, tar it up and name it anything that makes sense to you.

tar cvf openwrt-meta.tar metadata.yaml

Import the image into LXD

Place both tar files (metadata & rootfs) in the same directory on the Raspberry Pi. And use the following command to import the image:

lxc image import openwrt-meta.tar default-root.tar.gz  --alias openwrt_armhf

Starting up Virtual OpenWrt

Unfortunately, the OpenWrt image won’t boot with the imported image. So a helper script has been developed to create devices in /dev before OpenWrt will boot properly.

The steps to get your virtual OpenWrt up and running are:

  1. Create the container
  2. Adjust some of the parameters of the container
  3. Download init.sh script from github
  4. Copy the init.sh script to /root on the image
  5. Log into the OpenWrt container and execute sh init.sh
  6. Validate that OpenWrt has completed booting

I use router as the name of the OpenWrt container

lxc init local:openwrt_armhf router
lxc config set router security.privileged true

In order for init.sh to run the mknod command the container must run as privileged.

Download init.sh from the OpenWrt-LXD open source project

The init.sh script is open source and resides on github. To download it on your Pi, use curl (you may have to install curl)

curl https://raw.githubusercontent.com/cvmiller/openwrt-lxd/master/init.sh > init.sh

After copying the script to the container, Log into the router container using the lxc exec command, and run the init.sh script.

lxc exec router sh
#
# sh init.sh

Managing the Virtual OpenWrt router

The LuCI web interface by default is blocked on the WAN interface. In order to manage the router from the outside, a firewall rule allowing web access from the WAN must be inserted. It is possible to edit the /etc/config/firewall file within the OpenWrt container and open port 80 for external management.

lxc exec router sh

# vi /etc/config/firewall

...
config rule                      
        option target 'ACCEPT'   
        option src 'wan'         
        option proto 'tcp'       
        option dest_port '80'    
        option name 'ext_web'                                   

Save the file and then restart the firewall within the OpenWrt container.

/etc/init.d/firewall restart

Now you should be able to point your web browser to the WAN address and login, password is blank.

Step back and admire work

Type exit to return to the Raspberry Pi prompt. By looking at some lxc output, we can see the virtual network up and running.

$ lxc ls
+---------+---------+------------------------+-----------------------------------------------+------------+-----------+
|  NAME   |  STATE  |          IPV4          |                     IPV6                      |    TYPE    | SNAPSHOTS |
+---------+---------+------------------------+-----------------------------------------------+------------+-----------+
| docker1 | RUNNING | 192.168.215.220 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe58:1ac9 (eth0)  | PERSISTENT | 0         |
|         |         | 172.17.0.1 (docker0)   | fd4b:7e4:111:0:216:3eff:fe58:1ac9 (eth0)      |            |           |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fe58:1ac9 (eth0)  |            |           |
+---------+---------+------------------------+-----------------------------------------------+------------+-----------+
| router  | RUNNING | 192.168.215.198 (eth1) | fd6a:c19d:b07:2084::1 (br-lan)                | PERSISTENT | 1         |
|         |         | 192.168.181.1 (br-lan) | fd6a:c19d:b07:2080::8d1 (eth1)                |            |           |
|         |         |                        | fd6a:c19d:b07:2080:216:3eff:fe72:44b6 (eth1)  |            |           |
|         |         |                        | fd4b:7e4:111::1 (br-lan)                      |            |           |
|         |         |                        | fd4b:7e4:111:0:216:3eff:fe72:44b6 (eth1)      |            |           |
|         |         |                        | 2001:db8:ebbd:2084::1 (br-lan)                |            |           |
|         |         |                        | 2001:db8:ebbd:2080::8d1 (eth1)                |            |           |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fe72:44b6 (eth1)  |            |           |
+---------+---------+------------------------+-----------------------------------------------+------------+-----------+
| www     | RUNNING | 192.168.181.158 (eth0) | fd6a:c19d:b07:2084:216:3eff:fe01:e0a3 (eth0)  | PERSISTENT | 0         |
|         |         |                        | fd4b:7e4:111:0:216:3eff:fe01:e0a3 (eth0)      |            |           |
|         |         |                        | fd42:dc68:dae9:28e9:216:3eff:fe01:e0a3 (eth0) |            |           |
|         |         |                        | 2001:db8:ebbd:2084:216:3eff:fe01:e0a3 (eth0)  |            |           |
+---------+---------+------------------------+-----------------------------------------------+------------+-----------+

The docker1 container is still running from Part 1, and still connected to the outside network br0. You can see this by the addressing assigned (both v4 and v6).

The router container (which is running OpenWrt) has both eth1 (aka WAN) and br-lan (aka LAN) interfaces. The br-lan interface is connected to the inside lxdbr0 virtual network. And OpenWrt routes between the two networks.

Limitations of Virtual OpenWrt

There are some limitations of the virtual OpenWrt. Please see the github project for the most current list. Most notably, ssh although it works, needs improving.

Address Stability

Because all of this is running on LXC, there is address stability. Not matter how many times you reboot the Raspberry Pi, or restart containers in different order, the addresses remain the same. This means the addresses above can be entered into your DNS server with out churn. Something Docker doesn’t provide.

Running a Virtual Network

LXC is the best at container customization, and virtual networking (IPv4 and IPv6). With LXCs flexibility, it is easy to create templates to scale up multiple applications (e.g. a webserver farm running in the palm of your hand). OpenWrt is one of the best Open source router projects, and now it can be run virtually as well. Now you have a server farm in the palm of your hand, with excellent IPv6 support and a firewall! Perhaps the Docker folks will take note.

Article originally appeared (with more detail) on www.makikiweb.com

Linux Containers with IPv6 GUAs on the Pi

Traffic
Server Farm in the Palm of your hand

I have recently been exploring Docker containers on SBCs (Small Board Computers), including the Raspberry Pi. The Docker eco-system is impressive in the amount of preconfigured containers that are available. However, as I have written before, it falls down on networking support, specifically the bolted-on-after-thought IPv6. The best one can do is NAT66 on IPv6, which just perpetuates the complexities (and evils) of NAT.

The biggest problem with the Docker IPv6 implementation is that it was an after thought. Unfortunately, this is not uncommon. Think of adding security after the fact, and you will quickly discover the poorly implemented security model. Docker is limited in this kind of after-thought thinking.

Linux Containers

Another container technology which can also run on SBCs is Linux Containers (LXC/LXD). LXC shares the host’s kernel and is lighter weight than traditional Virtual Machines. But each LXC Container is isolated via namespaces and control groups, so it appears to have its own network stack. And therefore is more flexible than Docker.

What is the difference LXC vs LXD

In this article I will treat LXC and LXD as LXC, but they are separate. LXC (or Linux Containers) existed first. Versions 1 & 2 created Virtual Machines. With version 3, LXD was added, which provides a daemon that allows easier image management, including publishing images and an API to control LXC on remote machines. LXD complements LXC by providing more features.

Qualifying the SBC OS for LXC/LXD

Unfortunately, the raspian kernel from raspberrypi.org doesn’t support namespaces.

Fortunately, there is a Ubuntu 18.04 unofficial image available for the Pi which does. This image is compressed and must be decompressed before flashed to a SD Card. Fortunately in Linux you can do this on the fly:

$ xzcat ubuntu-18.04-preinstalled-server-armhf+raspi3.img.xz | sudo dd  of=/dev/sdZ bs=100M

Change sdZ to the device of your SD Card. Be careful with this command, if you give it the device of your boot drive, it will happily overwrite your boot device.

If you are using Window or a Mac, I suggest using Etcher which makes creating bootable SD Cards easy.

Make sure you follow the steps on the Ubuntu page to set an initial password for the ubuntu user. Best Practices is to setup a non-privileged user which you will use most of the time. This can be done with the adduser command. Below I have created a user craig with sudo privileges.

$ sudo adduser --ingroup sudo craig

Preparing the LXC Host (aka the Pi)

The key networking difference between Docker and LXC is that with LXC one can attach a container to any bridge on the Host. This includes a bridge on the outside interface. Via transparent bridging the container can have unfettered access to the existing IPv6 subnet, including picking up Global Unique Addresses (GUAs) without the host having to do router-like functions, such as adding routes, auto propagation of prefixes (with DHCPv6-PD), redistribution of routes, etc. Again, things which Docker doesn’t support.

Setting up an external bridge interface on the Host

Once you have the right kernel and distro, configure a bridge br0 which will in-turn have the ethernet interface as a member. This is best done from the Pi itself using a keyboard and monitor, rather than ssh-ing to a headless device. Because when you mess up, you are still connected to the Pi (believe me, it is easy to get disconnected with all interfaces down). Logically the bridge, br0 will not only be attached to the eth0 interface, but later on, the LXC Containers as well.

External Bridge

Set up the bridge

1) Install brctl the utility which controls/creates linux bridges. And install the ifupdown package which will be used later.

sudo apt-get install bridge-utils ifupdown
  1. Edit the /etc/network/interfaces file to automatically set up the bridge br0 and attach the ethernet device. Add the following lines:
iface br0 inet dhcp
    bridge_ports eth0
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0
iface br0 inet6 dhcp

Because Ubuntu uses systemd we must let systemd know about the bridge, or the IPv6 default route will disappear after about 5 minutes (not good).

3) Create/Edit /etc/systemd/network/br0.network file, and add the following:

[Match]
Name=br0

[Network]
DHCP=yes

Lastly, in order to make this all work when the Pi is rebooted, we have to hack at /etc/rc.local a bit to make sure the bridge is brought up and systemd is minding it at boot up time.

4) Create/Edit /etc/rc.local and add the following, and don’t forget to make it executable.

#!/bin/bash
#
## put hacks here

# fix for br0 interface
/sbin/ifup br0
# kick networkd as well
/bin/systemctl restart systemd-networkd
echo "Bridge is up"
exit 0

Make it executable:

$ sudo chmod 754 /etc/rc.local

Finally, reboot, login and see that the Pi br0 network is up

$ ip addr
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
    link/ether b8:27:eb:6c:02:88 brd ff:ff:ff:ff:ff:ff
3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b8:27:eb:6c:02:88 brd ff:ff:ff:ff:ff:ff
    inet 192.168.215.141/24 brd 192.168.215.255 scope global dynamic br0
       valid_lft 1995525700sec preferred_lft 1995525700sec
    inet6 2001:db8:ebbd:2080::9c5/128 scope global noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 2001:db8:ebbd:2080:ba27:ebff:fe6c:288/64 scope global mngtmpaddr noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::ba27:ebff:fe6c:288/64 scope link 
       valid_lft forever preferred_lft forever

As you can see, br0 has all the IPv4 and IPv6 addresses which is what we want. Now you can go back to headless access (via ssh) if you are like me, and the Pi is usually just sitting on a shelf (with power and network).

Installing LXC/LXD

Once setting up the br0 interface is done, we can install lxd and lxd-client. Linux Containers has been evolving of the years, and it is now (as I write this) up to version 3.0.2.

A note about versions

There is quite a bit on the internet about older versions of Linux Containers. If you see hyphenated commands like lxc-launch then stop and move to another page. Hyphenated commands are the older version 1 or 2 of Linux Containers.

A quick tour of LXC/LXD

Canonical has a nice Try It page, where you can run LXC/LXD in the comfort of your web browser without installing anything on your local machine. The Try It sets up a VM which has IPv6 access to the outside world, where you can install and configure LXC/LXD, even create Linux Containers. It is well worth the 10 minutes to run through the hands on tutorial.

Doing the install

But wait! It is already installed on this image. Although it is version 3.0.0, and the easiest way to get it to the latest version is to run:

$ sudo apt-get update
$ sudo apt-get upgrade lxd lxd-client

Add yourself to the lxd group so you won’t have to type sudo all the time.

sudo usermod -aG lxd craig
newgrp lxd

LXD Init

The LXD init script sets up LXD on the machine with a set of interactive questions. It is safe to accept all the defaults (just press return):

$ sudo lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: 
Name of the storage backend to use (btrfs, dir, lvm) [default=btrfs]: 
Create a new BTRFS pool? (yes/no) [default=yes]: 
Would you like to use an existing block device? (yes/no) [default=no]: 
Size in GB of the new loop device (1GB minimum) [default=15GB]: 
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, "auto" or "none") [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, "auto" or "none") [default=auto]: 
Would you like LXD to be available over the network? (yes/no) [default=no]: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] no
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: no

On the Pi, LXD will take a while to think about all this, just be patient (might be 10 minutes or so).

Default LXD Networking

Since we took all the defaults of lxd init it created another bridge on the system lxdbr0 which the YAML file would lead you to believe it is also bridged to the outside world, but it is not. The default config is similar to Docker, in that it creates a lxdbr0 bridge which uses NAT44 and NAT66 to connect to the outside world.

But we don’t care, because we have created a bridge br0 which is transparently bridged to the outside world. And unlike Docker, individual LXC containers can be attached to any bridge (either br0 or if you want NAT, lxdbr0)

Create a profile for the external transparent bridge (br0)

There is one more thing we have to do before running the first Linux Container, create a profile for the br0 bridge. Edit the profile to match the info below:

lxc profile create extbridge
lxc profile edit extbridge
    config: {}
    description: bridged networking LXD profile
    devices:
      eth0:
        name: eth0
        nictype: bridged
        parent: br0
        type: nic
    name: extbridge
    used_by:

The Linux Container network is now ready to attach containers to the br0 bridge like this:

container network

You may notice the bottom LXC container with Docker, more on this later.

Running the first Linux Container

So now it is time to have fun by running the first container. I suggest Alpine Linux because it is small, and quick to load. To create and start the container type the following:

lxc launch -p default -p extbridge images:alpine/3.8 alpine

LXD will automatically download the Alpine Linux image from the Linux Containers image server, and create a container with the name alpine. We’ll use the name alpine to manage the container going forward.

Typing lxc ls will list the running containers

$ lxc ls
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
|  NAME   |  STATE  |          IPV4          |                     IPV6                     |    TYPE    | SNAPSHOTS |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
| alpine  | RUNNING | 192.168.215.104 (eth0) | fd6a:c19d:b07:2080:216:3eff:fecf:bef5 (eth0) | PERSISTENT | 0         |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fecf:bef5 (eth0) |            |           |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+

You will note that the container has not only a IPv4 address from my upstream DHCP server, but it also has an IPv6 GUA (and in this case, an additional IPv6 ULA, Unique Local Address).

YAML overlaying

The alpine container has a GUA because we used two -p (profile) parameters when creating it. The first is the default profile which as I mentioned earlier is set up for NAT4 and NAT6. And the second is the extbridge profile we setup as a profile. The lxc launch command pulls in the YAML info from the default profile, and then overlays the extbridge profile, effectively overwriting the parts we want so that the alpine container is attached to br0 and the outside world!

Stepping into Alpine

Of course, what good is starting a Linux Container if all you can do is start and stop it. A key difference from Docker is that Linux Containers are not read-only, but rather you can install software, configure it the way you like, and then stop the container. When you start it again, all the changes you made are still there. I’ll talk about the goodness of this a little later.

But in order to do that customization one needs to get inside the container. This is done with the following command:

$ lxc exec alpine -- /bin/sh
~ # 

And now you are inside the running container as root. Here you can do anything you can do on a normal linux machine, install software, add users, start sshd, so you can ssh to it later, and so on. When you are done customizing the container type:

~ # exit
craig@pai:~$ 

And you are back on the LXC Host.

Advantages of customizing a container

A key advantage of customizing a container, is that you can create a template image which then can be used to create many instances of that customized application. For example, I started with alpine installed nginx and php7 and created a template image, which I called web_image. I used the following commands on the host, after installing the webserver with PHP inside the container:

$ lxc snapshot alpine snapshot_web                   # Make a back up of the container
$ lxc publish alpine/snapshot_web --alias web_image  # publish the back up as an image
$ lxc image list                                     # show the list of images
+--------------+--------------+--------+--------------------------------------+--------+----------+-----------------------------+
|    ALIAS     | FINGERPRINT  | PUBLIC |             DESCRIPTION              |  ARCH  |   SIZE   |         UPLOAD DATE         |
+--------------+--------------+--------+--------------------------------------+--------+----------+-----------------------------+
| web_image    | 84a4b1f466ad | no     |                                      | armv7l | 12.86MB  | Dec 4, 2018 at 2:46am (UTC) |
+--------------+--------------+--------+--------------------------------------+--------+----------+-----------------------------+
|              | 49b522955166 | no     | Alpine 3.8 armhf (20181203_13:03)    | armv7l | 2.26MB   | Dec 3, 2018 at 5:11pm (UTC) |
+--------------+--------------+--------+--------------------------------------+--------+----------+-----------------------------+

Scaling up the template container

And with that webserver image, I can replicate it as many times as I have disk space and memory. I tried 10, but based on how much memory it was using, I  can get to twenty on the Pi, with room for more.

$ lxc ls
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
|  NAME  |  STATE  |          IPV4          |                     IPV6                     |    TYPE    | SNAPSHOTS |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| alpine | RUNNING | 192.168.215.104 (eth0) | fd6a:c19d:b07:2080:216:3eff:fecf:bef5 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fecf:bef5 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w10    | RUNNING | 192.168.215.225 (eth0) | fd6a:c19d:b07:2080:216:3eff:feb2:f03d (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:feb2:f03d (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w2     | RUNNING | 192.168.215.232 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe7f:b6a5 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe7f:b6a5 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w3     | RUNNING | 192.168.215.208 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe63:4544 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe63:4544 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w4     | RUNNING | 192.168.215.244 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe99:a784 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe99:a784 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w5     | RUNNING | 192.168.215.118 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe31:690e (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe31:690e (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w6     | RUNNING | 192.168.215.200 (eth0) | fd6a:c19d:b07:2080:216:3eff:fee2:8fc7 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fee2:8fc7 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w7     | RUNNING | 192.168.215.105 (eth0) | fd6a:c19d:b07:2080:216:3eff:feec:baf7 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:feec:baf7 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w8     | RUNNING | 192.168.215.196 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe90:10b2 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe90:10b2 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| w9     | RUNNING | 192.168.215.148 (eth0) | fd6a:c19d:b07:2080:216:3eff:fee3:e5b2 (eth0) | PERSISTENT | 0         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fee3:e5b2 (eth0) |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+
| web    | RUNNING | 192.168.215.110 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe29:7f8 (eth0)  | PERSISTENT | 1         |
|        |         |                        | 2001:db8:ebbd:2080:216:3eff:fe29:7f8 (eth0)  |            |           |
+--------+---------+------------------------+----------------------------------------------+------------+-----------+

All of the webservers have their own unique IPv6 address, and all of them are running on port 80, something that can’t be done using NAT.

LXC plays well with DNS

Unlike Docker, LXC containers retain the same IPv6 address after being start and stopped. And if you are starting multiple containers, the order of starting doesn’t change the address (as Docker does).

This means that you can assign names to your LXC Containers without a lot of DNS churn. Here’s a chunk from my DNS zone file:

lxcdebian   IN  AAAA    2001:db8:ebbd:2080:216:3eff:feae:a30
lxcalpine   IN  AAAA    2001:db8:ebbd:2080:216:3eff:fe4c:4ab2
lxcweb      IN  AAAA    2001:db8:ebbd:2080:216:3eff:fe29:7f8
lxcw2       IN  AAAA    2001:db8:ebbd:2080:216:3eff:fe7f:b6a5
lxcdocker1  IN  AAAA    2001:db8:ebbd:2080:216:3eff:fe58:1ac9

DNS is your friend when using IPv6. With DNS entries, I can point my web browser to the servers running on these containers. I can even ssh in to the container, just like any host on my network.

$ ssh -X craig@lxcdebian
craig@lxcdebian's password: **********
craig@debian:~$

Key differences between LXC and Docker

Here’s a chart to show the key differences between Docker and Linux Containers.

Docker LXC/LXD
Always Routed, and with NAT (default) Any Container can be routed or bridged
IP addressing depends on container start order Addressing is stable regardless of start order, and plays well with DNS
A read-only-like Container A read-write container, make it easy to customize, and templatize
No check on Architectures (x86, ARM) LXC automatically selects the correct architecture
Most Containers are IPv4-only Containers support IPv4 & IPv6
Containers see Docker NAT address in logs Containers log real source addresses
Many, many containers to choose from By comparison, there are only a handful of pre-built containers

A key advantage to Docker is the last one, The sheer number of Docker containers are amazing. But what if you could have the best of both worlds?

Linux Containers + Docker

Traffic
No limitations

Since LXC containers are customizable, and also since it is easy to make a template image and replicate containers based on that template, why not install Docker inside a LXC Container, and have the best of both worlds?

Actually it is quite easy to do. Start with an image that is a bit more flush than Alpine Linux, like Debian. I used Debian 10 (the next version of Debian). Create the container with:

lxc launch -p default -p extbridge images:debian/10 debian

Enable nesting feature which allows LXC to run containers inside of containers

lxc config set debian security.nesting true
lxc restart debian      # restart with nesting enabled

Step into the container and customize it by installing Docker

$ lxc exec debian -- /bin/bash

# make a content directory for Docker/nginx
mkdir -p /root/nginx/www
# create some content
echo "<h2> Testing </h2>" > /root/nginx/www/index.html

# install docker and start it
apt-get install docker.io
/etc/init.d/docker start

# pull down nginx Docker Container for armhf
docker create --name=nginx -v /root/nginx:/usr/share/nginx/html:ro -p 80:80 -p 443:443 armhfbuild/nginx
# start the docker container
docker restart nginx
exit
$ 

And now Docker nginx is up and running inside Docker, inside a LXC Container with dual-stack predictable addressing and transparent bridging.

Make a template of the LXC + Docker Container

Follow the earlier procedure to create an image which will be used to launch customized LXC + Docker containers:

lxc snapshot debian docker_base_image
lxc publish debian/docker_base_image --alias docker_image       # publish image to local:
lxc image list                                                  # see the list of images
# start a lxc/docker container called docker1
lxc launch -p default -p extbridge local:docker_image docker1
# set config to allow nesting for docker1
lxc config set docker1 security.nesting true
lxc restart docker1

Looking at the running LXC containers, it easy to spot the one running Docker (hint: look for the Docker NAT address).

 lxc ls
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
|  NAME   |  STATE  |          IPV4          |                     IPV6                     |    TYPE    | SNAPSHOTS |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
| alpine  | RUNNING | 192.168.215.104 (eth0) | fd6a:c19d:b07:2080:216:3eff:fecf:bef5 (eth0) | PERSISTENT | 0         |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fecf:bef5 (eth0) |            |           |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
| docker1 | RUNNING | 192.168.215.220 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe58:1ac9 (eth0) | PERSISTENT | 0         |
|         |         | 172.17.0.1 (docker0)   | 2001:db8:ebbd:2080:216:3eff:fe58:1ac9 (eth0) |            |           |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
| w2      | RUNNING | 192.168.215.232 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe7f:b6a5 (eth0) | PERSISTENT | 0         |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fe7f:b6a5 (eth0) |            |           |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+
| web     | RUNNING | 192.168.215.110 (eth0) | fd6a:c19d:b07:2080:216:3eff:fe29:7f8 (eth0)  | PERSISTENT | 1         |
|         |         |                        | 2001:db8:ebbd:2080:216:3eff:fe29:7f8 (eth0)  |            |           |
+---------+---------+------------------------+----------------------------------------------+------------+-----------+

After getting it started, it is easy to step into the LXC container docker1 and query Docker on its container:

lxc exec docker1 -- /bin/bash
root@docker1:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                                      NAMES
19f3cbeba6d3        linuxserver/nginx   "/init"             3 hours ago         Up 3 hours          0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   nginx
root@docker1:~#

Running multiple LXC + Docker containers

Now that there is a template image, docker_image it is a breeze to spawn multiple LXC + Docker Containers. Don’t want them all to run nginx webservers? Easy, step into each, delete the nginx webserver and run of of the other thousands of Docker Containers.

Best of both Worlds

LXC is the best at container customization, and networking (IPv4 and IPv6). Docker is the best in the sheer volume of pre-built Docker containers (assuming you select the correct architecture, armhf for the Pi). With LXCs flexibility, it is easy to create templates to scale up multiple applications (e.g. a webserver farm running in the palm of your hand). And with LXC, it is possible to over come many of Dockers limitations opening up the world of Docker Containers to the LXC world. The best of both worlds.


originally posted with more detail on www.makikiweb.com

What makes a good IPv6 implementation?


Bad IPv6 support costs Money

I have been working with Docker lately, and as cool as the container technology is, it was originally built without consideration for IPv6, and then IPv6 was bolted on later. Making supporting IPv6 full of expensive work-a-rounds.

But that got me thinking what makes a good IPv6 implementation? Of course this is my opinion, and you are free to toss in other criteria, so think of this as a thought starter.

Why is this important?

With 25% of the internet carried over IPv6 as of this writing, if you are developing a product which has a lifetime of 5 to 10 years, and you aren’t giving thought as to how you will support IPv6, then your product will:

  • A) fail, or
  • B) you will try to bolt on IPv6 on the side, or
  • C) have to be completely rewritten.

All of that costs money.

A good IPv6 device implementation

There are broad areas where IPv6 should work well.

Addressing

As much as I like the simplicity of SLAAC (Stateless Address Auto Config), there are certainly use cases where DHCPv6 is a better choice. A good implementation should:

  • Support both addressing methods, SLAAC, and DHCPv6
  • Be able to reestablish IPv6 GUA (Global Unique Address) once the device comes out of sleep/suspend or link down/up (systemd suffers this problem)
  • Play well with DNS. Very few of us enjoy typing IPv6 addresses, the implementation should have a stable IPv6 address which can be entered into DNS without requiring a lot of DNS churn.

Routing

IPv6 is not IPv4 with colons. There are somethings which are different for good reason.

  • Default routes are link-local addresses (Docker fails on this one big time). GUAs may change, link-locals shouldn’t.
  • Supports RA (Router Advertisement) fields, RDNSS (DNS server), and DNSSL (DNS domain search list). Not much use having an address if the host can’t resolve names
  • If the device is routing (such as Docker) then support DHCPv6-PD, and provide the option of prefix delegation into the container/downstream network.

Resiliency

Basic protection from network misconfiguration, or out right attacks makes the IPv6 device better prepared for production use.

  • Rational limit on the number of IPv6 addresses an interface may have. Before systemd, the Linux kernel defaulted to 16. This seemed like a good compromise. Back in systemd v232, it was possible to exhaust memory on an IPv6 host by feeding it Random RA addresses, creating a denial of service. FreeBSD v11.5 has a similar problem, where the system will add over 3000 IPv6 addresses, and the system will slow to a crawl.
  • Rational limit on the number of neighbours. IPv6 /64 networks are sparsely populated and therefore one shouldn’t have to expect to support all 16 Quintilian (2^64) neighbours. Something like 1000, or even 256 should be enough.
  • Don’t assume that the Linux Stack has your back. Since systemd has become widespread, there are many IPv6 systemd bugs, which weren’t there in the pre-systemd kernel days. IPv6 is a different stack, be sure to test it.

Summary

I am sure I missing a few, but this is a start. When developing a product, the business case for supporting IPv6 well, is that it will save you money in the long run, by not having to go back and try to bolt IPv6 on, or rewrite your network stack later.

P.S I wouldn’t recommend putting Docker into production because of the severe IPv6 limitations. I’ll be looking at LXC next.

Image: Yachts colliding: Creative Commons/Mark Pilbeam

 

Hi Neighbour!

Traffic

Mr. Rogers

Neighbour Discovery Protocol (NDP) is more than just an IPv6 version of ARP (Address Resolution Protocol). It really is Neighbour Discovery, handling Layer 2 MAC address resolution, but also router discovery, and even better path handling (Redirection).

ARP is a funny protocol, as it isn’t part of the IPv4 suite, but IPv4 will not work without it. The creators of IPv6, were looking for an inclusive method (e.g. part of the IPv6 protocol) to accomplish MAC Address resolution. They chose to use ICMPv6, which is a Layer 4 protocol.

But how does one use a Layer 4 (L4) protocol to resolve Layer 2 (L2) information (the MAC address) when both L2 and L3 information is needed before the packet requesting Address Resolution can be sent?

ND Addressing

To answer this question, some special L3 addresses had to be created. The first is the Link-Local address, a non-routable address limited to the scope of the link (think broadcast domain), and always start with FE80.

The second involves multicast. Since it was decided that broadcast wouldn’t be used in IPv6 (since every end-station must listen to broadcast), a few special multicast addresses would be created.

  • FF02::1 All Nodes Address (all nodes must listen to this address on the link)
  • FF02::2 All Routers Address (all routers must listen to this address)
  • FF02::1:FF:XX:YYZZ Solicited Node Address where the last 3 bytes are the last 3 bytes of the L3 destination address (each node must listen to their own Solicited Node Multicast Address)

An example of the Solicited Node Address where the L3 address is: 2001:db8:ebbd:0:5066:64f6:547e:4872 would be ff02::1:ff7e:4872

ND Functions

Neighbour Discovery (ND) does more than just MAC Address Resolution. It also performs:

  • Duplicate Address Detection – DAD
  • Redirection (to a router with a better path)

In the early days of IPv4, duplicate addresses on the network was easy to do, since there was no automatic way to get an address (pre-DHCP), and addresses were manually entered. Creating a list of used addresses was quite common, some even used MS Excel. After DHCP had been created, duplicate addresses seemed like a thing of the past. But then VMs (Virtual Machines) came along, and cloning VMs became common. A clone which included the MAC address, would get the same IP address from the DHCP server as the original (one reason why DHCPv6 does not use a simple MAC address as an identifier).

Duplicate Address Detection

But back in the early days of creating IPv6, there was no DHCP, and the creators wanted to improved upon the situation by creating Duplicate Address Detection (DAD).

Because IPv6 nodes can create a globally unique address (GUA) without contacting any servers, using SLAAC (StateLess Auto Address Config), DAD becomes important in preventing duplicate addressing. When a node forms its GUA, it sends out a Neighbour Solicitation (NS) message to its own Solicited Node Address, similar to a gratuitous ARP in IPv4.

But the key difference from ARP is that if there isn’t a duplicate address on the network, no other host will hear the NS message. Why? Because no other nodes are listening to that specific multicast address, where as ARP requires ALL nodes to listen.

Redirection

Part of knowing the neighbourhood, is knowing what routers exist on the link. Communication with routers will use another special mutlicast address FF02::2.

As a host becomes active on the network, it will send Router Solicitations (RS) to the All Routers address. The routers on the link will then respond with Router Advertisements (RA), which include important bits of information, such as default gateway, prefixes defined on the link, even DNS servers.

On a network (shown below) where there is more than one router on the link, the default gateway router (e.g. the Production Router) may send a Redirect to host (DNS Server), stating another router (Test Network Router) on the link is a better path. The Redirect will also include the link-local address of the better path router.

Network Diagram

Unlike IPv4, next hop addresses, and default gateways are usually link-local addresses, rather than GUAs. Since link-local addresses are limited to the link, they are a good choice for next hop addresses. IPv6 Tip resist the temptation to make all router interface link-local addresses fd80::1.

  • This will break Redirection, which will lower the resiliency of your network (how to you say use this other address that is the same as my address?)
  • Should two networks get crossed (someone plugs in an ethernet cable to the wrong port), you will spend a long time sniffing fe80::1packets, wondering which router they are coming from.

Link-layer or MAC Address Resolution

Oh, and ND also does MAC Address resolution (like ARP). When Host A has the IP address, but not the MAC address of Host B, it will send a NS message to the Multicast Solicited Node address of Host B, using the last 3 bytes of the IPv6 address as the last 3 bytes of the Solicited Node Address.

Since on the average network, only one host will be listening to the Solicited Node Multicast Address, the Host B will respond with a NA which includes the destination MAC address ***.

After receiving the destination MAC address from Host BHost A can send an IPv6 packet to Host B.

ICMPv6 and ND

ND uses the following ICMPv6 message types to discover what is happening in the neighbourhood:

  • NS (type 135)
  • NA (type 136)
  • RS (type 133)
  • RA (type 134)
  • Redirect (type 137)

Because ND uses ICMPv6 extensively, the quickest way to cut yourself off of the internet is to block all ICMPv6 traffic on your firewall. Although this is common in IPv4, resist this temptation. ICMPv6 messages are required to remain connected in IPv6.

Free IPv6 Book

If you want to learn how IPv6 really works, check out this free eBook (PDF). It is an excellent Cisco Press book published in 2013. It may not have the latest RFC references, like RFC 8200 (the IPv6 Standard RFC), but it is still a good read.

IPv6 Fundementals

Download IPv6 Fundementals

The second edition is even better (but not free)

** Fred Rogers Creative Commons

*** Of course, it is possible that the last 3 bytes of the IPv6 address will match that of another host on the network (a 1/16 million chance), but multicast will prevent the rest of the hosts on the network from hearing the idle chatter, unlike ARP.