"In het verleden behaalde resultaten bieden geen garanties voor de toekomst"
About this blog

These are the ramblings of Matthijs Kooijman, concerning the software he hacks on, hobbies he has and occasionally his personal life.

Most content on this site is licensed under the WTFPL, version 2 (details).

Sun Mon Tue Wed Thu Fri Sat
24 25 26
27 28 29 30 31    
Powered by Blosxom &Perl onion
(With plugins: config, extensionless, hide, tagging, Markdown, macros, breadcrumbs, calendar, directorybrowse, entries_index, feedback, flavourdir, include, interpolate_fancy, listplugins, menu, pagetype, preview, seemore, storynum, storytitle, writeback_recent, moreentries)
Valid XHTML 1.0 Strict & CSS
Bouncing packets: Kernel bridge bug or corner case?


While setting up Tika, I stumbled upon a fairly unlikely corner case in the Linux kernel networking code, that prevented some of my packets from being delivered at the right place. After quite some digging through debug logs and kernel source code, I found the cause of this problem in the way the bridge module handles netfilter and iptables.

Just in case someone else actually finds himself in this situation and actually manages to find this blogpost, I'll detail my setup, the problem and it solution here.

Tika's network setup

Tika runs Debian wheezy, with a single network interface to the internet (which is not involved in this problem). Furthermore, Tika runs a number of lxc containers, which are isolated systems sharing the same kernel, but running a complete userspace of their own. Using kernel namespaces and cgroups, these containers obtain a fair degree of separation: Each of them has its own root filesystem, a private set of mounted filesystem, separate user ids, separated network stacks, etc.

Each of these containers then connects to the outside world using a virtual ethernet device. This is sort of a named pipe, but then for ethernet. Each veth device has two ends, one inside the container, and one outside, which are connected. On the inside, it just looks like each container has a single ethernet device, which is configured normally. On the outside, all of these veth interfaces are grouped together into a bridge device, br-lxc, which allows the containers to talk amongst themselves (just as if they were connected to the same ethernet switch). The bridge device in the host is configured with an IP address as well, to allow communciation between the host and containers.

Now, I have a few port forwarding rules: when traffic comes in on my public IP address on specific ports, it gets forwarded to a specific container. There is nothing special about this, this is just like forwarding ports to LAN hosts on a NAT router.

A problem with port forwarding like this is that by default, packets coming in from the internal side cannot be properly handled. As an example, one of the containers is running a webserver, which serves a custom Debian repository on the domain. When another container tries to connect to that, DNS resolution will give it the external IP of tika, but connecting to that IP fails.

Usually, the DNAT rule used for portforwarding is configured to only process packets from the external network. But even if it would process internal packets, it would not work. The DNAT rule changes the destination address of these packets to point to my web container so they get sent to the web container. However, the source address is unchanged. Since the containers have a direct connection (through the network bridge) reply packets get sent directly through the original container - the host does not have a chance to "undo" the DNAT on the reply packets. For external connections, this is not a problem because the host is the default gateway for the containers and the replies need to through the host to reach the external ip.

The most common solution to this is split-horizon DNS - make sure that all these domains resolve to the internal address of the web container, so no port forwarding is needed. For various practical reasons, this didn't work for me, so I settled for the other solution: Apply SNAT in addition to DNAT, which causes the source address of the forwarded packets to be changed to the host's address, forcing replies to pass through the host. The Vuurmuur firewall I was using even had a special "bounce" rule for exactly this purpose (setting up a DNAT and SNAT iptables rule).

This setup worked perfectly - when connecting to the web container from other containers. However, when the web container tried to connect to itself (through the public IP address), the packets got lost. I initially thought the packets were droppped - they went through the PREROUTING chain as normal, but never showed up in the FORWARD chain. I also thought the problem was caused by the packet having the same source and destination addresses, since packets coming from other containers worked as normal. Neither of these turned out to be true, as I'll show below.

Simplifying the setup

Since reproducing the problem on a different and/or simpler setup is always a good approach in debugging, I tried to reproduce the problem on my laptop, using a (single) reguler ethernet device and applying DNAT and SNAT rules. This worked as expected, but when I added a bridge interface, containing just the ethernet interface, it broke again. Adding a second (vlan) interface to the bridge uncovered that the problem was not traffic DNATed back to its source, but rather traffic DNATed back to the same bridge port it originated from - traffic from one bridge port DNATed to the other worked normally.

Digging down into the kernel sources for the bridge module, I uncovered this piece of code, which applies some special handling for exactly DNATed packages on a bridge. It seems this is either a performance optimization, or a way to allow DNATing packets inside a bridge without having to enable full routing, though I find the exact effects of this code rather confusing.

I also found that setting the bridge device to promiscuous mode (e.g. running tcpdump) makes everything work. Setting /proc/sys/net/bridge/bridge-nf-call-iptables to 0 also makes this work. This setting is to prevent bridged packets from passing through iptables, but since this packet wasn't actually a bridged packet before PREROUTING, this actually makes the packet be processed using the normal routing code and progresses through all regular chains normally.

Here's what I think happens:

  • The packet comes in br_handle_frame
  • The frame gets dumped into the NF_BR_PRE_ROUTING netfilter chain (e.g. the bridge / ebtables version, not the ip / iptables one).
  • The ebtables rules get called
  • The br_nf_pre_routing hook for NF_BR_PRE_ROUTING gets called. This interrupts (returns NF_STOLEN) the handling of the NF_BR_PRE_ROUTING chain, and calls the NF_INET_PRE_ROUTING chain.
  • The br_nf_pre_routing_finish finish handler gets called after completing the NF_INET_PRE_ROUTING chain.
  • This handler resumes the handling of the interrupted NF_BR_PRE_ROUTING chain. However, because it detects that DNAT has happened, it sets the finish handler to br_nf_pre_routing_finish_bridge instead of the regular br_handle_frame_finish finish handler.
  • br_nf_pre_routing_finish_bridge runs, this skb->dev to the parent bridge and sets the BRNF_BRIDGED_DNAT flag which calls neigh->output(neigh, skb); which presumably resolves to one of the neigh_*output functions, each of which again calls dev_queue_xmit, which should (eventually) call br_dev_xmit.
  • br_dev_xmit sees the BRNF_BRIDGED_DNAT flag and calls br_nf_pre_routing_finish_bridge_slow instead of actually delivering the packet.
  • br_nf_pre_routing_finish_bridge_slow sets up the destination MAC address, sets skb->dev back to skb->physindev and calls br_handle_frame_finish.
  • br_handle_frame_finish calls br_forward. If the bridge device is set to promisicuous mode, this also delivers the packet up through br_pass_frame_up. Since enabling promiscuous mode fixes my problem, it seems likely that the packet manages to get all the way to here.
  • br_forward calls should_deliver, which returns false when skb->dev != p->dev (and "hairpin mode" is not enabled) causing the packet to be dropped.

This seems like a bug, or at least an unfortunate side effect. It seems there's currently two ways two work around this problem:

  • Setting /proc/sys/net/bridge/bridge-nf-call-iptables to 0, so there is no need for this DNAT + bridge stuff. The side effect of this solution is that bridge packets don't go through iptables, but that's really what I'd have expected in the first place, so this is not a problem for me.
  • Setting the bridge port to "hairpin" mode, which allows sending ports back into it. The side effect here is, AFAICS, that broadcast packets are sent back into the bridge port as well, which isn't really needed (but shouldn't really hurt either).

Next up is reporting this to a kernel mailing list to confirm if there is an actual kernel bug, or just a bug in my expectations :-)

Update: Turns out this behaviour was previously spotted, but no concensus about a fix was reached.

Related stories

0 comments -:- permalink -:- 18:40
CrashPlan: Cheap cloud backup that runs on Linux

For some time, I've been looking for a decent backup solution. Such a solution should:

  • be completely unattended,
  • do off-site backups (and possibly onsite as well)
  • be affordable (say, €5 per month max)
  • run on Linux (both desktops and headless servers)
  • offer plenty of space (couple of hundred gigabytes)

Up until now I haven't found anything that met my demands. Most backup solutions don't run on (headless Linux) and most generic cloud storage providers are way too expensive (because they offer high-availability, high-performance storage, which I don't really need).

Backblaze seemed interesting when they launched a few years ago. They just took enormous piles of COTS hard disks and crammed a couple dozen of them in a custom designed case, to get a lot of cheap storage. They offered an unlimited backup plan, for only a few euros per month. Ideal, but it only works with their own backup client (no normal FTP/DAV/whatever supported), which (still) does not run on Linux.


Crashplan logo

Recently, I had another look around and found CrashPlan, which offers an unlimited backup plan for only $5 per month (note that they advertise with $3 per month, but that is only when you pay in advance for four years of subscription, which is a bit much. Given that if you cancel beforehand, you will still get a refund of any remaining months, paying up front might still be a good idea, though). They also offer a family pack, which allows you to run CrashPlan on up to 10 computers for just over twice the price of a single license. I'll probably get one of these, to backup my laptop, Brenda's laptop and my colocated server.

The best part is that the CrashPlan software runs on Linux, and even on a headless Linux server (which is not officially supported, but CrashPlan does document the setup needed). The headless setup is possible because CrashPlan runs a daemon (as root) that takes care of all the actual work, while the GUI connects to the daemon through a TCP port. I still need to double-check what this means for the security though (especially on a multi-user system, I don't want to every user with localhost TCP access to be able to administer my backups), but it seems that CrashPlan can be configured to require the account password when the GUI connects to the daemon.

The CrashPlan software itself is free and allows you to do local backups and backups to other computers running CrashPlan (either running under your own account, or computers of friends running on separate accounts). Another cool feature is that it keeps multiple snapshots of each file in the backup, so you can even get back a previous version of a file you messed up. This part is entirely configurable, but by default it keeps up to one snapshot every 15 minutes for recent changes, and reduces that to one snapshot for every month for snapshots over a year old.

When you pay for a subscription, the software transforms into CrashPlan+ (no reinstall required) and you get extra features such as multiple backup sets, automatic software upgrades and most notably, access to the CrashPlan Central cloud storage.

I've been running the CrashPlan software for a few days now (it comes with a 30-day free trial of the unlimited subscription) and so far, I'm quite content with it. It's been backing up my homedir to a local USB disk and into the cloud automatically, I don't need to check up on it every time.

The CrashPlan runs on Java, which I doesn't usually make me particularly enthousiastic. However, the software seems to run fast and reliable so far, so I'm not complaining. Regarding the software itself, it does seem to me that it's not intended for micromanaging. For example, when my external USB disk is not mounted, the interface shows "Destination unavailable". When I then power on and mount the external disk, it takes some time for Crashplan to find out about this and in the meanwhile, there's no button in the interface to convince CrashPlan to recheck the disk. Also, I can add a list of filenames/path patterns to ignore, but there's not really any way to test these regexes.

Having said that, the software seems to do its job nicely if you just let it do its job in the background. On piece of micromanagement which I do like is that you can manually pause and resume the backups. If you pause the backups, they'll be automatically resumed after 24 hours, which is useful if the backups are somehow bothering you, without the risk that you forget to turn the backups back on.

Backing up only when docked

Of course, sending away backups is nice when I am at home and have 50Mbit fiber available, but when I'm on the road, running on some wifi or even 3G connection, I really don't want to load my connection with the sending of backup data.

Of course I can manually pause the backups, but I don't want to be doing that every time when I pick up my laptop and get moving. Since I'm using a docking station, it makes sense to simply pause backups whenever I undock and resume them when I dock again.

The obvious way to implement this would be to simply stop the CrashPlan daemon when undocking, but when I do that, the CrashPlanDesktop GUI becomes unresponsive (and does not recover when the daemon is started again).

So, I had a look at the "admin console", which offers "command line" commands, such as pause and resume. However, this command line seems to be available only inside the GUI, which is a bit hard to script (also note that not all of the commands seem to work for me, sleep and help seem to be unknown commands, which cause the console to close without an error message, just like when I just type something random).

It seems that these console commands are really just sent verbatim to the CrashPlan daemon. Googling around a bit more, I found a small script for CrashPlan PRO (the business version of their software), which allows sending commands to the daemon through a shell script. I made some modifications to this script to make it useful for me:

  • don't depend on the current working dir, hardcode /usr/local/crashplan in the script instead
  • fixed a bashism (== vs =)
  • removed -XstartOnFirstThread argument from java (MacOS only?)
  • don't store the commands to send in a separate $command but instead pass "$@" to java directly. This latter prevents bash from splitting arguments with spaces in them into multiple arguments, which causes the command "pause 9999" to be interpreted as two commands instead of one with an argument.

I have this script under /usr/local/bin/CrashPlanCommand:


if [ "x$@" == "x" ] ; then
  echo "Usage: $0 <command> [<command>...]"

echo "Connecting to $hostPort"

echo "Executing $@"

for f in `ls $BASE_DIR/lib/*.jar`; do

java -classpath $CP com.backup42.service.ui.client.ConsoleApp $hostPort "$@"

Now I can run CrashPlanCommand 'pause 9999' and CrashPlanCommand resume to pause and resume the backups (9999 is the number of minutes to pause, which is about a week, since I might be undocked more than 24 hourse, which is the default pause time).

To make this run automatically on undock, I created a simply udev rules file as /etc/udev/rules.d/10-local-crashplan-dock.rules:

ACTION=="change", ATTR{docked}=="0", ATTR{type}=="dock_station", RUN+="/usr/local/bin/CrashPlanCommand 'pause 9999'"
ACTION=="change", ATTR{docked}=="1", ATTR{type}=="dock_station", RUN+="/usr/local/bin/CrashPlanCommand resume"

And voilà! Automatica pausing and resuming on undocking/docking of my laptop!

1 comment -:- permalink -:- 17:05
Getting Screen and X (and dbus and ssh-agent and ...) to play well

When you use Screen together with Xorg, you'll recognize this: You log in to an X session, start screen and use the terminals within screen to start programs every now and then. Everything works fine so far. Then, you logout and log in again (or X crashes, or whatever). You happily re-attach the still running screen, which allows you to continue whatever you were doing.

But now, whenever you want to start a GUI program, things get wonky. You'll get errors about not being able to find configuration data, connect to gconf or DBUS, or your programs will not start at all, with the ever-informative error message "No protocol specified". You'll also recognize your ssh-agent and gpg-agent to stop working within the screen session...

What is happening here, is that all those programs are using "environment variables" to communicate. In particular, when you log in, various daemons get started (like the DBUS daemon and your ssh-agent). To allow other programs to connect to these daemons, they put their contact info in an environment variable in the login process. Whenever a process starts another process, these environment variables get transfered from the parent process to the child process. Sine these environment variables are set in the X sesssion startup process, which starts everything else, all programs should have access to them.

However, you'll notice that, after logging in a second time, the screen you re-attach to was not started by the current X session. So that means its environment variables still point to the old (no longer runnig) daemons from the previous X session. This includes any shells already running in the screen as well as new shells started within the screen (since the latter inherit the environment variables from the screen process itself).

To fix this, we would like to somehow update the environment of all processes that are already running when we login, to update them with the addresses of the new daemons. Unfortunately, we can't change the environment of other processes (unless we resort to scary stuff like using gdb or poking around in /dev/mem...). So, we'll have to convice those shells to actually update their own environments.

So, this solution has two parts: First, after login, saving the relevant variables from the environment into a file. Then, we'll need to get our shell to load those variables.

The first part is fairly easy: Just run a script after login that writes out the values to a file. I have a script called ~/bin/save-env to do exactly that. It looks like this (full version here):


# Save a bunch of environment variables. This script should be run just
# after login. The saved variables can then be sourced by every bash
# shell, so long running shells (e.g., in screen) or incoming SSH shells
# can also use these services.

# Save the DBUS sessions address on each login
if [ -n "$DBUS_SESSION_BUS_ADDRESS" ]; then

if [ -n "$SSH_AUTH_SOCK" ]; then
echo export SSH_AGENT_PID="$SSH_AGENT_PID" > ~/.env.d/ssh
echo export SSH_AUTH_SOCK="$SSH_AUTH_SOCK" >> ~/.env.d/ssh

# Save other variables here

This script fills the directory ~/.env.d with files containg environment variables, separated by application. I could probably have thrown them all into a single file, but it seemed like a good idea to separate them. Anyway, these files are created in such a way that they can be sourced by a running shell to get the new files.

If you download and install this script, don't forget to make it executable and create the ~/.env.d directory. You'll need to make sure it gets run as late as possible after login. I'm running a (stripped down) Gnome session, so I used gnome-session-properties to add it to my list of startup applications. You might call this script from your .xession, KDE's startup program list, or whatever.

For the second part, we need to set our saved variables in all of our shells. This sounds easy, just run for f in ~/.env.d/*; do source "$f"; done in every shell (Don't be tempted to do source ~/.env.d/*, since that sources just the first file with the other files as arguments!). But, of course we don't want to do this manually, but let every shell do it automatically.

For this, we'll use a tool completely unintended, but suitable enough for this job: $PROMPT_COMMAND. Whenever Bash is about to display a prompt, it evals whatever is in the variable $PROMPT_COMMAND. So it ends up evaluating that command all the time, which makes it a prefect place to load the saved variables. By setting the $PROMPT_COMMAND variable in your ~/.bashrc variable, it will become enabled in every shell you start (except for login shells, so you might want to source ~/.bashrc from your ~/.bash_profile):

# Source some variables at every prompt. This is to make stuff like
# ssh agent, dbus, etc. working in long-running shells (e.g., inside
# screen).
PROMPT_COMMAND='for f in ~/.env.d/*; do source "$f"; done'

You might need to be careful where to place this line, in case PROMPT_COMMAND already has some other value, like is default on Debian for example. Here's my full .bashrc file, note the += and starting ; in the second assignment of $PROMPT_COMMAND.

The astute reader will have noticed that this will only work for existing shells when a prompt is displayed, meaning you might need to just press enter at an existing prompt (to force a new one) after logging in the second time to get the values loaded. But that's a small enough burden, right?

So, with these two components, you'll be able to optimally use your long-running screen sessions, even when your X sessions are not so stable ;-)

Additionally, this stuff also allows you to use your faithful daemons when you SSH into the machine. I use this so I can start GUI programs from another machine (in particular, to open up attachments from my email client which runs on a server somewhere). See my recent blogpost about setting that up. However, since running a command through SSH non-interactively never shows a prompt and thus never evaluates $PROMPT_COMMAND, you'll need to manually source the variables at once in your .bashrc directly. I do this at the top of my ~/.bashrc.

Man, I need to learn how to writer shorter posts...

0 comments -:- permalink -:- 15:26
xauth breaking X11 forwarding over SSH

This morning, I was trying to enable X forwarding, to run applications on my server (where I have GHC available) to my local workstation (where I have an X server running). The standard way to do this, is to use SSH with the -X option. However, this didn't work for me:

mkooijma@ewi1246:~> ssh -X kat
Last login: Wed May 20 13:48:13 2009 from
matthijs@katherina:~$ xclock
X11 connection rejected because of wrong authentication.

Running ssh with -vvv showed me another hint:

debug2: X11 connection uses different authentication protocol.

It turned out this problem was caused by some weird entries in my .Xauthority file, which contains tokens to authenticate to X servers. The entries in the file can be queried with the xauth command:

matthijs@katherina:~$ xauth list
#ffff##:  MIT-MAGIC-COOKIE-1  00000000000000000000000000000000
#ffff##:  XDM-AUTHORIZATION-1  00000000000000000000000000000000
localhost/unix:10  MIT-MAGIC-COOKIE-1  00000000000000000000000000000000

(I replaced the actual authentication keys with zeroes here). The last entry is the useful one. It is the proxy key added by ssh when I logged in. That is the one it should send over the ssh forwarded X connection (where ssh will replace it with the actual key, this is called authentication spoofing). However, I found that for some reason X clients were sending the XDM-AUTHORIZATION-1 key instead (hence the "different authentication protocol" message), causing the connection to fail.

I've solved the issue by removing the #ffff## entries from the .Xauthority file (but since I couldn't just run xauth remove #ffff#, I turned it around by readding only the one I wanted:

matthijs@katherina:~$ rm ~/.Xauthority
matthijs@katherina:~$ xauth add localhost/unix:10  MIT-MAGIC-COOKIE-1  00000000000000000000000000000000

I'm still not sure what these #ffff## entries do or mean (I suspect xdm has added them, since I am running xdm on this machine), but I've made inquiries on the xorg list.

As a last note: If you want to use X forwarding and enable the GLX protocol extensions for OpenGL rendering, you need to disable security checks in the X forwarding, by running ssh -Y instead of ssh -X.

0 comments -:- permalink -:- 16:38
Awesome window manager


Recently, I switched Window manager again. I still haven't found the window manager that really works for me (having run Ratpoison, Enlightenment, ion3 and wmii in the past), perhaps I should write my own. Anyway, a while back a friend of mine recommended XMonad, a Haskell based window manager. Even though having a functionally programmed window manager is cool, I couldn't quite find my way around in it (especially it's DIY statusbar approach didn't really suit me).

While googling around for half-decent statusbar setups for XMonad, I came across the Awesome window manager. It's again a tiling window manager, as are most of the ones I have been using lately, but it just made this cool impression on me (in particular the clock in the statusbar, which is by default just a counter of type time_t, ie number of seconds since the epoch. Totally unusable and I replaced it already, but cool).

Awesome allows you to tag windows, possibly with multiple tags and then show one tag at a time. (which is similar to having workspaces, but slightly more powerful). But, one other feature which looks promising, is that it can show multiple tags at once. So you can quickly peak at your IRC screen, without losing focus of your webbrowser, for example. I'm not yet quite used to this setup, so I'm not sure if it is really handy, but it looks promising. I will need to fix my problems with terminal resizes and screen for this to work first, though....

0 comments -:- permalink -:- 19:09
USB Suspend breaking USB Mass storage

I've recently been fiddling around with my kernel, configuring most options as modules to enable faster dehibernation (which seems to work). Anyway, afterwards I could not longer mount my USB stick. For some reason, USB Mass storage support got broken. The USB device was properly recognized, but there was no SCSI device generated.

[   62.166775] usb 2-1: new full speed USB device using ohci_hcd and address 2
[   62.455087] scsi0 : SCSI emulation for USB Mass Storage devices
[   62.460703] usb-storage: device found at 2
[   62.460707] usb-storage: waiting for device to settle before scanning
[   65.047735] usb 2-1: reset full speed USB device using ohci_hcd and address 2
[   65.409629] usb 2-1: reset full speed USB device using ohci_hcd and address 2
[   65.769525] usb 2-1: reset full speed USB device using ohci_hcd and address 2
[   66.041867] usb-storage: device scan complete

To cut a long story (and debugging session) short: The problem was a kernel option "USB Selective suspend/resume and wakeup" (CONFIGUSBSUSPEND) that I enabled in the process (it sounded useful). Disabling this option made everything work again. Yay!

[  629.199144] usb 1-1: new full speed USB device using ohci_hcd and address 4
[  629.786207] Initializing USB Mass Storage driver...
[  629.787051] scsi0 : SCSI emulation for USB Mass Storage devices
[  629.789039] usb-storage: device found at 4
[  629.789043] usb-storage: waiting for device to settle before scanning
[  629.789055] usbcore: registered new driver usb-storage
[  629.789058] USB Mass Storage support registered.
[  632.291508]   Vendor: USB-DISK  Model: FreeDik-FlashUsb  Rev: 1.06
[  632.291519]   Type:   Direct-Access                      ANSI SCSI revision: 00
[  632.302503] SCSI device sda: 32736 512-byte hdwr sectors (17 MB)
[  632.305490] sda: Write Protect is off
[  632.305493] sda: Mode Sense: 0b 00 00 08
[  632.305496] sda: assuming drive cache: write through
[  632.312994] SCSI device sda: 32736 512-byte hdwr sectors (17 MB)
[  632.315988] sda: Write Protect is off
[  632.315992] sda: Mode Sense: 0b 00 00 08
[  632.315994] sda: assuming drive cache: write through
[  632.325982] SCSI device sda: 32736 512-byte hdwr sectors (17 MB)
[  632.328979] sda: Write Protect is off
[  632.328982] sda: Mode Sense: 0b 00 00 08
[  632.328984] sda: assuming drive cache: write through
[  632.329014]  sda: unknown partition table
[  632.334028] sd 0:0:0:0: Attached scsi removable disk sda
[  632.335561] usb-storage: device scan complete

0 comments -:- permalink -:- 19:53
Copyright by Matthijs Kooijman