"In het verleden behaalde resultaten bieden geen garanties voor de toekomst"

Current filter: »Nerd« (Click tag to remove it or click and/or to switch it.)

About this blog

These are the ramblings of Matthijs Kooijman, concerning the software he hacks on, hobbies he has and occasionally his personal life.

Most content on this site is licensed under the WTFPL, version 2 (details).

Sun Mon Tue Wed Thu Fri Sat
28 29 30 31    
Powered by Blosxom &Perl onion
(With plugins: config, extensionless, hide, tagging, Markdown, macros, breadcrumbs, calendar, directorybrowse, entries_index, feedback, flavourdir, include, interpolate_fancy, listplugins, menu, pagetype, preview, seemore, storynum, storytitle, writeback_recent, moreentries)
Valid XHTML 1.0 Strict & CSS
Bouncing packets: Kernel bridge bug or corner case?


While setting up Tika, I stumbled upon a fairly unlikely corner case in the Linux kernel networking code, that prevented some of my packets from being delivered at the right place. After quite some digging through debug logs and kernel source code, I found the cause of this problem in the way the bridge module handles netfilter and iptables.

Just in case someone else actually finds himself in this situation and actually manages to find this blogpost, I'll detail my setup, the problem and it solution here.

Tika's network setup

Tika runs Debian wheezy, with a single network interface to the internet (which is not involved in this problem). Furthermore, Tika runs a number of lxc containers, which are isolated systems sharing the same kernel, but running a complete userspace of their own. Using kernel namespaces and cgroups, these containers obtain a fair degree of separation: Each of them has its own root filesystem, a private set of mounted filesystem, separate user ids, separated network stacks, etc.

Each of these containers then connects to the outside world using a virtual ethernet device. This is sort of a named pipe, but then for ethernet. Each veth device has two ends, one inside the container, and one outside, which are connected. On the inside, it just looks like each container has a single ethernet device, which is configured normally. On the outside, all of these veth interfaces are grouped together into a bridge device, br-lxc, which allows the containers to talk amongst themselves (just as if they were connected to the same ethernet switch). The bridge device in the host is configured with an IP address as well, to allow communciation between the host and containers.

Now, I have a few port forwarding rules: when traffic comes in on my public IP address on specific ports, it gets forwarded to a specific container. There is nothing special about this, this is just like forwarding ports to LAN hosts on a NAT router.

A problem with port forwarding like this is that by default, packets coming in from the internal side cannot be properly handled. As an example, one of the containers is running a webserver, which serves a custom Debian repository on the domain. When another container tries to connect to that, DNS resolution will give it the external IP of tika, but connecting to that IP fails.

Usually, the DNAT rule used for portforwarding is configured to only process packets from the external network. But even if it would process internal packets, it would not work. The DNAT rule changes the destination address of these packets to point to my web container so they get sent to the web container. However, the source address is unchanged. Since the containers have a direct connection (through the network bridge) reply packets get sent directly through the original container - the host does not have a chance to "undo" the DNAT on the reply packets. For external connections, this is not a problem because the host is the default gateway for the containers and the replies need to through the host to reach the external ip.

The most common solution to this is split-horizon DNS - make sure that all these domains resolve to the internal address of the web container, so no port forwarding is needed. For various practical reasons, this didn't work for me, so I settled for the other solution: Apply SNAT in addition to DNAT, which causes the source address of the forwarded packets to be changed to the host's address, forcing replies to pass through the host. The Vuurmuur firewall I was using even had a special "bounce" rule for exactly this purpose (setting up a DNAT and SNAT iptables rule).

This setup worked perfectly - when connecting to the web container from other containers. However, when the web container tried to connect to itself (through the public IP address), the packets got lost. I initially thought the packets were droppped - they went through the PREROUTING chain as normal, but never showed up in the FORWARD chain. I also thought the problem was caused by the packet having the same source and destination addresses, since packets coming from other containers worked as normal. Neither of these turned out to be true, as I'll show below.

Simplifying the setup

Since reproducing the problem on a different and/or simpler setup is always a good approach in debugging, I tried to reproduce the problem on my laptop, using a (single) reguler ethernet device and applying DNAT and SNAT rules. This worked as expected, but when I added a bridge interface, containing just the ethernet interface, it broke again. Adding a second (vlan) interface to the bridge uncovered that the problem was not traffic DNATed back to its source, but rather traffic DNATed back to the same bridge port it originated from - traffic from one bridge port DNATed to the other worked normally.

Digging down into the kernel sources for the bridge module, I uncovered this piece of code, which applies some special handling for exactly DNATed packages on a bridge. It seems this is either a performance optimization, or a way to allow DNATing packets inside a bridge without having to enable full routing, though I find the exact effects of this code rather confusing.

I also found that setting the bridge device to promiscuous mode (e.g. running tcpdump) makes everything work. Setting /proc/sys/net/bridge/bridge-nf-call-iptables to 0 also makes this work. This setting is to prevent bridged packets from passing through iptables, but since this packet wasn't actually a bridged packet before PREROUTING, this actually makes the packet be processed using the normal routing code and progresses through all regular chains normally.

Here's what I think happens:

  • The packet comes in br_handle_frame
  • The frame gets dumped into the NF_BR_PRE_ROUTING netfilter chain (e.g. the bridge / ebtables version, not the ip / iptables one).
  • The ebtables rules get called
  • The br_nf_pre_routing hook for NF_BR_PRE_ROUTING gets called. This interrupts (returns NF_STOLEN) the handling of the NF_BR_PRE_ROUTING chain, and calls the NF_INET_PRE_ROUTING chain.
  • The br_nf_pre_routing_finish finish handler gets called after completing the NF_INET_PRE_ROUTING chain.
  • This handler resumes the handling of the interrupted NF_BR_PRE_ROUTING chain. However, because it detects that DNAT has happened, it sets the finish handler to br_nf_pre_routing_finish_bridge instead of the regular br_handle_frame_finish finish handler.
  • br_nf_pre_routing_finish_bridge runs, this skb->dev to the parent bridge and sets the BRNF_BRIDGED_DNAT flag which calls neigh->output(neigh, skb); which presumably resolves to one of the neigh_*output functions, each of which again calls dev_queue_xmit, which should (eventually) call br_dev_xmit.
  • br_dev_xmit sees the BRNF_BRIDGED_DNAT flag and calls br_nf_pre_routing_finish_bridge_slow instead of actually delivering the packet.
  • br_nf_pre_routing_finish_bridge_slow sets up the destination MAC address, sets skb->dev back to skb->physindev and calls br_handle_frame_finish.
  • br_handle_frame_finish calls br_forward. If the bridge device is set to promisicuous mode, this also delivers the packet up through br_pass_frame_up. Since enabling promiscuous mode fixes my problem, it seems likely that the packet manages to get all the way to here.
  • br_forward calls should_deliver, which returns false when skb->dev != p->dev (and "hairpin mode" is not enabled) causing the packet to be dropped.

This seems like a bug, or at least an unfortunate side effect. It seems there's currently two ways two work around this problem:

  • Setting /proc/sys/net/bridge/bridge-nf-call-iptables to 0, so there is no need for this DNAT + bridge stuff. The side effect of this solution is that bridge packets don't go through iptables, but that's really what I'd have expected in the first place, so this is not a problem for me.
  • Setting the bridge port to "hairpin" mode, which allows sending ports back into it. The side effect here is, AFAICS, that broadcast packets are sent back into the bridge port as well, which isn't really needed (but shouldn't really hurt either).

Next up is reporting this to a kernel mailing list to confirm if there is an actual kernel bug, or just a bug in my expectations :-)

Update: Turns out this behaviour was previously spotted, but no concensus about a fix was reached.

Related stories

0 comments -:- permalink -:- 18:40
Introducing Tika

Tika Tovenaar Supermicro 5015A

(This post has been lying around as a draft for a few years, thought I'd finish it up and publish it now that Tika has finally been put into production)

A few months years back, I purchased a new server together with some friends, which we've named "Tika" (daughter of "Tita Tovenaar", both wizards from a Dutch television series from the 70's). This name combine's Daenney's "wizards and magicians" naming scheme with my "Television shows from my youth" naming schemes quite neatly. :-)

It's a Supermicro 5015A rack server sporting an Atom D510 dual core processor, 4GB ram, 500GB of HD storage and recently added 128G of SSD storage. It is intended to replace Drsnuggles, my current HP DL360G2 (which has been very robust and loyal so far, but just draws too much power) as well as Daenney's Zeratul, an Apple Xserve. Both of our current machines draw around 180W, versus just around 20-30W for Tika. :-D You've got to love the Atom processor (and it probably outperforms our current hardware anyway, just by being over 5 years newer...).

Over the past three years, I've been working together with Daenney and Bas on setting up the software stack on Tika, which proved a bit more work than expected. We wanted to have a lot of cool things, like LXC containers, privilege separation for webapplications, a custom LDAP schema and a custom web frontend for user (self-)management, etc. Me being the perfectionist I am, it took quite some effort to get things done, also producing quite a number of bug reports, patches and custom scripts in the process.

Last week, we've finally put Tika into production. My previous server, drsnuggles had a hardware breakdown, which forced me wrap up Tika's configuration into something usable (which still took me a week, since I seem to be unable to compromise on perfection...). So now my e-mail, websites and IRC are working as expected on Tika, with the stuff from Bas and Daenney still needing to be migrated.

I also still have some draft postings lying around about Maroesja, the custom LDAP schema / user management setup we are using. I'll try to wrap those up in case others are interested. The user management frontend we envisioned hasn't been written yet, but we'll soon tire of manual LDAP modification and get to that, I expect :-)

0 comments -:- permalink -:- 14:10
JTAG and SPI headers for the Pinoccio Scout

Pinoccio Scout

The Pinoccio Scout is a wonderful Arduino-like microcontroller board that has builtin mesh networking, a small form factor and a ton of resources (at least in Arduino terms: 32K of SRAM and 256K of flash).

However, flashing a new program into the scout happens through a serial port at 115200 baud. That's perfectly fine when you only have 32K of flash or for occasional uploads. But when you upload a 100k+ program dozens of times per day, it turns out that that's actually really slow! Uploading and verifying a 104KiB sketch takes over 30 seconds, just too long to actually wait for it (so you do something else, get distracted, and gone is the productivity).

See more ...

0 comments -:- permalink -:- 18:01
Using a JTAGICE3 programmer under Linux: Setting up permissions


Last week, I got a fancy new JTAGICE3 programmer / debugger. I wanted to achieve two things in my Pinoccio work: Faster uploading of programs (Having 256k of flash space is nice, but flashing so much code through a 115200 baud serial connection is slow...) and doing in-circuit debugging (stepping through code and dumping variables should turn out easier than adding serial prints and re-uploading every time).

In any case, the JTAGICE3 device is well-supported by avrdude, the opensource uploader for AVR boards. However, unlike devices like the STK500 development board, the AVR dragon programmer/debugger and the Arduino bootloader, which use an (emulated) serial port to communicate, the JTAGICE3 uses a native USB protocol. The upside is that the data transfer rate is higher, but the downside is that the kernel doesn't know how to talk to the device, so it doesn't expose something like /dev/ttyUSB0 as for the other devices.

avrdude solves this by using libusb, which can talk to USB devices directly, through files in /dev/usb/. However, by default these device files are writable only by root, since the kernel has no idea what kind of devices they are and whom to give permissions.

To solve this, we'll have to configure the udev daemon to create the files in /dev/usb with the right permissions. I created a file called /etc/udev/rules.d/99-local-jtagice3.rules, containg just this line:

SUBSYSTEM=="usb", ATTRS{idVendor}=="03eb", ATTRS{idProduct}=="2110", GROUP="dialout"

This matches the JTAGICE3 specifically using it's USB vidpid (03eb:2110, use lsusb to find the id of a given device) and changes the group for the device file to dialout (which is also used for serial devices on Debian Linux), but you might want to use another group (don't forget to add your own user to that group and log in again, in any case).

0 comments -:- permalink -:- 13:57
Dynamic memory allocation debugging

Arduino Community Logo

While trying to track down a reset bug in the Pinoccio firmware, I suspected something was going wrong in the dynamic memory management (e.g., double free, or buffer overflow). For this, I wrote some code to log all malloc, realloc and free calls, as wel as a python script to analyze the output.

This didn't catch my bug, but perhaps it will be useful to someone else.

In addition to all function calls, it also logs the free memory after the call and shows the return address (e.g. where the malloc is called from) to help debugging.

It uses the linker's --wrap, which allows replacing arbitrary functions with wrappers at link time. To use it with Arduino, you'll have to modify platform.txt to change the linker options (I hope to improve this on the Arduino side at some point, but right now this seems to be the only way to do this).

0 comments -:- permalink -:- 21:47
Changing the gdm3 (login screen) background in Gnome3


I upgraded to Gnome3 this week, and after half a day of debugging I got my (quite non-standard) setup working completely again. One of the things that got broken was my custom wallpaper on the gdm3 login screen. This used to be configured in /etc/gdm3/greeter.gconf.defaults, but apparently Gnome3 replaced gconf by this new "gsettings" thingy.

Anyway, to change the desktop background in gdm, add the following lines to /etc/gdm3/greeter.gsettings:


For reference, I also found some other method, which looks a lot more complicated. I suspect it also doesn't work in Debian, which runs gdm as root, not as a separate "gdm" user. Systems that do use such a user might need the more complicated method, I guess (which probably ends up storing the settings somewhere in the homedir of the gdm user...).

0 comments -:- permalink -:- 12:19
Thinkpad X201 mute button breaking speaker output


Recently, I was having some problems with the internal speakers on my Lenovo Thinkpad X201. Three times now, the internal speakers just stopped producing sound. The headphone jack worked, it's just the speakers which were silent. Nothing helped: fiddling with volume controls, reloading alsa modules, rebooting my laptop, nothing fixed the sound...

When trying to see if the speakers weren't physically broken, I discovered that booting into Windows actually fixed the problem and restored the sound from the speakers. It's of course a bit of a defeat to accept Windows a fix for my problem, but I was busy with other things, so it sufficed for a while.

When migrating my laptop to my new Intel SSD, I broke my Windows installation, so when the problem occured again, I had no choice but to actualy investigate it.

I'll skip right to the conclusion here: I had broken my sound by pressing the mute button on my keyboard... Now, before you think I'm stupid, I had of course checked my volume controls and the device really was unmuted! But it turns out the mute button in Thinkpads combined with Linux is a bit weird...

This is how you would expect a mute button to be implemented: You press the mute button, it sends a keypress to the operating system, which then tells the audio driver to mute.

X201 volume buttons

This is how it works on my Thinkpad: You press the mute button, causing the EC (embedded controller) in the thinkpad to directly mute the speakers. This is not visible from the normal volume controls in the software, since it happens on a very low level (though the thinkpad_acpi kernel module can be used to expose this special mute state through a /proc interface and special audio device).

In addition to muting the speakers, it also sends a MUTE acpi keypress to the operating system. This keypress then causes the audio driver to mute the audio stream (actually, it's pulseaudio that does that).

Now, here's the fun part: If you now unmute the audio stream through the software volume controls, everything looks like it should work, but the hardware is still muted! It never occured to me to press the mute button again, since the volume wasn't muted (or at least didn't look like it).

I originally thought that the mute button handling was even more complex, when I found some register polling code that faked keypresses, but it seems that's only for older Thinkpads (phew!).

In any case, the bottom line is: If you have a Thinkpad whose speakers suddely stop working, try pressing the mute button!

0 comments -:- permalink -:- 00:13
Opening attachments on another machine from within mutt

For a fair amount of years now, I've been using Mutt as my primary email client. It's a very nice text-based email client that is permanently running on my server (named drsnuggles). This allows me to connect to my server from anywhere, connect to the running Screen and always get exactly the same, highly customized, mail interface (some people will say that their webmail interfaces will allow for exactly the same, but in my experience webmail is always clumsy and slow compared to a decent, well-customized text-based client when processing a couple of hundreds of emails per day).

Attachment troubles

So I like my mutt / screen setup. However, there has been one particular issue that didn't work quite as efficient: attachments. Whenever I wanted to open an email attachment, I needed to save the attachment within mutt to some place that was shared through HTTP, make the file world-readable (mutt insists on not making your saved attachments world-readable), browse to some url on the local machine and open the attachment. Not quite efficient at all.

Yesterday evening I was finally fed up with all this stuff and decided to hack up a fix. It took a bit of fiddling to get it right (and I had nearly started to spend the day coding a patch for mutt when the folks in #mutt pointed out an easier, albeit less elegant "solution"), but it works now: I can select an attachment in mutt, press "x" and it gets opened on my laptop. Coolness.

How does it work?

Just in case anyone else is interested in this solution, I'll document how it works. The big picture is as follows: When I press "x", a mutt macro is invoked that copies the attachment to my laptop and opens the attachment there. There's a bunch of different steps involved here, which I'll detail below.

See more ...

0 comments -:- permalink -:- 22:38
Adobe dropped 64 bit Linux support in Flash again

Only recently, Adobe has started to (finally) support 64 bit Linux with its Flash plugin. I could finally watch Youtube movies (and more importantly, do some Flash development work for Brevidius).

However, this month Adobe has announced that it drops support for 64 bit Linux again. Apparently they "are making significant architectural changes to the 64-bit Linux Flash Player and additional security enhancements" and they can't do that while keeping the old architecture around for stable releases, apparently.

This is particularly nasty, because the latest 10.0 version (which still has amd64 support) has a couple of dozens (!) of security vulnerabilities which are fixed in a 10.1 version only (which does not have Linux amd64 support anymore).

So Adobe is effectively encouraging people on amd64 Linux to either not use their product, or use a version with critical security flaws. Right.

0 comments -:- permalink -:- 09:51
URL-encoding in Flash: Be careful of plus signs!

Adobe Flash PHP

Recently I have been doing some Flash debugging for my work at Brevidius. In a video player we have been developing (based on work done by Jeroen Wijering) we needed to escape some url parameter, since our flash code could not be certain what would be in the value (and characters like & and = could cause problems). The obvious way to do this is of course the escape function in ActionScript. This function promises to escape all "non-alphanumeric characters", which would solve all our problems.

However, afters implementing this, we find that there are spaces magically appearing in our GET parameters. Upon investigation, it turns out that there are plus signs in our actual values (it's Base64 encoded data, which uses the plus sign). However, the escape function apparently thinks a plus sign is alphanumeric, since it does not escape it (note that the flash 10 documentation documents this fact). Which shouldn't be a problem, since a plus sign isn't special in an url according to RFC1738:

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

(Note that RFC3986 does recommend escaping plus signs, since they might be used to separate variables, but that's not the case here).

However, the urls we generate in flash point to PHP scripts and thus pass their variables to PHP. Unfortunately, PHP does not adhere to the RFC's strictly: It interprets plus signs in an url as spaces. Historically, spaces in an url were replaced by plus signs, while spaces should really be encoded as %20 nowadays. There is of course a simple way get Flash (or any other url-generating piece of code) work properly with PHP: Simply encode plus signs in your data as %2B (which is the "official" way). This makes sure you get a real plus in your $_GET array in PHP, and the problem is resolved.

After some searching, and asking around in ##swf on Freenode, I found the encodeURIComponent function, which is similar to escape, but does encode the plus sign. If we use this function, we can again send data with spaces to PHP! And since encoding more than needed is still fine according to the specs, there are no downsides (except that you need Flash >= 9.0).

So, if you're developing in Flash, please stop using escape, and use encodeURIComponent instead.

0 comments -:- permalink -:- 00:25
Showing 1 - 10 of 40 posts
Copyright by Matthijs Kooijman