Glider
"In het verleden behaalde resultaten bieden geen garanties voor de toekomst"
About this blog

These are the ramblings of Matthijs Kooijman, concerning the software he hacks on, hobbies he has and occasionally his personal life.

Most content on this site is licensed under the WTFPL, version 2 (details).

August
Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            
Powered by Blosxom &Perl onion
(With plugins: config, extensionless, hide, tagging, Markdown, macros, breadcrumbs, calendar, directorybrowse, entries_index, feedback, flavourdir, include, interpolate_fancy, listplugins, menu, pagetype, preview, seemore, storynum, storytitle, writeback_recent, moreentries)
Valid XHTML 1.0 Strict & CSS
Bouncing packets: Kernel bridge bug or corner case?

Tux

While setting up Tika, I stumbled upon a fairly unlikely corner case in the Linux kernel networking code, that prevented some of my packets from being delivered at the right place. After quite some digging through debug logs and kernel source code, I found the cause of this problem in the way the bridge module handles netfilter and iptables.

Just in case someone else actually finds himself in this situation and actually manages to find this blogpost, I'll detail my setup, the problem and it solution here.

Tika's network setup

Tika runs Debian wheezy, with a single network interface to the internet (which is not involved in this problem). Furthermore, Tika runs a number of lxc containers, which are isolated systems sharing the same kernel, but running a complete userspace of their own. Using kernel namespaces and cgroups, these containers obtain a fair degree of separation: Each of them has its own root filesystem, a private set of mounted filesystem, separate user ids, separated network stacks, etc.

Each of these containers then connects to the outside world using a virtual ethernet device. This is sort of a named pipe, but then for ethernet. Each veth device has two ends, one inside the container, and one outside, which are connected. On the inside, it just looks like each container has a single ethernet device, which is configured normally. On the outside, all of these veth interfaces are grouped together into a bridge device, br-lxc, which allows the containers to talk amongst themselves (just as if they were connected to the same ethernet switch). The bridge device in the host is configured with an IP address as well, to allow communciation between the host and containers.

Now, I have a few port forwarding rules: when traffic comes in on my public IP address on specific ports, it gets forwarded to a specific container. There is nothing special about this, this is just like forwarding ports to LAN hosts on a NAT router.

A problem with port forwarding like this is that by default, packets coming in from the internal side cannot be properly handled. As an example, one of the containers is running a webserver, which serves a custom Debian repository on the apt.stderr.nl domain. When another container tries to connect to that, DNS resolution will give it the external IP of tika, but connecting to that IP fails.

Usually, the DNAT rule used for portforwarding is configured to only process packets from the external network. But even if it would process internal packets, it would not work. The DNAT rule changes the destination address of these packets to point to my web container so they get sent to the web container. However, the source address is unchanged. Since the containers have a direct connection (through the network bridge) reply packets get sent directly through the original container - the host does not have a chance to "undo" the DNAT on the reply packets. For external connections, this is not a problem because the host is the default gateway for the containers and the replies need to through the host to reach the external ip.

The most common solution to this is split-horizon DNS - make sure that all these domains resolve to the internal address of the web container, so no port forwarding is needed. For various practical reasons, this didn't work for me, so I settled for the other solution: Apply SNAT in addition to DNAT, which causes the source address of the forwarded packets to be changed to the host's address, forcing replies to pass through the host. The Vuurmuur firewall I was using even had a special "bounce" rule for exactly this purpose (setting up a DNAT and SNAT iptables rule).

This setup worked perfectly - when connecting to the web container from other containers. However, when the web container tried to connect to itself (through the public IP address), the packets got lost. I initially thought the packets were droppped - they went through the PREROUTING chain as normal, but never showed up in the FORWARD chain. I also thought the problem was caused by the packet having the same source and destination addresses, since packets coming from other containers worked as normal. Neither of these turned out to be true, as I'll show below.

Simplifying the setup

Since reproducing the problem on a different and/or simpler setup is always a good approach in debugging, I tried to reproduce the problem on my laptop, using a (single) reguler ethernet device and applying DNAT and SNAT rules. This worked as expected, but when I added a bridge interface, containing just the ethernet interface, it broke again. Adding a second (vlan) interface to the bridge uncovered that the problem was not traffic DNATed back to its source, but rather traffic DNATed back to the same bridge port it originated from - traffic from one bridge port DNATed to the other worked normally.

Digging down into the kernel sources for the bridge module, I uncovered this piece of code, which applies some special handling for exactly DNATed packages on a bridge. It seems this is either a performance optimization, or a way to allow DNATing packets inside a bridge without having to enable full routing, though I find the exact effects of this code rather confusing.

I also found that setting the bridge device to promiscuous mode (e.g. running tcpdump) makes everything work. Setting /proc/sys/net/bridge/bridge-nf-call-iptables to 0 also makes this work. This setting is to prevent bridged packets from passing through iptables, but since this packet wasn't actually a bridged packet before PREROUTING, this actually makes the packet be processed using the normal routing code and progresses through all regular chains normally.

Here's what I think happens:

  • The packet comes in br_handle_frame
  • The frame gets dumped into the NF_BR_PRE_ROUTING netfilter chain (e.g. the bridge / ebtables version, not the ip / iptables one).
  • The ebtables rules get called
  • The br_nf_pre_routing hook for NF_BR_PRE_ROUTING gets called. This interrupts (returns NF_STOLEN) the handling of the NF_BR_PRE_ROUTING chain, and calls the NF_INET_PRE_ROUTING chain.
  • The br_nf_pre_routing_finish finish handler gets called after completing the NF_INET_PRE_ROUTING chain.
  • This handler resumes the handling of the interrupted NF_BR_PRE_ROUTING chain. However, because it detects that DNAT has happened, it sets the finish handler to br_nf_pre_routing_finish_bridge instead of the regular br_handle_frame_finish finish handler.
  • br_nf_pre_routing_finish_bridge runs, this skb->dev to the parent bridge and sets the BRNF_BRIDGED_DNAT flag which calls neigh->output(neigh, skb); which presumably resolves to one of the neigh_*output functions, each of which again calls dev_queue_xmit, which should (eventually) call br_dev_xmit.
  • br_dev_xmit sees the BRNF_BRIDGED_DNAT flag and calls br_nf_pre_routing_finish_bridge_slow instead of actually delivering the packet.
  • br_nf_pre_routing_finish_bridge_slow sets up the destination MAC address, sets skb->dev back to skb->physindev and calls br_handle_frame_finish.
  • br_handle_frame_finish calls br_forward. If the bridge device is set to promisicuous mode, this also delivers the packet up through br_pass_frame_up. Since enabling promiscuous mode fixes my problem, it seems likely that the packet manages to get all the way to here.
  • br_forward calls should_deliver, which returns false when skb->dev != p->dev (and "hairpin mode" is not enabled) causing the packet to be dropped.

This seems like a bug, or at least an unfortunate side effect. It seems there's currently two ways two work around this problem:

  • Setting /proc/sys/net/bridge/bridge-nf-call-iptables to 0, so there is no need for this DNAT + bridge stuff. The side effect of this solution is that bridge packets don't go through iptables, but that's really what I'd have expected in the first place, so this is not a problem for me.
  • Setting the bridge port to "hairpin" mode, which allows sending ports back into it. The side effect here is, AFAICS, that broadcast packets are sent back into the bridge port as well, which isn't really needed (but shouldn't really hurt either).

Next up is reporting this to a kernel mailing list to confirm if there is an actual kernel bug, or just a bug in my expectations :-)

Update: Turns out this behaviour was previously spotted, but no concensus about a fix was reached.

Related stories

 
0 comments -:- permalink -:- 18:40
Introducing Tika

Tika Tovenaar Supermicro 5015A

(This post has been lying around as a draft for a few years, thought I'd finish it up and publish it now that Tika has finally been put into production)

A few months years back, I purchased a new server together with some friends, which we've named "Tika" (daughter of "Tita Tovenaar", both wizards from a Dutch television series from the 70's). This name combine's Daenney's "wizards and magicians" naming scheme with my "Television shows from my youth" naming schemes quite neatly. :-)

It's a Supermicro 5015A rack server sporting an Atom D510 dual core processor, 4GB ram, 500GB of HD storage and recently added 128G of SSD storage. It is intended to replace Drsnuggles, my current HP DL360G2 (which has been very robust and loyal so far, but just draws too much power) as well as Daenney's Zeratul, an Apple Xserve. Both of our current machines draw around 180W, versus just around 20-30W for Tika. :-D You've got to love the Atom processor (and it probably outperforms our current hardware anyway, just by being over 5 years newer...).

Over the past three years, I've been working together with Daenney and Bas on setting up the software stack on Tika, which proved a bit more work than expected. We wanted to have a lot of cool things, like LXC containers, privilege separation for webapplications, a custom LDAP schema and a custom web frontend for user (self-)management, etc. Me being the perfectionist I am, it took quite some effort to get things done, also producing quite a number of bug reports, patches and custom scripts in the process.

Last week, we've finally put Tika into production. My previous server, drsnuggles had a hardware breakdown, which forced me wrap up Tika's configuration into something usable (which still took me a week, since I seem to be unable to compromise on perfection...). So now my e-mail, websites and IRC are working as expected on Tika, with the stuff from Bas and Daenney still needing to be migrated.

I also still have some draft postings lying around about Maroesja, the custom LDAP schema / user management setup we are using. I'll try to wrap those up in case others are interested. The user management frontend we envisioned hasn't been written yet, but we'll soon tire of manual LDAP modification and get to that, I expect :-)

 
0 comments -:- permalink -:- 14:10
JTAG and SPI headers for the Pinoccio Scout

Pinoccio Scout

The Pinoccio Scout is a wonderful Arduino-like microcontroller board that has builtin mesh networking, a small form factor and a ton of resources (at least in Arduino terms: 32K of SRAM and 256K of flash).

However, flashing a new program into the scout happens through a serial port at 115200 baud. That's perfectly fine when you only have 32K of flash or for occasional uploads. But when you upload a 100k+ program dozens of times per day, it turns out that that's actually really slow! Uploading and verifying a 104KiB sketch takes over 30 seconds, just too long to actually wait for it (so you do something else, get distracted, and gone is the productivity).

See more ...

 
0 comments -:- permalink -:- 18:01
Using a JTAGICE3 programmer under Linux: Setting up permissions

JTAGICE3

Last week, I got a fancy new JTAGICE3 programmer / debugger. I wanted to achieve two things in my Pinoccio work: Faster uploading of programs (Having 256k of flash space is nice, but flashing so much code through a 115200 baud serial connection is slow...) and doing in-circuit debugging (stepping through code and dumping variables should turn out easier than adding serial prints and re-uploading every time).

In any case, the JTAGICE3 device is well-supported by avrdude, the opensource uploader for AVR boards. However, unlike devices like the STK500 development board, the AVR dragon programmer/debugger and the Arduino bootloader, which use an (emulated) serial port to communicate, the JTAGICE3 uses a native USB protocol. The upside is that the data transfer rate is higher, but the downside is that the kernel doesn't know how to talk to the device, so it doesn't expose something like /dev/ttyUSB0 as for the other devices.

avrdude solves this by using libusb, which can talk to USB devices directly, through files in /dev/usb/. However, by default these device files are writable only by root, since the kernel has no idea what kind of devices they are and whom to give permissions.

To solve this, we'll have to configure the udev daemon to create the files in /dev/usb with the right permissions. I created a file called /etc/udev/rules.d/99-local-jtagice3.rules, containg just this line:

SUBSYSTEM=="usb", ATTRS{idVendor}=="03eb", ATTRS{idProduct}=="2110", GROUP="dialout"

This matches the JTAGICE3 specifically using it's USB vidpid (03eb:2110, use lsusb to find the id of a given device) and changes the group for the device file to dialout (which is also used for serial devices on Debian Linux), but you might want to use another group (don't forget to add your own user to that group and log in again, in any case).

 
0 comments -:- permalink -:- 13:57
Current measurement helper board

Recently, I needed to do battery current draw measurements on my Pinoccio boards. Since the battery is connected using this tinywiny JST connector, I couldn't just use some jumper wires to redirect the current flow through my multimeter. I ended up using jumper wires, combined with my Bus Pirate fanout cable, which has female connectors just small enough, to wire everything up. The result was a bit of a mess:

Messy setup

Admittedly, once I cleaned up all the other stuff around it from my desk for this picture, it was less messy than I thought, but still, jamming in jumper wires into battery connectors like this is bound to wear them out.

So, I ordered up some JST FSH connectors (as used by the battery) and some banana sockets and built a simple board that allows connecting a power source and a load, keeping the ground pins permanently connected, but feeding the positive pins through a pair of banana sockets where a current meter can plug in. For extra flexibility, I added a few other connections, like 2.54mm header pins and sockets, a barrel jack plug and more banana sockets for the power source and load. I just realized I should also add USB connectors, so I can easily measure current used by an USB device.

The board also features a switch (after digging in my stash, I found one old three-way switch, which is probably the first component to die in this setup. The switch allows switching between "on", "off" and "redirect through measurement pins" modes. I tried visualizing the behaviour of the pins on the top of the PCB, but I'm not too happy with the result. Oh well, as long as I know what does :-)

Improved setup Top Bottom

All I need is a pretty case to put under the PCB and a μCurrent to measure small currents accurately and I'm all set!

 
0 comments -:- permalink -:- 18:27
Updating the Xprotolab portable firmware on Linux

Xprotolab Portable

I recently got myself an Xprotolab Portable, which is essentially a tiny, portable 1Msps scope (in hindsight I might have better gotten the XMinilab Portable which is essentially the same, but slightly bigger, more expensive and with a bigger display. Given the size of the cables and carrying case, the extra size of the device itself is negligable, while the extra screen size is significant).

In any case, I wanted to update the firmware of the device, but the instructions refer only to a Windows-only GUI utility from Atmel, called "FLIP". I remembered seeing a flip.c file inside the avrdude sources though, so I hoped I could also flash this device using avrdude on Linux. And it worked! Turns out it's fairly simple.

  1. Activate the device's bootloader, by powering off, then press K1 and keep it pressed while turning the device back on with the menu key. The red led should light up, the screen will stay blank.
  2. Get the appropriate firmware hex files from the Xprotolab Portable page. You can find them at the "Hex" link in the top row of icons.
  3. Run avrdude, for both the application and EEPROM contents:

    sudo avrdude -c flip2 -p x32a4u -U application:w:xprotolab-p.hex:i
    sudo avrdude -c flip2 -p x32a4u -U eep:w:xprotolab-p.eep:i
    

    I'm running under sudo, since this needs raw USB access to the USB device. Alternatively, you can set up udev to offer access to your regular user (like I did for the JTAGICE3), but that's probably too much effort just for a one-off firmware update.

  4. Done!

Note that you have to use avrdude version 6.1 or above, older versions don't support the FLIP protocol.

 
0 comments -:- permalink -:- 10:22
Dynamic memory allocation debugging

Arduino Community Logo

While trying to track down a reset bug in the Pinoccio firmware, I suspected something was going wrong in the dynamic memory management (e.g., double free, or buffer overflow). For this, I wrote some code to log all malloc, realloc and free calls, as wel as a python script to analyze the output.

This didn't catch my bug, but perhaps it will be useful to someone else.

In addition to all function calls, it also logs the free memory after the call and shows the return address (e.g. where the malloc is called from) to help debugging.

It uses the linker's --wrap, which allows replacing arbitrary functions with wrappers at link time. To use it with Arduino, you'll have to modify platform.txt to change the linker options (I hope to improve this on the Arduino side at some point, but right now this seems to be the only way to do this).

 
0 comments -:- permalink -:- 21:47
Twente Mini Maker Faire

Twente Mini Maker Faire Logo

I just returned from the Twente Mini Maker Fair in Hengelo, where I saw a lot of cool makers and things. Eye-catchers were the "strandbeest" from Theo Janssen, a big walking contraption, powered by wind, made from PVC tubing, a host of different hackerspaces and fablabs, all kinds of cool technology projects for kids from the Technlogy Museum Heim and all kinds of cool buttonsy projects from the enthousiastic Herman Kopinga. All this in a cool industrial atmosphere of an old electrical devices factory.

Also nice to meet some old and new familiar people. I ran into Leo Simons, with whom I played theatre sports at Pro Deo years ago. He was now working with his father and brother on the Portobello, a liquid resin based 3D printer, which looked quite promising. I also ran into Edwin Dertien (also familiar from Pro Deo as well as the Gonnagles), whom seemed to be on the organising side of some lectures during the faire. I attended one lecture by Harmen Zijp and Diana Wildschut (which form the Amersfoort-based art group "de Spullenmannen") about the overlap and interaction between art and science, which was nice. I also ran into Govert Combée, a LARPer I knew from Enschede.

Strandbeest by Theo Jansen

Overall, this was a nice place to visit. Lots of cool stuff to see and play with, lots of interesting people. There was also a nice balance of technology vs art, and electronics vs "regular" projects: Nice to see that the Maker mindset appeals to all kinds of different people!

 
0 comments -:- permalink -:- 13:11
Uses and requirements

Okay, so I'm gonna build a system to do administration tasks in our LARP club. But, what exactly are these? What should this system actually do for us? I've given this question a lot of thought and these are my notes and thoughts, hopefully structured in a useful and readable way. I've had some help of Brenda so far in writing some of these down, but I'll appreciate any comments you can think of (including "hey wouldn't it be cool if the system could do x?", or "Don't you think y is really a bad idea?"). Also, I am still open for suggestions regarding a name.

General outline

The general idea of the system is to simplify various administration tasks in a LARP club. These tasks include (but are not limited to) managing event information, player information, event subscriptions, character information, rule information (skill lists, spells, etc), etc.

This information should be managable by different cooperating organisers and to some extent by the players themselves. We loosely divide the information into OC information (info centered around players) and IC information (info centered around characters). OC information is plainly editable by players or organisers, where appropriate. IC information is generally editable by organisers and players can propose changes (but only for their own characters). These changes have to be approved by an organiser before being applied.

The information should be exported in various (configurable and/or adaptable) formats, such as a list of subscribed players with payment info, a PDF containing character sheets to be printed or a list of spells for on the main website. Since the exact requirements of each club and/or event with regard to this output vary, there should be some kind of way to easily change this output.

See more ...

 
0 comments -:- permalink -:- 15:26
Debian Squeeze on an emulated MIPS machine

In my work as a Debian Maintainer for the OpenTTD and related packages, I occasionally come across platform-specific problems. That is, compiling and running OpenTTD works fine on my own x86 and amd64 systems, but when I my packages to Debian, it turns out there is some problem that only occurs on more obscure platforms like MIPS, S390 or GNU Hurd.

This morning, I saw that my new grfcodec package is not working on a bunch of architectures (it seems all of the failing architectures are big endian). To find out what's wrong, I'll need to have a machine running one of those architectures so I can debug.

In the past, I've requested access to Debian's "porter" machines, which are intended for these kinds of things. But that's always a hassle, which requires other people's time to set up, so I'm using QEMU to set up a virtual machine running the MIPS architecture now.

What follows is essentially an update for this excellent tutorial about running Debian Etch on QEMU/MIPS(EL) by Aurélien Jarno I found. It's probably best to read that tutorial as well, I'll only give the short version, updated for Squeeze. I've also looked at this tutorial on running Squeeze on QEMU/PowerPC by Uwe Hermann.

Finally, note that Aurélien also has pre-built images available for download, for a whole bunch of platforms, including Squeeze on MIPS. I only noticed this after writing this tutorial, might have saved me a bunch of work ;-p

Preparations

You'll need qemu. The version in Debian Squeeze is sufficient, so just install the qemu package:

$ aptitude install qemu

You'll need a virtual disk to install Debian Squeeze on:

$ qemu-img create -f qcow2 debian_mips.qcow2 2G

You'll need a debian-installer kernel and initrd to boot from:

$ wget http://ftp.de.debian.org/debian/dists/squeeze/main/installer-mips/current/images/malta/netboot/initrd.gz
$ wget http://ftp.de.debian.org/debian/dists/squeeze/main/installer-mips/current/images/malta/netboot/vmlinux-2.6.32-5-4kc-malta

Note that in Aurélien's tutorial, he used a "qemu" flavoured installer. It seems this is not longer available in Squeeze, just a few others (malta, r4k-ip22, r5k-ip32, sb1-bcm91250a). I just picked the first one and apparently that one works on QEMU.

Also, note that Uwe's PowerPC tutorial suggests downloading a iso cd image and booting from that. I tried that, but QEMU has no BIOS available for MIPS, so this approach didn't work. Instead, you should tell QEMU about the kernel and initrd and let it load them directly.

Booting the installer

You just run QEMU, pointing it at the installer kernel and initrd and passing some extra kernel options to keep it in text mode:

$ qemu-system-mips -hda debian_mips.qcow2 -kernel vmlinux-2.6.32-5-4kc-malta -initrd initrd.gz -append "root=/dev/ram console=ttyS0" -nographic

Now, you get a Debian installer, which you should complete normally.

As Aurélien also noted, you can ignore the error about a missing boot loader, since QEMU will be directly loading the kernel anyway.

After installation is completed and the virtual system is rebooting, terminate QEMU:

$  killall qemu-system-mips

(I haven't found another way of terminating a -nographic QEMU...)

Booting the system

Booting the system is very similar to booting the installer, but we leave out the initrd and point the kernel to the real root filesystem instead.

Note that this boots using the installer kernel. If you later upgrade the kernel inside the system, you'll need to copy the kernel out from /boot in the virtual system into the host system and use that to boot. QEMU will not look inside the virtual disk for a kernel to boot automagically.

$ qemu-system-mips -hda debian_mips.qcow2 -kernel vmlinux-2.6.32-5-4kc-malta -append "root=/dev/sda1 console=ttyS0" -nographic

More features

Be sure to check Aurélien's tutorial for some more features, options and details.

 
0 comments -:- permalink -:- 12:18
Showing 1 - 10 of 145 posts
Copyright by Matthijs Kooijman