These are the ramblings of Matthijs Kooijman, concerning the software he hacks on, hobbies he has and occasionally his personal life.
Most content on this site is licensed under the WTFPL, version 2 (details).
Questions? Praise? Blame? Feel free to contact me.
My old blog (pre-2006) is also still available.
See also my Mastodon page.
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 |
(...), Arduino, AVR, BaRef, Blosxom, Book, Busy, C++, Charity, Debian, Electronics, Examination, Firefox, Flash, Framework, FreeBSD, Gnome, Hardware, Inter-Actief, IRC, JTAG, LARP, Layout, Linux, Madness, Mail, Math, MS-1013, Mutt, Nerd, Notebook, Optimization, Personal, Plugins, Protocol, QEMU, Random, Rant, Repair, S270, Sailing, Samba, Sanquin, Script, Sleep, Software, SSH, Study, Supermicro, Symbols, Tika, Travel, Trivia, USB, Windows, Work, X201, Xanthe, XBee
For a project (building a low-power LoRaWAN gateway to be solar powered) I am looking at simple and low-power linux boards. One board that I came across is the Milk-V Duo, which looks very promising. I have been playing around with it for just a few hours, and I like the board (and its SoC) very much already - for its size, price and open approach to documentation.
The board itself is a small (21x51mm) board in Raspberry Pi Pico form factor. It is very simple - essentially only a SoC and regulators. The SoC is the CV1800B by Sophgo, (a vendor I had never heard of until now, seems they were called CVITEK before). It is based on the RISC-V architecture, which is nice. It contains two RISC-V cores (one at 1Ghz and one at 700Mhz), as well as a small 8051 core for low-level tasks. The SoC has 64MB of integrated RAM.
The SoC supports the usual things - SPI, I²C, UART. There is also a CSI (camera) connector and some AI accelaration block, it seems the chip is targeted at the AI computer vision market (but I am ignoring that for my usecase). The SoC also has an ethernet controller and PHY integrated (but no ethernet connector, so you still need an external magjack to use it). My board has an SD card slot for booting, the specs suggest that there might also be a version that has on-board NAND flash instead of SD (cannot combine both, since they use the same pins).
There are two other variants - the Duo 256M with more RAM (board is identical except for 1 extra power supply, just uses a different SoC with more RAM) and the Duo S (in what looks like Rock Pi S form factor) which adds an ethernet and USB host port. I have not tested either of these and they use a different SoC series (SG200x) of the same chip vendor, so things I write might or might not be applicable to them (but the chips might actually be very similar internally, the CVx to SGx change seems to be related to the company merger, not necessarily technical differences).
The biggest (or at least most distinguishing) selling point, to me, is that both the chip and board manufacturers seem to be going for a very open approach. In particular:
Full datasheets for the SoC are available (the datasheets could be a bit more detailed, but I am under the impression that this is still a bit of a work-in-progress, not that there is a more detailed datasheet under NDA).
The tables (e.g. pinout tables) are not in the datasheet PDF, but separately distributed as a spreadsheet, which is super convenient.
For the the SG200x chips, the datasheet is created using reStructuredText (a text format a bit like markdown but more expressive), and the rst source files are available on github under a BSD license. This is really awesome: it makes contributions to the documentation very easy, and (if they structure things properly when more chips are added) could make it very easy to see what peripherals are the same or different between different chips.
Sophgo seems to be actively trying to get their chips supported by mainline linux (maybe by contributing code directly, or at least by supporting the community to do so), which should make it a lot easier to work with their chips in the future.
Other vendors often just drop a heavily customized and usually older kernel or BSP out there, sometimes updating it to newer versions but not always, and relying on other people to do the work of cleaning up the code and submitting it to linux.
The second core can be used as a microcontroller and Milk-V supports running FreeRTOS on it it, and provides an Arduino core for it (have not looked if it is any good yet). It seems the first core then remains active running Linux, providing a way to flash the secondary core through the primary core.
All this is based on fairly quick observations, so maybe things are not as open and nice and they seem to be at first glance, but it looks like something cool is going on here at least.
Other things I like about the board:
There are also some (potential) downsides that might complicate matters: - Only 64MB RAM is very limited. In practice, some RAM is used for peripherals (I think) too, the default buildroot image has something like half of the RAM available to Linux. Other images configure this differently so full RAM is available to the kernel (leaving 55M for userspace). See this forum topic for more details.
Low memory limits options - people have reported apt needs around 50M to work, which means it ends up using swap and is super slow.
The official Linux distribution from milk-v is a buildroot-built image, which means all software is built into the image directly, no package manager to install extra software afterwards.
The buildroot files are available, so it should be easy too build your own image with extra software, though I think this does mean compiling everything from source.
There does seem to be a lively community of people that are working on making other distributions work on these boards. In most cases this means building a custom kernel for this board (with milk-v/sophgo patches, often using buildroot) and combining it with existing RISC-V packages or a rootfs from these distributions. Sometimes with instructions or a script, sometimes someone just hand-edited an image to make it work.
Hopefully proper support can be added into the actual distributions as well, though a lot of distributions do not really have the machinery to create bootable images for specific boards (i.e. they only support building images for generic BIOS or EFI booting). One distribution that does have this is Armbian, but that is Debian/apt-based so probably needs more than 64MB RAM.
I have briefly tried the Alpine and Arch linux images that are available. Alpine is really tiny, but like the official buildroot image uses musl libc. This is nice and tiny, but not all software plays well with it (and in all cases I think software must be compiled specifically against musl). The main application I needed (The basicstation LoRaWAN forwarder) did not like being compiled against musl (and I did not feel like fixing that, especially since I am doubtful such changes would be merged upstream).
So I am hoping I can use the Arch image, which does use glibc and seems to run basicstation (at least it starts, I have not had the time to reallly set it up yet). Or maybe a Debian/Ubuntu/Armbian image after all - I have also ordered the 256M version (which was not in stock initially).
For an overview of various images created by the community, see this excellent overview page.
It is not entirely clear to me what bootloader is used and how the
devicetree is managed. On most single-board linux devices I know,
there is u-boot with a boot script, which can be configured to load
different devicetrees and overlays to allow configuring the hardware
(e.g. remapping pins as SPI or I²C pins). On the buildroot-image for
the Duo, I could find no evidence of any of this in /boot
, but
I did see u-boot being mentioned in some places, so maybe it is just
configured differently.
Even though the documentation is very open, some of it is a bit hard to find and spread around. Here's some places I found:
The USB datapins are available externally, but only on two pads that need pogopins or something like that to connect to them. Would have been more convenient if these had a proper pin header.
Some of the hardware setup is done with shell scripts that run on startup (for example the USB networking), some of which actually do raw register writes. This is probably something that will be fixed when kernel support improves, but can be fragile until then.
Sales are still quite limited - most of the suppliers linked from the manufacturer pages seem to be China-only, and I have not found the boards at any European webshop yet. I have now ordered from Arace Tech, a chinese webshop that ships internationally and worked well for me (except for to-be-expected long shipping times of a couple of weeks).
So, that was my first impression and thoughts. If I manage to get things running and use this board as part of my LoRaWAN gateway design, I'll probably post a followup with some more experiences. If you have used this board and have things to share, I'm happy to hear them!
On my server, I use LVM for managing partitions. I have one big "data" partition that is stored on an HDD, but for a bit more speed, I have an LVM cache volume linked to it, so commonly used data is cached on an SSD for faster read access.
Today, I wanted to resize the data volume:
# lvresize -L 300G tika/data
Unable to resize logical volumes of cache type.
Bummer. Googling for the error message showed me some helpful posts here and here that told me you have to remove the cache from the data volume, resize the data volume and then set up the cache again.
For this, they used lvconvert --uncache
, which detaches and deletes
the cache volume or cache pool completely, so you then have to recreate
the entire cache (and thus figure out how you created it in the first
place).
Trying to understand my own work from long ago, I looked through
documentation and found the lvconvert --splitcache
in
lvmcache(7), which detached a cache volume or cache pool,
but does not delete it. This means you can resize and just reattached
the cache again, which is a lot less work (and less error prone).
For an example, here is how the relevant volumes look:
# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data tika Cwi-aoC--- 300.00g [data-cache_cvol] [data_corig] 2.77 13.11 0.00
[data-cache_cvol] tika Cwi-aoC--- 20.00g
[data_corig] tika owi-aoC--- 300.00g
Here, data
is a "cache" type LV that ties together the big data_corig
LV
that contains the bulk data and small data-cache_cvol
that contains the
cached data.
After detaching the cache with --splitcache
, this changes to:
# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data tika -wi-ao---- 300.00g
data-cache tika -wi------- 20.00g
I think the previous data
cache LV was removed, data_corig
was renamed to
data
and data-cache_cvol
was renamed to data-cache
again.
Armed with this knowledge, here's how the ful resize works:
lvconvert --splitcache tika/data
lvresize -L 300G tika/data @hdd
lvconvert --type cache --cachevol tika/data-cache tika/data --cachemode writethrough
The last command might need some additional parameters depending on how you set
up the cache in the first place. You can view current cache parameters with
e.g. lvs -a -o +cache_mode,cache_settings,cache_policy
.
Note that all of this assumes using a cache volume an not a cache pool. I was originally using a cache pool setup, but it seems that a cache pool (which splits cache data and cache metadata into different volumes) is mostly useful if you want to split data and metadata over different PV's, which is not at all useful for me. So I switched to the cache volume approach, which needs fewer commands and volumes to set up.
I killed my cache pool setup with --uncache
before I found out about
--splitcache
, so I did not actually try --splitcache
with a cache pool, but
I think the procedure is actually pretty much identical as described above,
except that you need to replace --cachevol
with --cachepool
in the last
command.
For reference, here's what my volumes looked like when I was still using a cache pool:
# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data tika Cwi-aoC--- 260.00g [data-cache] [data_corig] 99.99 19.39 0.00
[data-cache] tika Cwi---C--- 20.00g 99.99 19.39 0.00
[data-cache_cdata] tika Cwi-ao---- 20.00g
[data-cache_cmeta] tika ewi-ao---- 20.00m
[data_corig] tika owi-aoC--- 260.00g
This is a data
volume of type cache, that ties together the big data_corig
LV that contains the bulk data and a data-cache
LV of type cache-pool that
ties together the data-cache_cdata
LV with the actual cache data and
data-cache_cmeta
with the cache metadata.
Interesting, thanks for your post. --splitcache
sounds very neat but as far as I can tell the main advantage is speed (vs --uncache
). When you run lvconvert
to restore the existing cache you are only allowed to proceed if you accept that the entire existing cache contents are wiped.
Yeah, I think the end result is the same, it's just easier to use --splitcache
indeed.
A few months ago, I put up an old Atom-powered Supermicro server (SYS-5015A-PHF) again, to serve at De War to collect and display various sensor and energy data about our building.
The server turned out to have an annoying habit: every now and then it would start beeping (one continuous annoying beep), that would continue until the machine was rebooted. It happened sporadically, but kept coming back. When I used this machine before, it was located in a datacenter where nobody would care about a beep more or less (so maybe it has been beeping for years on end before I replaced the server), but now it was in a server cabinet inside our local Fablab, where there are plenty of people to become annoyed by a beeping server...
I eventually traced this back to faulty sensor readings and fixed this by disabling the faulty sensors completely in the server's IPMI unit, which will hopefully prevent the annoying beep. In this post, I'll share my steps, in case anyone needs to do the same.
At first, I noticed that there was an alarm displayed in the IPMI webinterface for one of the fans. Of course it makes sense to be notified of a faulty fan, except that the system did not have any fans connected... It did show the fan speed as 0RPM (or -2560RPM depending on where you looked) as expected, so I suspected it would start up realizing there was no fan but then sporadically seeing a bit of electrical noise on the fan speed pin, causing it to mark the fan as present and immediately as not running, triggering the alarm. I tried to fix this by shorting the fan speed detection pins to the GND pins to make it more noise-resilient.
However, a couple of weeks later, the server started beeping again. This time I
looked a bit more closely, and found that the problem was caused by too high
temperature this time. The IPMI system event log (queried using ipmi-sel
)
showed:
43 | Feb-17-2023 | 09:18:58 | CPU Temp | Temperature | Upper Non-critical - going high ; Sensor Reading = 125.00 C ; Threshold = 85.00 C
44 | Feb-17-2023 | 09:18:58 | CPU Temp | Temperature | Upper Critical - going high ; Sensor Reading = 125.00 C ; Threshold = 90.00 C
45 | Feb-17-2023 | 09:18:58 | CPU Temp | Temperature | Upper Non-recoverable - going high ; Sensor Reading = 125.00 C ; Threshold = 95.00 C
46 | Feb-17-2023 | 16:26:16 | CPU Temp | Temperature | Upper Non-recoverable - going high ; Sensor Reading = 41.00 C ; Threshold = 95.00 C
47 | Feb-17-2023 | 16:26:16 | CPU Temp | Temperature | Upper Critical - going high ; Sensor Reading = 41.00 C ; Threshold = 90.00 C
48 | Feb-17-2023 | 16:26:16 | CPU Temp | Temperature | Upper Non-critical - going high ; Sensor Reading = 41.00 C ; Threshold = 85.00 C
This is abit opaque, but the events at 9:18 show the temperature was read as 125°C - clearly indicating a faulty sensor. These are (I presume) the "asserted" events for each of the thresholds that this sensor has. Then at 16:26, the server was rebooted and the sensor read 41°C again (which I believe is still higher than realistic) and each of the thresholds emits a "deasserted" event.
Looking back, I noticed that the log showed events for both fans and both
temperature sensors, so it seemed all of these sensors were really wonky. I
could also see the incorrect temperatures clearly in the sensor data I had been
collecting from the server (using telegraf, collected using lm-sensors
from
within the linux system itself, but clearly reading from the same sensor as
IPMI):
Note that the graph above shows two sensors, while IPMI only reads two, so I am not sure what the third one is. The alarm from the IPMI log is shown clearly as a sudden jump of the temp2 purple line (jumping back down when the server was rebooted). But also note an unexplained second jump down a few hours later, and note that the next day temp1 dives down to -53°C for some reason, which also matches what IPMI reads:
$ sudo ipmitool sensor
System Temp | -53.000 | degrees C | nr | -9.000 | -7.000 | -5.000 | 75.000 | 77.000 | 79.000
CPU Temp | 27.000 | degrees C | ok | -11.000 | -8.000 | -5.000 | 85.000 | 90.000 | 95.000
CPU FAN | -2560.000 | RPM | nr | 400.000 | 585.000 | 770.000 | 29260.000 | 29815.000 | 30370.000
SYS FAN | -2560.000 | RPM | nr | 400.000 | 585.000 | 770.000 | 29260.000 | 29815.000 | 30370.000
CPU Vcore | 1.160 | Volts | ok | 0.640 | 0.664 | 0.688 | 1.344 | 1.408 | 1.472
Vichcore | 1.040 | Volts | ok | 0.808 | 0.824 | 0.840 | 1.160 | 1.176 | 1.192
+3.3VCC | 3.280 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712
VDIMM | 1.824 | Volts | ok | 1.448 | 1.480 | 1.512 | 1.960 | 1.992 | 2.024
+5 V | 5.056 | Volts | ok | 4.096 | 4.320 | 4.576 | 5.344 | 5.600 | 5.632
+12 V | 11.904 | Volts | ok | 10.368 | 10.496 | 10.752 | 12.928 | 13.056 | 13.312
+3.3VSB | 3.296 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712
VBAT | 2.912 | Volts | ok | 2.560 | 2.624 | 2.688 | 3.328 | 3.392 | 3.456
Chassis Intru | 0x0 | discrete | 0x0000| na | na | na | na | na | na
PS Status | 0x1 | discrete | 0x01ff| na | na | na | na | na | na
Note that the voltage sensors show readings that do make sense, and looking at the history, they show no sudden jumps, so those are probably still reliably (even though they are read from the same sensor chip according to lm-sensors).
It seems you can disable generation of events when a threshold is crossed, can even disable reading the sensor entirely. Hopefully this will also prevent the BMC from beeping on weird sensor values.
To disable things, I used ipmi-sensor-config
(from the freeipmi-tools
Debian package):
First I queried the current sensor configuration:
sudo ipmi-sensors-config --checkout > ipmi-sensors-config.txt
Then I edited the generated file, setting Enable_All_Event_Messages
and
Enable_Scanning_On_This_Sensor
to No
. I also had to set the hysteresis
values for the fans to None
, since the -2375 value generated by
--checkout
was refused when writing back the values in the next step.
Commited the changes with:
sudo ipmi-sensors-config --commit --filename ipmi-sensors-config.txt
I suspect that modifying Enable_All_Event_Messages
allows the sensor to be
read, but prevents the threshold from being checked and generating events
(especially since this setting seems to just clear the corresponding setting
for each available threshold, so it seems you can also use this to disable some
of the thresholds and keep some others). However, it is not entirely clear to
me if this would just prevent these events from showing up in the event log, or
if it would actually prevent the system from beeping (when does the system
beep? On any event? Specific events? This is not clear to me).
For good measure, I decided to also modify Enable_Scanning_On_This_Sensor
,
which I believe prevents the sensor from being read at all by the BMC, so that
should really prevent alarms. This also causes ipmitool sensor
to display
value and status as na
for these sensors. The sensors
command (from the
lm-sensors
package) can still read the sensor without issues, though the
values are not very useful anyway...).
Note that apparently these settings are not always persistent across reboots and powercycles, so make sure you test that. For this particular server, the settings survive across a reboot, I have not tested a hard power cycle yet.
I cannot yet tell for sure if this has fixed the problem (only applied the changes today), but I'm pretty confident that this will indeed keep the people in our Fablab happy (and if not - I'll just solder off the beeper from the motherboard, but let's hope I will not have to resort to such extreme measures...).
When sorting out some stuff today I came across an "Ecobutton". When you attach it through USB to your computer and press the button, your computer goes to sleep (at least that is the intention).
The idea is that it makes things more sustainable because you can more easily put your computer to sleep when you walk away from it. As this tweakers poster (Dutch) eloquently argues, having more plastic and electronics produced in China, shipped to Europe and sold here for €18 or so probably does not have a net positive effect on the environment or your wallet, but well, given this button found its way to me, I might as well see if I can make it do something useful.
I had previously started a project to make a "Next" button for spotify that you could carry around and would (wirelessly - with an ESP8266 inside) skip to the next song using the Spotify API whenever you pressed it. I had a basic prototype working, but then the project got stalled on figuring out an enclosure and finding sufficiently low-power addressable RGB LEDs (documentation about this is lacking, so I resorted to testing two dozen different types of LEDs and creating a website to collect specs and test results for adressable LEDs, which then ended up with the big collection of other Yak-shaving projects waiting for this magical moment where I suddenly have a lot of free time).
In any case, it seemed interesting to see if this Ecobutton could be used as poor-man's spotify next button. Not super useful, but at least now I can keep the button around knowing I can actually use it for something in the future. I also produced some useful (and not readily available) documentation about remapping keys with hwdb in the process, so it was at least not a complete waste of time... Anyway, into the technical details...
I expected this to be an USB device that advertises itself as a keyboard, and then whenever you press the button, sends the "sleep" key to put the computer to sleep.
Turns out the first part was correct, but instead of sending a sleep keypress, it sends Meta-R, ecobutton (it types each of these letters after each other), enter. Seems you need to install a tool on your pc that is executed by using the windows-key+R shortcut. Pragmatic, but quite ugly, especially given a sleep key exists... But maybe Windows does not implement the key (or maybe this tool also changes some settings for deeper sleep, at least that's what was suggested in the tweakers post linked above).
I considered I could maybe replace the firmware to make the device send whatever keystroke I wanted, but writing firmware from scratch for existing hardware is not the easiest project (even for a simple device like this). After opening the device I decided this was not a feasible route.
The (I presume) microcontroller in there is hidden in a blob, so no indication as to its type, pin connections, programming ports (if it actually has flash and is not ROM only).
I did notice some solder jumpers that I figured could influence behavior (maybe the same PCB is used for differently branded buttons), but shorting S1 or S5 did not seem to change behavior (maybe I should have unsoldered S3, but well...).
The next alternative is to remap keys on the computer. Running Linux, this should certainly be possible in a couple of dozen ways. This does need to be device-specific remapping, so my normal keyboard still works as normal, but if I can unmap all keys except for the meta key that it presses first, and map that to someting like the KEY_NEXTSONG (which is handled by spotify and/or Gnome already), that might work.
I first saw some remapping solutions for X, but those probably will not work - I'm running wayland and I prefer something more low-level. I also found cool remapping daemons (like keyd) that grab events from a keyboard and then synthesise new events on a virtual keyboard, allowing cool things like multiple layers or tapping shift and then another key to get uppercase instead of having to hold shift and the key together, but that is way more complicated than what I need here.
Then I found that udev has a thing called "hwdb", which allows putting
files in /etc/udev/hwdb.d
that match specific input devices and can
specify arbitrary scancode-to-keycode mappings for them. Exactly what
I needed - works out of the box, just drop a file into /etc
.
The challenge turned out to be to figure out how to match against my
specific keyboard identifier, what scancodes and keycodes to use, and
in general figure out how the ecosystem around this works (In short:
when plugging in a device, udev rules consults the hwdb for extra device
properties, which a udev builtin keyboard
command then uses to apply
key remappings in the kernel using an ioctl on the /dev/input/eventxx
device). In case you're wondering - this means you do not need to use
hwdb, you can also apply this from udev rules directly, but then you
need a bit more care.
I've written down everything I figured about hwdb in a post on Unix stackexchange, so I'l not repeat everything here.
Using what I had learnt, getting the button to play nice was a matter of
creating /etc/udev/hwdb.d/99-ecobutton.hwdb
containing:
evdev:input:b????v3412p7856e*
KEYBOARD_KEY_700e3=nextsong # LEFTMETA
KEYBOARD_KEY_70015=reserved # R
KEYBOARD_KEY_70008=reserved # E
KEYBOARD_KEY_70006=reserved # C
KEYBOARD_KEY_70012=reserved # O
KEYBOARD_KEY_70005=reserved # B
KEYBOARD_KEY_70018=reserved # U
KEYBOARD_KEY_70017=reserved # T
KEYBOARD_KEY_70011=reserved # N
KEYBOARD_KEY_70028=reserved # ENTER
This matches the keyboard based on its usb vendor and product id (3412:7856) and then disables all keys that are used by the button, except for the first, and remaps that to KEY_NEXTSONG.
To apply this new file, run sudo systemd-hwdb update
to recompile the
database and then replug the button to apply it (you can also re-apply
with udevadm trigger
, but it seems then Gnome does not pick up the
change, I suspect because the gnome-settings media-keys module checks
only once whether a keyboard supports media keys at all and ignores it
otherwise).
With that done, it now produces KEY_NEXTSONG events as expected:
$ sudo evtest --grab /dev/input/by-id/usb-3412_7856-event-if00
Input driver version is 1.0.1
Input device ID: bus 0x3 vendor 0x3412 product 0x7856 version 0x100
Input device name: "HID 3412:7856"
[ ... Snip more output ...]
Event: time 1675514344.256255, type 4 (EV_MSC), code 4 (MSC_SCAN), value 700e3
Event: time 1675514344.256255, type 1 (EV_KEY), code 163 (KEY_NEXTSONG), value 1
Event: time 1675514344.256255, -------------- SYN_REPORT ------------
Event: time 1675514344.264251, type 4 (EV_MSC), code 4 (MSC_SCAN), value 700e3
Event: time 1675514344.264251, type 1 (EV_KEY), code 163 (KEY_NEXTSONG), value 0
Event: time 1675514344.264251, -------------- SYN_REPORT ------------
More importantly, I can now skip annoying songs (or duplicate songs - spotify really messes this up) with a quick butttonpress!
Maybe you missed the fact that it is also possible to keep the button pressed in order to open a website. This could be used for another function by mapping a key that is only included in that link like /.
To map out the other url keys, my file now looks like this:
evdev:input:b????v3412p7856e*
KEYBOARD_KEY_700e3=nextsong # LEFTMETA
KEYBOARD_KEY_70015=reserved # R
KEYBOARD_KEY_70008=reserved # E
KEYBOARD_KEY_70006=reserved # C
KEYBOARD_KEY_70012=reserved # O
KEYBOARD_KEY_70005=reserved # B
KEYBOARD_KEY_70018=reserved # U
KEYBOARD_KEY_70017=reserved # T
KEYBOARD_KEY_70011=reserved # N
KEYBOARD_KEY_7000b=reserved # H
KEYBOARD_KEY_70013=reserved # P
KEYBOARD_KEY_700e1=reserved # LEFTSHIFT
KEYBOARD_KEY_70054=reserved # /
KEYBOARD_KEY_7001a=reserved # W
KEYBOARD_KEY_70037=reserved # .
KEYBOARD_KEY_7002d=reserved # -
KEYBOARD_KEY_70010=reserved # M
KEYBOARD_KEY_70016=reserved # S
KEYBOARD_KEY_70033=reserved # ;
KEYBOARD_KEY_70028=reserved # ENTER
When the button is pressed for 3 seconds, the same happens as when pressed shortly.
> Maybe you missed the fact that it is also possible to keep the button pressed in order to open a website. This could be used for another function by mapping a key that is only included in that link like /.
Ah, I indeed missed that. Thanks for pointing that out and the updated file :-)
Recently, a customer asked me te have a look at an external hard disk he was using with his Macbook. It would show up a file listing just fine, but when trying to open actual files, it would start failing. Of course there was no backup, but the files were very precious...
This started out as a small question, but ended up in an adventure that spanned a few days and took me deep into the ddrescue recovery tool, through the HFS+ filesystem and past USB power port control. I learned a lot, discovered some interesting things and produced a pile of scripts that might be helpful to others. Since the journey seems interesting as well as the end result, I will describe the steps I took here, "ter leering ende vermaeck".
I started out confirming the original problem. Plugging in the disk to my Linux laptop, it showed up as expected in dmesg. I could mount the disk without problems, see the directory listing and even open up an image file stored on the disk. Opening other files didn't seem to work.
As you do with bad disks, you try to get their SMART data. Since
smartctl
did not support this particular USB bridge (and I wasn't game
to try random settings to see if it worked on a failing disk), I gave up
on SMART initially. I later opened up the case to bypassing the
USB-to-SATA controller (in case the problem was there, and to make SMART
work), but found that this particular hard drive had the converter
built into the drive itself (so the USB part was directly attached to
the drive). Even later, I found out some page online (I have not saved
the link) that showed the disk was indeed supported by smartctl
and
showed the option to pass to smartctl -d
to make it work. SMART
confirmed that the disk was indeed failing, based on the number of
reallocated sectors (2805).
Since opening up files didn't work so well, I prepared to make a
sector-by-sector copy of the partition on the disk, using ddrescue
.
This tool has a good approach to salvaging data, where it tries to copy
off as much data as possible quickly, skipping data when it comes to a
bad area on disk. Since reading a bad sector on a disk often takes a lot
of time (before returning failure), ddrescue tries to steer clear of
these bad areas and focus on the good parts first. Later, it returns to
these bad areas and, in a few passes, tries to get out as much data as
possible.
At first, copying data seemed to work well, giving a decent read speed of some 70MB/s as well. But very quickly the speed dropped terribly and I suspected the disk ran into some bad sector and kept struggling with that. I reset the disk (by unplugging it) and did a few more attempts and quickly discovered something weird: The disk would work just fine after plugging it in, but after a while the speed would plummet tot a whopping 64Kbyte/s or less. This happened everytime. Even more, it happened pretty much exactly 30 seconds after I started copying data, regardless of what part of the disk I copied data from.
So I quickly wrote a one-liner script that would start ddrescue, kill it after 45 seconds, wait for the USB device to disappear and reappear, and then start over again. So I spent some time replugging the USB cable about once every minute, so I could at least back up some data while I was investigating other stuff.
Since the speed was originally 70MB/s, I could pull a few GB worth of data every time. Since it was a 2000GB disk, I "only" had to plug the USB connector around a thousand times. Not entirely infeasible, but not quite comfortable or efficient either.
So I investigated ways to further automate this process: Using hdparm
to spin down or shutdown the disk, use USB powersaving to let the disk
reset itself, disable the USB subsystem completely, but nothing seemed
to increase the speed again other than completely powering down the disk
by removing the USB plug.
While I was trying these things, the speed during those first 30 seconds dropped, even below 10MB/s at some point. At that point, I could salvage around 200MB with each power cycle and was looking at pulling the USB plug around 10,000 times: no way that would be happening manually.
I resolved to further automate this unplugging and planned using an Arduino (or perhaps the GPIO of a Raspberry Pi) and something like a relay or transistor to interrupt the power line to the hard disk to "unplug" the hard disk.
For that, I needed my Current measuring board to easily
interrupt the USB power lines, which I had to bring from home. In the
meanwhile, I found uhubctl
, a small tool that uses
low-level USB commands to individually control the port power on some
hubs. Most hubs don't support this (or advertise support, but simply
don't have the electronics to actually switch power, apparently), but I
noticed that the newer raspberry pi's supported this (for port 2 only,
but that would be enough).
Coming to the office the next day, I set up a raspberry pi and tried
uhubctl
. It did indeed toggle USB power, but the toggle would affect
all USB ports at the same time, rather than just port 2. So I could
switch power to the faulty drive, but that would also cut power to the
good drive that I was storing the recovered data on, and I was not quite
prepared to give the good drive 10,000 powercycles.
The next plan was to connect the recovery drive through the network, rather than directly to the Raspberry Pi. On Linux, setting up a network drive using SSHFS is easy, so that worked in a few minutes. However, somehow ddrescue insisted it could not write to the destination file and logfile, citing permission errors (but the permissions seemed just fine). I suspect it might be trying to mmap or something else that would not work across SSHFS....
The next plan was to find a powered hub - so the recovery drive could
stay powered while the failing drive was powercycled. I rummaged around
the office looking for USB hubs, and eventually came up with some
USB-based docking station that was externally powered. When connecting
it, I tried the uhubctl
tool on it, and found that one of its six
ports actually supported powertoggling. So I connected the failing drive
to that port, and prepared to start the backup.
When trying to mount the recovery drive, I discovered that a Raspberry pi only supports filesystems up to 2TB (probably because it uses a 32-bit architecture). My recovery drive was 3TB, so that would not work on the Pi.
Time for a new plan: do the recovery from a regular PC. I already had
one ready that I used the previous day, but now I needed to boot a
proper Linux on it (previously I used a minimal Linux image from
UBCD, but that didn't have a compiler
installed to allow using uhubctl
). So I downloaded a Debian live image
(over a mobile connection - we were still waiting for fiber to be
connected) and 1.8GB and 40 minutes later, I finally had a working
setup.
The run.sh
script I used to run the backup basically does this:
By now, the speed of recovery had been fluctuating a bit, but was between 10MB/s and 30MB/s. That meant I was looking at some thousands up to ten thousands powercycles and a few days up to a week to backup the complete disk (and more if the speed would drop further).
Realizing that there would be a fair chance that the disk would indeed get slower, or even die completely due to all these power cycles, I had to assume I could not backup the complete disk.
Since I was making the backup sector by sector using ddrescue
, this
meant a risk of not getting any meaningful data at all. Files are
typically fragmented, so can be stored anywhere on the disk, possible
spread over multiple areas as well. If you just start copying at the
start of the disk, but do not make it to the end, you will have backed
some data but the data could belong to all kinds of different files.
That means that you might have some files in a directory, but not
others. Also, a lot of files might only be partially recovered, the
missing parts being read as zeroes. Finally, you will also end up
backing up all unused space on the disk, which is rather pointless.
To prevent this, I had to figure out where all kinds of stuff was stored on the disk.
The first step was to make sure the backup file could be mounted (using a loopback device). On my first attempt, I got an error about an invalid catalog.
I looked around for some documentation about the HFS+ filesystems, and found a nice introduction by infosecaddicts.com and a more detailed description at dubeiko.com. The catalog is apparently the place where the directory structure, filenames, and other metadata are stored in a single place.
This catalog is not in a fixed location (since its size can vary), but its location is noted in the so-called volume header, a fixed-size datastructure located at 1024 bytes from the start of the partition. More details (including easier to read offsets within the volume header) are provided in this example.
Looking at the volume header inside the backup, gives me:
root@debian:/mnt/recover/WD backup# dd if=backup.img bs=1024 skip=1 count=1 2> /dev/null | hd
00000000 48 2b 00 04 80 00 20 00 48 46 53 4a 00 00 3a 37 |H+.... .HFSJ..:7|
00000010 d4 49 7e 38 d8 05 f9 64 00 00 00 00 d4 49 1b c8 |.I~8...d.....I..|
00000020 00 01 24 7c 00 00 4a 36 00 00 10 00 1d 1a a8 f6 |..$|..J6........|
^^^^^^^^^^^ Block size: 4096 bytes
00000030 0e c6 f7 99 14 cd 63 da 00 01 00 00 00 01 00 00 |......c.........|
00000040 00 02 ed 79 00 6e 11 d4 00 00 00 00 00 00 00 01 |...y.n..........|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 a7 f6 0c 33 80 0e fa 67 |...........3...g|
00000070 00 00 00 00 03 a3 60 00 03 a3 60 00 00 00 3a 36 |......`...`...:6|
00000080 00 00 00 01 00 00 3a 36 00 00 00 00 00 00 00 00 |......:6........|
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000000c0 00 00 00 00 00 e0 00 00 00 e0 00 00 00 00 0e 00 |................|
000000d0 00 00 d2 38 00 00 0e 00 00 00 00 00 00 00 00 00 |...8............|
000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000110 00 00 00 00 12 60 00 00 12 60 00 00 00 01 26 00 |.....`...`....&.|
00000120 00 0d 82 38 00 01 26 00 00 00 00 00 00 00 00 00 |...8..&.........|
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000160 00 00 00 00 12 60 00 00 12 60 00 00 00 01 26 00 |.....`...`....&.|
00000170 00 00 e0 38 00 01 26 00 00 00 00 00 00 00 00 00 |...8..&.........|
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400
00000110 00 00 00 00 12 60 00 00 12 60 00 00 00 01 26 00 |.....`...`....&.|
^^^^^^^^^^^^^^^^^^^^^^^ Catalog size, in bytes: 0x12600000
00000120 00 0d 82 38 00 01 26 00 00 00 00 00 00 00 00 00 |...8..&.........|
^^^^^^^^^^^ First extent size, in 4k blocks: 0x12600
^^^^^^^^^^^ First extent offset, in 4k blocks: 0xd8238
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
I have annotated the parts that refer to the catalog. The content of the catalog (just like all other files), are stored in "extents". An extent is a single, contiguous block of storage, that contains (a part of) the content of a file. Each file can consist of multiple extents, to prevent having to move file content around each time things change (e.g. to allow fragmentation).
In this case, the catalog is stored only in a single extent (since the
subsequent extent descriptors have only zeroes). All extent offsets and
sizes are in blocks of 4k byte, so this extent lives at 0xd8238 * 4k =
byte 3626205184 (~3.4G) and is 0x12600 * 4k = 294MiB long. So I backed
up the catalog by adding -i 3626205184
to ddrescue, making it skip
ahead to the location of the catalog (and then power cycled a few times
until it copied the needed 294MiB).
After backup the allocation file, I could mount the image file just fine, and navigate the directory structure. Trying to open files would mostly fail, since the most files would only read zeroes now.
I did the same for the allocation file (which tracks free blocks), the extents file (which tracks the content of files that are more fragmented and whose extent list does not fit in the catalog) and the attributes file (not sure what that is, but for good measure).
Afterwards, I wanted to continue copying from where I previously left
off, so I tried passing -i 0
to ddrescue, but it seems this can only
be used to skip ahead, not back. In the end, I just edited the logfile,
which is just a textfile, to set the current position to 0. ddrescue
is smart enough to skip over blocks it already backed up (or marked as
failed), so it then continued where it previously left off.
With the catalog backed up, I needed to read it to figure out what file were stored where, so I could make sure the most important files were backed up first, followed by all other files, skipping any unused space on the disk.
I considered and tried some tools for reading the catalog directly, but none of them seemed workable. I looked at hfssh from hfsutils (which crashed), hfsdebug (which is discontinued and no longer available for download), hfsinspect (which calsl itself "quite buggy").
Instead, I found the filefrag
commandline utility that uses a Linux
filesystem syscall to figure out where the contents of a particular file
is stored on disk. To coax the output of that tool into a list of
extents usable by ddrescue, I wrote a oneliner shell script called
list-extents.sh
:
sudo filefrag -e "$@" | grep '^ ' |sed 's/\.\./:/g' | awk -F: '{print $4, $6}'
Given any number of filenames, it produces a list of (start, size) pairs for each extent in the listed files (in 4k blocks, which is the Linux VFS native block size).
With the backup image loopback-mounted at /mnt/backup
, I could then
generate an extent list for a given subdirectory using:
sudo find /mnt/backup/SomeDir -type f -print0 | xargs -0 -n 100 ./list-extents.sh > SomeDir.list
To turn this plain list of extents into a logfile usable by ddrescue, I
wrote another small script called post-process.sh
, that
adds the appropriate header, converts from 4k blocks to 512-byte
sectors, converts to hexadecimal and sets the right device size (so if
you want to use this script, edit it with the right size). It is called
simply like this:
./post-process.sh SomeDir.list
This produces two new files: SomeDir.list.done
, in which all of the
selected files are marked as "finished" (and all other blocks as
"non-tried") and SomeDir.list.notdone
which is reversed (all selected
files are marked as "non-tried" and all others are marked as "finished").
Edit: Elmo pointed out that all the mapfile manipulation with
ddrescuelog was not actually needed if I had know about ddrescue's
--domain-mapfile
option, which passes a second mapfile to ddrescue and
makes it only process blocks that are marked in that finished mapfile,
while presumably reading and updating the regular mapfile as normal.
Armed with a couple of these logfiles for the most important files on
the disk and one for all files on the disk, I used the ddrescuelog
tool to tell ddrescue
what stuff to work on first. The basic idea is
to mark everything that is not important as "finished", so ddrescue will
skip over it and only work on the important files.
ddrescuelog backup.logfile --or-mapfile SomeDir.list.notdone | tee todo.original > todo
This uses the ddrescuelog --or-mapfile
option, which takes my existing
logfile (backup.logfile
) and marks all bytes as finished that are
marked as finished in the second file (SomeDir.list.notdone
). IOW, it
marks all bytes that are not part of SomeDir
as done. This generates
two copies (todo
and todo.original
) of the result, I'll explain why
in a minute.
With the generated todo
file, we can let ddrescue
run (though I used
the run.sh
script instead):
# Then run on the todo file
sudo ddrescue -d /dev/sdd2 backup.img todo -v -v
Since the generation of the todo
file effectively threw away
information (we can not longer see from the todo
file what parts of
the non-important sectors were already copied, or had errors, etc.), we
need to keep the original backup.logfile
around too. Using the
todo.original
file, we can figure out what the last run did, and
update backup.logfile
accordingly:
ddrescuelog backup.logfile --or-mapfile <(ddrescuelog --xor-mapfile todo todo.original) > newbackup.logfile
Note that you could also use SomeDir.list.done
here, but actually
comparing todo
and todo.original
helps in case there were any errors in
the last run (so the error sectors will not be marked as done and can be
retried later).
With backup.logfile
updated, I could move on to the next
subdirectories, and once all of the important stuff was done, I did the
same with a list of all file contents to make sure that all files were
properly backed up.
Now, I had the contents of all files backed up, so the data was nearly
safe. I did however find that the disk contained a number of
hardlinks, and/or symlinks, which did not work. I did not dive
into the details, but it seems that some of the metadata and perhaps
even file content is stored in a special "metadata directory", which is
hidden by the Linux filesystem driver. So my filefrag
-based "All
files"-method above did not back up sufficient data to actually read
these link files from the backup.
I could have figured out where on disk these metadata files were stored and do a backup of that, but then I still might have missed some other special blocks that are not part of the regular structure. I could of course back up every block, but then I would be copying around 1000GB of mostly unused space, of which only a few MB or GB would actually be relevant.
Instead, I found that HFS+ keeps an "allocation file". This file contains a single bit for each block in the filesystem, to store whether the block is allocated (1) or free (0). Simply looking a this bitmap and backing up all blocks that are allocated should make sure I had all data, and only left unused blocks behind.
The position of this allocation file is stored in the volume header, just like the catalog file. In my case, it was stored in a single extent, making it fairly easy to parse.
The volume header says:
00000070 00 00 00 00 03 a3 60 00 03 a3 60 00 00 00 3a 36 |......`...`...:6|
^^^^^^^^^^^^^^^^^^^^^^^ Allocation file size, in bytes: 0x12600000
00000080 00 00 00 01 00 00 3a 36 00 00 00 00 00 00 00 00 |......:6........|
^^^^^^^^^^^ First extent size, in 4k blocks: 0x3a36
^^^^^^^^^^^ First extent offset, in 4k blocks: 0x1
This means the allocation file takes up 0x3a36 blocks (of 4096 bytes of 8 bits each, so it can store the status of 0x3a36 * 4k * 8 = 0x1d1b0000 blocks, which is rounded up from the total size of 0x1d1aa8f6 blocks).
First, I got the allocation file off the disk image (this uses bash arithmetic expansion to convert hex to decimal, you can also do this manually):
dd if=/dev/backup of=allocation bs=4096 skip=1 count=$((0x3a36))
Then, I wrote a small python script
parse-allocation-file.py
to parse the
allocate file and output a ddrescue mapfile. I started out in bash, but
that got tricky with bit manipulation, so I quickly converted to Python.
The first attempt at this script would just output a single line for
each block, to let ddrescuelog
merge adjacent blocks, but that would
produce such a large file that I stopped it and improved the script to
do the merging directly.
cat allocation | ./parse-allocation-file.py > Allocated.notdone
This produces an Allocated.notdone
mapfile, in which all free blocks
are marked as "finished", and all allocated blocks are marked as
"non-tried".
As a sanity check, I verified that there was no overlap between the non-allocated areas and all files (i.e. the output of the following command showed no done/rescued blocks):
ddrescuelog AllFiles.list.done --and-mapfile Allocated.notdone | ddrescuelog --show-status -
Then, I looked at how much data was allocated, but not part of any file:
ddrescuelog AllFiles.list.done --or-mapfile Allocated.notdone | ddrescuelog --show-status -
This marked all non-allocated areas and all files as done, leaving a whopping 21GB of data that was somehow in use, but not part of any files. This size includes stuff like the volume header, catalog, the allocation file itself, but 21GB seemed a lot to me. It also includes the metadata file, so perhaps there's a bit of data in there for each file on disk, or perhaps the file content of hard linked data?
Armed with my Allocated.notdone
file, I used the same commands as before
to let ddrescue
backup all allocated sectors and made sure all data
was safe.
For good measure, I let ddrescue
then continue backing up the
remainder of the disk (e.g. all unallocated sectors), but it seemed the
disk was nearing its end now. The backup speed (even during the "fast"
first 30 seconds) had dropped to under 300kB/s, so I was looking at a
couple of more weeks (and thousands of powercycles) for the rest of the
data, assuming the speed did not drop further. Since the rest of the
backup should only be unused space, I shut down the backup and focused
on the recovered data instead.
What was interesting, was that during all this time, the number of reallocated sectors (as reported by SMART) had not increased at all. So it seems unlikely that the slowness was caused by bad sectors (unless the disk firmware somehow tried to recover data from these reallocated sectors in the background and locked up itself in the process). The slowness also did not seem related to what sectors I had been reading. I'm happy that the data was recovered, but I honestly cannot tell why the disk was failing in this particular way...
In case you're in a similar position, the scripts I wrote are available for download.
So, with a few days of work, around a week of crunch time for the hard disk and about 4,000 powercycles, all 1000GB of files were safe again. Time to get back to some real work :-)
You can use the ddrescue -m
option to provide a 'domain' mapfile that tells ddrescue to only work on that part of the disk. This avoids all the mapfile manipulation youo had to go through.
Thanks for the tip, had I realized that option existed, that would have saved quite a fiddling with mapfiles. I've added a remark to my post about this!
After I recently ordered a new laptop, I have been looking for a USB-C-connected dock to be used with my new laptop. This turned out to be quite complex, given there are really a lot of different bits of technology that can be involved, with various (continuously changing, yes I'm looking at you, USB!) marketing names to further confuse things.
As I'm prone to do, rather than just picking something and seeing if it works, I dug in to figure out how things really work and interact. I learned a ton of stuff in a short time, so I really needed to write this stuff down, both for my own sanity and future self, as well as for others to benefit.
I originally posted my notes on the Framework community forum, but it seemed more appropriate to publish them on my own blog eventually (also because there's no 32,000 character limit here :-p).
There are still quite a few assumptions or unknowns below, so if you have any confirmations, corrections or additions, please let me know in a reply (either here, or in the Framework community forum topic).
Parts of this post are based on info and suggestions provided by others on the Framework community forum, so many thanks to them!
First off, I can recommend this article with a bit of overview and history of the involved USB and Thunderbolt technolgies.
Then, if you're looking for a dock, like I was, the Framework community forum has a good list of docks (focused on Framework operability), and Dan S. Charlton published an overview of Thunderbolt 4 docks and an overview of USB-C DP-altmode docks (both posts with important specs summarized, and occasional updates too).
Then, into the details...
lsusb -v
to tell).[source]
[source]
[source]
Because in DP alt mode all four lines can be used unidirectionally (unlike USB, which is always full-duplex), this means the effective bandwidth for driving a display can be twice as much in alt mode than when tunneling DP over USB4 or using DisplayLink over USB3.
In practice though, DP-altmode on devices usually supports only DP1.4 (HBR3 = 8.1Gbps-per-line) for 4x8.1 = 32.4Gbps of total and unidirectional bandwidth, which is less than TB3/4 with its 20Gbps-per-line for 2x20 = 40Gbps of full-duplex bandwidth. This will change once devices start support DP-altmode 2.0 (UHBR20 = 20Gbps-per-line) for 4x20=80Gbps of unidirectional bandwidth.
[source] and [source] and [source]
[source]
boltctl
(e.g. boltctl
list -a
).echo "module thunderbolt +p" > /sys/kernel/debug/dynamic_debug/control
, plug in your dock and check dmesg
(which will call the USB4 controller/router in the the dock a "USB4 switch" and its interfaces "Ports").For DP:
Bandwidth allocation for DP links happens based on the maximum bandwidth for the negotiated link rate (e.g. HBR3) and seems to happen on a first-come first-served basis. For example, if the first display negotiates 4xHBR3, this takes up 25.92Gbs (after encoding) of bandwidth, leaving only 2xHBR3 or 4xHBR1 for a second display connected.
This means that the order of connecting displays can be relevant to the supported resolutions on each (on multi-output hubs with displays already connected, it seems the hub typically initializes displays in a fixed order).
If the actual bandwidth for the resolution used is less than the allocated bandwidth, the extra bandwidth is not lost, but can still be used for other traffic (like bulk USB traffic, which does not need allocated / reserved bandwidth). [source]
[source] and [source] and [source]
As mentioned, different protocols use different bitrates and different encodings. Here's an overview of these, with the effective transfer rate (additional protocol overhead like packet headers, error correction, etc. still needs to be accounted for, so i.e. effective transfer rate for a mass-storage device will be lower still). Again, these are single-line bandwidths, so total bandwidth is x4 (unidirectional) for DP and x2 (full-duplex) for TB/USB3.2/USB4.
Note that for TB3, the rounded 10/20Gbps speeds are after encoding, so the raw bitrate is actually slightly higher. Thunderbolt 4 is not listed as (AFAICT) this just uses USB4 transfer modes.
[source]
That's it for now. Kudos if you made it this far! As always, feel free to leave a comment if this has been helpful to you, or if you have any suggestions or questions!
Update 2022-03-29: Added "Understanding tunneling and bandwidth limitations" section and some other small updates.
Update 2022-06-10: Mention boltctl
as a tool for getting device info.
This has dramatically improved my understanding of USB, which is even more complicated than I thought. Excellent research!
> This has dramatically improved my understanding of USB, which is even more complicated than I thought. Excellent research!
Good to hear! And then I did not even dive into the details of how USB itself works, with all its descriptors, packet types, transfer modes, and all other complex stuff that is USB (at least 1/2, I haven't looked at USB3 in that much detail yet...).
Hi there are 4 corrupted links with a slipped ")". Just search for ")]".
Ah, I copied the markdown from my original post on Discourse, and it seems my own markdown renderer is slightly different when there are closing parenthesis inside a URL. Urlencoding them fixed it. Thanks!
Thank you for sharing! Which dock (hub?) did you end up buying?
Hey Jens, good question!
I ended up getting the Caldigit TS4, which is probably one of the most expensive ones, but I wanted to get at least TB4 (since USB4 has public specifications, and supports wake-from-sleep, unlike TB3) and this one has a good selection of ports, the host port on the back side (which I think ruled out 70% of the available products or so - I used this great list to make the initial selection).
See also this post for my initial experiences with that dock.
> It is a bit unclear to me how these signals are multiplexed. Apparently TB2 combines all four lines into a single full-duplex channel, suggesting that on TB1 there are two separate channels, but does that mean that one channel carries PCIe and one carries DP on TB1? Or each device connected is assigned to (a part of) either channel?
No. Each Thunderbolt controller contains a switch. Protocol adapters for PCIe and DP attach to the switch and translate between that protocol and the low-level Thunderbolt transport protocol. The protocols are tunneled transparently over the switching fabric and packets for different protocols may be transmitted on the same lane in parallel.
> TB1/2 are backwards compatible with DP, so you can plug in a non-TB DP display into a TB1/2 port. I suspect these ports start out as DP ports and there is some negotiation over the aux channel to switch into TB mode, but I could not find any documentation about this.
When Apple integrated the very first Copper-based Thunderbolt controller, Light Ridge, into their products, they put a mux and redrivers on the mainboard which switches the plug's signals between the Thunderbolt controller and the DP-port on the GPU. They also added a custom microcontroller which snoops on the signal lines, autodetects whether the signals are DP or Thunderbolt, and drives the mux accordingly.
The next-generation chip Cactus Ridge (still Thunderbolt 1) integrated this functionality into the Thunderbolt controller. So the signals go from the DP plug to the Thunderbolt controller, and if it detects DP signals, it routes them through to the GPU via its DP-in pins.
(Source: I've worked on Linux Thunderbolt bringup on Macs and studied their schematics extensively.)
Amazing write up. I spent so long trying to patch together an understanding of TB docks for displays and this has helped TREMENDOUSLY. Thanks again
I just "Waybaked*" your page. It's such a gem. :) Thank you for your writeup Matthijs!
*(archive.org)
Interesting read. After reading it, though, I am still none the wiser on a situation I experienced recently. I have a Moto G6 phone. I bought a USB C dock for it, a "Kapok 11-in-1-USB-C Laptop Dockingstation Dual HDMI Hub" on Amazon. It would charge, but the video out didn't work, and the Ethernet didn't work either. It did work for my Steam Deck, though. How come? I understand some USB C hubs work for android phones, some don't, and I don't know how that works. How does one find a dock that will work with Android?
@Lukas, you replied:
> No. Each Thunderbolt controller contains a switch. Protocol adapters for PCIe and DP attach to the switch and translate between that protocol and the low-level Thunderbolt transport protocol. The protocols are tunneled transparently over the switching fabric and packets for different protocols may be transmitted on the same lane in parallel.
I understand this, but the questions in my post that you are responding to were prompted because the wikipedia page on TB2 ( en.wikipedia.org/wiki/Thunderbolt(interface)#Thunderbolt2 ) says:
> The data-rate of 20 Gbit/s is made possible by joining the two existing 10 Gbit/s-channels, which does not change the maximum bandwidth, but makes using it more flexible.
Which is confusing to me. I could imagine this refers to the TB controller having one adapter that supports 20Gbps DP or PCIe instead of 2 adapters that support 10Gbps each, but TB2 only supports DP1.2 (so only 4.32Gbps) and PCIe bandwidths seem to be multiples of 8Gbps/GTps (but I do see PCIe 2.0 that has 5GTps, so maybe TB1 supported two PCIe 2.0 x2 ports for 10Gbps each and TB2 (also?) supports once PCIe 2.0 x4 port for 20Gbps?
@Someone, thanks for clarifying how DP vs TB1/2 negotiation worked, I'll edit my post accordingly.
@MikeNap, I feel your pain, great to hear my post has been helpful :-D
@Omar, cool!
@cheater, my first thought about video is that your dock might be of the DisplayLink (or some other video-over-USB) kind and your Android phone might not have the needed support/drivers. A quick google suggests this might need an app from the Play Store.
Or maybe the reverse is true and the dock only supports DP alt mode (over USB C) to get the display signal and your Android does not support this (but given it is a dual HDMI dock without thunderbolt, this seems unlikely, since the only way to get multiple display outputs over DP alt mode is to use MST and I think not many docks support that - though if the dock specs state that dual display output is not supported on Mac, that might be a hint you have an MST dock after all, since MacOS does not support MST).
As for ethernet, that might also be a driver issue?
Hello,
Thanks for this instructive post :)
I spotted some typos if it may help:
"can be combied" -> "can be combined"
"USB3.2 allows use of a all" -> "USB3.2 allows use of all"
"but a also" -> " but also"
"up to 20Gps-per-line" -> "up to 20Gbps-per-line"
"an USB3 10Gpbs" -> "an USB3 10Gbps"
"leaving ony" -> "leaving only"
"reverse-engineerd" -> "reverse-engineered"
"a single hub or dock with and a thunderbolt or USB4" -> "a single hub or dock with a thunderbolt or USB4"
"certifified" -> "certified"
Best regards, Laurent Lyaudet
@Laurent, thanks for your comments, I've fixed all of the typos :-D Especially "certifified" is nice, that should have been the correct word, sounds much nicer ;-p
Hello :)
Google does understand (incomplete URL because of SPAM protection):
/search?q=certifified+meme&source=lnms&tbm=isch&sa=X
But there is not yet an exact match for "certifified meme" ;P
Thanks you very much for your article!
I am interested in buying the Caldigit TS4. But when I kindly emailed caldigit to find out ways to work around these following issues:
By taking care to inform them of the expectations regarding the docks of the users of the laptop framework. And giving them your blog link and others.
I got this response from caldigit: " Currently our products are not supported in a Linux environment and so we do not run any tests or able to offer any troubleshooting for Linux ."
Finally, Matthijs could you tell us, please, why you didn't choose the lenovo dock? It is MST compatible, and supports vlan ID at rj45 level and seems officially supported under linux?
Thanks in advance !
@Al, thanks for your response, and showing us Caldigits responses to your question.
As for why I chose the Caldigit instead of the Lenovo - I think it was mostly a slightly better selection of ports (and a card reader). As for Linux support - usually nobody officially supports linux, but I've read some succesful reports of people using the TS4 under Linux (and I have been using it without problems under linux as well - everything just works).
As for MST and VLANs - I do not think I really realized that these were not supported, but I won't really miss them anyway (though VLAN support might have come in handy - good to know it is not supported. Is this a hardware limitation or driver? If the latter, is it really not supported under Linux, then? It's just an intel network card IIRC?).
As for black-box firmware - that's something I do care about, but I do not expect there will be any TB dock available that has open firmware, or does Lenovo publish sources?
Thank you for your reply ! For the vlan, it seems to be tested, therefore! because the site indicated in 2019: www.caldigit.com/do-caldigit-docks-support-vlan-feature
Indeed, I don't know any free firmware for lenovo, I was trying to find out their feeling to libre software things... Because it's always annoying to be secure everywhere except at one place in the chain: 4G stack, intel wifi firmware , and dock..
Hi Matthijs;
Thanks for the amazing writeup, very useful. I just got my framework laptop and I bought the TS4 largely because of your recommendation (no pressure! :D ) but also because all the work you put in, it seemed the 'most documented'.
I have a question regarding usage of USB ports. I've got my framework laptop plugged into it with the thunderbolt cable. I have a 10-port StarTech USB 3 hub and I plug most everything into that, then that hub is plugged into one of the ports on the TS4 (the lower unpowered one, specifically -- The 10 port hub has its own power).
The 10 port hub is about half full. Keyboard (USB3), mouse (USB 1? 2? no idea, its reasonably new but cheap), a scanner which is usually turned off (USB1), a blueray player that takes two ports (USB2), and a printer that is usually turned off (USB2).
Most of the time, everything seems to work okay, but now and then I get a weirdness. Like, I left my computer sitting for awhile (30 min ~ 1 hour) and came back to it. The ethernet had stopped working out of the blue -- my laptop didn't sleep/suspend or anything, I have basically all the power saving stuff off because in my experience thatj ust causes weird chaos in Linux.
I unplugged/replugged in the TS4, and ethernet came back, but now my mouse stopped working (everything else seemed okay). I unplugged the mouse from my 10 port hub and put it in the second TS4 unpowered power and now its back.
Is there a power saving thing on the TS4 that I can turn off? Or is this something else? Or have you not encountered this one in your usage? So far, nothing has happened like this while I'm at the computer actively using it.
If you have any ideas, I'd really appreciate hearing them (though I understand totally if all this is just weird and you have no idea ...!)
Thanks again!!
@Steven, thanks for your comment!
As for your issues with ethernet or other USB devices that stop working after some time - I have no such experience - so far the dock has been rather flawless for me. I have seen some issues with USB devices, but I think those were only triggered by suspend/resume and/or plugging/unplugging the TS4, and those occurred very rarely (and I think haven't happened in the last few months at all).
For reference, I'm running Ubuntu 22.10 on my laptop. I also have a number of USB hubs behind the TS4 (one in my keyboard, one in my monitor and two more 7-port self-powered multi-TT hubs), so that is similar to your setup. I have not modified any (power save or other) settings on the TS4 - I'm not even sure if there are settings to tweak other than the Linux powersaving settings (which I haven't specifically looked at or changed).
So I don't think I have anything for you, except wishing you good luck with your issues. If you ever figure them out, I'd be curious to hear what you've found :-)
For a while, I've been considering replacing Grubby, my trusty workhorse laptop, a Thinkpad X201 that I've been using for the past 11 years. These thinkpads are known to last, and indeed mine still worked nicely, but over the years lost Bluetooth functionality, speaker output, one of its USB ports (I literally lost part of the connector), some small bits of the casing (dropped something heavy on it), the fan sometimes made a grinding noise, and it was getting a little slow at times (but still fine for work). I had been postponing getting a replacement, though, since having to figure out what to get, comparing models, reading reviews is always a hassle (especially for me...).
Then, when I first saw the Framework laptop last year, I was immediately sold. It's a laptop that aims to be modular, in the sense that it can be easily repaired and upgraded. To be honest, this did not seem all that special to me at first, but apparently in the 11 years since I last bought a laptop, manufacturers have been more using glue rather than screws, and solder rather than sockets, which is a trend that Framework hopes to turn.
In addition to the modularity, I like the fact they make repairability and upgradability an explicit goal, in attempt to make the electronics ecosystem more sustainable (they remind me of Fairphone in that sense). On top of that, it seems that this is also a really well made laptop, with a lot of attention to details, explicit support for Linux, open-source where possible (e.g. code for the embedded controller is open, ), flexible expansion ports using replacable modules, encouraging third parties to build and market their own expansion cards and addons (with open-source reference designs available), a mainboard that can be used standalone too (makes for a nice SBC after a mainboard upgrade), decent keyboard, etc.
The only things that I'm less enthusiastic about are the reflective screen (I had that on my previous laptop and I remember liking the switch to a matte screen, but I guess I'll get used to that), having just four expansion ports (the only fixed port is an audio jack, everything else - USB, displays, card reader - has to go through expansion modules, so we'll see if I can get by with four ports) and the lack of an ethernet port (apparently there is an ethernet expansion module in the works, but I'll probably have to get a USB-to-ethernet module in the meanwhile).
Unfortunately, when I found the Framework laptop a few months ago, they were not actually being sold yet, though they expected to open up pre-orders in December. I really hoped Grubby would last long enough so I could get a Framework laptop. Then pre-orders opened only for US and Canada, with shipping to the EU announced for Q1 this year. Then they opened up orders for Germany, France and the UK, and I still had to wait...
So when they opened up pre-orders in the Netherlands last month, I immediately placed my order. They are using a batched shipping system and my batch is supposed to ship "in March" (part of the batch has already been shipped), so I'm hoping to get the new laptop somewhere it the coming weeks.
I suspect that Grubby took notice, because last friday, with a small sputter, he powered off unexpectedly and has refused to power back on. I've tried some CPR, but no luck so far, so I'm afraid it's the end for Grubby. I'm happy that I already got my Framework order in, since now I just borrowed another laptop as a temporary solution rather than having to panic and buy something else instead.
So, I'm eager for my Framework laptop to be delivered. Now, I just need to pick a new name, and figure out which Thunderbolt dock I want... (I had an old-skool docking station for my Thinkpad, which worked great, but with USB-C and Thunderbolt's single cable for power, display, usb and ethernet, there is now a lot more choice in docks, but more on that in my next post...).
Hey Matthijs! I'm currently looking into replacing my work machine and my eye also caught the Framework.. I didn't see any further posts on your Framework, is that a good thing or a bad thing? :-D
Are you happy with it/would you recommend it? One of the concerns I have is dust piling up in the gaps between the components (for instance between the screen and the bezel). Is that something you're experiencing? Looking forward to hearing your thoughts!
W00ps, you comment got stuck in my moderation queue, only approved it just now.
Making another post about my experiences has been on my list for a while, but busy, busy, busy... The TL;DR is that I'm quite happy with the laptop, it works really well. I had an issue with the webcam, but that was replaced under warranty (and a breeze to replace).
I do have a few gripes with it: Limited battery time (especially high drain in suspend mode), the glossy screen and somewhat limited brightness, but AFAIU all of these have improved in the newer revisions of the laptop.
> Are you happy with it/would you recommend it
Yes, definitely. In addition to being happy with the hardware, I'm also happy with the way Framework has been following up with new designs that are compatible with the older generations (allowing upgrades), as well as a lot o openness about their designs (within limits of their own NDA's) and willingness to collaborate with third parties (e.g. they recently announced a RISC-V motherboard produced by someone else).
> One of the concerns I have is dust piling up in the gaps between the components (for instance between the screen and the bezel). Is that something you're experiencing? Looking forward to hearing your thoughts!
Nope, not at all (it did not create a problem, and I just looked behind the bezel and no dust there. It closes quite neatly.
Recently, I've been working with STM32 chips for a few different projects and customers. These chips are quite flexible in their pin assignments, usually most peripherals (i.e. an SPI or UART block) can be mapped onto two or often even more pins. This gives great flexibility (both during board design for single-purpose boards and later for a more general purpose board), but also makes it harder to decide and document the pinout of a design.
ST offers STM32CubeMX, a software tool that helps designing around an STM32 MCU, including deciding on pinouts, and generating relevant code for the system as well. It is probably a powerful tool, but it is a bit heavy to install and AFAICS does not really support general purpose boards (where you would choose between different supported pinouts at runtime or compiletime) well.
So in the past, I've used a trusted tool to support this process: A spreadsheet that lists all pins and all their supported functions, where you can easily annotate each pin with all the data you want and use colors and formatting to mark functions as needed to create some structure in the complexity.
However, generating such a pinout spreadsheet wasn't particularly easy. The tables from the datasheet cannot be easily copy-pasted (and the datasheet has the alternate and additional functions in two separate tables), and the STM32CubeMX software can only seem to export a pinout table with alternate functions, not additional functions. So we previously ended up using the CubeMX-generated table and then adding the additional functions manually, which is annoying and error-prone.
So I dug around in the CubeMX data files a bit, and found that it has an XML file for each STM32 chip that lists all pins with all their functions (both alternate and additional). So I wrote a quick Python script that parses such an XML file and generates a CSV script. The script just needs Python3 and has no additional dependencies.
To run this script, you will need the XML file for the MCU you are interested in from inside the CubeMX installation. Currently, these only seem to be distributed by ST as part of CubeMX. I did find one third-party github repo with the same data, but that wasn't updated in nearly two years). However, once you generate the pin listing and publish it (e.g. in a spreadsheet), others can of course work with it without needing CubeMX or this script anymore.
For example, you can run this script as follows:
$ ./stm32pinout.py /usr/local/cubemx/db/mcu/STM32F103CBUx.xml
name,pin,type
VBAT,1,Power
PC13-TAMPER-RTC,2,I/O,GPIO,EXTI,EVENTOUT,RTC_OUT,RTC_TAMPER
PC14-OSC32_IN,3,I/O,GPIO,EXTI,EVENTOUT,RCC_OSC32_IN
PC15-OSC32_OUT,4,I/O,GPIO,EXTI,ADC1_EXTI15,ADC2_EXTI15,EVENTOUT,RCC_OSC32_OUT
PD0-OSC_IN,5,I/O,GPIO,EXTI,RCC_OSC_IN
(... more output truncated ...)
The script is not perfect yet (it does not tell you which functions correspond to which AF numbers and the ordering of functions could be improved, see TODO comments in the code), but it gets the basic job done well.
You can find the script in my "scripts" repository on github.
Update: It seems the XML files are now also available separately on github: https://github.com/STMicroelectronics/STM32_open_pin_data, and some of the TODOs in my script might be solvable.
For this blog, I wanted to include some nicely-formatted formulas. An easy way to do so, is to use MathJax, a javascript-based math processor where you can write formulas using (among others) the often-used Tex math syntax.
However, I use Markdown to write my blogposts and including formulas directly in the text can be problematic because Markdown might interpret part of my math expressions as Markdown and transform them before MathJax has had a chance to look at them. In this post, I present a customized MathJax configuration that solves this problem in a reasonable elegant way.
An obvious solution is to put the match expression in Markdown code
blocks (or inline code using backticks), but by default MathJax does not
process these. MathJax can be reconfigured to also typeset the contents
of <code>
and/or <pre>
elements, but since actual code will likely
contain parts that look like math expressions, this will likely cause
your code to be messed up.
This problem was described in more detail by Yihui Xie in a
blogpost, along with a solution that preprocesses the DOM to look
for <code>
tags that start and end with an math expression start and
end marker, and if so strip away the <code>
tag so that MathJax will
process the expression later. Additionally, he translates any expression
contained in single dollar signs (which is the traditional Tex way to
specify inline math) to an expression wrapped in \(
and \)
, which is
the only way to specify inline math in MathJax (single dollars are
disabled since they would be too likely to cause false positives).
I considered using his solution, but it explicitly excludes code blocks
(which are rendered as a <pre>
tag containing a <code>
tag in
Markdown), and I wanted to use code blocks for centered math expressions
(since that looks better without the backticks in my Markdown source).
Also, I did not really like that the script modifies the DOM and has a
bunch of regexes that hardcode what a math formula looks like.
So I made an alternative implementation that configures MathJax to
behave as intended. This is done by overriding the normal automatic
typesetting in the pageReady
function and instead explicitly
typesetting all code tags that contain exactly one math expression.
Unlike the solution by Yihui Xie, this:
<code>
elements (e.g. no formulas in normal text), because the default
typesetting is replaced.<code>
elements inside <pre>
elements
(but this can be easily changed using the parent tag check from Yihui
Xie's code).pageReady
event, so the script does not have
to be at the end of the HTML page.You can find the MathJax configuration for this inline at the end of this post. To use it, just put the script tag in your HTML before the MathJax script tag (or see the MathJax docs for other ways).
To use it, just use the normal tex math syntax (using single or double $
signs) inside a code block (using backticks or an indented block) in any
combination. Typically, you would use single $
delimeters together with
backticks for inline math. You'll have to make sure that the code block
contains exactly a single MathJax expression (and maybe some whitespace), but
nothing else. E.g. this Markdown:
Formulas *can* be inline: `$z = x + y$`.
Renders as: Formulas can be inline: $z = x + y$
.
The double $$
delimeter produces a centered math expression. This works
within backticks (like Yihui shows) but I think it looks better in the Markdown
if you use an indented block (which Yihui's code does not support). So for
example this Markdown (note the indent):
$$a^2 + b^2 = c^2$$
Renders as:
$$a^2 + b^2 = c^2$$
Then you can also use more complex, multiline expressions. This indented block of Markdown:
$$
\begin{vmatrix}
a & b\\
c & d
\end{vmatrix}
=ad-bc
$$
Renders as:
$$
\begin{vmatrix}
a & b\\
c & d
\end{vmatrix}
=ad-bc
$$
Note that to get Markdown to display the above example blocks, i.e. code
blocks that start and with $$
, without having MathJax process them, I
used some literal HTML in my Markdown source. For example, in my blog's
markdown source, the first block above literall looks like this:
<pre><code><span></span> $$a^2 + b^2 = c^2$$</code></pre>
Markdown leaves the HTML tags alone, and the empty span ensures that the script below does not process the contents of the code block (since it only processes code blocks where the full contents of the block are valid MathJax code).
So, here is the script that I am now using on this blog:
<script type="text/javascript">
MathJax = {
options: {
// Remove <code> tags from the blacklist. Even though we pass an
// explicit list of elements to process, this blacklist is still
// applied.
skipHtmlTags: { '[-]': ['code'] },
},
tex: {
// By default, only \( is enabled for inline math, to prevent false
// positives. Since we already only process code blocks that contain
// exactly one math expression and nothing else, it is also fine to
// use the nicer $...$ construct for inline math.
inlineMath: { '[+]': [['$', '$']] },
},
startup: {
// This is called on page ready and replaces the default MathJax
// "typeset entire document" code.
pageReady: function() {
var codes = document.getElementsByTagName('code');
var to_typeset = [];
for (var i = 0; i < codes.length; i++) {
var code = codes[i];
// Only allow code elements that just contain text, no subelements
if (code.childElementCount === 0) {
var text = code.textContent.trim();
inputs = MathJax.startup.getInputJax();
// For each of the configured input processors, see if the
// text contains a single math expression that encompasses the
// entire text. If so, typeset it.
for (var j = 0; j < inputs.length; j++) {
// Only use string input processors (e.g. tex, as opposed to
// node processors e.g. mml that are more tricky to use).
if (inputs[j].processStrings) {
matches = inputs[j].findMath([text]);
if (matches.length == 1 && matches[0].start.n == 0 && matches[0].end.n == text.length) {
// Trim off any trailing newline, which otherwise stays around, adding empty visual space below
code.textContent = text;
to_typeset.push(code);
code.classList.add("math");
if (code.parentNode.tagName == "PRE")
code.parentNode.classList.add("math");
break;
}
}
}
}
}
// Code blocks to replace are collected and then typeset in one go, asynchronously in the background
MathJax.typesetPromise(to_typeset);
},
},
};
</script>
Update 2020-08-05: Script updated to run typesetting only once, and
use typesetPromise
to run it asynchronously, as suggested by Raymond
Zhao in the comments below.
Update 2020-08-20: Added some Markdown examples (the same ones Yihui Xie used), as suggested by Troy.
Update 2021-09-03: Clarified how the script decides which code blocks to process and which to leave alone.
Hey, this script works great! Just one thing: performance isn't the greatest. I noticed that upon every call to MathJax.typeset, MathJax renders the whole document. It's meant to be passed an array of all the elements, not called individually.
So what I did was I put all of the code elements into an array, and then called MathJax.typesetPromise (better than just typeset) on that array at the end. This runs much faster, especially with lots of LaTeX expressions on one page.
Hey Raymond, excellent suggestion. I've updated the script to make these changes, works perfect. Thanks!
What a great article! Congratulations :)
Can you please add a typical math snippet from one of your .md files? (Maybe the same as the one Yihui Xie uses in his post.)
I would like to see how you handle inline/display math in your markdown.
Hey Troy, good point, examples would really clarify the post. I've added some (the ones from Yihui Xie indeed) that show how to use this from Markdown. Hope this helps!
Hi, this code looks pretty great! One thing I'm not sure about is how do you differentiate latex code block and normal code block so that they won't be rendered to the same style?
Hi Xiao, thanks for your comment. I'm not sure I understand your question completely, but what happens is that both the math/latex block and a regular code block are processed by markdown into a <pre><code>...</code></pre>
block. Then the script shown above picks out all <code>
blocks, and passes the content of each to MathJax for processing.
Normally MathJax finds any valid math expression (delimited by e.g. $$
or $
) and processes it, but my script has some extra checks to only apply MathJax processing if the entire <code>
block is a single MathJax block (in other words, if it starts and ends with $$
or $
).
This means that regular code blocks will not be MathJax processed and stay regular code blocks. One exception is when a code block starts and ends with e.g. $$
but you still do not want it processed (like the Markdown-version of the examples I show above), but I applied a little hack with literal HTML tags and an empty <span>
for that (see above, I've updated the post to show how I did this).
Or maybe your question is more about actually styling regular code blocks vs math blocks? For that, the script adds a math
class to the <code>
and <pre>
tags, which I then use in my CSS to slightly modify the styling (just remove the grey background for math blocks, all other styling is handled by Mathjax already it seems).
Does that answer your question?
Every now and then I work on some complex C++ code (mostly stuff running on Arduino nowadays) so I can write up some code in a nice, consise and abstracted manner. This almost always involves classes, constructors and templates, which serve their purpose in the abstraction, but once you actually call them, the compiler should optimize all of them away as much as possible.
This usually works nicely, but there was one thing that kept bugging me. No matter how simple your constructors are, initializing using constructors always results in some code running at runtime.
In contrast, when you initialize normal integer variable, or a struct variable using aggregate initialization, the copmiler can completely do the initialization at compiletime. e.g. this code:
struct Foo {uint8_t a; bool b; uint16_t c};
Foo x = {0x12, false, 0x3456};
Would result in four bytes (0x12, 0x00, 0x34, 0x56, assuming no padding and big-endian) in the data section of the resulting object file. This data section is loaded into memory using a simple loop, which is about as efficient as things get.
Now, if I write the above code using a constructor:
struct Foo {
uint8_t a; bool b; uint16_t c;};
Foo(uint8_t a, bool b, uint16_t c) : a(a), b(b), c(c) {}
};
Foo x = Foo(0x12, false, 0x3456);
This will result in those four bytes being allocated in the bss section (which is zero-initialized), with the constructor code being executed at startup. The actual call to the constructor is inlined of course, but this still means there is code that loads every byte into a register, loads the address in a register, and stores the byte to memory (assuming an 8-bit architecture, other architectures will do more bytes at at time).
This doesn't matter much if it's just a few bytes, but for larger objects, or multiple small objects, having the loading code intermixed with the data like this easily requires 3 to 4 times as much code as having it loaded from the data section. I don't think CPU time will be much different (though first zeroing memory and then loading actual data is probably slower), but on embedded systems like Arduino, code size is often limited, so not having the compiler just resolve this at compiletime has always frustrated me.
Today I learned about a new feature in C++11: Constant initialization. This means that any global variables that are initialized to a constant expression, will be resolved at runtime and initialized before any (user) code (including constructors) starts to actually run.
A constant expression is essentially an expression that the compiler can
guarantee can be evaluated at compiletime. They are required for e.g array
sizes and non-type template parameters. Originally, constant expressions
included just simple (arithmetic) expressions, but since C++11 you can
also use functions and even constructors as part of a constant
expression. For this, you mark a function using the constexpr
keyword,
which essentially means that if all parameters to the function are
compiletime constants, the result of the function will also be
(additionally, there are some limitations on what a constexpr function
can do).
So essentially, this means that if you add constexpr
to all
constructors and functions involved in the initialization of a variable,
the compiler will evaluate them all at compiletime.
(On a related note - I'm not sure why the compiler doesn't deduce
constexpr
automatically. If it can verify if it's allowed to use
constexpr
, why not add it? Might be too resource-intensive perhaps?)
Note that constant initialization does not mean the variable has to be
declared const
(e.g. immutable) - it's just that the initial value
has to be a constant expression (which are really different concepts -
it's perfectly possible for a const
variable to have a non-constant
expression as its value. This means that the value is set by normal
constructor calls or whatnot at runtime, possibly with side-effects,
without allowing any further changes to the value after that).
Anyway, so much for the introduction of this post, which turned out longer than I planned :-). I learned about this feature from this great post by Andrzej Krzemieński. He also writes that it is not really possible to enforce that a variable is constant-initialized:
It is difficult to assert that the initialization of globals really took place at compile-time. You can inspect the binary, but it only gives you the guarantee for this binary and is not a guarantee for the program, in case you target for multiple platforms, or use various compilation modes (like debug and retail). The compiler may not help you with that. There is no way (no syntax) to require a verification by the compiler that a given global is const-initialized.
If you accidentially forget constexpr on one function involved, or some other requirement is not fulfilled, the compiler will happily fall back to less efficient runtime initialization instead of notifying you so you can fix this.
This smelled like a challenge, so I set out to investigate if I could
figure out some way to implement this anyway. I thought of using a
non-type template argument (which are required to be constant
expressions by C++), but those only allow a limited set of types to be
passed. I tried using builtin_constant_p
, a non-standard gcc
construct, but that doesn't seem to recognize class-typed constant
expressions.
static_assert
It seems that using the (also introduced in C++11) static_assert
statement is a reasonable (though not perfect) option. The first
argument to static_assert
is a boolean that must be a constant
expression. So, if we pass it an expression that is not a constant
expression, it triggers an error. For testing, I'm using this code:
class Foo {
public:
constexpr Foo(int x) { }
Foo(long x) { }
};
Foo a = Foo(1);
Foo b = Foo(1L);
We define a Foo
class, which has two constructors: one accepts an
int
and is constexpr
and one accepts a long
and is not
constexpr
. Above, this means that a
will be const-initialized, while
b
is not.
To use static_assert
, we cannot just pass a
or b
as the condition,
since the condition must return a bool type. Using the comma operator
helps here (the comma accepts two operands, evaluates both and then
discards the first to return the second):
static_assert((a, true), "a not const-initialized"); // OK
static_assert((b, true), "b not const-initialized"); // OK :-(
However, this doesn't quite work, neither of these result in an error. I
was actually surprised here - I would have expected them both to fail,
since neither a
nor b
is a constant expression. In any case, this
doesn't work. What we can do, is simply copy the initializer used for
both into the static_assert
:
static_assert((Foo(1), true), "a not const-initialized"); // OK
static_assert((Foo(1L), true), "b not const-initialized"); // Error
This works as expected: The int
version is ok, the long
version
throws an error. It doesn't trigger the assertion, but
recent gcc versions show the line with the error, so it's good enough:
test.cpp:14:1: error: non-constant condition for static assertion
static_assert((Foo(1L), true), "b not const-initialized"); // Error
^
test.cpp:14:1: error: call to non-constexpr function ‘Foo::Foo(long int)’
This isn't very pretty though - the comma operator doesn't make it very clear what we're doing here. Better is to use a simple inline function, to effectively do the same:
template <typename T>
constexpr bool ensure_const_init(T t) { return true; }
static_assert(ensure_const_init(Foo(1)), "a not const-initialized"); // OK
static_assert(ensure_const_init(Foo(1L)), "b not const-initialized"); // Error
This achieves the same result, but looks nicer (though the
ensure_const_init
function does not actually enforce anything, it's
the context in which it's used, but that's a matter of documentation).
Note that I'm not sure if this will actually catch all cases, I'm not
entirely sure if the stuff involved with passing an expression to
static_assert
(optionally through the ensure_const_init
function) is
exactly the same stuff that's involved with initializing a variable with
that expression (e.g. similar to the copy constructor issue below).
The function itself isn't perfect either - It doesn't handle (const) (rvalue) references so I believe it might not work in all cases, so that might need some fixing.
Also, having to duplicate the initializer in the assert statement is a big downside - If I now change the variable initializer, but forget to update the assert statement, all bets are off...
constexpr
constantAs Andrzej pointed out in his post, you can mark variables with
constexpr
, which requires them to be constant initialized. However,
this also makes the variable const
, meaning it cannot be changed after
initialization, which we do not want. However, we can still leverage this
using a two-step initialization:
constexpr Foo c_init = Foo(1); // OK
Foo c = c_init;
constexpr Foo d_init = Foo(1L); // Error
Foo d = d_init;
This isn't very pretty either, but at least the initializer is only defined once. This does introduce an extra copy of the object. With the default (implicit) copy constructor this copy will be optimized out and constant initialization still happens as expected, so no problem there.
However, with user-defined copy constructors, things are diffrent:
class Foo2 {
public:
constexpr Foo2(int x) { }
Foo2(long x) { }
Foo2(const Foo2&) { }
};
constexpr Foo2 e_init = Foo2(1); // OK
Foo2 e = e_init; // Not constant initialized but no error!
Here, a user-defined copy constructor is present that is not declared
with constexpr
. This results in e
being not constant-initialized,
even though e_init
is (this is actually slighly weird - I would expect
the initialization syntax I used to also call the copy constructor when
initializing e_init
, but perhaps that one is optimized out by gcc in
an even earlier stage).
We can user our earlier ensure_const_init
function here:
constexpr Foo f_init = Foo(1);
Foo f = f_init;
static_assert(ensure_const_init(f_init), "f not const-initialized"); // OK
constexpr Foo2 g_init = Foo2(1);
Foo2 g = g_init;
static_assert(ensure_const_init(g_init), "g not const-initialized"); // Error
This code is actually a bit silly - of course f_init
and g_init
are
const-initialized, they are declared constexpr
. I initially tried this
separate init variable approach before I realized I could (need to,
actually) add constexpr
to the init variables. However, this silly
code does catch our problem with the copy constructor. This is just a
side effect of the fact that the copy constructor is called when the
init variables are passed to the ensure_const_init
function.
One variant of the above would be to simply define two objects: the one you want, and an identical constexpr version:
Foo h = Foo(1);
constexpr Foo h_const = Foo(1);
It should be reasonable to assume that if h_const
can be
const-initialized, and h
uses the same constructor and arguments, that
h
will be const-initialized as well (though again, no real guarantee).
This assumes that the h_const
object, being unused, will be optimized
away. Since it is constexpr, we can also be sure that there are no
constructor side effects that will linger, so at worst this wastes a bit
of memory if the compiler does not optimize it.
Again, this requires duplication of the constructor arguments, which can be error-prone.
There's two significant problems left:
None of these approaches actually guarantee that
const-initialization happens. It seems they catch the most common
problem: Having a non-constexpr
function or constructor involved,
but inside the C++ minefield that is (copy) constructors, implicit
conversions, half a dozen of initialization methods, etc., I'm
pretty confident that there are other caveats we're missing here.
None of these approaches are very pretty. Ideally, you'd just write something like:
constinit Foo f = Foo(1);
or, slightly worse:
Foo f = constinit(Foo(1));
Implementing the second syntax seems to be impossible using a function -
function parameters cannot be used in a constant expression (they could
be non-const). You can't mark parameters as constexpr
either.
I considered to use a preprocessor macro to implement this. A macro
can easily take care of duplicating the initialization value (and since
we're enforcing constant initialization, there's no side effects to
worry about). It's tricky, though, since you can't just put a
static_assert
statement, or additional constexpr
variable
declaration inside a variable initialization. I considered using a
C++11 lambda expression for that, but those can only contain a
single return statement and nothing else (unless they return void
) and
cannot be declared constexpr
...
Perhaps a macro that completely generates the variable declaration and
initialization could work, but still a single macro that generates
multiple statement is messy (and the usual do {...} while(0)
approach
doesn't work in global scope. It's also not very nice...
Any other suggestions?
Update 2020-11-06: It seems that C++20 has introduced a new
keyword, constinit
to do exactly this: Require that at variable is
constant-initialized, without also making it const
like constexpr
does. See https://en.cppreference.com/w/cpp/language/constinit
Thanks for the overview, it is also a very helpful collection of links. I also ordered the Milk-V Duo 256, I will see what can be done with it at this stage. Have a nice day!
Would be cool to do a custom pcb using the same design