[Post in english so it might benefit somebody out there]
In the last few weeks we (Sysadmin committee of Inter-Actief) have been working on filling our brand new 19" rack cabinet (free leftover from a deserted building :-)
Even though we have only one rackmountable server casing so far, we've been stacking normal ATX cases on top of each other in the cabinet, with a nice rackmountable switch and patch panel to top the stack.
With all that hardware jammed in there, temperature inside the cabinet was on a rise. Not yet worryingly hot, but still warm enough to make one a little cautious. Having somewhat mixed hardware inside the cabinet, we decided to start out installing Motherboard Monitor on the only windows PC in there, since MBM generally has good support for just about any hardware. No luck there, since installing from within a remote desktop connection was not supported.
Later tonight, I decided to give it a go with our FreeBSD AMD64 boxes. Some quick googling and make searching revealed three potential tools: lmmon, healthd and xmbmon. Quickly, it became clear that the first two did not support my hardware. Unfortunately xmbmon did not support it either, but it still looked like a promising candidate to grease into supporting my Asus K8N-Deluxe temperature sensors. Main reason for this was that while the other two tools hinted that they used some kind of standard kernel interface, which apparently did not support my hardware, xmbmon seemed to have it's own list of supported hardware it could interface against directly.
A common access method for sensor chips is through an SMBus controller. Since xmbmon did not detect any SMBus controllers, while the output of "pciconf -lv" did show one, I tried investigating this.
Some googling showed that my nForce3 chipset has an SMBus controller that operates nearly identical to the nForce2 one. Since xmbmon supports nForce2, it seemed like it would be doable to make it support nForce3 SMBus too.
For some reason I cannot remember I ended up on the original xmbmon site. There I found a note saying there was a patch enabling support for a A7N8X series motherboard. Somehow recognising that type number, even though it is not the board I was concerned currently with. Perhaps we have it in use in some of the other workstations....
Considering that it never hurts to try (how very untrue that is, see further ahead), I dropped the patch into /usr/ports/sysutils/xmbmon/files and reinstalled xmbmon. And actually against my expectation, it worked.
matthijs@geldpakhuis:~$ mbmon
Temp.= 30.0, 128.0, 31.0; Rot.= 3276, 0, 0
Vcore = 1.54, 4.08; Volt. = 3.25, 4.95, 11.13, -14.19, -6.14
matthijs@geldpakhuis:~$ mbmon -d
Using ISA-IO access method!!
* Int.Tec.Exp. Chip IT8705F/IT8712F or SIS950 found.
Apparently this patch did not enable the nForce3 SMBus controller (which makes sense, since afterwards I found that the mainboard the patch was originally meant for was an nForce2 mobo). What I think it did do, was help xmbmon finding the IT sensor chip through the ISA bus. Or something.
Still, not fully convinced that any one of 30, 128 or 31 was an actual temperature reading, I found a small util called "testsmb", which appeared to find valid SMBus controllers and printing some info about them. Thinking this would be a nice place to start hacking the nForce3 SMBus controller in, I ran the utility.
It instantly froze my console (not even the carriage return used to confirm the command was echo'd back). The first thing to do after that is the dreaded ping test: It came out negative. I had fully frozen our production webserver. Oops. Good thing our website isn't too busy during the night :-)
After some though I decided to get over to Inter-Actief and fix stuff right away, instead of waiting for the first people to arrive at 9 tomorrow and leaving instructions for them to fix my mess.
Hooking up a monitor to see what had happened, showed absolutely nothing. Everything looked normal, so I found myself a keyboard to, only fully confirming that I had completely locked up the system. Hard reset was the only way left open and I took the opportunity to ask the BIOS for it's opinion about temperatures. To my surprise it agreed with xmbmon, so my quest was completed.
Apparently, case temperature lies around 30 degrees currently, while it was around 26/27 before I locked up the system. Quick deduction showed that while I was at Inter-Actief, I did not only recover the system, but also collected my keys which I left in the (open) backdoor of the patch cabinet. Obviously I closed and locked the door in the process, which apparently resulted in a temperature raise of around 4 degrees.
Oh well, 30 degrees is perfectly doable I guess, so I can go to sleep safely now, knowing that my hardware won't overheat itself. Next up is setting up our Nagios monitoring application to look at server temperatures too. Maybe tomorrow ;-)
Good night.
Posted by matthijs at December 15, 2005 04:38 AM