Monday, March 14, 2005

Beat the clock

The aforementioned clock problem was noticed early on while running 2.6.11 x86_64 kernels on this new machine. The clock ran 3x faster than it should have. So a minute only took 20 seconds in earth time. I searched all over and found several other people having similar issues, all of them with some type of Athlon. Usually the 3000+ just like mine. Most of them had laptop systems. I'm using a microATX board, the MSI RS480M2-IL. It will take a while to get to the solution, but I did find one.

Nobody had any solutions posted. I started out trying some different bios settings. One of these settings gave me a no-boot situation where the machine stopped mid-POST (after CPU, but before memory tests). It was the A20 setting which had options of Fast or Off. I didn't know what A20 was, but I thought, "hey, if it's fast now, maybe turning it off would slow the clock down."

So now the machine doesn't boot.
Step 1: clear the CMOS. I tried using the jumper for this purpose. No dice. Took the battery out and unplugged power. Nope.
Step 2: Remove stuff... pci cards, memory, disks. I even reseated the CPU. Nope
Step 3: What a pain... I'm actually considering the possibility that I need a new BIOS or a new board. I investigate the BIOS route, including how to build an EEPROM flasher. Or buying a new BIOS online. I bet someone at work somewhere has an EEPROM flasher, afterall, I do work in Silicon Valley now.
Step 4: Try the BIOS recovery procedure from MSI's tech support site. This requires finding a floppy drive and even more rare, a floppy disk. I disect one of my Dell's for the floppy drive, and wipe one of the driver disks that came with the mobo. After all that, the BIOS recovery doesn't even kick off like it was supposed to.
Step 5: It stops at the memory test... maybe if I try completely new memory. I pull the memory out of my Dell, and as I am putting it in, I realize the light on the optical mouse I've been using is still lit up.

Eureka! All the power is disconnected from the machine, however it's connected it to a KVM, which is also connected to another Dell which is powered up. So I unplug it from the KVM, pull out the battery, and wallah! the next boot worked. So that CMOS clearing jumper either doesn't work at all, or doesn't work when there's a powersource other than the battery on the motherboard.

Ok, after alllllll that.... I'm back to a bootable machine which runs 3x faster by the clock.

I tried booting an i386 kernel from the Fedora cd, and the clock speed was normal there, so I start looking into the kernel timer routines. I spent a few hours studying the code and undestanding more of it than I cared to. Nothing obvious stuck out. I wondered if maybe the timesource was different than what they expected. I learned all kinds of stuff about 8253 PIT's and HPET's and stuff no Java programmer should ever care about. It was fun.

I noticed how similar the x86_64 and i386 timer code was, but there were some differences. It was obvious the 64bit code had been forked from the 386 code some time back. So I started looking around on the linux kernel mailing list. I stumbled upon this patch by John Stultz, which he just happened to post a new version of hours before I started this kernel investigation: http://lkml.org/lkml/2005/3/11/309

I started trying to apply John's patches. They required some minor tweaking to get it to compile, but in the end I had a kernel rpm with the patch, and booted it up and
WoooHOOOO! A minute takes a minute now. Pretty important thing for a video recorder that schedules recording tv shows that come on at certain times. :-)

No comments: