Part II. Blind alleys
Not everything goes smoothly in the passage of life. One of the numerous false starts I ran into stemmed in a false assumption. First off, my program is written in Java.1 I had reached the performance level of the current record holder, which was also written in Java, and I was ready to move on. Beating a Java program in terms of speed is kind of like winning the Special Olympics. I mean, it's kind of an accomplishment, but you know you're not facing off against the best the world has to offer. My point is that no matter what Sun tries to tell you, Java is slow. Slower than the guy who offed himself in Full Metal Jacket. So in order to face off against the fastest program in the world, my next target would have to be the C program that is now the second runner-up for the Queens record.2
Which as easier said than done. Where the Java world-record holder took 70 seconds to solve the 16-queens problem, the C program took 8 seconds.3 That's almost ten times faster. Ain't that a peach? Or maybe "peach" isn't the right word.
I decided to try something completely new with my approach, and here was my reasoning for this experiment. In C, recursion is about ten times slower than iteration.4 Which is why that fast runner-up program used iteration. And why I decided to use it too. Here's my code. Be warned, it's ugly. I'll explain the ugliness further down.
private static void findSolutions(
int index,
int row,
long down,
long ld,
long rd,
int maxRowIndex,
SolutionResult res) {
int startIndex = index;
int startRow = row;
int[] indexStack = new int[maxRowIndex];
long mask;
long tries = res.tries;
long solutionCount = res.solutionCount;
// Main (critical) loop.
main_loop:
for (;;) {
mask = index;
if ((down & mask) == 0L &&
(ld & (mask << row)) == 0L &&
(rd & (mask << (32 - row))) == 0L) {
tries += 1;
if (row == maxRowIndex) {
solutionCount += 1;
// This optimization disables rectangular boards.
index = 1; // Triggers exit condition below.
} else {
down |= mask;
ld |= mask << row;
rd |= mask << (32 - row);
indexStack[row] = index;
++row;
index = startIndex;
continue;
}
}
while (index == 1) {
if (row == startRow) {
break main_loop;
}
--row;
index = indexStack[row];
mask = index;
down &= ~mask;
ld &= ~(mask << row);
rd &= ~(mask << (32 - row));
}
index >>= 1;
} // End critical for loop
res.tries = tries;
res.solutionCount = solutionCount;
}
Why is it ugly? Well, the problem we're trying to solve is inherently recursive. Transforming it into an iterative solution forced us to make certain concessions. These concessions are twofold. First, size: there's a lot more code. More code means more chance for bugs and should always be avoided if possible. Secondly, we have to maintain a custom stack in an array since we can't take advantage of the function stack.5 These two problems jam together to make this snippet awkward.
So right now, I should take a moment to explain exactly why I chose to try something ugly and awkward. It's the mentality that the ends justify the means. Just ask the Chinese: what's the childhood of one little girl compared to winning a gold medal in gymnastics? Well worth it, they'll tell you. In terms of cars, this solution is totally the pimped-out Civic that can do 300 HP. It may look like ass, but it goes fast. And that's worth it, right?
Right, if it went fast. But unfortunately, it didn't. What? That's kind of like going out with a girl who's both ugly and dumb. There's just no point. A total waste of effort. This solution wound up taking over 80 seconds. The assumption that an iterative solution would be fast in Java just because it's fast in C was wrong.
The only thing we can do is learn from our mistake. This thing was supposed to save our necks, but instead it totally bombed. How could that have happened? The mystery deepens.
The answer lies in the loops.6 Loops in Java are very slow. Kind of like the rest of Java, but inordinately so. The presence of a loop seems to incur much more overhead than a method invocation. That's an important clue as to where we should go next. Loops are slow, so if we want to go fast, we should avoid loops as much as possible.
This experiment failed, but from this failure we have obtained a critical hint about our goal. I kind of liken it to dating. Sometimes it's only through trials and failures that we learn what doesn't work. And the lesson is best learned through experience. It can be painful, even heartbreaking, to do your best and not be good enough. But instead of giving up and succumbing to the life of a spinster, we can take away nuggets of wisdom that leave us better equipped in the future. And I think that's the lesson learned here: our failures are much more valuable than our successes. Maybe I'm making too much a deal out of something so small, but then again programming makes me philosophize.
Footnotes
(1) Java 1.5 on Mac OS X, client VM.
(2) Solution for the 24 x 24 case by Jeff Somers. His site has several links to other Queens sites. And yes, this program is significantly faster than the current world record holder.
(3) Somers' site has some timings that give 23 seconds for the solution of the 16 x 16 case. These were taken on an older machine. All of my timings are taken on the same computer, a 2.6GHz Mac Pro. I compiled Somers' code with gcc 4.0 with -O3 optimization (the fastest). The resulting program solved the 16 x 16 case in 8 seconds.
(4) Recursion and iteration are two differnt ways of approaching a problem that can be broken down into smaller problems. For a classic example of recursion, see the Towers of Hanoi problem, a personal favorite interview question of mine. Iteration is somewhat simpler. If you've ever taken a programming class, a simple "for" loop is an example of iteration. It allows you to step sequentially through a range of values.
Although both approaches are powerful tools for the programmer, there are certain classes of problems that lend themselves naturally to one approach over the other. Binary search, for example, can be implemented using either method but most computer science courses teach it using recursion because it's easier to understand that way.
Although both approaches are powerful tools for the programmer, there are certain classes of problems that lend themselves naturally to one approach over the other. Binary search, for example, can be implemented using either method but most computer science courses teach it using recursion because it's easier to understand that way.
(5) When one function calls another, the data from the new function is pushed onto a part of memory called the function stack or call stack. Think of it like putting a new, blank piece of paper on top of a pile when you start something new. Switching to iteration is like trying to use a whole bunch of Post-Its stuck all over your desk instead of a neat stack of paper.
(6) I determined this by using Apple's Shark profiling tool. It's free with OS X's developer tools and easily beats other profiling software that costs hundreds more. If you develop in Java, Shark alone is worth switching your environment to Mac OS X. It's really that good. And believe me, when a cynic like me says that, it really means something.
Incidentally, all of my speed analyses in the Queens project are done using Shark.
Incidentally, all of my speed analyses in the Queens project are done using Shark.
