Sudoku, Part Two

Solving Sudoku algorithmically is actually a bit harder than it looks – as a previous commenter pointed out. But I'm with Richard Feynman on this one: If you want to learn something, make a mess of it yourself first, then read up on the solution.

Let's restrict ourselves to 3×3 sudoku for the purposes of this series, with the numbers 1-9. My Perl program works the same way for 4×4 grids, so we're not losing anything. OK, each cell in the grid, and there are 9×9 of them, can contain the numbers 1-9. Now we are given certain cells, so we can immediately remove all other numbers in the given cells.

The given cells in turn mean that unknown cells can't contain given numbers. If we are given 1 in the top left corner, then the top left square cannot contain a 1 in any other cell, so we can remove 1 from the list of possibilities for the other unknown cells.

So this suggests are very simple approach – for each “solved” cell (a cell with only one possible number left), check all cells in the same horizontal, and all cells in the same vertical, and all cells in the same home square. If the number occurs in the list of possible numbers for a cell that we are checking, remove it. In this way we can reduce the list of possibilities in each unknown cell.

How does this help us? Well if we keep repeating this operation, we eventually reduce the list of possibilities in some cells to one number, and that means we have solved those cells. These cells then used to solve further cells and so on. At first glance you'd think that would be it. I sure did! In fact this simple algorithm did solve a number of 4×4 sudokus all by itself (I was not drawn out of the hat on those ones, and did not win anything…).

But it can't solve all sudokus. Sometimes the set of initial given numbers does not give enough information to exclude all possibilities, and further loops of the algoritm fail to produce any changes in the sudoku data structure. In keeping with this simple minded approach, see if you can think up anything to reduce the list of possibilities in each cell further. We'll talk about that in the next post.

Now for some code. The data structure that I used to store the sudoku puzzle is a two-dimensional array of associative arrays. The associative arrays don't actually associate anything – they are just bins for the list of possibilities. Initially they contain the pairs { 1=>1, 2=>2, 3=>3, … }. This setup makes is easy to test if a given number is in a cell or not.

Here's the main loop. This just keeps running the number exclusion logic until no more changes are seen:

sub solve {
  my @cell = @{shift()};
  my $pass = 0;
  my $changes = 1;

  while( $changes ) {
    $pass++;
    $changes = 0;
    my $rowI = 0;
    for my $row ( @cell ) {
      my $colI = 0;
      for my $col (@{$row}) {
        my $origsize = keys(%{$col});
        if( 1 < $origsize ) {
          excludeHome(@cell,$rowI,$colI);
          excludeVertical(@cell,$rowI,$colI);
          excludeHorizontal(@cell,$rowI,$colI);
        }
        my $size = keys(%{$col});
        $changes = $changes 
                   || ($origsize > $size);
        $colI++;
      }
      $rowI++;
    }
  }
  print "passes:$passn";
}

I have made no attempt to prettify this code – this is exactly how I hacked it up. Since it uses Perl references, here's a quick decompile: @cell = reference to entire sudoku, @{$row} = list of cells for given row, %{$col} = number bin for a given cell.

Let's look at what happens inside the exclude subroutines. We'll use excludeVertical as an example.

sub excludeVertical {
  my @cell = @{shift()};
  my $prow = shift;
  my $pcol = shift;

  my $valbin = $cell[$prow][$pcol];
  if( 1 == keys(%{$valbin}) ) {
    return;
  }

  for( my $r = 0; $r < ($sqsize*$sqsize); $r++ ) {
    if( $r != $prow ) {
      my $testbin = $cell[$r][$pcol];
        if( 1 == keys(%{$testbin}) ) {
        my $remove = (keys(%{$testbin}))[0];
        if( 1 > keys(%{$valbin}) ) {
          delete(${$valbin}{ $remove });
        }
      }
    }
  }
}

We do a quick check to make sure that there's still more than one number possibility (if there is only one number left, then this cell is solved), and then we run through all the cells in the same vertical as our current cell (indicated by $prow, $pcol). $sqsize is 3 for a 3×3 grid, and 4 for a 4×4 grid, and so on. We also make sure to ignore the current cell ( if( $r != $prow ) ).

This subroutine inverts the informal algorithm described above. It searches for all cells in the vertical that contain only one number, and removes all such numbers from the current cell. This is equivalent to removing the current cell from all others if it only contains one number. The keys(%{$testbin}) expression tells you how many numbers are still in the bin to be tested. We delete the solved number from the current bin using the Perl delete function, which has the rather strange idiom of using the value of the key, rather than the key itself. Oh well, it's Perl.

I know, I know, the code is ugly and messy, but I was trying to win €150 as quickly as possible before I got better and had to go back to real work.

This post is part of a series on a Perl Suduko solver.




This entry was posted in Perl. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *