Just another site

A Basic Sudoku Detector Application – Part 1: Problem Outline


Some months ago, I was asked to teach an undergrad computer vision course at my university. It was my first teaching gig and… anyway, the course was a fairly basic introduction to image processing and artificial intelligence. The final project was to develop a basic Sudoku detector.

I’ll be reviewing this application as a small tutorial. I’ll focus on the general outline, the image processing operations needed and just the basic ideas of approaching the problem. I’ll be giving out some tips here and there. The tutorial is Matlab-based, but the approach should be the same no matter the development platform.

The first part will be used to introduce the problem and present possible solutions. My proposed method is just one of many possible, but it should give you some pointers if you want to craft your own solution.


The application’s main goal is to analyze input images of a Sudoku puzzle and to output a matrix containing only the numbers present in the puzzle. Figure 1 shows the app’s real input and desired output. Input images are not easy to process; these contain some (noisy) information that must be filtered out before attempting to recognize the numerical characters. Checkout the sample input image: illumination is crappy, there are some shapes that do not contribute to the actual puzzle (e.g., those clocks depicting the Sudoku’s difficulty) and there’s a nasty perspective distortion going on there.


Figure 1. Sudoku detector input and output. Checkout the outrageous perspective distortion on the input image.


The heart of the system will deal with character recognition. We will implement a basic minimum distance classifier. The classifier will work with a database (or template) of numbers previously labeled and will receive new (unlabeled) data gathered from test input images. The core concept here is the likeness metric we will employ. Minimum distance from the labeled data to the unlabeled data will be used as an index of similarity, thus, the minimum distance will be (theoretically) identical to the maximum similarity.


Divide and Conquer

In order to get the best classified results, we need to clean the image as much as possible. There are two main sources of noise that we will be focusing on: Symbols outside the “puzzle board” (the rectangle encompassing the puzzle numbers) and the distorted perspective. The last being the most critical of the problems. Let’s focus on this one for a bit.

The puzzle is a 9 x 9 squared grid will a known number of cells. Each cell contains either a “blank” (no number) or a numerical character. The board area is delimited by the four corners of the board.  Each cell should be as straight as possible – that is, we do not want any distorted number inside a cell. If the number is distorted, the match operation could yield an incorrect answer. Figure 2 depicts the overall idea.


Figure 2. A distorted input could produce a mismatched result (not good!).

Summarizing: We will remove everything that we don’t need from the image and we will correct its perspective. Perspective correction is somewhat difficult, but will substantially boost the number matching quality, provide some robustness and simplify other filtering operations.


Application Outline

  • Perspective Correction

Now that we have some idea of what are we trying to do, let’s get over the first part of the processing pipeline of the application. Have a look at Figure 3.


Figure 3. Basic processing chain for board rectification.

No matter how you implement the perspective correction operation (actually, a projective transformation), you need four input points – in the distorted plane – and four output points – in the transformed plane. The transformation will yield a re-mapping of all input pixels inside the board area. The input points will be, of course, the four corners of the puzzle board. The only restriction that we will establish on the four output points is that they must depict an equilateral polygon.

The tricky part of this process is corner detection. What criteria should we use to locate the four corners of the board? Worry not, as we will cover this problem on the next part of the tutorial. For now, examine the input image and try to come up with a solution. Yeah. Check that image out… see anything that could help? Mmhh… Interesting, the board rectangle seems to be the object with the largest area in the image… that should takes us somewhere, right? LOL!

  • Board segmentation

After image rectification we will attempt to filter out all the shapes that are not part of the target characters. Refer to Figure 4. This task is a little easier, as we already know – or should know – the exact position of the board. A simple crop operation will do. Next, check out the information left after some thresholding and area filtering. We are left with a nice black and white image that contains just the information that we will feed into the classifier. And the numbers seem to be undistorted and pretty clear!


Figure 4. Board cropping and segmentation.

  • Number Classification

The next step is to design the classifier itself. The theory we will implement is known in artificial intelligence as supervised learning classification. Here, the classifier starts with known metrics attributed to each class. Our example has nine known classes (numbers 1 to 9); we need to compute some metrics (attributes) that will optimally describe (label) each individual class. This is the way we build our attribute database of known classes. Now, imagine the classifier receives new, unlabeled, attributes. As suggested early, a possible solution to classify new information is to compute the distance between the unlabeled data and the already labeled data, and search for the minimum distance that will categorize the new information in a known cluster. The complete classifying procedure is depicted in Figure 5.


Figure 5. Supervised learning classification.

This step is not easy. Typically, more than one attribute is needed to describe each class, thus, attribute vectors are used. These vectors are multivariate, and so, a multivariate distance metric is needed. As we work in a multidimensional space of attributes, there’s also more than one definition of distance that we can apply. Euclidean distance between n points is the easiest of the bunch; some others also weigh in other statistical information such as the dispersion between the data inside each class cluster. We begin to delve into some nice statistical theory. I’ll limit things here to the use of multivariate Euclidean distance, focusing on the mean of each class cluster. The template that will be used in this tutorial is shown in Figure 6.


Figure 6. Classifier train template.


Here’s a summary of the operations needed in the number classification stage:

  1. Choose the best attributes that optimally segment each class.
  2. Compute attributes and cluster mean for each know class using a labeled template (Classifier Training).
  3. Receive new data.
  4. Compute attributes and cluster mean for the new data.
  5. Compute distance between each labeled cluster and the unlabeled cluster.
  6. Assign closest class to the unlabeled cluster (Classifier Testing).


  • Final Numerical Matrix

The classification phase is carried out for each number in the input image. Of course, the attributes estimated from the template image may only be computed once. After processing all the numbers, it is fairly straightforward to assemble a numerical matrix containing all output classes. We can estimate the weight and height of the puzzle board; we know there’s a total of 9 x 9 cells. Additionally, we have access to the position of each number on the grid. That’s enough information to index a 9 x 9 matrix with all the detected numbers.

Well, for now we (hopefully!) have at least an outline of our attack plan. The tutorial will cover each of the previous points in detail, along with some example Matlab code. We will discuss the first part: board detection in the next post.

Control Signals For A Successful Data Exchange Between Two Different Hardware Components

For the project I’m working on, I had to interface at least one custom co-processor with a hard CPU. The problem is straight forward: two components with different clocks must be able to share information back and forth. In the absence of a unique reference, control signals must be implemented somehow to synchronize both (or more) components that work at different frequencies.

In my particular setup, the co-processor executed some data operations and dumped the results to a data bus. The CPU then performed periodic reads to the data bus in search of new results. The problem here is that the CPU wasn’t really sure if the data that was available on the bus was truly new. The CPU could read the results before the information was actually updated. Even though the co-processor was really fast, the CPU could read much faster. The situation looked like this:


Fig 1. Functional waveform of the periodic output of a custom co-processor.

As you can see in the image above, the co-processor produces periodic results through the ‘data’ bus. ‘newData’ is an internal signal and is set every time the data bus is refreshed. ‘clk01’ is the co-processor’s clock reference. Note that there’s no way for the CPU to tell if the information in the data bus has been previously read.

How can we address this issue? It took some hours to come with a satisfactory solution. Actually, the fix came in two parts. First, it is really helpful to implement ‘write request’ and ‘read request’ type signals inside the custom co-processor. The goal of these signals is to be able to identify the moment when an external component is performing data writes or data reads. Knowing this enable us to take further action.

Now, every time the CPU performed a read, it had to first acknowledge a read request to the co-processor. The second part is to use this information to “tell” the CPU that the data it is reading right now is really new. In my case, the CPU always read a word of 32 bits. Of those 32 bits I only was using 24 for data results. I decided to use one extra bit (the least significant) to identify previously read data.

How does it work? The co-processor sets the least significant bit of the word immediately after the first external read is performed. Whenever the CPU sampled a word, it could now clearly tell if it was new or old data just by looking at the least significant bit. The results looked like this:


Fig 2. Functional waveform of the periodic output of a custom co-processor after fix.

Take some time to check out the previous image. Here, I added a new read request signal (readRq). Note that this signal is asynchronous. If readRQ is set, then the least bit of word in the data bus will also be set for the next clock cycle.

In VHDL, the fix looks like this (assuming you already have a read-request signal declared in your architecture). The following snippet would be inside a Synchronous process:

if (lastState = '0' and newData = '1') then --rising edge of newData
 dataReg(24 downto 0) <= dataResults; -- co-processor results
 dataReg(30 downto 25) <= (others => '0'); -- unused bits
 dataReg(31) <= '1'; -- new data bit
end if;
lastState := newData;
--delay read ACK - you want this to be set AFTER the read has been performed
if (readRq = '1') then
 tempReg <= '1';
end if;
if (tempReg = '1') then
 dataReg(31) <= '0'; -- old data after 1 cycle
 tempReg <= '0';
end if;

Hope you find this neat tip useful!


A 3 DOF Robotic Arm Simulation

Here’s something I have been working on for a larger undisclosed project. It is a 3 Degrees-of-Freedom robotic manipulator simulation, it uses inverse kinematics to estimate the angular position of all three links given a point inside the robot’s working area. It only requires the length of all three links and their starting angular positions.

Here’s a gif of the simulation. The effector is rendered in green, the work area is represented as everything inside the gray circle.


Building a custom 8-bit CPU Part 1: The basics

I Introduction

So, as promised, this new series of posts will be meatier than the stuff I previously wrote. Brace yourself, we’re in for a bumpy ride.

Today we will discuss the designing and development of a custom basic-to-intermediate Central Processing Unit. Our little CPU will operate in 4 stages (fetch, decode, execute and write back) and will have a fairly basic architecture. The core of it all will be an Arithmetic Logic Unit (ALU) that will be capable of performing additions, subtractions, multiplications and divisions of  8-bit integers along the usual logical ops (comparisons, mainly.)

For the moment, we will avoid branch prediction, so, instructions of the “jump and call” type will not be supported. That is, our CPU won’t decode IF-ELSE and loop statements. (Sorry guys, but we want to keep things simple for the time being).

All right, the goal of this exercise is to develop a custom CPU for implementation in your favorite reconfigurable hardware device (or if you just want to keep it to simulation, that’s ok, but our design should be fully synthesizable).

Hopefully, at the end of this project we’ll be able to provide a program for our CPU to execute. Of course, the program will be written in our basic, custom, machine code. For extra fun, we could write an “assembly” language parser, how nice is that?

II Devising and setting up our CPU architecture

Our CPU will handle 8-bit wide operands. We will set the instruction width to 2 bytes, the upper byte will code up the actual instruction and the lower byte will represent the operand. The following operations will be available (presented in assembly format):

  • MOV a, b
  • Move Op. “a” is a destiny register and “b” is a numerical constant.
  • ADD a, b
  •  Arithmetic addition. “a” is any register and “b” can be either a register or a numerical constant.
  • SUB a, b
  • Arithmetic subtraction. “a” is any register and “b” must be a numerical constant.
  • MUL a, b
  • Arithmetic multiplication. “a” is any register and “b” must be a numerical constant
  • DIV a, b
  • Arithmetic division. “a” is any register and “b” must be a numerical constant

These are the basic ops supported by our CPU, however, more ops could be added fairly easy in the future.

Let’s code up our instructions. We will use the first 6 bits for op coding. Suppose we have 3 registers for internal computations, so, the last 2 bits (from 00 to 10) will encode the destiny register supported by some of our instructions.

For instance, let’s encode the move operation. Say we need to move the number 4 to register number two. Let’s use the following word to encode the operation: 000001 we also need to include where we are moving the operand. We’re moving the operand to register two : 01 (register 1 = 00, register 2 = 01, register 3 = 10) and, finally the operand we’re moving is the number 4: 00000100

The complete machine-code op is: 0000010100000100 — MOV R2, 00000100b0

The supported ops will be encoded as follows:

  • 000000 NOP reserved
  • 000001 Move op
  • 000010 two integer addition
  • 000011 integer and register addition
  • 000100 register and integer subtraction
  • 000101 register and integer multiplication
  • 000110 register and integer division

Let’s also consider that the ALU will always write back to register 1 (00). Some op-encoding examples (actually, this is our very first program. The CPU should be able to run it without any problems):

     0   :   0000010100000100; -- MOV R2, 00000100b0
     1   :   0000010000000111; -- MOV R1, 00000111b0
     2   :   0000100000000001; -- ADD R1, R2
     3   :   0000011000000101; -- MOV R3, 00000101b0
     4   :   0001001000000111; -- SUB R3, 00000111b0
     5   :   0001010100000100; -- MUL R2, 00000100b0
     6   :   0001100000000011; -- DIV R1, 00000011b0
     7   :   0000000000000000; -- NOP

Nice stuff, eh?

We will build our CPU under a pseudo Von Neumann architecture. We will store instructions and data on the same memory storage device. (For the moment, we will store instructions on a read-only memory and we will store results on CPU registers). We will break up the CPU main functionalities into individual blocks: ALU, Control Unit (Instruction Fetch and Decode), Registers and Memory.

This is the basic CPU block diagram:

CPUArchFigure 1. Custom CPU Architecture.

Let’s get over what we have here. A ROM block will store the program, the fetch unit will poll the ROM block periodically (we will return to this later) and extract the corresponding instruction, it will then send the instruction to the decoder for further processing. Inside the decoder, each instruction will be divided into its atoms, parsed, and executed accordingly. The decoder unit controls the register bank (for data storing), ALU (for data computing) and a pair of multiplexers (for operand feeding to the ALU).

III CPU pseudo state machine progression

CPUs (mono architectures, at least) are sequential by nature. Superscalar architectures kinda cheat on this, but the very basic CPU always executes only one instruction per instruction cycle, so, the following is a list of the CPU operation cycle:

1 – Fetch instruction to Rom, halt fetch (we don’t want instruction overlapping!).
2 – Send instruction to decode unit.
3 – Decode instruction.
4 – Send operands to ALU/Bank register.
5 – Write back instruction result (in any case, a number must be written to a register).
6 – Instruction execution done, go to 1.

You gotta watch out for any instruction overlapping, that is, we must ensure that every instruction cycle starts with an instruction fetching and ends with a result write back, otherwise instructions could be overlapped and some write backs could be skipped.

That is for now, so far we’ve an idea of what we’re trying to accomplish, in the following part of these posts series we will deal with the ALU, this is the core of the CPU. We will try to implement actual arithmetic circuitry for additions, multiplications and divisions. Remember that subtractions can be performed by the same addition circuitry!

More to follow soon.

Tool’s hell part I: setting up ModelSim with Altera’s libraries under Win 7. Quartus II ver 9.x and up.

I frequently find myself deep in tool’s hell trying to set up a program for my engineering needs, it’s a common and necessary step in the design cycle, that’s reasonable. Unfortunately, the amount of time I invest in such triviality is, more than often, just ridiculous.

If you are into hardware design, you’ll have a good grasp of the kind of tools we, hardware designers, must deal with on a daily basis. I’ve a strong opinion about these tools. I’m aware that they get the work done, but c’mon, most of them look ugly, have an awkward graphical interface and love to install the “hard way”, let’s address that right away and leave the opinions for later.


ModelSim, nice, advanced verification, eh?

This first post will be a mini guide in how to (quickly!) set up everyone’s favorite HDL simulator, ModelSim. If you’re having trouble with mentor graphics’ finest, follow this guide and you should be investing countless hours of waveform debugging in no time.

First Circle: The basics

This is the scenario I faced while trying to install this thing: I’m working with Altera Quartus II 9.1 Web Edition; I tried installing Quartus II version 9 through 11, and version 9.1 was the only one I could get working under windows 7 (64-bit).


ModelSim console (FATAL) error. With bonus ambiguous signal value warning.

Now, you may be aware that Altera has its own ModelSim “edition”, called ModelSim-Altera. The main advantage of this edition is that it has all of Altera’s libraries built-in right out the box, so no need to mess with that, but if you’re out of luck (like myself) installing a Modelsim-Altera version compatible with Quartus II ver 9.1 will surely fail for whatever reason.

The only solution left is to use a stand-alone version of ModelSim (whichever version you hate the less) and manually compile Altera’s libraries and link them to ModelSim’s configuration files.

This is what we’re going to do:

  1. Install ModelSim.
  2. Find out which Altera libraries we need.
  3. Modify a ModelSim script to compile the libraries that we need
  4. Link the libraries to ModelSim’s configuration files.

Second Circle: Getting our hands dirty

Let’s get this show started. After successfully installing ModelSim you’ll need to browse the following directory path_to_quartus/eda/sim_lib and take note of its contents, you should identify the FPGA/device of your interest and take note of the exact filename because this is the library we will include in our automated script later (the extension could be either .vhd or .v for verilog). You can also compile for all the devices available. I’m working with VHDL, so I identify all the files with .vhd extension, it should be the same for Verilog files.

Next, download this handy script and open it with your favorite text editor (please, do not use notepad). We will run this script from ModelSim’s console to compile all our libraries. The script must include the exact same paths of the files you identified in the previous step. Be sure to modify the following line: path_to_quartus with the actual Quartus path you got.

Now, as an example, in the /eda/sim_lib directory, I’ve identified the cyclone family as a potential must-include FPGA library on ModelSim, so I’ve added the following lines to the script:

vlib cyclone
vmap cyclone cyclone
vcom -work cyclone -2002 -explicit $path_to_quartus/eda/sim_lib/cyclone_atoms.vhd
vcom -work cyclone -2002 -explicit $path_to_quartus/eda/sim_lib/cyclone_components.vhd

You can manually type or uncomment the line of your interest inside the script. Remember that # is the comment identifier on ModelSim scripts. Be sure to include stuff that actually exists in the /eda/sim_lib  directory, otherwise your compilation will fail, and that will probably make you sad for a while, but worry not, you can always try again!

Once ready, we’ll head to ModelSim’s application directory, in my case rootDrive/modeltech_6.5b inside, we will create the following subdirectories /altera/vhdl here we’ll place our nifty little script.

Fire up ModelSim and change directory to where the script is located, once there, simply type the following: do vhdl-library-setup.tcl in the console, the script should start execution. It will take a while, depending on how many libraries you’ve included inside the script.

After ModelSim is done, close it and browse to your rootDrive/modeltech_x.x/ directory, we’re looking for modelSim’s configuration file, modelsim.ini we will modify it. It might be set to read-only, be sure to modify that file attribute beforehand.

It may be wise to make a backup copy of modelsim.ini just in case things get hairy after our modifications. Ok, now take a look inside the ini file and identify the [Library] configuration block, we will add some lines after sv_std = $MODEL_TECH/../sv_std  and before the [vcom] block.

Now, we will link each library that was created in the /altera/vhdl path with a variable, this is easy, for the script provided, I’ve included the following lines:

lpm = $MODEL_TECH/../altera/vhdl/lpm

altera_mf = $MODEL_TECH/../altera/vhdl/altera_mf
altera = $MODEL_TECH/../altera/vhdl/altera
sgate = $MODEL_TECH/../altera/vhdl/sgate
altgxb = $MODEL_TECH/../altera/vhdl/altgxb
cyclone = $MODEL_TECH/../altera/vhdl/cyclone
cycloneii = $MODEL_TECH/../altera/vhdl/cycloneii
cycloneiii = $MODEL_TECH/../altera/vhdl/cycloneiii
max = $MODEL_TECH/../altera/vhdl/max
maxii = $MODEL_TECH/../altera/vhdl/maxii
stratix = $MODEL_TECH/../altera/vhdl/stratix
stratixii = $MODEL_TECH/../altera/vhdl/stratixii
stratixiii = $MODEL_TECH/../altera/vhdl/stratixiii
stratixgx_gxb = s$MODEL_TECH/../altera/vhdl/tratixgx_gxb
stratixiigx = $MODEL_TECH/../altera/vhdl/stratixiigx
stratixiigx_hssi = $MODEL_TECH/../altera/vhdl/stratixiigx_hssi
arriagx = $MODEL_TECH/../altera/vhdl/arriagx
arriagx_gxb = $MODEL_TECH/../altera/vhdl/arriagx_gxb

We’re adding just the name of the library to the path where it was compiled, again, make sure you’re adding actual directories you created in the rootDrive/modeltech_x.x/ altera/vhdl path and make sure to not repeat variable names.

Third Circle: Wrapping it up

Once done, save and start ModelSim, you should see our newly compiled libraries right in ModelSim’s main library tab. Success, you should be able to painfully debug Altera components now.


ModelSim simulating a 10 hour-long run. Note the ugly traditional wave editor.

This is all for now, tool configuration is a pain in the arse, but it is something that must be done, hopefully this guide will help you to do it as quickly as possible. I promise a more “action-oriented” post the next time.

Blog updates


During the last couple of days, I’ve been thinking of using this blog as a kind of “idea-bin” where I’d dump any thought about the projects I’m currently working on. The main reason being that I need a record/archive of my own progress to be queried and read at any time, anywhere.

So, for the following months, I’ll describe some ideas that will be heavy involved with the following themes:

Computer Science and Digital Electronics

Those are the two main areas I’m constantly moving on. Of course, all this is related to one, main, field, which basically involves the designing of a computer system, but I’d like to point out that the main areas I deal with range from hard implementation (actual circuitry) to soft abstraction (programming).

This is all for now, during the following weeks I’ll be updating this blog with random thoughts, inner monologues and reflections about the stuff I’m currently developing, hope this proves useful for anyone with similar interests, or just with a little bit of curiosity of how some things (may) work.

Rockband Stage Kit Controller for PC

“Stage Kit Controller” (SKC) is an application written in C#, free for you to use and distribute under the GNU General Public License for free software. Its primary objective is to let you control your rockband stage kit peripheral using a personal computer, via USB connection.


  1. Generates random patterns.
  2. Supports written scripts.
  3. Winamp synchronization.

The SKC is based on the Rock Band Stage Kit API originally developed by brantestes (, to run it properly you will need the xinput1_3.dll dynamic library. If your computer does not have this library, then you will need to install the DirectX redistributable, SDK, or web installer.

Quick Start:

  1. Unzip the file contents.
  2. xinput1_3.dll needs to be in the same directory as the main executable file.
  3. Run SKC(BETA).Exe

Please note: This is a beta release, a couple of bugs are expected but the main functionality of the application has been tested to work, if you have got any feedback feel free to post it down here.

Tested under Windows XP 32 bit and Windows 7 64 bit.

Download links: (SKC-BETA.rar – English version)

Site 1

Update: Spanish version available.

Una versión en castellano del programa está disponible.

“Stage Kit Controller” (SKC) es una aplicación escrita en C# de libre uso y distribución bajo la licencia GNU (General Public License) de software gratuito. El objetivo principal de la aplicación es controlar el Stage Kit de Rockband utilizando un PC, mediante conexión USB.

Características principales:

  • Generación de patrones aleatorios.
  • Archivos Script hechos por el usuario.
  • Sincronización con Winamp
Descarga  ( – versión en castellano):


The Smiling Spider