Mihai's page

Computing the binary expression for isDigit

In the previous article we built a program that would use genetic algorithms to determine the best expression to tell if a character is a digit using only bitwise operations. We got so close, only 2 inputs were providing invalid results. However, this is not something that can be deployed, so today we will look at how to do this exactly using bits of logic.

Matching a single binary pattern 🔗

To begin with, let’s first write a simple function that would return true if the input is equal to a specific value, and false otherwise:

bool Matches(char target, char value) {
    return target == value;
}

Well, this is trivial, but we also cheated. Let’s replace equality with bitwise operations. We need some operation that given two bits \(b_1\) and \(b_2\), returns \(1\) if the bits are equal and \(0\) otherwise. This is XNOR, the negation of the exclusive or.

The XNOR logic table
\(b_1\) \(b_2\) \(x = XNOR(b_1, b_2)\)
0 0 1
0 1 0
1 0 0
1 1 1

For our function, we can combine the xor operator ^ with the negation operator ! to obtain:

bool MatchesBits(char target, char value) {
    return !(target ^ value);
}

Note that we have to use ! instead of ~ because we want to return false as soon as one bit does not match, but using ~ will still return non-zero (that is, true-ish) values.

Minimizing and matching all 10 digits 🔗

The naive way to match all 10 digits is to call MatchesBits for all 10 values, returning true if at least one of them returns true (that is, using the || operator, to also benefit from short-circuiting properties).

But this is going to be a very large expression, so let’s try to first minimize it. For problems with small number of values involved, we could use the Karnaugh map to identify adjacent blocks with the same value. However, here we have 8 bits in the input, so this method is not too practical.

Instead, we can use the Quine-McCluskey algorithm: we list all possible values together with the desired output and collapse states that differ in only a single input bit but have the same output. The differing bit will has a special “don’t care” value. In our case, we don’t even need to go over all 256 inputs, since the digits are clustered together in the ASCII table. So, we can start with a few collapsed states:

Initial input to Quine-McCluskey algorithm
x isDigit(x)
0000---- 0
0001---- 0
0010---- 0
00110000 1
00110001 1
00110010 1
00110011 1
00110100 1
00110101 1
00110110 1
00110111 1
00111000 1
00111001 1
00111010 0
00111011 0
00111100 0
00111101 0
00111110 0
00111111 0
01------ 0
1------- 0

If we group the patterns together, according to the algorithm, we reach the following result:

Final result of the Quine-McCluskey algorithm
x isDigit(x)
0000---- 0
0001---- 0
0010---- 0
00110--- 1
0011100- 1
0011101- 0
001111-- 0
01------ 0
1------- 0

We have two rows that produce the true output, we just need to combine them using ||, just like we were discussing about combining MatchesBits. But first, we still have one question to solve: how do we handle the “don’t care” bits?

We have (at least) two options, with equal results in terms of quality but different resulting expressions: we can set them all to 0 using the bitwise & operator or we can set them all to 1 using the | operator. Then, we just require this result to match a specific mask.

Let’s do the case where we use |, the other one is left as an exercise to the reader. For the first pattern, 00110---, we have to set the last 3 bits to 1, so we can do x | 0x07. The corresponding mask is thus 00110111, or 0x37. Similarly, the other branch means doing a bitwise or with 0x01 and using 0x39 as the mask. Hence:

bool isDigit(char x) {
    return !((x | 0x07) ^ 0x37) || !((x | 0x01) ^ 0x39);
}

This is the expression we represented in the first tree in the last article, the expression we wanted the genetic algorithm to discover (or an equivalent form of it). In a future article, we will look into why this was not the case in our previous exploration.


Comments:

There are 0 comments (add more):