When implementing the DES S-boxes it was very important to minimize the number of logic gates in the design. The scenario is slightly different from traditional hardware design due to the type of gates available. In hardware, tri- and quad-input gates are relatively cheap, inversion is sometimes free, and XOR gates are fairly expensive. However, in software, the only gates available are NOT and dual-input AND, OR, and XOR, and they all execute at the same speed.

Basically, the technique I used was to take two of the S-box input bits, expand them to all 16 possible functions of two variables, and use the remaining four S-box inputs to select from those 16 functions. However, the details are slightly more complicated.

Think of an S-box output bit as being a function of 6 boolean variables. This can be implemented by using one of the variables to select between two functions of the remaining 5 variables. To generate a function of 5 variables, use one of those variables to select between two functions of 4 variables. And so on, down to functions of 2 variables. There are only 16 functions of 2 variables, and they can all be generated with total of 12 gates (assuming they are all needed).

The selection between two functions **Fa** and **Fb**, if you use
standard multiplexing, requires 3 gates (not counting the inversion of
the selection bit, which only has to be done once per S-box)

However, using the XOR operator, you can carry out a modified form of selection with two gates

where **Fc** = **Fa** XOR **Fb**

Without optimization, the number of gates required to implement all 4 output bits of an S-box is 132 ; 12 to generate all functions of two inputs, plus 4 x 2 x (8 + 4 + 2 + 1) gates for the selections. However, with optimization this can be significantly reduced.

The first form of optimization occurs when a function of *N* variables
is a constant 1 or a constant 0. For example, if Fc has a value of 1
regardless of its inputs, the selection function can be reduced to

Similarly, other reductions are possible when **Fc** = 0,
**Fb** = 0, or **Fb** = 1.

The second form of optimization involves the elimination of common
subexpressions. In other words, before a function of *N* variables
is built you first check if it has already been generated.
If it exists, you re-use it, and no extra gates are required.

To obtain the minimum number of gates, it was not obvious which input variables should be used as selection bits at the various stages, so all 6! (720) orderings were used to generate optimized designs, and the one with fewest gates was chosen. As shown below, the resulting S-boxes required an average of just under 88 gates

S-box | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 |
---|---|---|---|---|---|---|---|---|

Gates | 95 | 84 | 89 | 77 | 96 | 87 | 86 | 88 |