Bitslice DES

Fast DES implementation

I have implemented some bitslice DES software, similar to that described by Eli Biham at FSE'97. Essentially this involves breaking DES into its component logic gates, and executing these gates as machine instructions. On 64-bit machines, DES is executed 64 times in parallel, and according to Dr Biham, this is 3 times faster than the previous best implementation. Even on 32-bit machines it is still the fastest technique, although only on RISC machines with lots of registers.

I have made a number of improvements, most of which involve specializing the routine to perform exhaustive key search (where the plaintext and ciphertext are known).

In addition, the average number of gates required to implement an S-box has been reduced from 100 to 88.

S-box S1 S2 S3 S4 S5 S6 S7 S8
Gates 95 84 89 77 96 87 86 88

As a result, it runs about 1.5 times faster than Eli's version. At least, it would if you could get your hands on a machine with over 150 registers. Because this doesn't seem to happen in real life, and because compilers have a lot of trouble doing efficient register allocation for 12,500 line functions, I have written another version where each S-box is a separate function. Although this means gates aren't collapsed, and ciphertext output bits can only be checked four at a time, at least it compiles efficiently, and the executable is much smaller too.


The source code is available for use in the RSA DES challenge. Frankly, it isn't much use for anything else. However, if you do use this source to compete, and you win, please give credit where it is due. And, if you feel so inclined, a share of the $10,000 prizemoney would be much appreciated.

If you do find this software useful, and decide to use it to compete in the RSA challenge, please let me know so that I can (a) keep you informed of any improvements, and (b) gloat to my friends.

The following distributions contain the source for the bitslice DES function, rudimentary functions to carry out an exhaustive search, plus a README file with a fairly comprehensive description of the code.

The separate S-box version is pretty much guaranteed to compile on all machines, and run pretty fast. The other versions will, in theory, run even faster, but only if you have a machine with lots and lots and lots of registers.

Source files

Document written by Matthew Kwan, 25 April 1997
Please send any comments or corrections to