Fast DES implementation
I have implemented some bitslice DES software, similar to that
described by Eli Biham
at FSE'97. Essentially this involves breaking DES into its component logic
gates, and executing these gates as machine instructions. On 64-bit
machines, DES is executed 64 times in parallel, and according to Dr Biham,
this is 3 times faster than the previous best implementation. Even on 32-bit
machines it is still the fastest technique, although only on RISC machines
with lots of registers.
I have made a number of improvements, most of which involve specializing
the routine to perform exhaustive key search (where the plaintext and
ciphertext are known).
- Collapsed gates whose inputs are known (i.e. gates taking their
input directly from the plaintext).
- Don't actually calculate the ciphertext, but rather compare output
bits with their expected output as they are calculated. The earliest
ciphertext bits are known from the S-box outputs in round 15, and
thus most keys can be rejected after only 90% of the evaluation.
- On 64-bit machines, 64 keys are evaluated simultaneously. This
corresponds to all combinations of 6 key bits. If the 6 key bits
are all XORed with the input to the same S-box in the first round,
then that S-box will evaluate to a constant, removing another 100
gates or so.
In addition, the average number of gates required to implement an S-box
has been reduced from 100 to 88.
| S-box || S1 || S2 || S3 || S4 || S5 || S6 || S7 || S8
| Gates || 95 || 84 || 89 || 77 || 96 || 87 || 86 || 88
As a result, it runs about 1.5 times faster than Eli's version. At least,
it would if you could get your hands on a machine with over 150 registers.
Because this doesn't seem to happen in real life, and because compilers have
a lot of trouble doing efficient register allocation for 12,500 line
functions, I have written another version where each S-box is a separate
function. Although this means gates aren't collapsed, and ciphertext
output bits can only be checked four at a time, at least it compiles
efficiently, and the executable is much smaller too.
The source code is available for use in the RSA DES challenge.
Frankly, it isn't much use for anything else. However, if you do use this
source to compete, and you win, please give credit where it is due.
And, if you feel so inclined, a share of the $10,000 prizemoney
would be much appreciated.
If you do find this software useful, and decide to use it to compete
in the RSA challenge, please let me know so that I can (a) keep you
informed of any improvements, and (b) gloat to my friends.
The following distributions contain the source for the bitslice
DES function, rudimentary functions to carry out an exhaustive search,
plus a README file with a fairly comprehensive description of the code.
The separate S-box version is pretty much guaranteed to compile on all
machines, and run pretty fast. The other versions will, in theory, run
even faster, but only if you have a machine with lots and lots and lots
- Separate S-box version
- 64-bit version
- 32-bit version
Document written by Matthew Kwan, 25 April 1997
Please send any comments or corrections to