This document gives a description of the encoding scheme used by gifshuffle.
Steganography is the science of concealing messages in a covering medium. Some historical techniques have involved invisible ink, subtle indentations in paper, and even tattooing messages under the hair of messengers. In this digital age, steganography provides means for hiding messages in digital audio files, in some kinds of images, and even for generating pseudo-English text which encodes the message.
Ideally, the original message is not noticeably degraded by the presence of a hidden message. As a result, the most effective techniques tend to make use of data that contains a lot of redundancy, such as raw audio and image files. Steganography works much less effectively, if at all, with efficient compressed formats such as JPEG and MPEG.
Unfortunately, sending large amounts of raw audio and image data can arouse suspicion, and the pseudo-English encoding schemes are not sophisticated enough to fool a human observer.
The encoding scheme used by gifshuffle relies on the fact that the ordering of the colours in the colourmap of a GIF does not affect the way an image is displayed. Two images with a different colourmap ordering are visibly identical.
The gifshuffle program runs in two modes - message concealment, and message extraction. During concealment, the following steps are taken.
Extraction reverses the process.
Each of the steps are described in detail below.
The compression scheme used by gifshuffle is a fairly rudimentary Huffman encoding scheme, where the tables are optimised for English text. This was chosen because the colourmap encoding scheme provides very limited storage space in some situations, and a compression algorithm with low overhead was needed. In other words, short messages had to compress to even shorter data. Depending on the text, you can usually get 25 - 40% compression.
If you want to compress a long message, or one not containing standard text, you would be better off compressing the message externally with a specialized compression program, and bypassing gifshuffle's optional compression step. This usually results in a better compression ratio.
The encryption algorithm built in to gifshuffle is ICE, a 64-bit block cipher also designed by the author of gifshuffle. It runs in 1-bit cipher-feedback (CFB) mode, which although inefficient (requiring a full 64-bit encryption for each bit of output), provides the best possible security when different messages are encrypted with the same password. Although using the same password many times is theoretically a big no-no, in the real world it often can't be avoided.
The lower 7 bits of each character in the password are packed into an array, which is used to set the encryption key. The ICE encryption algorithm can operate at different levels, with higher levels using longer keys and providing more security. The ICE level appropriate for the password length is used.
CFB mode makes use of an initialization vector (IV), which is initially set to the first 64 bits of the key encrypted by itself. Each time a bit is encrypted, the IV is encrypted, and the leftmost bit of the encrypted IV is XORed with the bit. The IV is then shifted left one bit, and the ciphertext bit is added to the right. Decryption reverses this process.
Consider a pack of 52 cards. There are 52 factorial ways to sort the pack, which means that any particular ordering of the cards can represent a number in the range [0, 52!-1]. In other words, given n cards, you can store approximately log2(n!) bits of information based on their ordering.
GIF images contain a colourmap with up to 256 entries, resulting in a maximum storage capacity of 1683 bits. The image itself consists of a compressed array of indices into this colourmap. To conceal a message within a GIF image the following steps take place.
Extracting a hidden message follows a similar procedure, but in reverse. The ordering of the colourmap is used to construct a binary number, which is then optionally decrypted and uncompressed before being output.