How gifshuffle works

This document gives a description of the encoding scheme used by gifshuffle.

The Nature of Steganography

Steganography is the science of concealing messages in a covering medium. Some historical techniques have involved invisible ink, subtle indentations in paper, and even tattooing messages under the hair of messengers. In this digital age, steganography provides means for hiding messages in digital audio files, in some kinds of images, and even for generating pseudo-English text which encodes the message.

Ideally, the original message is not noticeably degraded by the presence of a hidden message. As a result, the most effective techniques tend to make use of data that contains a lot of redundancy, such as raw audio and image files. Steganography works much less effectively, if at all, with efficient compressed formats such as JPEG and MPEG.

Unfortunately, sending large amounts of raw audio and image data can arouse suspicion, and the pseudo-English encoding schemes are not sophisticated enough to fool a human observer.

GIF Colourmap Steganography

The encoding scheme used by gifshuffle relies on the fact that the ordering of the colours in the colourmap of a GIF does not affect the way an image is displayed. Two images with a different colourmap ordering are visibly identical.

The gifshuffle program runs in two modes - message concealment, and message extraction. During concealment, the following steps are taken.

Message -> optional compression -> optional encryption -> concealment in image

Extraction reverses the process.

Extract data from image -> optional decryption -> optional uncompression -> message

Each of the steps are described in detail below.

Compression

The compression scheme used by gifshuffle is a fairly rudimentary Huffman encoding scheme, where the tables are optimised for English text. This was chosen because the colourmap encoding scheme provides very limited storage space in some situations, and a compression algorithm with low overhead was needed. In other words, short messages had to compress to even shorter data. Depending on the text, you can usually get 25 - 40% compression.

If you want to compress a long message, or one not containing standard text, you would be better off compressing the message externally with a specialized compression program, and bypassing gifshuffle's optional compression step. This usually results in a better compression ratio.

Encryption

The encryption algorithm built in to gifshuffle is ICE, a 64-bit block cipher also designed by the author of gifshuffle. It runs in 1-bit cipher-feedback (CFB) mode, which although inefficient (requiring a full 64-bit encryption for each bit of output), provides the best possible security when different messages are encrypted with the same password. Although using the same password many times is theoretically a big no-no, in the real world it often can't be avoided.

The lower 7 bits of each character in the password are packed into an array, which is used to set the encryption key. The ICE encryption algorithm can operate at different levels, with higher levels using longer keys and providing more security. The ICE level appropriate for the password length is used.

CFB mode makes use of an initialization vector (IV), which is initially set to the first 64 bits of the key encrypted by itself. Each time a bit is encrypted, the IV is encrypted, and the leftmost bit of the encrypted IV is XORed with the bit. The IV is then shifted left one bit, and the ciphertext bit is added to the right. Decryption reverses this process.

The Encoding Scheme

Consider a pack of 52 cards. There are 52 factorial ways to sort the pack, which means that any particular ordering of the cards can represent a number in the range [0, 52!-1]. In other words, given n cards, you can store approximately log2(n!) bits of information based on their ordering.

GIF images contain a colourmap with up to 256 entries, resulting in a maximum storage capacity of 1683 bits. The image itself consists of a compressed array of indices into this colourmap. To conceal a message within a GIF image the following steps take place.

  1. Start with the message you want to conceal, specified on the command line or in a file. Optionally compress and/or encrypt this message. You are then left with a sequence of 1's and 0's.
  2. Prepend a 1 to this sequence, giving you a binary number m (probably quite large).
  3. Next take a look at the GIF image that you want to conceal the message in. Count the number of unique colours in the image, and call the value n. If m > n!-1 then the message is too large, and the procedure will be aborted.
  4. The colours in the colourmap are first sorted into their "natural" order. Each RGB colour is assigned the value (red * 65536 + green * 256 + blue), and the colours are sorted according to these values, except when encryption is being used in version 2 or higher, in which case the colours will be sorted by the encrypted ciphertexts of these values. Any duplicate colours are removed.
  5. Iterate i through the values 1 .. n. Each colour n-i is allocated a target position (m mod i), then m is divided by i.
  6. Each colour (n-1) .. 0 is then in turn inserted into a new colourmap at its target position. Colours previously occupying the target position and above are moved up one place.
  7. If the size of the colourmap is greater than the number of unique colours, then the colourmap will be padded with the last colour from the original colourmap.
  8. The image component of the GIF is then uncompressed, the colour indices are re-mapped to the new colourmap, and the image is re-compressed. For animated GIFs this is repeated for each image.

Extracting a hidden message follows a similar procedure, but in reverse. The ordering of the colourmap is used to construct a binary number, which is then optionally decrypted and uncompressed before being output.


Document last modified by Matthew Kwan, 7 December 2010
Please send any comments or corrections to mkwan@darkside.com.au