Steganography using permutations. Traditional steganography hides information in noisy spaces in data, such as the low bits of a sound sample. Permutation steganography hides information in the ordering of elements of a set. For examples, the orders of entries in a database, in which pre-arranged messages are sent, of the headers of a mail message, of books on a shelf, of cards in a deck, of people posing in a picture, of cars parked in a parking lot, or anything else. The number of possible permutations of a set of n elements is n!. In order to be conveniently used for steganography, it is necessary to uniquely map each possible permutation to a number from 0 to n! - 1. That number is the message being sent. API --- A permutation is a bijective mapping from a set to itself. We represent permutations with lists of natural numbers. For example, [2, 1, 3, 4, 0] is a permutation of [4, 3, 2, 1, 0]. The programmer converts between numeric "permutations" and the particular set being used for communication. The example below demonstrates that this is not difficult in Python. permutation_to_nat(permutation) and nat_to_permutation(number, set_size) convert between permutations and natural numbers. (It is necessary to specify set_size. Only the minimum set size can be determined from the number itself.) >>> PStego.permutation_to_nat([2, 1, 3, 4, 0]) 22 >>> PStego.nat_to_permutation(22, 5) # Set has five elements [2, 1, 3, 4, 0] permutation_to_int(permutation) and int_to_permutation(number, set_size) convert between permutations and integers. Almost identical to the above functions, except integers can be negative. permuted_set_size(number) is used to see how big a set is needed to store non-negative number. factorial(set_size) - 1 gives the largest natural number representable by a set of set_size elements. string_to_nat(string) and nat_to_string(number) make it convenient to convert between numbers and strings. >>> PStego.string_to_nat("G. H. Hardy") 301496875671042235064921671L >>> PStego.nat_to_string(301496875671042235064921671L) 'G. H. Hardy' All exceptions raised are of class PStegoError, which inherits from Exception. Example ------- This code fragment demonstrates encoding a large integer message into a particular ordering of a set of cards and back. --------------- #!/usr/bin/env python import PStego suits = ['Hearts', 'Diamonds', 'Clubs', 'Spades'] vals = [str(val) for val in range(2,11) + ['Jack', 'Queen', 'King', 'Ace']] deck = [ val + ' of ' + suit for suit in suits for val in vals] # Encode msg = 71753659453958351496053366505932007005969038565701322956079243329L bynumber = PStego.nat_to_permutation(msg, len(deck)) arranged = [deck[i] for i in bynumber] # Decode bynumber_II = [deck.index(card) for card in arranged] msg_II = PStego.permutation_to_nat(bynumber_II) # Output print "Original Message:", msg print "Decoded Message:", msg_II print print "Index\tRaw No.\tCard" print "-----\t-------\t----" for i in xrange(len(deck)): print "%d:\t%d\t%s" %(i, bynumber[i], arranged[i]) Application Notes ----------------- Care should be taken to ensure that permutations are equally probable if they are to be difficult to identify. This can be inconvenient since most messages are in binary form. Nearly all crypto algorithms are designed for binary messages. One approach is to use permutations as one time pads. If the set has n objects, then the key should be a randomly chosen value from 0 to n! - 1. The message falls into the same range. Encryption is performed by adding the key and the message mod n!. If the key is a randomly chosen value throughout the domain of possible numbers, then it doesn't matter if the message is not evenly distributed. This means that at least half of the message space can be used to represent a string of bits, depending on the choice of set size. The above approach could also be used with a keyed pseudo-random number generator. It would be necessary to agree in advance on how to use it to generate evenly distributed numbers in the unusual ranges required. In effect this would be an XORing stream cipher - so be careful! Don't reuse the key. ;-) A less efficient approach is to use the prime factors of value 2 to store the binary data and mask the other factors with randomly chosen numbers. For example, in a system using ten objects we have 10! = 3,628,800 possible messages. 3,628,800 = 2^8 * 3^4 * 5^2 * 7 = 256 * 14,175, so we can store 8 bits. Let M be the message value in the range 0-255, random from the point of view of the eavesdropper. Let R be a random value in the range 0-14174. These values can be used in two ways: a. c1 = R * 256 + M which is decrypted with M = c1 mod 256 b. c2 = M * 14175 + R which is decrypted with M = c2 // 14175 8bpqsY0U3M1SLzaNXJho9jV2ftBGO5rIvclQu76Eg4CiZmnWyPFHKATRwxeDdk@ Behind the Curtain ------------------ The mapping between the set of integers and the set of permutations is accomplished by creating an alternate number system. Cartesian Products can be used when clearly describing number systems. It's important to use a precise specification because it is easy to get confused otherwise. A cartesian product is the set of ordered tuples of sets. For example, if set S = {a, b}, then a cartesian product S X S = {(a,a), (a,b), (b,a), (b,b)}. Let T = {c, d, e}. Then, S X T = {(a, c), (a, d), (a, e), (b, c), (b, d), (b, e)}. Let D be the set of decimal digits, 0 through 9. The set of three place decimal numbers could be seen as the cartesian product D X D X D. We write "123" for the element (1,2,3). Leading zeroes are traditionally dropped, so we would write "37" for (0,3,7). Let B be the set of binary digits: {0, 1}. The set of four place binary numbers could seen as the cartesian product B X B X B X B. We are most comfortable working with decimal numbers. Therefore, we will use decimal numbers to describe numbers and the digits of numbers in other bases. For example, let H be the set of hexadecimal digits. The set of four place numbers could be described by the cartesian product H X H X H X H. The number 0xBABE would be represented as (11,10,11,14). This becomes convenient when we want to convert the number to decimal - a format which is second nature to most people. In the case of 0xBABE, it is 11 * 16^3 + 10 * 16^2 + 11 * 16^1 + 14 = 47,806. Traditional number systems use the same digits in each column. For example, in base 10 the digits used are 0 through 9 and the columns are increasing integral powers of 10: ones, tens, hundreds, and so on, or <...,100,10,1>. In hexadecimal the columns are ones, sixteens, 256s, etc., or <...,256,16,1>. Note that the value of each column is actually the number of possible numbers that can be represented by the remaining digits. In the decimal system the third column is the "hundreds" because there are exactly one hundred different decimal two digit numbers: 00 - 99. If a number system does not have this property, it will either have more than one representation for certain numbers or it will be unable to represent some numbers. Neither possibility is what we want. We are used to number systems which use the same digits for each column, but to make the above points clear let's consider a strange five place number system described by this cartesian product: {0,1,2,3,4} X {0} X {0,1,2,3} X {0} X {0,1,2} What values should we assign to each column? The rightmost must be ones. The ones column has three possibilities, so the second (from the right) column is the threes. Since the second column has only one possible digit, the first two columns can express only three two place numbers, so the third column will be the threes, too. The first through third columns have 12 possible combinations, so the fourth column is the twelves. And since the fourth column has only one possible digit, the last column must be the twelves, too: <12,12,3,3,1>. The numbers from 0 through 59 can be represented by this system. For example, the number 45 is represented by (3,0,3,0,0). The system we want to use is tame by comparison. Each additional column gets one additional digit. Five place numbers are described by this cartesian product: {0,1,2,3,4,5} X {0,1,2,3,4} X {0,1,2,3} X {0,1,2} X {0,1} The columns will have values <120,24,6,2,1>, or <5!,4!,3!,2!,1!>. The numbers from 0 through 6! - 1 can be represented. The number 219 is (1,4,0,1,1). In other words, 219 = 1*5! + 4*4! + 0*3! + 1*2! + 1*1! = 120 + 96 + 2 + 1. We call these numbers "permutation numbers". Permutation numbers are useful because an n digit number can represent (n + 1)! different values, which also happens to be the number of permutations of a set of n + 1 elements. It is easy to map a permutation number to a permutation and vice versa. Let's represent 219 in terms of a permutation of a set of six objects: {0,1,2,3,4,5}. As described above, the permutation number is (1,4,0,1,1). The little puzzle is how to unambiguously choose each element from the set. The solution is to order the remaining elements of the set and reassign their values each time a digit is chosen. We start with this mapping: 0 1 2 3 4 5 {0,1,2,3,4,5} The first digit of (1,4,0,1,1) is 1, so we select the element "1" from the set, making the interim result (1). The remaining elements of the set are assigned these values: 0 1 2 3 4 {0,2,3,4,5} As the second digit of (1,4,0,1,1) is 4, we select the element "5" from the set, making the interim result (1,5). The remaining elements then look like this: 0 1 2 3 {0,2,3,4} Since the third digit is 0, we select the element "0" from the set, and so forth. The final result is (1,5,0,3,4,2). There is a cultural inconsistency between the order in which we write the digits of a number and the order in which Python displays the elements of a list; that is, when a number is written the highest valued digits are written first, but when an array is printed, the lowest indexed elements are printed first. In the code below, permutation numbers are represented in a manner natural to Python. perm_number[0] holds the least significant digit. For example, the number 3301 which can be represented by the permutation (4,3,2,2,0,1), is stored as the list [1, 0, 2, 2, 3, 4] in the Python code. (perm_number_repr(perm_number) is available to get a printable representation of permutation numbers.) References: These Wikipedia pages have a more complete discussion of these numbering systems: http://en.wikipedia.org/wiki/Permutation#Numbering_permutations http://en.wikipedia.org/wiki/Mixed_radix http://en.wikipedia.org/wiki/Factorial_number_system We do not implement a combinatorial number system, but it may be of interest all the same: http://en.wikipedia.org/wiki/Combinatorial_number_system Georg Cantor worked with mixed radix number systems: http://en.wikipedia.org/wiki/Georg_Cantor D.N. Lehmer and D.H. Lehmer both worked on permutation numbering systems: http://en.wikipedia.org/wiki/Derrick_Norman_Lehmer http://en.wikipedia.org/wiki/Derrick_Henry_Lehmer