One-Time Pad Strikes Once Never Twice

One-time pad, also called Vernam cipher, is often considered as the strongest cryptography as long as you use the key only once. In this article the properties of a one-time pad will be analysed as well as attacks against it.


What is a one-time pad

Also call Vernam cipher in honour of one of its creators, the one-time pad is a substitution cipher with a key as long as the clear text. It was mostly used during the gold age of spies (a.k.a. the cold war) by the agents of both sides. Very simple, it has the advantages to be easily used with only pen and paper. Absolutely no need of a computer.

The one-time pad is a substitution cipher. Every character of the clear text is changed following the corresponding character of the key.

It can be defined in a more mathematical way as follows:

Given an alphabet A of n characters (represented by their position in the alphabet, e.g.: 0 the 1st letter, 1 the 2nd letter, ...).
Given a clear message M of length l, a key K of length l and C the ciphered message resulting by the encryption of M by K.
M, K, C are composed of characters of A
Let defines C(k), M(k), K(k) respectively the k-th character of the ciphered message, the clear message and the key.
The ciphered message can be obtained by : C(k) = (M(k)+K(k)) % n
The clear message can be obtained by : M(k) = (C(k)-K(k)) % n

Let’s give an example with :

  • A the Latin alphabet with all letters from A to Z
  • M = WHAT IS THE KEY
  • K = VERNAM CIPHER

In this case we shift the letter as many times as the value indicated by the letter in the key. For example, the letter B becomes C if key is B, A if key is Z, or B if key is A

cleartext  W  H  A  T  I  S  T  H  E  K  E  Y
clearcode  22 7  0  19 8  18 19 7  4  10 4  24
key        V  E  R  N  A  M  C  I  P  H  E  R
key code   21 4  17 13 0  12 2  8  15 7  4  17
ciphercode 17 11 17 6  8  4  21 15 19 17 8  15
ciphertext R  L  R  G  I  E  V  P  T  R  I  P

Nowadays, the operation is done on ASCII character by applying a xor operation on cleartext characters and key characters. The xor operation is the same as an addition modulo two when ciphering, and it's a subtraction modulo two when deciphering.

Let's define in python a simple onetimpad function that will be used on the other part

(NB: the codes in this article are only here for POC purpose.)

latin = 'abcdefghijklmnopqrstuvwxyz'

def onetimepad(text, key, action, alphabet=latin):
    n = len(alphabet)
    textcode = [alphabet.index(c) for c in text]
    keycode = [alphabet.index(c) for c in key]
    rescode = []
    for i in range(min(len(text), len(key))):
        if action == 'cipher':
            rescode.append((textcode[i] + keycode[i])%n)
        elif action == 'decipher':
            rescode.append((textcode[i] - keycode[i])%n)
        else:
            raise ValueError("%s is not a valid action" % action)
    res = ''.join([alphabet[x] for x in rescode])
    return res

We can easily verify that the output of the function corresponds to the previous example and that it can easily decipher the code:

>>> onetimepad('whatisthekey', 'vernamcipher', 'cipher')
'rlrgievptrip'
>>> onetimepad('rlrgievptrip', 'vernamcipher', 'decipher')
'whatisthekey'

An unbreakable cryptography ?

The Vernam cipher is often considered as a cryptography that can't be broken (no one can know the message content without the key). Let's first give a simple definition for perfect secrecy:

Given a key randomly generated, a cipher is perfectly secret if no information except message length can be extracted from the ciphertext. In other words, such a cipher is unbreakable. And this is the case with one-time pad. Even if we find the key, there is no possibility to know that it's the real key.

Let's see why with an example:

Today a cipher arrived at the National Spy Agency the following cipher text is:

rlrgievptrip

The first agent said: "The key, it is: 'vernamcipher' and the message is 'whatisthekey'"
The second agent said: "The key, it is: 'whatisthekey' and the message is 'vernamcipher'"
The third agent said: "The key, it is: 'zltzetkbadwl' and the message is 'sayhellotome'"
The fourth agent said: "The key, it is: 'fhnnbwjwfoir' and the message is 'meethimtoday'"

We can verify with the python output

>>> onetimepad('rlrgievptrip', 'vernamcipher', 'decipher')
'whatisthekey'
>>> onetimepad('rlrgievptrip', 'whatisthekey', 'decipher')
'vernamcipher'
>>> onetimepad('rlrgievptrip', 'zltzetkbadwl', 'decipher')
'sayhellotome'
>>> onetimepad('rlrgievptrip', 'fhnnbwjwfoir', 'decipher')
'meethimtoday'

We can see that we have multiple messages with an understandable output. What is the correct message then? This is impossible to know if the key is generated randomly because any messages of the length of the cipher can be a valid message (and therefore any key of the same length).

Attack against two (or more) time pad

The one-time pad is unbreakable if the key is randomly generated and use once. However, if the key is used twice, it becomes possible to start some cryptanalysis and reveal partially or completely the content of the messages.

The principle is easy, if we know a part of the first message, we can retrieve the corresponding part of the key then the part of second message. To break the cipher, a dictionary of words that might appear in first and second message (day of the message, sender name, receiver name, common words, ...) will be applied respectively on message one and two. If the corresponding cleartext of messages two and one make sense, it's probably a good guess. This is done for every word of the dictionary and at the same time, the words that are partially recovered can also be guessed or completed. Of course there won't be 100% chance that both messages are the original one, however, if they both make sense within a certain context they can be considered as correct.

This is called the crib technique.

Let's make a small python script help us to do the crib.

def crib(cipher1, cipher2, word_in_clear2, fix_part_of_clear2=None):
    '''
    :param cipher1 str: Cipher we want to decipher
    :param cipher2 str: Cipher were we already know some content
    :param word_in_clear2 str: Word we want to crib against cipher1
    :param fix_part_of_clear2 str: clear content we know of the message 2.
    Unknown letters are represeted by an underscore
    '''
    len_min = min(len(cipher1), len(cipher2))
    if fix_part_of_clear2 in (None, ''):
        fix_part_of_clear2 = '_'*len_min
    key = word_in_clear2
    len_key = len(key)
    diff = onetimepad(cipher1, cipher2, 'decipher')
    fix_part_of_clear1 = ''
    for i, c in enumerate(fix_part_of_clear2):
        if c != '_':
            fix_part_of_clear1 += onetimepad(diff[i], c, 'cipher')
        else:
            fix_part_of_clear1 += '_'
    for i in range(len_min):
        if fix_part_of_clear2[i:i+len_key] == '_'*len_key:
            crib = onetimepad(diff[i:i+len_key], key, 'cipher')
            clear = fix_part_of_clear1[:i] + crib
            clear += fix_part_of_clear1[i+len_key:]
            clear += '_'*(len_min-len(clear))
            print(clear)

Let's apply it on a small example with Alice, Bob and Eve.

Eve could intercept some message exchanged between Alice and Bob.

The message sends to Alice is:

>>> print(enc1)
'pypkxcidqulgiwpnwajkvsolanbu'

The message sends to Bob is:

>>> print(enc2)
'pcfnkkymcawssnlhwzaxjohazjdqw'

Eve knows that they both used the same key for ciphering their message. Eve is really happy, Alice and Bob messages can be revealed. Let’s see how she proceeds.

>>> crib(enc2, enc1, 'alice', '____________________________')
apyfr_______________________
_eblpm______________________
__qovkc_____________________
___dyqaq____________________
____ntgou___________________
_____ijust__________________ # seems to have some meaning: I just ? let's try it
______yxyrm_________________
_______mbxko________________
________qaqmo_______________
_________ptsmc______________
__________ivsae_____________
___________kvgct____________
____________kjirs___________
_____________ylxqa__________
______________aawyk_________
_______________pzeiu________
________________ohosk_______
_________________wryif______
__________________gbodm_____
___________________qrjkx____
____________________gmqvv___
_____________________btbtl__
______________________iezjr_
_______________________tcppd 
>>> crib(enc1, enc2, 'bob', '_____ijust__________________')
__lloalice__________________
# We founds this one. Could it be 'hello Alice' ?
>>> crib(enc2, enc1, '',    'helloalice_______________')
hibobijust__________________
# Seem that the guess was quite good.
# let's crib with the word the which is very common in English
>>> crib(enc2, enc1, 'the',    'helloalice_______________')
hibobijustbro_______________
# Hi bob I just broke something ? let's also crib with 'the'
>>> crib(enc1, enc2, 'the', 'hibobijustbroke_____________')
helloalicethemeeti__________
# Wow we find a lot
>>> crib(enc2, enc1, '', 'helloalicethemeeting________')
hibobijustbrokethetw________
>>> crib(enc1, enc2, '', 'hibobijustbrokethetwo_______')
helloalicethemeetingi_______
>>> crib(enc2, enc1, '', 'helloalicethemeetingisat____')
hibobijustbrokethetwotim____
# two time pad maybe ??
>>> crib(enc1, enc2, '', 'hibobijustbrokethetwotimepad')
helloalicethemeetingisatnine

# The message send to Alice is probably
'helloalicethemeetingisatnine'
# And the one send to Bob
'hibobibroketodaythetwotimepad'
# The key is 
'iuezjcxvoqszekljdswenaosnfoqt'
# which mean we can break the next message if they use it

With this small example, you could see that it's not too difficult to find the content of the message ciphered with Vernam cipher and using the same key. Of course with more complex file, for example ciphered pdf or word documents it may be more complicated but by using the same principle partly or all the information can be retrieved.

This is not only a small example that could be used to solve CTF, but it's really used in practice. During cold war, some Russian messages were broken during the Venona project which leads to find spies that was selling nuclear secrets to the USSR. More info about it on the links at the end.

You should never reuse a one-time pad key !!!

Other problems

In previous section you learned that Vernam cipher is unbreakable if the keys aren't reused. You will then probably ask, "If I don't reuse the keys, I can use Vernam cipher for all my messages, right?" Here are indeed some problems that make it difficult to use in everyday life. They won't be explained in detail in this article but they might be explained later and I will then add the links on this page.

First of all, the key exchange is problematic. How can you easily send keys that must be at least as long as the messages send ? It might be easy when you can directly meet the person or when you can use embassy power, but to communicate with your penpal on the other side of the world it might be troublesome. And keys can't be reused as we saw previously.

Another problem is generating truly random keys and explaining how to do it and why it's complicated would probably need a few articles.

Conclusion

Vernam cipher is a simple cipher and unbreakable as long as the keys are used only once, for this reason it's also called one-time pad. Using a key multiple time can lead to reveal messages content by using the crib method. Moreover, the difficulty to send keys that need to be at least as long as the messages make this cipher not practical for everyday use.

Thank you for reading this article. Hope to see you soon.

Sinople.

Some Links