CRC is Not Cryptography

CRC is a very common error detection algorithm for data transmission. It was never designed for cryptography. However it is sometimes used in protocols such as WEP. When used for cryptographic purpose, it can lead to compromising the security of the exchanges. In this article one of possible attack against use of CRC in cryptography will be detailed.


All the code used in this article (and more) can be found in my github or here.

What is a CRC

Cyclic Redundancy Check (CRC) is an error-detecting code based on polynomial division modulo two. CRCs are used for detecting errors in a data transmission, for example in Bluetooth transmission, but it should not be used for cryptographic purposes.

A CRC (there are many CRC definition depending protocols) is defined by:

  • a polynomial P (which is \(x^8+x^7+x^6+x^4+x^2+1\) in case of CRC-8-Bluetooth)
  • an operation before and after executing the division by P

The calculation of CRC is done in a polynomial space. Therefore, the data is considered as polynomial.

For example, \("a" = 97 = 0b01100001 = x^6 + x^5 + 1\).

To generalize a series of bits \(b(n)b(n-1)...b(1)b(0)\) can be compared to the polynomial \(B = b(n)X^n + b(n-1)X^(n-1) + ... + b(1)X + b(0)\)

And the CRC will be calculated as :

  x = (op_start(B)) mod[P]
  CRC(B) = op_end(x)

With op_start and op_end two operations on polynomials.

Polynomial representation

In CRC documentation, different polynomial representation is used. The normal representation, the reversed, the reciprocal and the reversed reciprocal. The representation used depends on specification and will impact the algorithm used to compute the CRC but will not modify the result. Details on the representation are as follows:

For CRC8 that uses the polynomial \(x^8 + x^7 + x^6 + x^4 + x^2 + 1\), the different representation is as follows:

Type Equivalent to Hex Value
Normal \(x^8 + (x^7 + x^6 + x^4 + x^2 + 1)\) 0xD5
Reversed \(1 + (x^2 + x^4 + x^4 + x^6 + x^7 + x^8)\) 0xAB
Reciprocal \((1 + x^2 + x^4 + x^4 + x^6 + x^7) + x^8\) 0x57
Reversed Reciprocal \((x^8 + x^7 + x^6 + x^4 + x^2) + 1\) 0xEA

A list of the polynomial for different CRCs can be found here

Calculating a CRC

In this section I present how to compute in python CRC with polynomial of length multiple of 8. The code is inspired by the one presented in RosettaCode

class CRC:
    '''
    This CRC class works to generate crc of length multiple of 8.
    :param polynome int: polynome in reciprocal form
    :param init_xor int: first data xoring to apply on the data
    :param end_xor int: final data xoring to apply on the data
    '''
    def __init__(self, polynome, init_xor, end_xor):
        self.polynome = polynome
        self.crc_len = (len(hex(polynome)[2:])-1)//2
        self.init_xor = init_xor
        self.end_xor = end_xor
        self.crc_table = self.create_table()
        self.inv_crc_table = self.create_inv_table()

    def create_table(self):
        table = []
        for i in range(256): # for all byte value
            k = i
            for j in range(8): # compute modulus
                if k & 1:
                    k ^= self.polynome
                k >>= 1
            table.append(k)
        return table

    def crc(self, bytestring, crc_val=0):
        crc_val ^= self.init_xor
        for byte in bytestring:
            # Compute crc recursively
            crc_val = (crc_val >> 8) ^ self.crc_table[(crc_val & 0xff) ^ byte]
        return crc_val ^ self.end_xor

To compute the classical CRC32 the parameter is as follows:

polynome =  0x1db710640
init_xor = int('FF'*4, 16)
end_xor = int('FF'*4, 16)

Some properties of a CRC

CRCs have interesting properties that will be used later on. However, demonstrations of this property is left to reader attention.

  • CRC(A ^ B ^ C) = CRC(A) ^ CRC(B) ^ CRC(C), for len(A) = len(B) = len(C)

  • CRCs can be computed recursively (see code in the previous section):

    • CRC(0) = c0
    • CRC(i+1) = (CRC(i) >> 8) ^ T[(CRC(i) & 0xFF) ^ M[i]] with T a table of constant (see create_table)
  • The recursivity of CRC add some properties:
    • CRC('', val) = val
    • CRC(M, val) = CRC(M[1:], CRC(M[0], val))
    • CRC(M) = CRC(M, 0)

Reversing CRC

CRC doesn't have cryptographic purpose and it's an easy task to find two sets of data which have same CRC. In this part different method to generate different data with same CRC will be described.

Brute force approach

The first approach is by using brute force. For CRC of length l and a message M=M[0]M[1]...M[n-1] of n bits, another message with same CRC can be found by testing all possible set of values for M[n-l]..M[n-1].

This method is easy to implement but very slow. If the calculation time can be acceptable for a CRC32 it becomes very annoying for CRC64 or when you need to compute many at a time.

Non-naive approach

It is possible to do better than the naive approach. In order to do so, the CRC computation algorithm needs to be reversed.

To simplify, as CRC of 32 bits will be considered but same method can be applied for any length

As previously seen, CRC can be computed recursively with this condition * CRC(0) = c0 * CRC(i+1) = (CRC(i) >> 8) ^ T[(CRC(i) & 0xFF) ^ M[i]]

In the case of four-byte representation:

Element Bits \(B_0\) \(B_1\) \(B_2\) \(B_3\)
(CRC(i) >> 8) = B \(b_0\) \(b_1\) \(b_2\)
T[(CRC(i) & 0xFF) ^ M[i]] = T \(t_0\) \(t_1\) \(t_2\) \(t_3\)
CRC(i+1) = A \(a_0\) \(a_1\) \(a_2\) \(a_3\)
CRC(i+1) \(t_0\) \(t_1\) ^ \(b_0\) \(t_2 xor b_1\) \(t_3 xor b_2\)

The first thing to realize is that if we know CRC(i+1) we know the 8 bits on the left of T[(CRC(i) & 0xFF) ^ M[i]].

All the T values have unique values for their 8 bits on the left (T[i] >> (crc_len - 8 or \(t_0\) on the table). A reversed table T_inv can be created.

It has the following properties: * T[T_inv[i]] = i * T_inv[T[i] >> (crc_len - 8)] = i

And then the value of CRC(i) except the first 8 bits to the right (\(b_0 b_1 b_2\)) can then be known by:

(CRC(i) >> 8) = CRC(i+1) ^ T[T_inv[CRC(i+1) >> (crc_len - 8)]]

This operation can be done for all CRC(i).

Now, the values of M need to be found.

But we also know another intermediate value (for example CRC(0) which is 0 or -1 (0xFF..FF) or it could be the CRC of the N-4 bits) and we can then compute the value M[i] because we know the corresponding CRC and T Recursively we can reconstruct a message value from this point.

For the case of CRC32:

CRC(i+1) = T[(CRC(i) & 0xFF) ^ M[i]] ^
           (
            (T[(CRC(i-1) & 0xFF) ^ M[i-1]] ^
             (
              (T[(CRC(i-2) & 0xFF) ^ M[i-2]] ^
               (
                (T[(CRC(i-3) & 0xFF) ^ M[i-3]] ^
                 (
                  (T[(CRC(i-4) & 0xFF) ^ M[i-3]] ^
                  (CRC(i-4) >> 8)
                  ) >> 8
                 )
                ) >> 8
               )
              ) >> 8
             )
            ) >> 8
           )
         = T[(CRC(i) & 0xFF) ^ M[i]] ^
           (T[(CRC(i-1) & 0xFF) ^ M[i-1]] >> 8) ^
           (T[(CRC(i-2) & 0xFF) ^ M[i-2]] >> 16) ^
           (T[(CRC(i-3) & 0xFF) ^ M[i-3]] >> 24) ^
           (T[(CRC(i-4) & 0xFF) ^ M[i-3]] >> 32) ^
           (CRC(i-4) >> 40)
        = T[(CRC(i) & 0xFF) ^ M[i]] ^
           (T[(CRC(i-1) & 0xFF) ^ M[i-1]] >> 8) ^
           (T[(CRC(i-2) & 0xFF) ^ M[i-2]] >> 16) ^
           (T[(CRC(i-3) & 0xFF) ^ M[i-3]] >> 24) ^
           (T[(CRC(i-4) & 0xFF) ^ M[i-3]] >> 32)

And we can also easily prove for CRC of length n*8

CRC(i+1) = T[(CRC(i) & 0xFF) ^ M[i]] ^
           (T[(CRC(i-1) & 0xFF) ^ M[i-1]] >> 8) ^
           (T[(CRC(i-2) & 0xFF) ^ M[i-2]] >> 16) ^
           ...
           (T[(CRC(i-(n-1)) & 0xFF) ^ M[i-(n-1)]] >> (n*8))

The CRCs value at step i+1 only depends on the initial CRC value at the step i-(n-1) (which is known as it is the crc of the part of the message M[i-(n-1):i+1] that will not be modified).

To modify a text to have a specific CRC, I wrote the following code:

    def create_inv_table(self):
        inv_table = [0] * 256
        l = self.crc_len*8 - 8
        for i, elem in enumerate(self.crc_table):
            val = elem >> l
            inv_table[val] = i
        return inv_table

    def crc_inv(self, data, crc_target):
        original_crc = self.crc(data, 0)
        prev_crc = crc_target ^ self.end_xor
        index_table = []
        crc_list = [prev_crc]
        new_data = []
        l = self.crc_len*8 - 8
        for i in range(self.crc_len):
            crc_head = prev_crc >> l
            index = self.inv_crc_table[crc_head]
            index_table = [index] + index_table
            prev_crc = (prev_crc ^ self.crc_table[index]) << 8
            crc_list = [prev_crc] + crc_list
        crc_list = [original_crc ^ self.init_xor] + crc_list
        for i in range(self.crc_len):
            new_data.append((crc_list[i] ^ index_table[i]) & 0xFF)
            crc_list[i+1] = (crc_list[i] >> 8) ^ (self.crc_table[index_table[i]])
        return data + bytes(new_data)

By using the code:

>>> import crc
>>> crc = crc.CRC(0x1db710640, int('FF'*4, 16), int('FF'*4, 16))
>>> crc.crc_inv(b'', 0xDEADBEEF)
b'\xc3\xd8$\x06'
>>>> hex(crc.crc(b'\xc3\xd8$\x06'))
'0xdeadbeef'

CRC and cryptography

Even though CRC was never made for cryptographic purpose, it is sometimes used inside custom crypto (which should never be done). I could see it few times in embedded systems that I reviewed. In this part I will explain through one example why it should never be used in crypto systems.

Stream cipher

A stream cipher works in a similar way than a one-time pad (see article) A one-time pad key is generated through a cryptographic algorithm and this key is xored to the data to cipher it.

The key like that : Key = E(IV, K)|E(IV, K+1)|....|E(IV, K+N)|...

And XOR this key with the message : C = E(M, K1|K2|...|KN) = E(M, K1)|E(C, K2)...|E(C, KN)

Which can also be noted as C = E(M, K) = M ^ K

CRC should not be used for integrity

Now, let’s see with one example why a CRC should not be used for integrity check in cryptographic context. In the case of using CRC as an integrity algorithm when using stream cipher, it is possible to modify a ciphered message without modifying its CRC and therefore without being detected.

From a previous part CRC(A ^ B ^ C) = CRC(A) ^ CRC(B) ^ CRC(C), for len(A) = len(B) = len(C) Therefore for the one-time pad : C = M ^ K with len(C) = len(M) = len(K)

By calculating the CRC of the cipher and using previous relations: CRC(C) = CRC(M ^ K ^ 0) = CRC(M) ^ CRC(K) ^ CRC(0), for len(A) = len(B) = len(C)

Goal: Modify C in C2 such that CRC(C2 ^ K) = CRC(M2) = CRC(M)

C2 is obtained by modifying C xoring it with K2

CRC(C2 ^ K) = CRC(C ^ K2 ^ K)
            = CRC(C) ^ CRC(K2) ^ CRC(K)

The goal is to have:

CRC(C2 ^ K) = CRC(M2) = CRC(M)

Therefore

    CRC(M) = CRC(C) ^ CRC(K2) ^ CRC(K)
    CRC(K2) = CRC(M) ^ CRC(K) ^ CRC(C)
            = CRC(M) ^ CRC(K) ^ CRC(0) ^ CRC(0) ^ CRC(C)
            = CRC(M ^ K ^ 0) ^ CRC(0) ^ CRC(C)
            = CRC(C) ^ CRC(0) ^ CRC(C)
            = CRC(0)

To modify the cipher without modifying the cleartext CRCs a key K2 can be generated and it should verify CRC(K2) = CRC(0), which can be calculated with the algorithm described in previous part.

You can check with this example (by adding to the previous code):

cipher = lambda x,k:bytes([a^b for a,b in zip(x, k)])

crc = crc.CRC(0x1db710640, int('FF'*4, 16), int('FF'*4, 16))

message = b'CRC is so fun to break'
key     = b'Sha1 is better for crypto'
c = cipher(message, key)
#out: b'\x10:"\x12I\x1aSS\rE\x12\x01\x0bRT\tO\x10R\x06\x13\x12'
crc_m = crc.crc(message)
# '0x601784a9'
key_2 = b'\x42' * (len(message) - 4)
key_2 = crc.crc_inv(key_2, crc.crc(b'\x00'*len(message)))
# b'BBBBBBBBBBBBBBBBBB\xe9P\xc0x'
modified_c = cipher(c, key_2)
# b'Rx`P\x0bX\x11\x11O\x07PCI\x10\x16K\rR\xbbV\xd3j'
modified_m = cipher(modified_c, key)
# b'\x01\x10\x01b+1b1-b$7,b6-b \x9b5\xa1\x13'
crc.crc(modified_m)
# 0x601784a9 same as crc_m = crc.crc(message)
# We achieved our goal

In this part was proved that CRC doesn't provide any integrity in case of ciphering data with a stream cipher. The use of CRC with other cipher types wasn't detailed, however, it should not be used.

Conclusion

CRC is a easy method to detect modification in data. However, it is very easy to create other data with same CRC even if the data is ciphered and that the original data CRC is unknown. CRC should never be use to ensure data integrity in cryptographic context.

Thank you for reading this article. Hope to see you soon.

Sinople

Some Links

  • The wikipedia page on CRC
  • Code in python and many other language of CRC
  • A complete website to understang CRC.
  • Link to my github with the full project (and a bit improved) (or you can download it here but there will not be potential updates)