Your resource for web content, online publishing
and the distribution of digital products.
S M T W T F S
 
 
 
 
 
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
 
25
 
26
 
27
 
28
 
 

Build (And Then Crack) Your First Message Encryption Protocol With This Guide

DATE POSTED:February 17, 2025

Since ancient times, message encryption has sparked interest, particularly in the military sphere, ranging from the simple Caesar cipher to the German Enigma machine. As the number and complexity of ciphers increased, so did the interest in decryption, which, in turn, drove the creation of even more sophisticated encryption methods.

With the advent of computers, encryption became widespread, as users also sought to maintain the privacy of their data. While in the past encryption relied on letter substitution, transposition, anagrams, and pseudo-languages, today, information is transmitted using binary code consisting of zeros and ones.

One of the simplest ways to encrypt your message is to perform the XOR operation on your message in binary form, also known as exclusive OR.

The XOR operation takes two input bits (a = 0 || 1, b = 0 || 1) and returns 1 if the arguments are different and 0 if they are the same:

| a/b | 0 | 1 | |:---:|:---:|:---:| | 0 | 0 | 1 | | 1 | 1 | 0 |

\ How does XOR encrypt our message? Let's take the word "cat" — in binary representation, this word looks like:

| 01100011 | 01100001 | 01110100 | |:---:|:---:|:---:| | c | a | t |

\ We will use the letter "k" (01101011) as the key. Applying XOR byte by byte, repeating the key cyclically:

| 01100011 | 01100001 | 01110100 | |:---:|:---:|:---:| | 01101011 | 01101011 | 01101011 | | 00001000 | 00001010 | 00011111 |

\ 00001000 00001010 00011111 – the resulting encrypted message, or ciphertext.

One of the main advantages of XOR is its reversibility. That is, if we apply XOR between the ciphertext and the key, we will retrieve the original message:

| 00001000 | 00001010 | 00011111 | |:---:|:---:|:---:| | 01101011 | 01101011 | 01101011 | | 01100011 | 01100001 | 01110100 |

\ Additionally, XOR supports encryption with multiple keys due to its associativity:

\ m = Dk2​(Dk1​(Ek2​(Ek1​(m))))

\ In the application of XOR, let's consider the following example:

We take the previously obtained ciphertext and apply XOR with a key consisting only of ones.

| 00001000 | 00001010 | 00011111 | |:---:|:---:|:---:| | 11111111 | 11111111 | 11111111 | | 11110111 | 11110101 | 11100000 |

\ Now, we apply XOR between the original key k and the key consisting of all ones.

| 01101011 | 01101011 | 01101011 | |:---:|:---:|:---:| | 11111111 | 11111111 | 11111111 | | 10010100  | 10010100 | 10010100 |

\ Now, we apply XOR between the new key and the ciphertext to restore the original message.

| 11110111 | 11110101 | 11100000 | |:---:|:---:|:---:| | 10010100 | 10010100 | 10010100 | | 01100011 | 01100001 | 01110100 |

\ As a result, we retrieve the original message "cat" once again. Due to this property, XOR is widely used in various cryptographic protocols. But why not use XOR instead of popular encryption algorithms? Let's analyze the security of XOR-encrypted text.

\n Analysis of XOR-Encrypted Text

In this example, we will examine one of the methods for breaking XOR encryption if the key length is known.

Suppose we have an encrypted text (ciphertext):

00011111 00001101 00010000 00011000 01000101 00010000 00011000 01000101 00010011 00011110 00010110 00001101   01001011 00000100 01011001 00011111 00000000 00001010 00011111 01000101 00001101 00001110 00011101 00001101 01001011 00001000 00011100 00011000 00010110 00011000  00001100 00000000 01011001 00001101 00001010 00001011    01001011 00000000 00010111 00001000 00001010 00011101 00001110 01000101 00010110 00011001 01000101 00011101 00001110 00000110 00010110 00001111 00000000

Assuming that the original message consists only of lowercase Latin letters (a-z) and spaces, and that the key length is 3 bytes.

Knowing this, we can split our ciphertext into three groups, meaning the ciphertext corresponding to each byte of the key:

\

  • The first group (i % 3 == 0):

00011111 00011000 00011000 00011110 01001011 00011111 00011111 00001110 01001011 00011000 00001100 00001101 01001011 00001000 00001110 00011001 00001110 00001111

  • The second group (i % 3 == 1):

00001101 01000101 01000101 00010110 00000100 00000000 01000101 00011101 00001000 00010110 00000000 00001010 00000000 00001010 01000101 01000101 00000110 00000000

  • And the third group (i % 3 == 2):

00010000 00010000 00010011 00001101 01011001 00001010 00001101 00001101 00011100 00011000 01011001 00001011 00010111 00011101 00010110 00011101 00010110

Since Latin letters and spaces are encoded in a single byte, and each byte can take one of 256 values, we can brute-force the possible key values.

If the decrypted group contains only a-z characters and spaces, we can assume that the corresponding key byte is found.

Python Example for Finding the Key: \n

def binaryToChar(binary_str): return chr(int(binary_str, 2)) def xorWithByte(binaryStr, keyByte): result = "" for i in range(8): bit = int(binaryStr[i]) keyBit = (keyByte >> (7 - i)) & 1 result += str(bit ^ keyBit) return result def isCorrectByte(binaryGroup, keyByte): for byteAsString in binaryGroup.split(): resultBinary = xorWithByte(byteAsString, keyByte) resultChar = binaryToChar(resultBinary) if resultChar != ' ' and not ('a' <= resultChar <= 'z'): return False return True binaryGroup1 = "00011111 00011000 00011000 00011110 01001011 00011111 00011111 00001110 01001011 00011000 00001100 00001101 01001011 00001000 00001110 00011001 00001110 00001111" binaryGroup2 = "00001101 01000101 01000101 00010110 00000100 00000000 01000101 00011101 00001000 00010110 00000000 00001010 00000000 00001010 01000101 01000101 00000110 00000000" binaryGroup3 = "00010000 00010000 00010011 00001101 01011001 00001010 00001101 00001101 00011100 00011000 01011001 00001011 00010111 00011101 00010110 00011101 00010110" key = "" for binaryGroup in [binaryGroup1, binaryGroup2, binaryGroup3]: neededKeyByte = 0 for keyByte in range(256): if isCorrectByte(binaryGroup, keyByte): neededKeyByte = keyByte break keyBitAsString = '' for i in range(8): keyBit = (neededKeyByte >> (7 - i)) & 1 keyBitAsString += str(keyBit) key += keyBitAsString + ' ' print(key)

\ As a result, we obtain the key 01101011 01100101 01111001, which corresponds to the string "key".

After applying XOR to the ciphertext using this key, we retrieve the original message:

this is just a test text message for encode or decode

\n Given that in ASCII encoding, one byte corresponds to one character, brute-force attacks are relatively easy. However, what if the original message is in UTF-8 or UTF-32 and contains emojis or Cyrillic characters?

When encrypting text in encodings other than ASCII (e.g., UTF-8 or UTF-32), brute-force attacks become more complex, since a single character can consist of multiple bytes.

One way to analyze such encrypted text is by creating a byte dictionary for each letter. For example, the Cyrillic letter "Ш" in UTF-8 is encoded as 11010010 10101000.

If, during decryption, we obtain these bytes (or parts of them), we can hypothesize that decryption is proceeding in the right direction. However, the final confirmation is possible only after analyzing the entire character sequence.

If the encrypted data represents a file, its structure can assist in decryption. Many file formats (e.g., PNG, PDF, ZIP) contain distinctive headers, which help identify the beginning of the file and possible patterns in the ciphertext. This makes decryption significantly easier if the key was reused.

However, if the key length matches the original message length and was used only once, decryption becomes impossible. For example, if the word "cat" was encrypted using the key "dog":

| 01100011 | 01100001 | 01110100 | |:---:|:---:|:---:| | 01100100 | 01101111 | 01100111 | | 00000111 | 00001110 | 00010011 |

Since the key is never reused, an attacker cannot determine the original message — it could be "cat", "yes", "run", or any other combination of the same length. The only information that can be inferred is the message length (24 bits). However, if the key is used more than once, cryptanalysis becomes possible, which is why a one-time pad (OTP) remains secure only when strictly following its usage rules. The encryption method where the key is equal in length to the message and used only once is known as the one-time pad. This method was first proposed by Gilbert Vernam and Joseph Mauborgne in 1917.

\ As we can see, knowing the key length allows us to break the ciphertext. But the question arises — how can we determine this length?

Of course, we can still rely on brute force, iterating through key lengths from 1 to 100 bytes, but there are more elegant methods. Let's explore key length detection using Hamming distance.

Hamming distance is a measure of difference between two strings of equal length, indicating how many characters in one string need to be changed to obtain the other.

Simply put, it represents the number of positions where the strings differ.

For example, the Hamming distance between the strings 10101010 and 10001110 is 3, as they differ in three bits.

How does this help? When encrypting with a repeating byte, patterns emerge, causing similar symbols to have short Hamming distances.

The algorithm works as follows:

  1. Select a hypothetical key length n (e.g., from 1 to 40).
  2. Divide the ciphertext into blocks according to the chosen key length.
  3. Calculate the Hamming distance between blocks and compute the average distance.
  4. To compare different key lengths, normalize the Hamming distance by dividing it by the block size.
  5. The key length for which the normalized Hamming distance is minimal is the most probable key length.

However, the Hamming distance method does not always provide an accurate result. Let's look at a code snippet that calculates Hamming distances for various key lengths:

strBinary = '00011111 00001101…00000000' binaryByByte = strBinary.split() distance_map = {} def hammingDistance(byte1, byte2): return sum(bit1 != bit2 for bit1, bit2 in zip(byte1, byte2)) for keyLen in range(1, 40): count = 0 totalDist = 0 for i in range(len(binaryByByte)): for j in range(i + 1, len(binaryByByte)): if (j - i) % keyLen == 0: totalDist += hammingDistance(binaryByByte[i], binaryByByte[j]) count += 1 distance_map[keyLen] = totalDist / count if count else 0 sortedMap = {k: v for k, v in sorted(distance_map.items(), key=lambda item: item[1])} print(sortedMap)

\n wherestrBinary is the ciphertext described earlier.

The output will look like this:

{39: 2.0, 36: 2.411764705882353, 21: 2.441860465116279, 18: 2.4423076923076925, 12: 2.4456521739130435, 6: 2.519230769230769, 24: 2.5294117647058822, 3: 2.5361990950226243, 27: 2.5384615384615383, 9: 2.576923076923077, 23: 2.6216216216216215, …}

As we know, the actual key length key is neither 39 nor 36, and the eighth position from the minimum value does not correspond to the true minimum Hamming distance. So, what is happening here?

To understand this, let's first consider an "ideal" ciphertext for Hamming distance analysis.

We encrypt the string: dogdogdogdogdogdogdogdogdogdogdogdogdogdog using the key key.

The output of the program for calculating the Hamming distance of this ciphertext will look like this:

{3: 0.0, 6: 0.0, 9: 0.0, 12: 0.0, 15: 0.0, 18: 0.0, 21: 0.0, 24: 0.0, 27: 0.0, 30: 0.0, 33: 0.0, 36: 0.0, 39: 0.0, 1: 1.36 …

\ Indeed, the key key is identical to keykey, keykeykey, and so on (following the XOR encryption algorithm). The Hamming distance in this case will be 0, because with a repeating key, each bit of the key XORs with the same bit of the plaintext every time.

With arbitrary text, each byte of the key XORs with different bytes of the ciphertext, causing small variations in the Hamming distance. However, "pseudo-keys" (the key repeated nnn times) will still produce minimal Hamming distances, even if they are not the true key length. Solution: Using the Greatest Common Divisor (GCD). To eliminate the influence of pseudo-keys, we can compute the Greatest Common Divisor (GCD) of the first few minimal Hamming distances. If the ciphertext is long enough, the true key length will be a divisor of the values with minimal distances. In our case, the GCD of 39, 36, 21, etc. is 3, which matches the actual key length.

\ The algorithm is effective when the ciphertext length is significantly greater than the key length. However, if the text is short, the probability of errors increases, and the GCD may result in 1, making it difficult to determine the key length. This occurs because the number of key repetitions decreases, reducing the patterns available for analysis, thus making key length detection more challenging.

\ Other methods for determining the key length also rely on the property of XOR encryption with a repeating key. Therefore, the only truly secure way to encrypt data using XOR is by employing a one-time pad (OTP). Modern cryptographic algorithms, although they use XOR, avoid key repetitions, making them resistant to such attacks. Examples include AES, ChaCha20, and others. The primary techniques to prevent key repetition are:

  • In block encryption (AES) — different keys are used for each block, which are derived from the base key.
  • In stream encryption (ChaCha20) — a unique key stream is generated, also based on the base key.

A detailed description of these algorithms is beyond the scope of this article.

\ Thus, XOR remains an important tool in cryptography, but its security depends on proper usage. Modern encryption algorithms account for its weaknesses and implement more complex encryption schemes.

\