Project 1: Cryptography

Deadline: Thursday, September 19 by 11:59PM.

Before you start, review the course syllabus for the Lateness, Collaboration, and Ethical Use policies.

You may optionally work alone, or in teams of at most two and submit one project per team. If you have difficulties forming a team, post on Piazza’s Search for Teammates forum. Note that the final exam will cover project material, so you and your partner should collaborate on each part.

The code and other answers your group submits must be entirely your own work, and you are bound by the University’s Student Code. You may consult with other students about the conceptualization of the project and the meaning of the questions, but you may not look at any part of someone else’s solution or collaborate with anyone outside your group. You may consult published references, provided that you appropriately cite them (e.g., in your code comments). Don't risk your grade and degree by cheating!

Complete your work in the CS 4440 VM—we will use this same environment for grading. You may not use any external dependencies. Use only default Python 3 libraries and/or modules we provide you.

Helpful Resources

Introduction

In this project, you'll investigate vulnerable applications of cryptography, inspired by security problems found in many real-world implementations. In Part 1, you'll use a cutting-edge tool to generate MD5 hash collisions, and you’ll investigate how hash collisions can be exploited to conceal malicious behavior in software. In Part 2, we’ll guide you through attacking the authentication capability of an imaginary server API by exploiting the length-extension vulnerability of hash functions in the MD5 and SHA families. In Part 3, you'll perform cryptanalysis on a historically-popular encryption cipher to recover its secret keys. In Part 4, you’ll exploit vulnerable RSA padding to forge a digital signature.

Objectives

Understand how to apply basic cryptographic integrity and authentication primitives.
Investigate how cryptographic failures can compromise message and system security.
Appreciate why you should use HMAC-SHA256 as a substitute for common hash functions.
Understand why padding schemes are integral to cryptographic security.

Start by reading this!

Before you begin, please carefully read through the following sections for important setup information and guidelines about this project.

Working in the VM

Subtle differences in programming environments (e.g., Python version) can cause major issues for reproducing your attacks. To remedy this, we distribute a Linux-based VM with all relevant dependencies pre-installed. For all course projects, we require that your code is developed and tested within the CS 4440 VM. Working outside the VM (e.g., on your own system) can—and likely will—lead to broken or incorrect code that will cause your team to lose points!

Before continuing, be sure to complete the Course VM Setup instructions located on the Wiki. If you encounter any difficulties (and have found no success with the troubleshooting suggestions), visit office hours ASAP to get help from the course staff. It is your responsibility to get your VM working well ahead of the project deadline!

Your submitted solutions must work within the CS 4440 VM, as we will use this same environment for grading. We recommend getting your VM setup as soon as possible—ideally within the first week of the course!

Testing your Solutions

For each project component, we provide several example tests to help you assess the correctness of your code (see the What to Submit sections per task). We will also evaluate your code via several other tests not provided to you, so be sure to consider potential edge-case inputs to your code—and how to handle them accordingly!

Part 1: Hash Collision Attacks

MD5 and SHA-1 were once the most widely used cryptographic hash functions, but today they are considered dangerously insecure. This is because cryptographers have discovered efficient algorithms for finding collisions—pairs of messages with the same output values for these functions. In this exercise, you'll perform collision attacks against the vulnerable MD5 hash function.

Prelude: Replicating a Real-world Hash Collision

This component is intended as practice. You don’t need to submit anything.

The first known MD5 collisions were announced on August 17, 2004, by Xiaoyun Wang, Dengguo Feng, Xuejia Lai, and Hongbo Yu. Here’s one pair of colliding messages they published:

Message 1 (save as 1.hex):

d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89
55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70

Message 2 (save as 2.hex):

d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70

While both strings appear to be identical, they are in fact different! For instance, notice this mismatch on the 38th byte: 2fcab58 and 2fcab50. Can you spot any others?

Let's use these strings to demonstrate a hash collision attack!

In your CS 4440 VM, save the above messages as 1.hex and 2.hex, respectively.

Then convert them to binary files with the following commands: $ xxd -r -p 1.hex > 1.bin and xxd -r -p 2.hex > 2.bin.

Now, compute the MD5 hashes of both binary files: $ openssl dgst -md5 1.bin 2.bin.
Are the the same? If so, you've successfully replicated this collision on MD5!

Finally, compute the SHA-256 hashes of both files: $ openssl dgst -sha256 1.bin 2.bin.
Are they different? If so, you've proven that SHA-256 is resistant to collisions!

Prelude: Generating and Manipulating Colliding Strings

This component is intended as practice. You don’t need to submit anything.

In 2004, Wang’s method took more than 5 hours to find a collision on a desktop PC. Since then, researchers have introduced vastly more efficient collision finding algorithms. You can compute your own MD5 collisions using a tool written by Marc Stevens that uses a more advanced technique.

To make things easy, we've pre-installed this tool in your CS 4440 VM. To generate two colliding binaries, run the following command in your terminal: $ fastcoll -o file1 file2. On completion, you should notice that two files (file1 and file2) have been generated.

As before, get the MD5 hashes of both: $ openssl dgst -md5 file1 file2. Verify they're the same!

Now, get their SHA-256 hashes: $ openssl dgst -sha256 file1 file2. Verify they're different!

Suffixes: Here's an interesting property about MD5: if we were to append the same suffix to both colliding strings, then these new, longer strings will also collide!

Try this out for yourself: create a file suffix containing a random word of your choice. Then, concatenate this to each colliding file: $ cat file1 suffix > suf1; cat file2 suffix > suf2. Re-compute your MD5 and SHA-256 hashes on new files suf1 and suf2—what do you observe?

Prefixes: You can also generate colliding blobs that contain identical prefixes. However, prepending will not work here, as this would misalign how the MD5 blocks are processed!

Try this out for yourself: create another file prefix containing a random word. Then, generate two colliding blobs with this prefix: $ fastcoll -p prefix -o pre1 pre2. Re-compute your MD5 and SHA-256 hashes on new files pre1 and pre2—what do you observe?

Exploiting Hash Collisions (25pts)

Recall that your operating system (e.g., Windows, iOS, MacOS, Android, etc.) has many sensitive files critical to the functionality of your computer. Computer viruses often try to overwrite these files with their own malicious code, so to detect this, your system hashes these critical files to try and look for unauthorized tampering. What could go wrong if the hash function is vulnerable to collisions?

In this attack, you'll create two programs with identical MD5 hashes but wildly different behaviors. Start by putting these four lines (note the empty fourth line) of Python 3 code into a file called prefix:

#!/usr/bin/env python3
# coding: latin-1
MSG = bytes(r"""

Then, put the following four lines (note the empty first line) into a file suffix:


""", "latin-1")
from hashlib import sha256
print(sha256(MSG).hexdigest())

Now, generate two files with the same MD5 hash and prefix: $ fastcoll -p prefix -o col1 col2.

Then, append the suffix to both: $ cat col1 suffix > file1.py; cat col2 suffix > file2.py. Verify that file1.py and file2.py have the same MD5 hash but print different outputs.

If your programs print identical SHA-256 hash outputs, fear not—just re-run fastcoll and the above steps (several runs may be needed to get a working result). This occasionally happens when fastcoll-generated bytes interfere with the triple-quoted Python string initialized in the prefix above.

Do not open these files in a code or text editor—work with them purely using terminal commands. Because the blobs are random bytes in the middle of a source code file, an IDE may attempt to "fix" them automatically. However, this may cause your MD5 hashes to diverge.

Your task: Extend this technique to produce another pair of programs, good.py and evil.py, that also share the same MD5 hash but produce the following unique outputs:

good.py should execute a benign payload: print("I come in peace.").
evil.py should execute a pretend malicious payload: print("Prepare to be destroyed!").

Note that we may rename these programs before grading them. It is acceptable if your programs also print their SHA-256 hashes in addition to their required output.

What to Submit:

Two Python 3 scripts named good.py and evil.py that have the same MD5 hash, have different SHA-256 hashes, and print the specified messages.

Part 2: Length Extension Attacks

In most applications, you should use Message Authentication Codes (MACs) such as HMAC-SHA256 instead of plain cryptographic hash functions (e.g. MD5, SHA-1, or SHA-256). This is because hashes, also known as digests, fail to match our intuitive security expectations. What we really want is something that behaves like a pseudorandom function, which HMACs seem to approximate and hash functions do not. One difference between hash functions and pseudorandom functions is that many hashes are subject to length extension.

All the hash functions we’ve discussed use a design called the Merkle-Damgård construction. Each is built around a compression function f and maintains an internal state s, which is initialized to a fixed constant. Messages are processed in fixed-sized blocks by applying the compression function to the current state and current block to compute an updated internal state, i.e., s_i+1 = f(s_i,b_i). The result of the final application of the compression function becomes the output of the hash function.

A consequence of this design is that if we know the hash of an n-block message, we can find the hash of longer messages by applying the compression function for each block b_n+1, b_n+2, ... that we want to add. This process is called length extension, and in this exercise, you'll leverage it to exploit Merkle-Damgård-constructed hash functions!

Prelude: Length Extension in Merkle-Damgård Hashes

This component is intended as practice. You don’t need to submit anything.

To experiment with this idea, we’ll use a Python implementation of the MD5 hash function, though SHA-1 and SHA-256 are vulnerable to length extension too. You can download the PyMD5 module at cs4440.eng.utah.edu/files/project1/pymd5.py and learn how to use it by viewing its Wiki page. To follow along with these examples, run Python in interactive mode ($ python3 -i) and run command from pymd5 import *.

Consider the string "Use HMAC, not hashes". We can compute its MD5 hash by running:

>>> m = "Use HMAC, not hashes"
>>> h1 = md5()
>>> h1.update(m)
>>> print(h1.hexdigest())

Or more compactly: print(md5(m).hexdigest()).
The output should be: 3ecc68efa1871751ea9b0b1a5b25004d.

MD5 processes messages in 512-bit blocks, so, internally, the hash function pads m to a multiple of that length. This padding consists of the bit 1, followed by as many 0 bits as necessary, followed by a 64-bit count of the number of bits in the unpadded message. (If the 1 and count won’t fit in the current block, then an additional block will be added.)

You can use the function padding(count) in the PyMD5 module to compute the padding that will be added to a count-bit message.

Even if we don't know message m, we could compute hashes of longer messages of the general form m + padding(len(m)*8) + suffix by setting our MD5 function's internal state to md5(m), instead of the default initialization value, and setting the function’s message length counter to the size of m plus the padding (a multiple of the block size). To find this padded message length, find m's length (this can be guessed by an attacker!) and run padded_m_len = (len(m) + len(padding(len(m) * 8)))*8.

Hint: The above expressions m + padding(len(m)*8) + suffix and (len(m) + len(padding(len(m) * 8)))*8—despite both using the + symbol—result in a string and an integer, respectively. This is because of Python's implicit type casting. You may find the type() function helpful (see the Wiki's Python Cheat Sheet).

The pymd5 module lets you specify these parameters as additional arguments to the MD5 object:

>>> h2 = md5(
        state = "3ecc68efa1871751ea9b0b1a5b25004d", 
        count = padded_m_len
    )

Now, you can use length extension to find the hash of a longer string containing suffix "Good advice":

>>> x = "Good advice"
>>> h2.update(x)
>>> print(h2.hexdigest())

The above will execute the compression function over x and output the resulting hash. Verify that it equals the MD5 hash of m + padding(len(m)*8) + x:

>>> h3 = md5(m + padding(len(m)*8) + x)
>>> print(h3.hexdigest())

Notice that, due to the length-extension property of MD5, we didn’t need to know the value of m to compute the hash of the longer string—all we needed to know was m’s length and its MD5 hash!

Exploiting Length Extension (25pts)

Length extension attacks can cause serious vulnerabilities when people mistakenly try to construct something like an HMAC by using hash(secret || message) (note that || here just represents string concatenation).

The Central Bank of CS 4440, which is not up-to-date on its security practices, hosts an API that allows its client-side applications to perform actions on behalf of a user by loading URLs of the form:

https://cs4440.eng.utah.edu/project1/api?token=token&command=command1&command=command2&...

Bank administrators authorize actions in advance by computing a valid token using a secret 8-byte password. Upon receiving a URL request, the server checks that token is equal to:

md5(password || user=... [rest of the URL from "user=" and ending with the last command])

Assume this password is transmitted secretly—it's not contained in the URL, yet the server knows it internally and will prepend it to the user's command string when computing the MD5 digest of the message (as shown above).

Your task: Using the techniques that you learned in the previous section and without guessing the password, apply length extension to create a URL ending with &command#=UnlockAllSafes (where # is replaced with a number that indicates it is the new last command to be executed) that is treated as valid by the server. You have permission to use our server to check whether your command is accepted. A successful attack will receive message "all safes are open" from the server. To help get you started, we provide the following template:

#!/usr/bin/python3
import http.client as httplib
from urllib.parse import urlparse, quote
import sys, re
from pymd5 import *
url = sys.argv[1]

#--------------------------------------------
# TODO: Your code to modify `url` goes here!
#--------------------------------------------

parsedUrl = urlparse(url)
conn = httplib.HTTPConnection(parsedUrl.hostname,parsedUrl.port)
conn.request("GET", parsedUrl.path + "?" + parsedUrl.query)
print(conn.getresponse().read())

Hint: You might want to use the quote() function from Python’s urllib.parse module to put raw bytes into the URL. If you’re still puzzled about raw bytes, it may be useful to make a diagram of the Merkle-Damgård construction during length extension.

You may use the following URL to test your code:

https://cs4440.eng.utah.edu/project1/api?token=402a574d265dc212ee64970f159575d0&user=admin&command1=ListFiles&command2=NoOp

You should make the following assumptions:

URLs will have the same form as the sample above (one token, one user, and any number of command strings). These values may be of substantially different lengths than in the provided sample.
The input URL may be for a user with a different password, but password lengths will always be 8.
The server’s output might not exactly match what you see during testing.

What to Submit:

A Python 3 program named extend.py that:

Accepts a double-quoted URL in the same form as the one above as a command line argument (e.g., python3 extend.py "https://cs4440.eng.utah.edu/project1/api?token=...").
Modifies the URL so that it will execute the UnlockAllSafes command as the user.
Successfully performs the command on the server and prints the server’s response.

Part 3: Cipher Cryptanalysis

Before public-key cryptography, ciphers were the primary mechanism for secretly encoding messages. In lecture, you learned about two types of ciphers: transposition and substitution. In this exercise, you'll demonstrate just how easy it is to break one of history's best-known substitution ciphers: the Vigenère.

Prelude: Understanding Vigenère Ciphers

This component is intended as practice. You don’t need to submit anything.

The Vigenère cipher, named after Blaise de Vigenère, is a substitution cipher that gained significant popularity in the late 15th century. Recall from lecture that a substitution cipher (e.g., Caesar) operates by shifting plaintext letters—with the shifts defined by the key's' letters—to create the ciphertext.

For example, assume that encrypting with the key letter A results in a shift of zero (thus, no change); encrypting with B results in an increment by one place in the alphabet (e.g., X -> Y); and encrypting with C results in an increment by two places (e.g., X -> Z), and so on.

Recall from lecture that a Vigenère cipher performs shifts with the key as a repeating word. For example, for plaintext AAAAAAAA and key BCD, you can perform this cipher yourself with a pen and paper:

plain  = AAAAAAAA
key    = BCDBCDBC
shift  = 12312312
-----------------
cipher = BCDBCDBC

Vigenère Cipher Cryptanalysis (25pts)

Unfortunately, it's also really easy to break Vigenère ciphers. While 16th century cryptanalysts performed this via pen-and-paper, you will instead show how to do this programmatically!

Your task: Write a Python 3 program (decipher.py) that accepts a ciphertext string as its first and only argument, and prints its encryption key out as a single word. We will only encrypt the ciphertext with an 8-length key. Thus, your returned key must also be 8 characters long. Every plaintext message will be a sequence of English sentences fully-capitalized and concatenated (no punctuation or spaces). Thus, only the correct key should produce intelligible English on decryption.

To help you get started, we provide the following template code with several useful functions as well as the dictionary of English-language letter frequencies:

#!/usr/bin/python3
import sys
ciphertext = sys.argv[1]

# Dictionary of English-language letter frequencies.
f = {"A": .08167, "B": .01492, "C": .02782, "D": .04253, "E": .12702, "F": .02228,
     "G": .02015, "H": .06094, "I": .06966, "J": .00153, "K": .00772, "L": .04025,
     "M": .02406, "N": .06749, "O": .07507, "P": .01929, "Q": .00095, "R": .05987,
     "S": .06327, "T": .09056, "U": .02758, "V": .00978, "W": .02360, "X": .00150,
     "Y": .01974, "Z": .00074}

# Returns index for a given letter.
def index(letter):
    return sorted(list(f.keys())).index(letter)

# Returns letter for a given index.
def letter(index):
    if index > 25: index = index-26
    return sorted(list(f.keys()))[index]

#------------------------------------------
# TODO: your cryptanalysis code goes here!
#------------------------------------------

key = ""
print(key)

We will evaluate your program using a variety of ciphertexts, but you may use the following example ciphertexts and their provided keys to test the correctness of your code:

Example 1 (key = LEBOWSKI):

ELFRQVOUPIUGQHGQELIWOTYEWMOUPWKULXUVADYKLPBZHWIIYHUOHCCBZXISISLWFXIWONSWWIOHAFMWFRUSNOKTEISGKTMPLOSSWUDAHMUVWFQMCEORRWXOPEOQAGXPTWNWJVYNEIOGLWKSTRHCBZSAEMNSOWBDPHJBRAOBYENHKJOTLXFHKLRMTWTIAKVWHAJHPWNBSIPRKJOLZRBZZVYVYCLSNSLIEWPGKXDMYIOHAJSVRGPBRWBALXJCJKRIWJXOULRZZYHVLAZMDMOPQLSAAVPALLVGESMRUGEICIPIPGPGZYSSHWWMYXCMSSVBPV

Example 2 (key = OBROTHER):

MPLGXLORUSVOMMSIHVESRVYKVSVSPOSRFFECPPRTVBZBLFSLKJCZYPRUOGFFMBRVHIFIZOMKKJCZGVXSSUYSHUIPCVJSXRFLHGZFLAJZFTKMHBQLGUKFTCICOMFBZHRURJWTBJYCHSFOWHVFOEWFTBKYHXZHAWIIWMDAATQPCVJVTSPJSFKVTUKJKPERXYJLZUFHXSPPCVJVTSPJSFROVVAFBUYSKVSWCGRQHAXFBIFILLLROOUCAZSDOOPGMHVKZFDSGAWZQBEBHAXVZMPCNOSNZPEUMOMJFPRRLOECZCVPNAJVOSECMALVCCJHTJPVGJEMHBVGOUYTHYJRHFYOLCSLQIJOYLHPCVIFXDEIRUYCNNLKVFICTKQRMXZBWFIRMPLFALEIHTXFHDAVOSPGMPPCGIRZEFIWCMCCPALVAFMSGBRKCZFIKZECJBKWHU

Extra Credit: Recovering Arbitrary-length Keys (15pts)

While we previously tested your code on only 8-length keys, real-world Vigenere ciphers can certainly use keys that are longer or shorter.

Your task: Extend your decipher.py to support keys of arbitrary lengths. Be sure not to break your original code or else you will lose points!

What to Submit:

A Python 3 program decipher.py that accepts a ciphertext string as its first argument, and retrieves and prints the key used to encrypt it. Example usage and output:

$ python3 decipher.py ELFRQVOUPIUGQHGQELIWOTYEWMOUPWKULXUVADYKLPBZHWIIYHUOHCCBZXISISLWFXIWONSWWIOHAFMWFRUSNOKTEISGKTMPLOSSWUDAHMUVWFQMCEORRWXOPEOQAGXPTWNWJVYNEIOGLWKSTRHCBZSAEMNSOWBDPHJBRAOBYENHKJOTLXFHKLRMTWTIAKVWHAJHPWNBSIPRKJOLZRBZZVYVYCLSNSLIEWPGKXDMYIOHAJSVRGPBRWBALXJCJKRIWJXOULRZZYHVLAZMDMOPQLSAAVPALLVGESMRUGEICIPIPGPGZYSSHWWMYXCMSSVBPV

LEBOWSKI

Part 4: RSA Signature Forgery

A secure implementation of RSA encryption or digital signatures requires a proper padding scheme. RSA without padding, also known as textbook RSA, has several undesirable properties. One property is that it is trivial for an attacker with only an RSA public key pair (n, e) to produce a mathematically valid (message,signature) pair by choosing an s and returning (s^e,s).

To prevent attackers from being able to forge valid signatures in this way, RSA implementations use a padding scheme to provide structure to the values that are encrypted or signed. The most commonly used padding scheme in practice is defined by the PKCS #1 v1.5 standard, which defines, among other things, the format of RSA keys and signatures and the procedures for generating and validating RSA signatures. In this exercise, you'll show how a flawed padding scheme is vulnerable to signature forgery.

Prelude: Validating RSA Signatures

This component is intended as practice. You don’t need to submit anything.

You can experiment with validating RSA signatures yourself using the OpenSSL toolkit (which we've pre-installed in your VM). Create a text file called key.pub that contains the following RSA public key:

-----BEGIN PUBLIC KEY-----
MFowDQYJKoZIhvcNAQEBBQADSQAwRgJBAMvIv9XDmDGSjBYvwCUNFL7p4Fw/0Br1
MXNkZFrPs9cVTlX8CbyWs4+PdK2kzpkT8lk51/99Xubt6risHEXa43UCAQM=
-----END PUBLIC KEY-----

Confirm that the key has a 512-bit modulus with an exponent of 3. You can view the modulus and public exponent of this key by running: $ openssl rsa -in key.pub -pubin -text -noout.

Next, create a file containing only "CS 4440 rul3z!" ($ echo -n 'CS 4440 rul3z!' > myfile). Here is a base64-encoded signature of the file using the private key corresponding to the public key above.

c1cT6r1wX2xhqjKd5j2of5DbMuTRXty53bKJgzl14Ta0E6EdpQbE9
nxKyM6/4b5P496XsMQTwivHnngp+Z1SIg==

Copy the base64-encoded signature to a file named sig.b64. Now, convert the file from base64 to raw bytes ($ base64 --decode -i sig.b64 > sig). Verify the signature against the file you originally created: $ openssl dgst -sha1 -verify key.pub -signature sig myfile.

We can also use basic math operations in Python to explore this signature further. Remember, RSA ciphertexts, plaintexts, exponents, moduli, and signatures are actually all integers.

Usually, you would use a cryptography library to import a public key. However, for the purposes of this part of the assignment, you can just manually assign the modulus and exponent as integers in Python based on the earlier output from OpenSSL. You may find the following command useful: $ openssl rsa -in key.pub -text -noout -pubin | egrep '^ ' | tr -d ' :\n'

Launch Python in interactive mode ($ python3 -i) and assign the modulus and the exponent to integer variables:

# n is the modulus from the key.
# You can just assign it as a hexadecimal literal--remember to start with 0x
# It will look something like:
>>> n = 0x00cbc8bfd5c3983192 ... 1c45dae375

# e is the exponent from the key
>>> e = 3

We can also load the signature into Python. Like the modulus and the exponent, we’ll convert the signature to an integer:

>>> import base64
>>> signature = open('sig.b64').read()
# decode and convert to hexadecimal integer
>>> signature = int.from_bytes(base64.b64decode(signature), byteorder="big")

Now reverse the RSA signing operation by computing:

>>> pkcs = pow(signature, e, n)

You can print the resulting value as a 64-byte (512-bit) integer in hex:

>>> f'{pkcs:0128x}'

Python tip: This uses Python’s formatted string literal notation to return pkcs as a zero-padded (0), 128-character-long hex (x) integer. Pretty neat!

You should see something like: 0001fffff…35bf1ba974a916891f05.

Verify that the last 20 bytes of this value match the SHA-1 hash of your file:

>>> import hashlib
>>> m = hashlib.sha1()
>>> m.update(b"CS 4440 rul3z!")
>>> m.hexdigest()

The hash has been padded using the PKCS #1 v1.5 signature scheme to produce the pkcs variable you computed earlier. The signature scheme specifies that, for a SHA-1 hash with a k-bit RSA key, the value to be signed, and later verified against, will contain the following bytes:

00 01  FF FF FF ... FF  00  30 21 30 09 06 05 2B 0E 03 02 1A 05 00 04 14  XX XX XX XX ... XX
      |_______________|    |____________________________________________||__________________|
         k/8 - 38 bytes         ASN.1 "magic" bytes denoting type of hash algorithm    20-byte SHA-1 digest

The number of FF bytes varies such that the size of the result is equal to the size of the RSA key. In our implementation, with k = 2048, we can expect 2048/8 - 38 = 218 total FF bytes. Confirm that the value of pkcs you computed above matches this format.

Remember that pkcs is a result of a signature padding scheme applied to the message "CS 4440 rul3z!", following the pattern of bytes described directly above. It is used as an intermediate value before computing the signature and is not a signature in itself.

It is crucial for implementations to verify that every bit is exactly as it should be, but sometimes developers can be lazy...

Prelude: Bleichenbacher's Attack

This component is intended as practice. You don’t need to submit anything.

It’s tempting for a programmer to validate the signature padding as follows: (1) confirm that the total length equals the key size; (2) strip off the bytes 00 01, followed by any number of FF bytes, then 00; (3) parse the ASN.1 bytes; (4) verify that the next 20 bytes are the correct SHA-1 digest.

This procedure does not check the length of the FF bytes, nor does it verify that the hash is in the least significant (rightmost) bytes of the string. As a result, it will accept malformed values that have "garbage" bytes following the digest, like this example, which has only one FF:

00 01 FF 00  30 21 30 09 06 05 2B 0E 03 02 1A 05 00 04 14  XX XX XX XX ... XX  YY YY YY ... YY
            |____________________________________________||__________________||_______________|
               ASN.1 "magic" bytes denoting type of hash algorithm   20-byte SHA-1 digest    k/8 - 39 bytes

Convince yourself that this value would be accepted by the incorrect implementation described above, and that the bytes at the end labeled YY would be ignored. When an implementation uses this lenient, incorrect parsing, an attacker can easily create forged signatures that it will accept!

This is particularly troubling when RSA is used with a small exponent: e = 3. Consider the case with RSA encryption: if we encrypt an unpadded message m that's much shorter than k-bits, then m³ < n. Thus, the "encrypted" message does not "wrap around" the modulus n. In this case, RSA doesn’t provide good security, since an attacker can just take the normal cube root of the ciphertext to find the plaintext: m = c^1/3. It’s easy to reverse normal exponentiation, as opposed to modular exponentiation!

Now recall that RSA signature validation is analogous to RSA encryption. If the signature uses e = 3, the validator calculates s^e = s³ mod(n) and checks that the result is the correct PKCS-padded digest of the signed message.

Here comes the attack: for a 2048-bit key, a correctly padded value for an RSA signature using a SHA-256 hash should have k/8 - 38 = 2048/8 - 38 = 218 bytes of FFs. But what if there were only one FF as in the example shown above? This would leave space for 217 arbitrary bytes at the end of the value. The weak implementation described above would ignore these bytes!

To forge a signature that would pass this bad implementation, an attacker must find a number x such that x³ < n, and where x³ matches the format of the malformed example shown above. To do this, construct an integer whose most significant bytes have the correct format—including the digest of the target message—and set the last 217 bytes to 00. Then, take the cube root and round as necessary.

Constructing Forged Signatures (25pts)

The Central Bank of CS 4440 has a website at cs4440.eng.utah.edu/project1 for performing wire transfers between bank accounts. To authenticate each transfer request, the control panel requires a signature from a particular 2048-bit RSA key that is listed on the website’s home page. Unfortunately, this control panel is running old, unpatched software that is vulnerable to signature forgery—namely, from using (1) a small RSA exponent (i.e., e=3) as well as (2) the incomplete padding validation described above (i.e., only checking for one FF).

Your task: Using the signature forgery technique described above, write a Python 3 program that produces RSA signatures that the Central Bank of CS 4440 site accepts as valid.

You have our permission to use cs4440.eng.utah.edu/project1 to test your signatures, but when we grade your program it will not have access to the network.

We have provided a Python module with several useful functions you may wish to use in your solution: cs4440.eng.utah.edu/files/project1/pyroots.py. Learn how to use PyRoots by viewing its Wiki page. To use it, you will have to include from import pyroots *. You can start with the following template:

#!/usr/bin/python3
from pyroots import *
import hashlib
import sys
message = sys.argv[1]

#----------------------------------------------
# TODO: Your signature forgery code goes here!
#----------------------------------------------

forged_sig = ""
print(integer_to_base64(forged_sig))

Hint: You can just construct your initial message as a string (e.g., m = "0001FF...") and then convert it to an integer (e.g., int(m,16)). It's much easier this way than representing it as bytes!

What to Submit:

A Python 3 program called bleichenbacher.py that:

Accepts a double-quoted string (e.g., "cs4440+jdoe+1.23") as a command-line argument.
Prints a base64-encoded forged signature of the input string (e.g., b'MsOGI/y2cA/CW...').

Test your signatures via the CS 4440 Bank Website (be sure to setup the same transaction parameters). Note that the RSA Signature box expects only the signature itself (i.e., don't paste the starting/ending b' and ').

Submission Instructions

Upload to Canvas a tarball (.tar.gz) named project1.uid1.uid2.tar.gz, replacing your team's UIDs accordingly (if working alone, provide only your UID once). Each UID must be in u####### format. Your tarball must contain only the files listed below. These will be autograded, so make sure that your solutions conform to the expected filenames, formatting, and behaviors.

Failure to follow assignment instructions (e.g., submitting a corrupted tarball; wrong, missing, or broken code; improper formatting; etc.) will be ineligible for regrades. External dependencies are prohibited. You may use only default Python 3 libraries and/or modules we provide you. Your solutions must work as-is in the CS 4440 VM. Make sure to thoroughly test your code before submitting!

Generate the tarball in your VM terminal using this command (be sure to first cd to the directory that contains your files):

tar -zcf project1.uid1.uid2.tar.gz good.py evil.py extend.py decipher.py bleichenbacher.py

To aide in formatting, we provide the following reference template for you to fill in:

https://cs4440.eng.utah.edu/files/project1/project1.uid1.uid2.tar.gz

Project 1: Cryptography

Deadline: Thursday, September 19 by 11:59PM.

Helpful Resources

Introduction

Objectives

Start by reading this!

Working in the VM

Testing your Solutions

Part 1: Hash Collision Attacks

Prelude: Replicating a Real-world Hash Collision

Prelude: Generating and Manipulating Colliding Strings

Exploiting Hash Collisions (25pts)

What to Submit:

Part 2: Length Extension Attacks

Prelude: Length Extension in Merkle-Damgård Hashes

Exploiting Length Extension (25pts)

What to Submit:

Part 3: Cipher Cryptanalysis

Prelude: Understanding Vigenère Ciphers

Vigenère Cipher Cryptanalysis (25pts)

Extra Credit: Recovering Arbitrary-length Keys (15pts)

What to Submit:

Part 4: RSA Signature Forgery

Prelude: Validating RSA Signatures

Prelude: Bleichenbacher's Attack

Constructing Forged Signatures (25pts)

What to Submit:

Submission Instructions

Table of Contents: