Hash Function Sha-1 Is Broken

One iteration within the SHA-1 compression function — A team of cryptanalysts and computer scientists in February hacked an important cryptographic tool, the hash function SHA-1.

‘SHA-1 is dead!’

The team of cryptanalysts and computer scientists that, in February, hacked an important cryptographic tool, the hash function SHA-1 (Secure Hash Algorithm 1), didn’t pull its punches in a comment on what they have achieved. The comment can be found in a pair of PDF documents that should not exist, because both PDFs have the same hash. It is a trick that enables forged authentication certificates for websites and other hacks.

Two files having the same hash is called a collision, and it is like two people having the exact same fingerprints (which has never been observed): it opens the door to all sorts of hacks and fraud. The team that ‘killed’ SHA-1, led by Marc Stevens of Centrum Wiskunde & Informatica (CWI) in the Netherlands, and Elie Bursztein, who leads Google’s anti-abuse research team, used a combination of smart algorithms and massive computing power to create this collision.

The colliding files have a specific format, P| M1| M2| S and P| M1’| M2’| S, in which the prefix P and the suffix S can be freely chosen. The actual collision is caused by M1/M1′ and M2/M2′, which are unique. Both are blocks of 256 bits that differ in 124 places. First, M1 and M1′ had to be calculated, which took about 6,500 years of computing time on hundreds of thousands of central processing units (CPUs) running in parallel.

Given M1/M1′, it became possible to calculate M2/M2′, which could be executed on graphics processing units (GPUs) that were more efficient for this task, squeezing 110 years of computing time into a mere eight days. In a joint Feb. 23 press release by CWI and Google, Bursztein explains, “We used the same infrastructure that powers many Google AI projects, including Alpha Go (which beat the world Go champion in 2016) and Google Photo, as well as Google Cloud.”

The collision found by the CWI/Google team can be built into a pair of pdf files
to show completely different content, while having the same hash.
Credit: Hector Martin

Digital fingerprint

A hash function takes a file of arbitrary length and number-crunches it into a short string of bits, the hash (160 bits in the case of SHA-1). It can be used as a digital fingerprint, safeguarding the integrity of a file. A good digital fingerprint is specific to that file, but without giving away any information about the content. A hash can be published when a private file is created, and later serve as a check that nothing in the file has been changed.

In theory, every hash function has countless collisions, because files can arbitrarily contain many characters, while the hash has a short, fixed length. So the number of hashes is limited, and the number of files is not. But the security of a hash function is in the difficulty of actually finding a collision.

A brute force attack on SHA-1—hashing random files and checking if two have the same hash—would require calculating about 2⁸⁰ (~10²⁴) hashes, a practical impossibility. Calculating M1/M1′ and M2/M2′ was equivalent to calculating 2⁶³ hashes, about 100,000 times less.

The attack is not a bolt from the blue; in 2015, Stevens warned that finding a collision might cost no more than $100,000 of rented computer time. Already in 2011, the National Institute for Standards and Technology (NIST) deprecated SHA-1 because of more theoretical safety concerns.

Much safer hash functions are available; for instance, SHA-2. Yet, according to Stevens, “Many applications still use SHA-1. Our result proves that the deprecation by a large part of the industry has been too slow and that migration to safer standards should happen as soon as possible.”

Tailor-made PDFs

In general, a hash collision is like a needle in a haystack, but because of the special structure of the file P|M1|M2|S, this needle becomes a tool for tailor-made pairs of documents.

Explains Stevens, “To turn these into meaningful files, with different visible content, we make use of the PDF and JPEG formats.” By putting the relatively small difference (248 bits) between the M-blocks in the space reserved for comment in a JPEG-file, they can make two PDFs who show almost arbitrary and different content, while their hash is the same.

He adds, “This is now also possible with our line tool. However, its possibilities are much more limited than the tool we’ll make public later.”

In accordance with good practice, cryptanalysts give users 90 days after publication of an attack to perform the necessary fixes. Within that time frame, certain technical details are withheld to keep hackers in the dark. Anyone who suspects he has a file forged with this collision can already do an on line check on the group’s website SHAttered.

Forging certificates

In particular need of an update are authentication certificates that still rely on SHA-1. Every secure (https) website comes with a certificate issued by a certification authority. With a clever combination of public key cryptography and hashing, browsers can check that the certificate is genuine and has not been tampered with in any way.

The exception, however, is if a hacker provides your browser with a forged certificate having the same hash; this makes it possible to direct you to a fake website where, thinking you are logging in to your bank account, you are giving away your user name and password.

Google’s browser Chrome had already disabled the use of SHA-1 certificates, warning users that the ‘secure’ connection to such a website may be hacked. On Feb. 24, one day after the CWI/Google team published their collision, Mozilla also disabled SHA-1 for its Firefox browser.

Website certificates stay valid for one or two years, so quite a few based on SHA-1 are still out there. In fact, at least one certification authority, Cloudflare, is still advertising and selling certificates based on SHA-1. Cloudflare could not be reached for comment on why it continues to do so, six years after NIST officially advised the industry to stop using SHA-1.

Arnout Jaspers is a freelance science writer based in Leiden, the Netherlands.