Industrial Security Part 3: About Hash and SignaturesFollow article
If you have ever programmed serial transmission, you possibly have already used a hash without knowing it. When sending data over a serial line, you want to be sure that the data you receive is the data which was sent. EMI (electromagnetic interference) and other disturbances might change the bits on their way. Even the loss of a single bit could change the message dramatically and cause fatal consequences. So to secure the message against environmental disturbances, you calculate a kind of checksum of the message. Merely adding the bits of each byte and sending their sum as an extra bit is what is called the “parity bit”. It is a speedy method to check the integrity of a single byte transferred over a serial line. Checking the parity can be done by hardware without CPU calculations. But this is not very secure; it only tells you if the byte does have an even or an odd count of 1-bits. If two bits have flipped, you get a correct parity bit although the byte is not right.
So talented engineers have invented what is called the CRC (“cyclic redundancy check”) which delivers a check value to verify the bytes over which the CRC value has been calculated. If you like to play around with CRC, you can follow this link to an online CRC calculator. On that webpage, you also find a link to the author’s explanation of how to implement CRC calculation. For all of you who don’t like maths, just take the CRC algorithms as a black-box.
This black box is an example of a hash calculator. You can think of it as a “digest algorithm”. It digests the message, and at the end, you get a much smaller message (in the case of CRC just a single value) which perfectly represents the original message. One of the most commonly used hash algorithms is called “SHA” (Secure Hash Algorithm).
I don’t know if cowpats are unique, but a perfect hash will avoid the same output for different inputs (that’s called a “collision”). This is not possible if the number of different inputs is bigger than the number of possible outputs. But in reality, you never have all possible inputs. Take a huge book. You will not find any possible combination of a 1000 letter sequence. The number of such letter combinations would never be the maximum of N raised to 1000 (where N is the number of letters allowed to use). It will be much less. And therefore it is possible to reduce any 1000 letter sequence of a book to a small number of digits which uniquely represent the sequence. Such a hash algorithm is always a one-way street. A specific message will always result in a unique hash. But there is no way to calculate the message from the hash.
Let me give you a very visual example of this principle: There are no two humans on earth who have the same fingerprint. Even monozygotic twins do not have identical fingerprints. So the fingerprint is kind of a hash of an individual. You can check the identity of the individual by comparing his fingerprints. On the other side, you cannot take a fingerprint and calculate the face of the person who matches with this fingerprint. The fingerprint does not contain the complete data of the individual it belongs to.
If you want to check a received message being identical with the sent message, just ask the sender to also send the hash together with his message. When you receive his message with its hash, you can take the message and calculate the hash from it. If your calculated hash is identical with the hash which has come together with the message, you can be very sure that the message sent was the same as the message you have received and checked. Of course, this does not protect you from someone exchanging the original message and the original hash against alternative facts (with a matching message and hash for it). I’ll show you a way to achieve this protection in part 4 of this series. We will have a look at digital signatures and certificates. Please do not hesitate to ask questions in the comment section. This security stuff is tough, and I know that we all prefer to get things running instead of torturing our brain with cryptography.
Let’s get back to asymmetric cryptography. Remember the love story from part 2? By using asymmetric keys, Helen’s love letter was very close to the concept of a digital signature. Only Helen would be able to encrypt her message with her private key. And Pete would know that Helen sent him the mail because he can decrypt it with the public key he received from Helen. But wait a moment: What if Susan would manipulate the encrypted message and send this manipulated message to Pete. Pete would be able to decrypt that letter. He would likely get something strange, not even proper words. But he could not know if Helen has learned a new language or if someone manipulated the letter.
A digital signature has to prove the authenticity of a message
So something is missing with this scenario. This time it’s not another fancy black box we need but a straightforward clue: Helen is sending her original message together with the encrypted version. When Pete receives the letter, he decrypts the encrypted part with his public key and compares the result to the original message attached to the encrypted part. If both are matching, Pete can be very sure that no one manipulated this letter on its way and that only Hellen can have written it. He can prove the “integrity” as well as the “authenticity” of the message.
A digital signature process always includes these steps:
- The sender generates an asymmetric key pair
- The sender is producing the signature using his message and his private key
- The sender “signs” his message by adding the digital signature and sends it to the receiver
- The receiver decrypts the signature with his public key
- The receiver verifies that the signature matches with the message
The DSA (Digital Signature Algorithm) scheme defines this process and is a worldwide standard.
Using the hash instead of the message
Remember the hash from part 2? I also mentioned that asymmetric encryption is very time- and energy-consuming as it needs enormous computing power when the message gets too long. So in most signature procedures, the sender does not produce the signature from the complete message but typically uses the much shorter hash of the message. The receiver then decrypts the hash using his public key. He also needs to calculate the hash of the original message (using the identical hash algorithm) and verifies the authenticity by comparing the two hashes. As an ideal hash is like a fingerprint, it can only derive from the original message.
Attacking the signature
If you spotted the weak point, you could call yourself a security expert: If a hacker finds a way to find collisions in the hash algorithm, he could exchange the original message by the colliding message (which produces the identical hash). Please forget Helen's and Pete's love story for a moment. Signatures are also used in other scenarios (which will be explained in the next part of our series). In such scenarios, Helen would send a document A to Sir Barcley. As an authority, he is reading the document's content and proving the authenticity and the content to be true. Then he is sending Helen a signature encrypted with his private key. Helen can now send the document A together with the signature to any person. The receiver would take the public key of Sir Barclay to verify the signature and the document A to be from Helen and to contain correct information. Now imagine Helen would be a fraud. Before sending the document A to Sir Barcley she has tried hard and succeeded in finding a document B with an identical hash like A has. When she gets the certificate, she exchanges A against B (which contains false information) and sends it (together with the certificate) to people. The receivers cannot detect that Sir Barclay has never seen and agreed to document B. They think he would have approved B to be correct and signed B.
So security needs to have hash algorithms which make finding collisions too hard to be done by hackers’ equipment.
Remember the “chosen-plaintext-attack” I’ve explained as a weak point of the pure RSA concept? The good news is that modern signature procedures do use Random elements to block this and other vulnerabilities.
There have been standard procedures defined for signatures, like DSA (or its “elliptic” pendant ECDSA which I will explain later). They all have included a random element. But random elements are only as good as their random number generator (“RNG”). It might look easy, but it is impossible to create a “true” random number by software. Therefore such RNGs are called PRNG (Pseudo Random Number Generator) The perfect way is to use hardware which involves random processes in nature (e.g. “random noise”). Such devices are called TRNG (True Random Number Generator).
Trusted Platform Modules
You may have heard of “Trusted Platform Modules” (“TPM”). A TPM is an electronic module capable of performing several security procedures. It needs a TRNG for some of them. The standard for TPMs does not only define its abilities and interfaces but also several details like the specification of its RNG. The specification names examples for “entropy sources” for the RNG: thermal noise or clock jitter.
In part 4, I will show you how these concepts are used in secure internet protocols. We will talk about certificates and CAs.
Now on to Part 4: Digital Certificates, CAs and Elliptic Curves