Digital Signatures Explained
It’s fairly easy to get digital signatures working with web services. Just pull up the docs for your web service stack and follow the directions. Some configuration here and keystores there and you’re good to go.
But just what is happening under the covers? Digitally signing something might seem like magic but it’s rather simple conceptually even though it builds on some pretty heavy theory (mostly math, ugh!). However, in this post I’m going to talk about the concepts and leave the math to someone else.
What’s the Purpose of a Digital Signature?
Before we start decomposing the mechanics of signing data, let’s first consider what we want to use a signature for in the first place.
A digital signature allows the receiver to check if the data has been altered since the sender signed it. This function is performed via a cryptographic hash which I’ll talk about later.
Verification of the Sender
Digital signatures use private and public keys. An entity (person or process) signs the message with his/its private key and then the recipient can use the entity’s public key to verify the source.
Hey, you got your Digital Signature in my SSL!
Do these two functions seem a little like what SSL does? You’re right! SSL provides those features for data in transit while a digital signature does the same thing at the message level. SSL and digital signatures don’t work in the exact same way but they do perform similar high-level functions. One interesting difference between the two is that the digital signature stays with the message even if it’s sitting in a queue or on disk somewhere (assuming that it’s not intentionally stripped at some point).
Try that, SSL!
I’d like to mention one more thing about SSL and signatures before digging out of this SSL rabbit hole: You can use signatures and SSL at the same time. Why might you want to do this? There are several reasons:
- You want to encrypt your message over the wire. That’s SSL’s sweet spot.
- Your message gets processed by multiple machines and you want each to verify the original sender of the message. SSL can’t do this past the first recipient since it’s a point-to-point protocol. The digital signature, on the other hand, travels with the message wherever it may go.
How Digital Signatures Work
Creating signed data is a two step process. The first step is to hash the data and the second step is to sign the hash. Both of these steps are cryptographic operations but neither actually encrypts the data. Fortunately, the Java API provides classes for doing these operations so we don’t have to write any of that complex stuff. We’ll see these APIs in action in a bit.
Let’s Hash it Out
There are several algorithms for generating one-way cryptographic hashes. You’ve probably heard of MD5 and SHA-1. These algorithms take any amount of data and convert it to a fixed length byte array that can’t be reversed. That is, given the hash, you can’t determine what the input was. Additionally, two different inputs will never generate the same hash.
NOTE: Technically, both of the previous assertions are not absolutely true. The time and computing power required for reversing a hash make it unlikely. The more likely case for "reversing" a hash is to leverage a pre-computed hash dictionary which I’ll discuss briefly later. Finally, there are so called "collisions" where two different inputs can create the same hash but this situation is extremely rare).
Because of these features, hash algorithms are often used for storing passwords. Take the user’s password, hash it, and then store it in LDAP or a database. You can’t guess the password from the hash so the stored passwords are reasonably secure from prying eyes. But when the user logs in, you hash the newly supplied password and compare it against the hash on file. If they match, the user is authenticated.
I bet you already knew that stuff. The cool thing is that’s the first half of generating a signature. Before we move on to the second half, let’s have a look at how to generate a hash using the Java APIs.
MessageDigest md = MessageDigest.getInstance("SHA-1"); md.update("Corned beef hash".getBytes()); // Create the hash from the message byte hash = md.digest();
The code above leverages the MessageDigest class for hashing data. "Digest" is another word for "hash." We tell the MessageDigest object that we want to use the SHA-1 algorithm and then feed it the data using the update() method. You can call that method repeatedly until all of your data is included in the hash. Then, simply call digest() to get the fixed-length hash.
Notice that the hash is actually a byte array which would create non-printable characters. So, to show you the hash for the input data I’ll first encode the hash like this:
String encodedHash = new sun.misc.BASE64Encoder().encode(hash);
I know, I know. Don’t use the sun.* classes. I’m just saving you the trouble of not having to download something like Commons Codec in order to try this out. Just don’t use sun.* classes for production.
Anyway, now that the hash has been encoded, I can tell you that the hashed version of "Corned beef hash" is
Notice that the length of the hash is longer than the input. Let’s try hashing "Corned beef hash is not dog food"
Notice that the length of the hash is still the same. It would even be the same length for a megabyte worth of text.
Before we move on to signing data, I’d like to mention one more thing about hashing. The one-way nature of a cryptographic hash is very useful but it can bite you. Since the same input always generates the same hash for a given algorithm, a bad guy who can get hold of your hashed data might be able to use a precomputed hash dictionary to determine your original text. It’s sort of like a reverse-lookup of the hash. For example, a hash dictionary will have the hashes for common passwords such as "Password" or "ABC123" and the bad guy can just query the hash to get the corresponding input.
The remedy is to add some "salt" to the hash. A salt is just a bit of data that you’ll add to your input text when you compute the hash. This simply equates to another call to update() method. Only you know the value of this salt which acts like a simple pass key or password and negates the ability for a bad guy to determine the input data.
For example, the hash for "Corned beef hash" is TARd8ciquglqtzCGlhl/Ano8+kE= and always will be. The hash of "Corned beef hash" with a salt of "Pinch of Salt" is rEN7xxJPqyY7pkspLL902NkmJn0=. Obviously, the phrase "Pinch of Salt" would have to be kept secret.
Conceptually, we would now just sign the hash. However, with the Java API, the hashing is done for us as part of the signing process so we wouldn’t actually perform the steps above to generate the signature. Instead, we’d just use the Signature class.
Using the Signature class is a little more involved than hashing because we need a private key to actually do the signing. The corresponding public key would be used by the recipient to verify the signature later. Ideally, you would have your keys in a keystore and use them with the Signature class. For demonstration purposes I’m going to generate the keypair on the fly. Yes, I’m lazy but it also makes the pertinent signing machinery stand out better. Call it artistic license. 😉
Here’s the sample code:
// Generate a keypair which // contains the private/public keys KeyPairGenerator keyGen = KeyPairGenerator.getInstance("DSA"); keyGen.initialize(1024, new SecureRandom()); KeyPair keyPair = keyGen.generateKeyPair(); // Sign some data Signature sig = Signature.getInstance("DSA"); sig.initSign(keyPair.getPrivate()); sig.update("Sign on the dotted line".getBytes()); byte signedData = sig.sign();
The first group of code generates a sample keypair that, as shown here, will only live until it goes out of scope. It’s good enough for our purposes, though. The second group is where the action is. We tell the Signature object that we want to use the DSA (Digital Signature Algorithm) and then we load it with the private key to use as the signer. Add the text via update() just like we did for hashing and then call sign(). Just like before, we get back a byte array which is unprintable. After encoding the signed data, I can tell you that the signature for "Sign on the dotted line" looks like this:
Now, the results here are a little trickier than when hashing. If you run this code, you’ll get a different encoded string than what’s shown here. It’ll be different for two reasons:
- You’re using a different private key
- You’re running it at a different time on different hardware
When I run it again with the same private key I get
So, even though the lengths are the same the output is different. By the way, the length will be the same regardless of the input size.
Pretty cool, isn’t it? There’s a lot of stuff going on in those few lines of code. Now we have signed data but how does the recipient verify it?
Verifying Signed Data
Let’s say I send you an email that contains two lines. The first line is
Sign on the dotted line
and the second line is
Obviously, these lines represent the message and the signed message, respectively.
To see if I REALLY said "Sign on the dotted line" you would mash together my public key, the message, and the signed message to see if they align. That’s imprecise language for the process of determining if, given the message, the private key associated with the public key would produce the signed message. It’s sort of equivalent to the process of checking passwords using a hash as described above except the keys have been added to the mix.
Here’s the code for doing just that:
Signature sig2 = Signature.getInstance("DSA"); sig2.initVerify(keyPair.getPublic()); sig2.update("Sign on the dotted line".getBytes()); boolean verified = sig2.verify( "MCwCFA1+YXhgSu0xCP6lhKVO9QH5DYbcAhRQ/V5i8czHiMxL7SnyLtafZNoL9A==" .getBytes());
As before, the first line tells the Signature object which signing algorithm to use and it has to match the algorithm that was used originally to sign the message. The second line loads the public key that matches the private key used to sign the data. (Important: The recipient does NOT and should not have your private key!)
We then load the message with the update() method. Finally, we pass the signed data to the verify() method. If it returns true, the signature is verified. Changing even one character in the message or signed data will cause verification to fail which is what you want. And obviously, specifying a public key that does not match the signer’s private key will fail, too.
To sum up verification, if the message is modified in any way, verification will fail. Somebody monkeyed with the data and you were able to detect it. That’s data integrity checking. If verification fails because a mismatched public key was used, then you know that someone other than who you expected signed the message. To my knowledge, you can’t distinguish between the two causes of verification failure.
Knowing the fundamentals of digital signatures will help you to understand things that build on them such as the XML Digital Signature specification or the Java implementation of it. Here’s a re-cap and a few other take-aways:
- Signing cannot prevent changes to the message, but changes can be detected
- You can detect change, but you can’t tell WHAT changed
- The signed message can travel with the data or can be separate (for example, in the email scenario above, I could have sent you the signed data in a separate email and told you it went with the first message)
- Encryption and signatures are not the same. With encryption, you intend for the original message to be recoverable at some point.
- You can think of a signature as a very fancy checksum
- In PKI, private and public keys are mathematically related. The private key is only needed for signing and the public key is only needed for verification.
- A salt can be a constant (but still secret) value or it can be generated randomly with each use. In both cases the verifier must have access to the salt.