Signing Is Not Trivial
Thursday, 15 Dec 2022
I hear something akin to the following a lot: “This problem is simply solved by (cryptographic) signing.”
Over the last half-decade, I’ve always responded in the moment with a laundry list of all of the things around signing that need to be built at whatever scale (corporate, community, Internet), so I decided to collect my thoughts about it here, so I can reference it later.
What follows is not really about cryptography at all. I would apologize in advance for this, but you really don’t want me writing about cryptography. My point of view is as builder, not cryptographer. Taking cryptographic signatures as a primitive, it talks about what kinds of things are needed build robust, usable systems that use signing to provide authenticity, integrity, and authorization controls.
Cryptographic Signatures
Of course, modulo the mathematics involved, and ensuring the cryptographic implementations are coded soundly, signing things and verifying the signatures is extraordinarily simple. If I have a public-key cryptosystem key pair of some sort, I can use the private part of the key—known only to me—to produce a signature over the cryptographic digest of some data (any old bag of bits). Having shared my public key, anyone who can retrieve the public key is able to verify that the signature was produced with the private part of the key. What could be simpler?
I suspect this is the mental model of most who assert that “this problem is simply solved by signing.” And it’s more or less correct. But it is far from complete. Let’s explore what other problems need to be solved in support of signing, especially if the problem is many-to-many and/or at scale.
Private key, Known only to me
Most every cryptographic signature mechanism depends upon the security of the private key. The private key is a chunk of binary data which needs to be present at the time, and in the environment of signing. In the simple model above, disclosure of the key allows an attacker to produce signatures that appear to be valid and come from the legitimate owner of the key. Since the key is just data which can be duplicated ad infinitum without diminishing any single copy, there is nothing to make it obvious either to the key owner or other stakeholders that this has happened.
So the very first “feature” that we owe in a secure signing system is the security of the signing environment. What makes this particularly challenging is that we usually want to make our system available to a diverse set of users using various operating systems and other tools. If the generation and management of the key material is commonly left up to the users of the system, we may need to consider building protections against the legitimate users of the system, because it isn’t always possible to govern how key material is managed.
Uncertainty about the safety of key material is one of the reasons you may see keys that expire after a certain period of time, limiting the blast radius of a key exposure. However, if you want to consider a signature to be valid after the key that produced it has expired, you need to embed authenticated time information into the signature, which involves more keys and signatures, and compounds every other problem laid out in what follows. Also, if the security of your system depends on knowing what time it is, you’ve got more problems.
One key security strategy is hardware cryptography modules, which at least make it more obvious when the user no longer has control over the key material. Another is single-use keys which are created upon a proper authentication by the user into a centralized system, and where the keys or signatures are countersigned by that system. In both of these strategies, the signing user needn’t even be exposed to the key material in order to mishandle it.
Verifying Signatures
In order to verify a signature, I need to obtain the public key of the signer. When it’s just Alice and Bob, this is straightforward. In a larger many-party system, some mechanism of discovering public key information is needed, not merely as a convenience, but as a security measure; an attacker might fool me into thinking something is signed by a legitimate party, by replacing the public key I am using for verification with a false one.
A common strategy for this is to provide a directory of public key information, called a key server. Verifying parties will need to trust this key server to provide the legitimate key material. The key-server strategy requires the verifying environment to have connectivity, which is not always viable (think IoT devices or air-gapped environments).
Public key certificates (PKI) are often a way to manage key material in such a way that signatures can be verified in air-gapped environments. The public key material is sent along with the signature, and is signed by a chain of certificates signed by one or more root certificates, which are pre-loaded into the verification environment. As long as the signature chains all the way to a root that you trust, you have some assurance that the correct public key is present for the stated identity of the signer.
Depending upon what is at stake, (per the threat model of the system. You do have a threat model, right?), any strategy for distributing public key material must authenticate users before they can publish key material to the key server. For example, paying for the key-server entry with a credit card (like you might do for your Apple developer account), or, in the case of web-server certificates, demonstrating that you have control over a domain by placing an entry in that domain’s DNS records, or placing some data in a web page in that domain.
Once the secure distribution of public keys is solved, there is still the fact that verification is done in diverse environments, where it is not possible to govern the security and hygiene of the public key material. If an attacker is able to manipulate local stores of public key material, they can trick a verifier into trusting a signature which they should not. PKI certificate trust stores are an example of such an attack vector.
Authorization
The foregoing section was primarily about making sure you have an authentic public key with which to verify that an identity produced a signature, and now you can verify the signature. That’s essentially an authentication operation. The next question I might want to answer is: why do I believe that the authenticated identity was the right one to have produced this signature. This depends on the semantics of the signature. If it’s an email from Alice to Bob, Bob can visually verify that the signature was produced by Alice.
Consider a situation where it is repeatedly proposed that signing is a solution: open source package repositories. It is suggested that many of the attacks where malicious code is substituted for genuine code in these repositories could have been prevented by signing the code. For a software project of any complexity, the dependency tree may comprise hundreds or thousands of packages from multiple repositories. Not only would obtaining the public key material and checking all of those signatures be cumbersome, something is needed to help the verifier understand that the signatures were produced by those authorized to do so (i.e. the package maintainers). As far as I am aware, highly ramified authorization like this is an unsolved problem, but that may be due to my prejudice against trying to apply blockchain to every problem I happen across.
When Things Go Wrong
Of course, all of the above is just getting the happy path to work. Keys will get compromised. Individuals will lose trust and need to be de-authourized to produce signatures for some or all of the capabilities granted by their ability to sign. Registries and key-servers will have defects. And so, systems need to be in place to manage the key material and other configurations, and the mechanisms need to be understood by systems running in signing environments as well as verification environments.
Blast radius
Some of the protective mechanisms can be somewhat automatic. By constraining the scope of how and when key material can be used, compromises can self-heal after some time, or only expose a subset of the problem space.
The easiest example to draw on is the validity period of PKI certificates. Certificates are chained in hierarchies from a heavily protected root, down to the leaf-node certificates given to unwashed masses to use. These leaf certificates often are valid for very short periods of time (hundreds of days) compared to the roots (decades), so if one is compromised it will stop being a problem eventually. Using PKI certificates to sign digital assets is complicated by this, because I generally want the signature on my asset to be valid beyond the validity period of the key itself (meaning I cannot produce any more signatures with that key, but I want continue to trust signatures that were produced while the key itself was valid). Using PKI for signing digital assets can use a technique called timestamping, where a trusted third party countersigns a signature with an authenticated timestamp, attesting that the signature was submitted to the timestamper while the key was valid.
Expiration dates on certificates cause a lot of problems. It is said that everyone has a cryptographically signed timestamp for when their website or service will stop working, embedded in a certificate somewhere. Most people get a certificate then stop worrying about it until 180 days later when stuff stops working. They scramble about, get another certificate, then stop worrying for another 180 days. Even though the glue on sticky notes continues to improve, this is a perennial problem.
It’s even worse when a custodian of public key infrastructure (like a certificate authority, ISP, or cloud provider) has an accidental expiration. The blast radius of these events can take half the internet down.
Explicit controls
Often, blast radius measures aren’t enough, and a key needs to be revoked immediately. This requires a communications channel between the key distribution mechanism and verifying systems (which may cache keys), once again requiring connectivity in order to make sure you have the most up-to-date information about which keys are valid and which are not.
Expect signature verification to fail
Another thing that can be overlooked is what happens when a signature fails to verify. This really should be among the in-band use cases, but the nuances of failed verifications are often discovered in root-cause analysis of failures due to invalid or missing signatures. Signatures can fail to verify for a variety of reasons: missing signatures, invalid key material, corrupted data, maliciously modified data, infrastructure and configuration errors. It is almost impossible to distinguish between benign and malicious causes in an automated way.
Even within a single use case or ecosystem (consider, again, the software package repository example), the stakes of accepting or rejecting assets with invalid signatures vary. For example, there’s a difference between failing at build time, when signatures are checked when packages are bound to the build, but if you’re unfortunate enough to have a system that binds dependencies at deployment, invalid signatures at this stage could cause some chaos without careful thought.
Any sufficiently complex signing system should give the verifier enough detailed information, tools, or hooks to make decisions about what happens when signatures fail. This decision making process needs to be flexible enough so that it is easy to configure across multiple verification environments, where the stakes or semantics of invalid signatures may differ.
Conclusion
Not every problem where signing is employed needs all of the features described here. When bringing signing to bear it is worth considering how simple the system can be made and still provide the usability and security desired. PKI attempts to address many of these problems, and falls pretty far short. Running a large scale PKI is a major undertaking.
App signing ecosystems like Authenticode, or Apple’s AppStore have added their own controls onto PKI, and those systems are fairly robust, though not easy for developers to learn or use. Apple has still had issues with things like infrastructure key material being distrusted.