~3 minutes reading đź•‘

Wed, 26 August 2015

← Back to articles

Signing collection to ensure data integrity

Lire la version française


Translated from the French by RĂ©my Hubscher.

Within the scope of the Go Faster project, we want to distribute Firefox updates more often than just with major updates (which happens once every six weeks).

There are multiple kinds of data that we will need to update on the client; for example, we would like to update the SSL Control Revocation List (named OneCRL).

In that specific case, it is obviously very important to make sure that all the retrieved data are coming from Mozilla and that no records are missing. We can then be absolutely confident that nobody has attempted to invalidate a valid certificate to the list (or remove one that has been revoked).

A cryptographic signature can give us the guarantee that all the records where fetched but it is still possible to prevent users from completely accessing the service (the Great Firewall of China, for example).

This mechanism works in our specific use-case of the OneCRL update project, but it can also be useful to any future project that needs to make sure that all records on a collection have been correctly synced. We will probably need to re-use it to update other parts of Firefox (or Fennec), but you may also want to use it for your projects.

We plan to use Kinto in order to distribute the data (or meta-data) associated with the files that need to be updated. It's a good fit because it can be cached easily behind a CDN.

That said, we don't want our users to trust either the CDN or the Kinto server itself without checking. Somebody can attack the CDN or the Kinto server and may add or remove records and update the CRL. If you think about it is a horrible scenario.

Consider the following work-flow:

  • The person responsible for updating the CRL, the updater, has got a private key (or even better an HSM) which will enable them to sign a hash of the collection records.
  • The corresponding public key ships with the client (Firefox or Fennec).
  • Hashing and signature generation are done on client side to prevent certain attacks vectors (if somebody can access the Kinto server, for example).

The hashing is a one-way operation that guarantees the same result given the same input.

First data issuance on Kinto

All the data are fetched from a secure source and converted into a JSON collection. Each record is assigned a unique ID generated on client side.

For instance, we could have the following record:

{"id": "b7dded96-8df0-8af8-449a-8bc47f71b4c4",
 "fingerprint": "11:D5:D2:0A:9A:F8:D9:FC:23:6E:5C:5C:30:EC:AF:68:F5:68:FB:A3"}

Then the collection hash is computed, signed, and then sent to the Kinto server. (See below for details.)

The signature process is deported to a specific service in a sandbox that ensures the certificate security which is crucial in the process.

How to validate data integrity?

First, we need to fetch the collection records as well as the hash and the signature.

Then, we can validate that the hash signature is valid to make sure it has been generated by a trusted source.

Finally we can serialise our local collection and compute its hash to make sure it matches the signed one.

Update the collection data

When you need to create, read, update or delete records in the collection, the client needs to make sure that the local collection records match those of the remote server, and that they are valid.

Once we are confident that the collection update is valid, the client can compute the new collection hash and sign it.

How to compute the collection hash?

To compute the collection hash you need a reproducible algorithm.

For instance one could be:

  1. Sort records by their ids.
  2. Serialise fields giving the value of each keys sorted by key.
  3. Compute the hash of the records list serialization.

We do not yet know the exact algorithm that we will be using.

An interesting candidate could be the JSON Web Signature standard. Meanwhile, a naive Python implementation could look like this:

import json
import hashlib

data = [
   {"id": "b7dded96-8df0-8af8-449a-8bc47f71b4c4",
    "fingerprint": "11:D5:D2:0A:9A:F8:D9:FC:23:6E:5C:5C:30:EC:AF:68:F5:68:FB:A3"},
   {"id": "dded96b7-8f0d-8f8a-49a4-7f771b4c4bc4",
    "fingerprint": "33:6E:5C:5C:30:EC:AF:68:F5:68:FB:A3:11:D5:D2:0A:9A:F8:D9:FC"}]

m = hashlib.sha256()
m.update(json.dumps(data, sort_keys=True))
collection_hash = m.hexdigest()

Here is a little sketch to summarise:

Summary schema of the collection signing flow.
Revenir au début