Mihai's page

New hindent release comes with SLSA guarantees

Through a series of events some years ago, I became the maintainer of hindent, one of the first pretty printers for Haskell. This is an autoformatter, similar to others found in other language ecosystems (e.g., Black for Python).

Earlier this week, I released hindent-6.0.0. The new release increases the major number mostly due to the change in the parser from haskell-src-exts to ghc-lib-parser. This increases reliability of hindent when GHC changes as ghc-lib-parser is decoupled from the compiler.

While I will write a longer post about hindent at a later time, this post is about another feature added to this release, one feature that I intend to add to all and every package that I maintain: the release also provides SLSA attestation for the source distribution and the binary.

This blog post explains what all of this means – at a very high level as I’m planning to do a series of deep dives in future articles –, why we need it and how to use the attestation to validate the supply chain involved in building the release.

What are the SLSA attestations? 🔗

So, let’s begin. First, let’s look at what is being offered. If you go to the release page, you will find 7 assets listed at the bottom of the page, as shown below:

The 7 release assets on latest hindent release

Going from the bottom, two of the artifacts, Source code (zip) and Source code (tar.gz), are generated automatically by GitHub upon creating the release tag. These are supposed to contain the code as it was at the tag reference, using git archive. For a long time, it was assumed that these are stable archives, but recent events have shown that this assumption is not valid. Since we can make no guarantees about them, let’s ignore them for the rest of the post.

There are 2 other tarballs added to the release page, as part of the release process. These are the tarball with the source code (that is, hindent-6.0.0.tar.gz), and the generated documentation (i.e., hindent-6.0.0-docs.tar.gz). These are stable, will never change (unless by manual intervention). The first tarball is generated by cabal sdist and is also uploaded to the Hackage page of the release via cabal upload. It is different than the tarball provided by GitHub (mentioned in the previous paragraph): it only contains the files listed in the project’s Cabal file, as these are the only ones that will get uploaded to Hackage.

The docs tarball is optional, but recommended. Whereas Hackage infrastructure could generate the documentation for most packages, it is sometimes recommended to do it manually and upload the corresponding tarball. Since I did that for hindent, I also uploaded it to the release page.

Next, we have one binary file, hindent, the executable associated with the project. In general, this is compiled via cabal build (or stack build, etc.) and should be tied to the operating system and the platform it is being built on. However, since hindent is a code formatter, I envision creating a GitHub action to automatically format / check code formatting on PRs for any other project. In this case, instead of waiting for the entire compilation process to finish for the tool, we can reuse the same executable, by specifying the same runs-on: in the corresponding workflows. This is future work, for now we just have the binary.

Finally, there are 2 jsonl files. These represent the SLSA attestations for the tarballs (i.e., source-distribution.intoto.jsonl), and for the binary (i.e., executables.intoto.jsonl). These are the main focus of this article. We could have generated a single attestation to cover both the binary and the tarballs, but we will see later in the article that there is a difference in what can be determined about these in the absence of SLSA attestations. In other words, there is a different amount of security uplift generated by these two files, even though they have similar structures.

Before going deeper into this, let’s first see what is the content of these files.

What do the attestations contain? 🔗

The attestations are JSON files, we can inspect them using jq. The root document is a DSSE document containing 3 keys:

$ cat source-distribution.intoto.jsonl | jq keys
[
  "payload",
  "payloadType",
  "signatures"
]

Let’s punt discussing the signatures section for later in the article and also jump over the payloadType contents (which is just "application/vnd.in-toto+json", but I didn’t plan to discuss in-toto in this article).

The payload field is actually a base64 encoded JSON document, with 4 different keys:

$ cat source-distribution.intoto.jsonl | jq -r .payload | base64 -d | jq keys
[
  "_type",
  "predicate",
  "predicateType",
  "subject"
]

The _type and predicateType fields are just simple constants that specify that this document is a SLSA attestation, so we’ll skip over them.

The subject field documents what the attestation talks about:

$ cat source-distribution.intoto.jsonl | jq -r .payload | base64 -d | jq .subject
[
  {
    "name": "hindent-6.0.0-docs.tar.gz",
    "digest": {
      "sha256": "0f19dc56cb10447bb396caf28a3f369a4b1e88e6987901e904d1a97804a4b7c0"
    }
  },
  {
    "name": "hindent-6.0.0.tar.gz",
    "digest": {
      "sha256": "7fa9eb4ad8f767fe9608f1e01c0ba7a90a999c8efc0f6ed7d8dfe24f965cf39e"
    }
  }
]
$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .subject
[
  {
    "name": ".execs/hindent",
    "digest": {
      "sha256": "49413fe4e6b71cde476464fbfd9f2ebd43ab2fa2838193a848d7c7878346ac86"
    }
  }
]

We see the 2 tarballs in source-distribution attestation and the executable in the executables ones. Each subject is also paired with its digest, in this case using SHA-256 algorithm (and we can verify that these are correct by running sha256sum over the corresponding files).

The important part of all these attestations is stored in the predicate field. For SLSA, the contents of the predicates are specified in the SLSA specification. In our case, we have:

$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .predicate | jq keys
[
  "buildType",
  "builder",
  "invocation",
  "materials",
  "metadata"
]

The builder and buildType identify that the attestation has been generated via the SLSA GitHub Generator action, using the v1.4.0 version of the generator.

$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .predicate.buildType
"https://github.com/slsa-framework/slsa-github-generator/generic@v1"
$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .predicate.builder
{
  "id": "https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.4.0"
}

This is a reusable workflow, that cannot be modified by the owner of the repository generating the attestation. This means, I am not able to control both the binary and the attestation at the same time. Keep in mind this property, as it will be useful later.

The invocation field is quite long, so I’ll only filter to some relevant parts of it:

$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .predicate.invocation
{
  "configSource": {
    "uri": "git+https://github.com/mihaimaruseac/hindent@refs/tags/v6.0.0",
    "digest": {
      "sha1": "a9c2898b9ac8f13e57092e3d5287e05b942d4539"
    },
    "entryPoint": ".github/workflows/release.yaml"
  },
  "environment": {
    "github_event_name": "release",
    "github_event_payload": {
      "release": {
        "author": {
          "id": 323199,
          "login": "mihaimaruseac",
        },
        "body": "## Major changes\r\n* The file parser ...",
        "tag_name": "v6.0.0",
        "target_commitish": "master",
      },
      "repository": {
        "description": "Haskell pretty printer",
        "git_url": "git://github.com/mihaimaruseac/hindent.git",
      },
    },
    "github_ref": "refs/tags/v6.0.0",
    "github_run_id": "4229863410",
    "github_run_number": "2",
    "github_sha1": "a9c2898b9ac8f13e57092e3d5287e05b942d4539"
  }
}

The field attests to the following:

The metadata field is short but still provides one essential piece of information: the tarballs and the binary were not built in a way that can attest that re-building them would result in the exact same contents, byte-for-byte:

$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .predicate.metadata
{
  "buildInvocationID": "4229863410-2",
  "completeness": {
    "parameters": true,
    "environment": false,
    "materials": false
  },
  "reproducible": false
}

As we see, the provenance documents link the artifacts to the way they were produced, providing enough information to track back the entire provenance and determining properties of the entire build process.

Why are the SLSA attestations useful? 🔗

Having this provenance in a human readable format allows for humans to look at the provenance and take policy decisions based on the amount of trust and other security policies existing on the user side. While likely not the case here, you can imagine a very security conscious company saying that they only allow running software that has been build by a certain set of users, in a reproducible way, etc.

Even a normal user could take the binary or the source tarball from the release page, compute its checksum and check manually that it matches with the digest specified in the provenance file. This is in general useful for the binary, but not so much for the tarballs.

Although you could download the source tarball from the Hackage page and get exactly the same hash (since I uploaded the tarballs generated by the release action), this is not something that currently happens in Haskell. In general when one runs cabal install or cabal unpack, the tarball is downloaded directly for Hackage via HTTPS. Attacking the entire Hackage infrastructure to change a single project is not feasible. Interested persons can run cabal unpack to download the source code, unpacked and then compare it with the state of the repository at the commit tag. So, an immediate question arises: maybe there is no need for a provenance file for the tarballs?

That could have been true if the provenance only listed the checksum of the tarball and nothing else. The presence of all other information allows one to look at all other relevant aspects of the software supply chain.

Furthermore, all of this is supposed to be verified via automation. The JSON documents are both human readable but also machine readable. So CISOs, etc. could create security policies that look at specific fields in the provenance and decide if the software is allowed to be used within a company or not. But even casual users can benefit from this.

Enter slsa-verifier. It is a Go program that allows verifying the provenance associated with a (set of) path(s). For our case, after cloning the repository, building, and installing the code we can run the following commands:

$ slsa-verifier verify-artifact --provenance-path executables.intoto.jsonl --source-uri github.com/mihaimaruseac/hindent hindent
Verified signature against tlog entry index 13814736 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a649f9ebb0ca1c39cbd59e3a7db1529c74720f2cfa10a21fdfee90137fafbe5bc
Verified build using builder https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.4.0 at commit a9c2898b9ac8f13e57092e3d5287e05b942d4539
Verifying artifact hindent: PASSED

PASSED: Verified SLSA provenance

$ slsa-verifier verify-artifact  --provenance-path source-distribution.intoto.jsonl --source-uri github.com/mihaimaruseac/hindent hindent*.tar.gz
Verified signature against tlog entry index 13814544 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a52f1ea2f8ba724b794edc1f0744832f91ea9660ae052059d718dabdc04657d8e
Verified build using builder https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.4.0 at commit a9c2898b9ac8f13e57092e3d5287e05b942d4539
Verifying artifact hindent-6.0.0-docs.tar.gz: PASSED

Verified signature against tlog entry index 13814544 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a52f1ea2f8ba724b794edc1f0744832f91ea9660ae052059d718dabdc04657d8e
Verified build using builder https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.4.0 at commit a9c2898b9ac8f13e57092e3d5287e05b942d4539
Verifying artifact hindent-6.0.0.tar.gz: PASSED

PASSED: Verified SLSA provenance

We have verified both the binary and the tarballs contained in the release page! We can do the same for the tarball downloaded from Hackage, if we want to make sure that that is exactly the one that got uploaded:

$ slsa-verifier verify-artifact  --provenance-path source-distribution.intoto.jsonl --source-uri github.com/mihaimaruseac/hindent hindent-6.0.0-hackage.tar.gz
Verified signature against tlog entry index 13814544 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a52f1ea2f8ba724b794edc1f0744832f91ea9660ae052059d718dabdc04657d8e
Verified build using builder https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.4.0 at commit a9c2898b9ac8f13e57092e3d5287e05b942d4539
Verifying artifact hindent-6.0.0-hackage.tar.gz: PASSED

PASSED: Verified SLSA provenance

This could be incorporated into automation anywhere. While, as mentioned above, for the source tarball there is not much uplift, for the binary there is some uplift: This is useful when hindent will become available as a GitHub action that can be run on another repository’s presubmit without needing to first compile the code again, as mentioned earlier in the article. If there was no way to verify the provenance of the binary in an automated way, someone having access to the hindent repository could have replaced the binary at any moment in time and thus take control over all GitHub Actions workflows that use the binary!

To see how this could be prevented, consider the case of this test release which I have altered some weeks after the release – during this time, the SLSA provenance changed slightly, so the slsa-verifier invocation below needs a change to source-uri flag. Downloading all artifacts and the provenance files and running slsa-verifier results in the following:

$ slsa-verifier verify-artifact  --provenance-path built-with-stack-attestation.intoto.jsonl --source-uri git+https://github.com/mihaimaruseac/slsa-lvl3-generic-provenance-in-haskell-example built-with-stack
Verified signature against tlog entry index 9712595 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a5b9492fe82102c6fa4d5805e5fc35b6f15d0f424ab307f488efd1aecfcc893de
Verifying artifact built-with-stack: FAILED: expected hash '09129e587b40e52db87e1df1bc75989a3f272df290d49b7349fcd4ea6afacb0f' not found: artifact hash does not match provenance subject

FAILED: SLSA verification failed: expected hash '09129e587b40e52db87e1df1bc75989a3f272df290d49b7349fcd4ea6afacb0f' not found: artifact hash does not match provenance subject
exit status 1

Trying to be sneaky failed here. When I changed the built-with-stack binary on the release page (by deleting the existing one and uploading built-with-cabal under the same name), I could not regenerate the provenance file. This is by design and it is what guarantees that the provenance is unfalsifiable.

How does this work? 🔗

There was one field of the provenance that we have not looked at yet. This is the signatures field of the DSSE envelope:

$  cat source-distribution.intoto.jsonl | jq .signatures
[
  {
    "keyid": "",
    "sig": "MEYCIQDfY8cD3xVVcCcbJ/TIuSEEcvRE0pr8jKqLZ/kpLaZuUgIhAPkAQJWbFKqZZb/WkCgSaUyKx97RicWBZ4ZDYN4mdxIT",
    "cert": "-----BEGIN CERTIFICATE-----\nMIIDvTCCA0OgAwIBAgIUOb8+vATMILKl0HfomkAFriOjp7swCgYIKoZIzj0EAwMw\nNzEVMBMGA1UEChMMc2lnc3RvcmUuZGV2MR4wHAYDVQQDExVzaWdzdG9yZS1pbnRl\ncm1lZGlhdGUwHhcNMjMwMjIxMDY0MzQ4WhcNMjMwMjIxMDY1MzQ4WjAAMFkwEwYH\nKoZIzj0CAQYIKoZIzj0DAQcDQgAE8KXd+1JbMe24zB95FUChMr+FxhsAM4h5WBUk\nMZORdto2rMT9dO90N+clEh2iTUW9LAVDH7xMIspe/yQQKGCi5qOCAmIwggJeMA4G\nA1UdDwEB/wQEAwIHgDATBgNVHSUEDDAKBggrBgEFBQcDAzAdBgNVHQ4EFgQU2WGb\n1fCHuiMBri6rqXPU6Sgw0wswHwYDVR0jBBgwFoAU39Ppz1YkEZb5qNjpKFWixi4Y\nZD8wgYQGA1UdEQEB/wR6MHiGdmh0dHBzOi8vZ2l0aHViLmNvbS9zbHNhLWZyYW1l\nd29yay9zbHNhLWdpdGh1Yi1nZW5lcmF0b3IvLmdpdGh1Yi93b3JrZmxvd3MvZ2Vu\nZXJhdG9yX2dlbmVyaWNfc2xzYTMueW1sQHJlZnMvdGFncy92MS40LjAwOQYKKwYB\nBAGDvzABAQQraHR0cHM6Ly90b2tlbi5hY3Rpb25zLmdpdGh1YnVzZXJjb250ZW50\nLmNvbTAVBgorBgEEAYO/MAECBAdyZWxlYXNlMDYGCisGAQQBg78wAQMEKGE5YzI4\nOThiOWFjOGYxM2U1NzA5MmUzZDUyODdlMDViOTQyZDQ1MzkwFQYKKwYBBAGDvzAB\nBAQHUmVsZWFzZTAjBgorBgEEAYO/MAEFBBVtaWhhaW1hcnVzZWFjL2hpbmRlbnQw\nHgYKKwYBBAGDvzABBgQQcmVmcy90YWdzL3Y2LjAuMDCBiQYKKwYBBAHWeQIEAgR7\nBHkAdwB1AN09MGrGxxEyYxkeHJlnNwKiSl643jyt/4eKcoAvKe6OAAABhnK2zZYA\nAAQDAEYwRAIgP+rdz8fCEUnPD/LuSzKYcCPpIP8m52WDJGrIU0sV7u0CIH3DOXyl\nfxDrWgge0FjB3Tc6wO3Ttfe6S8RHPh8AnzyVMAoGCCqGSM49BAMDA2gAMGUCMQDF\nPDjXGxOZlOWatkjXhnFWScQCeOHTI5DeF076KYBNrrfDjsC+7nNwGtdlbq/cLPQC\nMHCVeWACAolUxnB/SEmwTvptptHnmZoqojF3cgy5hlbyVm8kaK8UH03JSP9eVnAK\nPQ==\n-----END CERTIFICATE-----\n"
  }
]

You also see a mention of signature verification in the slsa-verifier commands above, for example:

Verified signature against tlog entry index 13814544 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a52f1ea2f8ba724b794edc1f0744832f91ea9660ae052059d718dabdc04657d8e

All these attestations are signed using a short-lived certificate generated by Sigstore. These ephemeral certificates are stored in a transaction log so other projects and people can observe when new get added, how long they are valid for, etc. This process is very similar to how Let’s Encrypt handle generating SSL certificates for most HTTPS sites in the world, including this one.

I am planning to go into more details about Sigstore and the entire security concerns regarding the software supply chain in future articles. Stay tuned.


Comments:

There are 0 comments (add more):