New hindent release comes with SLSA guarantees
Through a series of events some years ago, I became the maintainer of
hindent
, one of the first pretty printers for Haskell. This is an
autoformatter, similar to others found in other language ecosystems (e.g.,
Black for Python).
Earlier this week, I released hindent-6.0.0
. The new release
increases the major number mostly due to the change in the parser from
haskell-src-exts
to ghc-lib-parser
.
This increases reliability of hindent
when GHC changes as ghc-lib-parser
is decoupled from the compiler.
While I will write a longer post about hindent
at a later time, this post is
about another feature added to this release, one feature that I intend to add
to all and every package that I maintain: the release also provides
SLSA attestation for the source distribution and the binary.
This blog post explains what all of this means – at a very high level as I’m planning to do a series of deep dives in future articles –, why we need it and how to use the attestation to validate the supply chain involved in building the release.
What are the SLSA attestations? 🔗
So, let’s begin. First, let’s look at what is being offered. If you go to the release page, you will find 7 assets listed at the bottom of the page, as shown below:
Going from the bottom, two of the artifacts, Source code (zip) and Source
code (tar.gz), are generated automatically by GitHub upon creating the
release tag. These are supposed to contain the code as it was at the tag
reference, using git archive
. For a long time, it was assumed that these are
stable archives, but recent events have shown that this
assumption is not valid. Since we can make no guarantees about them, let’s
ignore them for the rest of the post.
There are 2 other tarballs added to the release page, as part of the release
process. These are the tarball with the source code (that is,
hindent-6.0.0.tar.gz
), and the generated documentation (i.e.,
hindent-6.0.0-docs.tar.gz
). These are stable, will never change (unless by
manual intervention). The first tarball is generated by cabal sdist
and is
also uploaded to the Hackage page of the release via cabal upload
. It is different than the tarball provided by GitHub (mentioned in the
previous paragraph): it only contains the files listed in the project’s Cabal
file, as these are the only ones that will get uploaded to
Hackage.
The docs tarball is optional, but recommended. Whereas Hackage infrastructure
could generate the documentation for most packages, it is sometimes
recommended to do it manually and upload the corresponding tarball. Since I
did that for hindent
, I also uploaded it to the release page.
Next, we have one binary file, hindent
, the executable
associated with the project. In general, this is compiled via cabal build
(or stack build
, etc.) and should be tied to the operating system and the
platform it is being built on. However, since hindent
is a code formatter, I
envision creating a GitHub action to automatically format / check code
formatting on PRs for any other project. In this case, instead of waiting for
the entire compilation process to finish for the tool, we can reuse the same
executable, by specifying the same runs-on:
in the corresponding workflows.
This is future work, for now we just have the binary.
Finally, there are 2 jsonl
files. These represent the SLSA attestations for
the tarballs (i.e., source-distribution.intoto.jsonl
), and for the binary
(i.e., executables.intoto.jsonl
). These are the main focus of this article.
We could have generated a single attestation to cover both the binary and the
tarballs, but we will see later in the article that there is a difference in
what can be determined about these in the absence of SLSA attestations. In
other words, there is a different amount of security uplift generated by these
two files, even though they have similar structures.
Before going deeper into this, let’s first see what is the content of these files.
What do the attestations contain? 🔗
The attestations are JSON files, we can inspect them using jq
. The root
document is a DSSE document containing 3 keys:
$ cat source-distribution.intoto.jsonl | jq keys
[
"payload",
"payloadType",
"signatures"
]
Let’s punt discussing the signatures
section for later in the article and
also jump over the payloadType
contents (which is just
"application/vnd.in-toto+json"
, but I didn’t plan to discuss
in-toto in this article).
The payload field is actually a base64 encoded JSON document, with 4 different keys:
$ cat source-distribution.intoto.jsonl | jq -r .payload | base64 -d | jq keys
[
"_type",
"predicate",
"predicateType",
"subject"
]
The _type
and predicateType
fields are just simple constants that specify
that this document is a SLSA attestation, so we’ll skip over them.
The subject
field documents what the attestation talks about:
$ cat source-distribution.intoto.jsonl | jq -r .payload | base64 -d | jq .subject
[
{"name": "hindent-6.0.0-docs.tar.gz",
"digest": {
"sha256": "0f19dc56cb10447bb396caf28a3f369a4b1e88e6987901e904d1a97804a4b7c0"
}
},
{
"name": "hindent-6.0.0.tar.gz",
"digest": {
"sha256": "7fa9eb4ad8f767fe9608f1e01c0ba7a90a999c8efc0f6ed7d8dfe24f965cf39e"
}
}
]
$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .subject
[
{"name": ".execs/hindent",
"digest": {
"sha256": "49413fe4e6b71cde476464fbfd9f2ebd43ab2fa2838193a848d7c7878346ac86"
}
}
]
We see the 2 tarballs in source-distribution
attestation and the executable
in the executables
ones. Each subject is also paired with its digest, in
this case using SHA-256 algorithm (and we can verify that these are correct by
running sha256sum
over the corresponding files).
The important part of all these attestations is stored in the predicate
field. For SLSA, the contents of the predicates are specified in the SLSA
specification. In our case, we have:
$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .predicate | jq keys
[
"buildType",
"builder",
"invocation",
"materials",
"metadata"
]
The builder
and buildType
identify that the attestation has been generated
via the SLSA GitHub Generator action, using the
v1.4.0
version of the generator.
$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .predicate.buildType
"https://github.com/slsa-framework/slsa-github-generator/generic@v1"
$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .predicate.builder
{
"id": "https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.4.0"
}
This is a reusable workflow, that cannot be modified by the owner of the repository generating the attestation. This means, I am not able to control both the binary and the attestation at the same time. Keep in mind this property, as it will be useful later.
The invocation
field is quite long, so I’ll only filter to some relevant
parts of it:
$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .predicate.invocation
{
"configSource": {
"uri": "git+https://github.com/mihaimaruseac/hindent@refs/tags/v6.0.0",
"digest": {
"sha1": "a9c2898b9ac8f13e57092e3d5287e05b942d4539"
},
"entryPoint": ".github/workflows/release.yaml"
},
"environment": {
"github_event_name": "release",
"github_event_payload": {
"release": {
"author": {
"id": 323199,
"login": "mihaimaruseac",
},
"body": "## Major changes\r\n* The file parser ...",
"tag_name": "v6.0.0",
"target_commitish": "master",
},
"repository": {
"description": "Haskell pretty printer",
"git_url": "git://github.com/mihaimaruseac/hindent.git",
},
},
"github_ref": "refs/tags/v6.0.0",
"github_run_id": "4229863410",
"github_run_number": "2",
"github_sha1": "a9c2898b9ac8f13e57092e3d5287e05b942d4539"
}
}
The field attests to the following:
- the artifact has been created and signed in a GitHub actions triggered by a
release event, with the workflow defined in
.github/workflows/release.yaml
- the release has tag
v6.0.0
, created on what the master branch was at the time, tagging commita9c2898b9ac8f13e57092e3d5287e05b942d4539
- the author triggering the release workflow is myself and the provenance file
stores both the username in human-readable format but also the
id
field associated with it. This allows tracking to the author even if I were to now change my username. - the provenance file has been generated in the second run of the workflow. This documents the fact that the first run flaked and I retriggered the build.
- the text of the release is also present in the
body
field, so changing it might also invalidate the provenance
The metadata field is short but still provides one essential piece of information: the tarballs and the binary were not built in a way that can attest that re-building them would result in the exact same contents, byte-for-byte:
$ cat executables.intoto.jsonl | jq -r .payload | base64 -d | jq .predicate.metadata
{
"buildInvocationID": "4229863410-2",
"completeness": {
"parameters": true,
"environment": false,
"materials": false
},
"reproducible": false
}
As we see, the provenance documents link the artifacts to the way they were produced, providing enough information to track back the entire provenance and determining properties of the entire build process.
Why are the SLSA attestations useful? 🔗
Having this provenance in a human readable format allows for humans to look at the provenance and take policy decisions based on the amount of trust and other security policies existing on the user side. While likely not the case here, you can imagine a very security conscious company saying that they only allow running software that has been build by a certain set of users, in a reproducible way, etc.
Even a normal user could take the binary or the source tarball from the release page, compute its checksum and check manually that it matches with the digest specified in the provenance file. This is in general useful for the binary, but not so much for the tarballs.
Although you could download the source tarball from the Hackage
page and get exactly the same hash (since I uploaded the tarballs
generated by the release action), this is not something that currently happens
in Haskell. In general when one runs cabal install
or cabal unpack
, the
tarball is downloaded directly for Hackage via HTTPS. Attacking the entire
Hackage infrastructure to change a single project is not feasible. Interested
persons can run cabal unpack
to download the source code, unpacked and then
compare it with the state of the repository at the commit tag. So, an
immediate question arises: maybe there is no need for a provenance file for
the tarballs?
That could have been true if the provenance only listed the checksum of the tarball and nothing else. The presence of all other information allows one to look at all other relevant aspects of the software supply chain.
Furthermore, all of this is supposed to be verified via automation. The JSON documents are both human readable but also machine readable. So CISOs, etc. could create security policies that look at specific fields in the provenance and decide if the software is allowed to be used within a company or not. But even casual users can benefit from this.
Enter slsa-verifier
. It is a Go program that allows
verifying the provenance associated with a (set of) path(s). For our case,
after cloning the repository, building, and installing the code we can run the
following commands:
$ slsa-verifier verify-artifact --provenance-path executables.intoto.jsonl --source-uri github.com/mihaimaruseac/hindent hindent
Verified signature against tlog entry index 13814736 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a649f9ebb0ca1c39cbd59e3a7db1529c74720f2cfa10a21fdfee90137fafbe5bc
Verified build using builder https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.4.0 at commit a9c2898b9ac8f13e57092e3d5287e05b942d4539
Verifying artifact hindent: PASSED
PASSED: Verified SLSA provenance
$ slsa-verifier verify-artifact --provenance-path source-distribution.intoto.jsonl --source-uri github.com/mihaimaruseac/hindent hindent*.tar.gz
Verified signature against tlog entry index 13814544 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a52f1ea2f8ba724b794edc1f0744832f91ea9660ae052059d718dabdc04657d8e
Verified build using builder https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.4.0 at commit a9c2898b9ac8f13e57092e3d5287e05b942d4539
Verifying artifact hindent-6.0.0-docs.tar.gz: PASSED
Verified signature against tlog entry index 13814544 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a52f1ea2f8ba724b794edc1f0744832f91ea9660ae052059d718dabdc04657d8e
Verified build using builder https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.4.0 at commit a9c2898b9ac8f13e57092e3d5287e05b942d4539
Verifying artifact hindent-6.0.0.tar.gz: PASSED
PASSED: Verified SLSA provenance
We have verified both the binary and the tarballs contained in the release page! We can do the same for the tarball downloaded from Hackage, if we want to make sure that that is exactly the one that got uploaded:
$ slsa-verifier verify-artifact --provenance-path source-distribution.intoto.jsonl --source-uri github.com/mihaimaruseac/hindent hindent-6.0.0-hackage.tar.gz
Verified signature against tlog entry index 13814544 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a52f1ea2f8ba724b794edc1f0744832f91ea9660ae052059d718dabdc04657d8e
Verified build using builder https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.4.0 at commit a9c2898b9ac8f13e57092e3d5287e05b942d4539
Verifying artifact hindent-6.0.0-hackage.tar.gz: PASSED
PASSED: Verified SLSA provenance
This could be incorporated into automation anywhere. While, as mentioned
above, for the source tarball there is not much uplift, for the binary there
is some uplift: This is useful when hindent
will become available as a
GitHub action that can be run on another repository’s presubmit without
needing to first compile the code again, as mentioned earlier in the article.
If there was no way to verify the provenance of the binary in an automated
way, someone having access to the hindent
repository could have replaced the
binary at any moment in time and thus take control over all GitHub Actions
workflows that use the binary!
To see how this could be prevented, consider the case of this test
release which I have altered some weeks after the release –
during this time, the SLSA provenance changed slightly, so the slsa-verifier
invocation below needs a change to source-uri
flag. Downloading all
artifacts and the provenance files and running slsa-verifier
results in the
following:
$ slsa-verifier verify-artifact --provenance-path built-with-stack-attestation.intoto.jsonl --source-uri git+https://github.com/mihaimaruseac/slsa-lvl3-generic-provenance-in-haskell-example built-with-stack
Verified signature against tlog entry index 9712595 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a5b9492fe82102c6fa4d5805e5fc35b6f15d0f424ab307f488efd1aecfcc893de
Verifying artifact built-with-stack: FAILED: expected hash '09129e587b40e52db87e1df1bc75989a3f272df290d49b7349fcd4ea6afacb0f' not found: artifact hash does not match provenance subject
FAILED: SLSA verification failed: expected hash '09129e587b40e52db87e1df1bc75989a3f272df290d49b7349fcd4ea6afacb0f' not found: artifact hash does not match provenance subject
exit status 1
Trying to be sneaky failed here. When I changed the built-with-stack
binary
on the release page (by deleting the existing one and uploading
built-with-cabal
under the same name), I could not regenerate the provenance
file. This is by design and it is what guarantees that the provenance is
unfalsifiable.
How does this work? 🔗
There was one field of the provenance that we have not looked at yet. This is
the signatures
field of the DSSE envelope:
$ cat source-distribution.intoto.jsonl | jq .signatures
[
{"keyid": "",
"sig": "MEYCIQDfY8cD3xVVcCcbJ/TIuSEEcvRE0pr8jKqLZ/kpLaZuUgIhAPkAQJWbFKqZZb/WkCgSaUyKx97RicWBZ4ZDYN4mdxIT",
"cert": "-----BEGIN CERTIFICATE-----\nMIIDvTCCA0OgAwIBAgIUOb8+vATMILKl0HfomkAFriOjp7swCgYIKoZIzj0EAwMw\nNzEVMBMGA1UEChMMc2lnc3RvcmUuZGV2MR4wHAYDVQQDExVzaWdzdG9yZS1pbnRl\ncm1lZGlhdGUwHhcNMjMwMjIxMDY0MzQ4WhcNMjMwMjIxMDY1MzQ4WjAAMFkwEwYH\nKoZIzj0CAQYIKoZIzj0DAQcDQgAE8KXd+1JbMe24zB95FUChMr+FxhsAM4h5WBUk\nMZORdto2rMT9dO90N+clEh2iTUW9LAVDH7xMIspe/yQQKGCi5qOCAmIwggJeMA4G\nA1UdDwEB/wQEAwIHgDATBgNVHSUEDDAKBggrBgEFBQcDAzAdBgNVHQ4EFgQU2WGb\n1fCHuiMBri6rqXPU6Sgw0wswHwYDVR0jBBgwFoAU39Ppz1YkEZb5qNjpKFWixi4Y\nZD8wgYQGA1UdEQEB/wR6MHiGdmh0dHBzOi8vZ2l0aHViLmNvbS9zbHNhLWZyYW1l\nd29yay9zbHNhLWdpdGh1Yi1nZW5lcmF0b3IvLmdpdGh1Yi93b3JrZmxvd3MvZ2Vu\nZXJhdG9yX2dlbmVyaWNfc2xzYTMueW1sQHJlZnMvdGFncy92MS40LjAwOQYKKwYB\nBAGDvzABAQQraHR0cHM6Ly90b2tlbi5hY3Rpb25zLmdpdGh1YnVzZXJjb250ZW50\nLmNvbTAVBgorBgEEAYO/MAECBAdyZWxlYXNlMDYGCisGAQQBg78wAQMEKGE5YzI4\nOThiOWFjOGYxM2U1NzA5MmUzZDUyODdlMDViOTQyZDQ1MzkwFQYKKwYBBAGDvzAB\nBAQHUmVsZWFzZTAjBgorBgEEAYO/MAEFBBVtaWhhaW1hcnVzZWFjL2hpbmRlbnQw\nHgYKKwYBBAGDvzABBgQQcmVmcy90YWdzL3Y2LjAuMDCBiQYKKwYBBAHWeQIEAgR7\nBHkAdwB1AN09MGrGxxEyYxkeHJlnNwKiSl643jyt/4eKcoAvKe6OAAABhnK2zZYA\nAAQDAEYwRAIgP+rdz8fCEUnPD/LuSzKYcCPpIP8m52WDJGrIU0sV7u0CIH3DOXyl\nfxDrWgge0FjB3Tc6wO3Ttfe6S8RHPh8AnzyVMAoGCCqGSM49BAMDA2gAMGUCMQDF\nPDjXGxOZlOWatkjXhnFWScQCeOHTI5DeF076KYBNrrfDjsC+7nNwGtdlbq/cLPQC\nMHCVeWACAolUxnB/SEmwTvptptHnmZoqojF3cgy5hlbyVm8kaK8UH03JSP9eVnAK\nPQ==\n-----END CERTIFICATE-----\n"
}
]
You also see a mention of signature verification in the slsa-verifier
commands above, for example:
Verified signature against tlog entry index 13814544 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77a52f1ea2f8ba724b794edc1f0744832f91ea9660ae052059d718dabdc04657d8e
All these attestations are signed using a short-lived certificate generated by Sigstore. These ephemeral certificates are stored in a transaction log so other projects and people can observe when new get added, how long they are valid for, etc. This process is very similar to how Let’s Encrypt handle generating SSL certificates for most HTTPS sites in the world, including this one.
I am planning to go into more details about Sigstore and the entire security concerns regarding the software supply chain in future articles. Stay tuned.
Comments:
There are 0 comments (add more):