Supply chain attacks are a known issue, and also lately there was a discussion around the relevance of reproducible builds. Looking in comparison at an average IT org doing something with the internet, I believe the pressing problem is neither supply chain attacks nor a lack of reproducible builds. The real problem is the amount of prefabricated binaries supplied by someone else, created in an unknown build environment with unknown tools, the average IT org requires to do anything.

The Mess the World Runs on

By chance I had an opportunity to look at what some other people I know use, and here is the list I could compile by scratching just at the surface:

  • 80% of what HashiCorp releases. Vagrant, packer, nomad, terraform, just all of it. In the case of terraform of course with a bunch of providers and for Vagrant with machine images from the official registry.
  • Lots of ansible usecases, usually retrieved by pip.
  • Jenkins + a myriad of plugins from the Jenkins plugin registry.
  • All the tools/SDKs of a cloud provider du jour to interface with the Cloud. Mostly via 3rd party Debian repository.
  • docker (the repo for dockerd) and DockerHub
  • Mobile SDKs.
  • Kafka fetched somewhere from apache.org.
  • Binary downloads from github. Many. Go and Rust make it possible.
  • Elastic, more or less the whole stack they offer via their Debian repo.
  • Postgres + the tools around it from the apt.postgresql.org Debian repo.
  • archive.debian.org because it's hard to keep up at times.
  • Maven Central.

Of course there are also all the script language repos - Python, Ruby, Node/Typescript - around as well.

Looking at myself, who's working in a different IT org but with a similar focus, I have the following lingering around on my for work laptop and retrieved it as a binary from a 3rd party:

  • dockerd from the docker repo
  • vscode from the microsoft repo
  • vivaldi from the vivaldi repo
  • Google Cloud SDK from the google repo
  • terraform + all the providers from hashicorp
  • govc form github
  • containerdiff from github(yes, by now included in Debian main)
  • github gh cli tool from github
  • wtfutil from github

Yes some of that is even non-free and might contain spyw^telemetry.

Takeway I

By guessing based on Pareto Principle probably 80% of the software mentioned above is also open source software. But, and here we leave Pareto behind, close to none is build by the average IT org from source.

Why should the average IT org care about advanced issues like supply chain attacks on source code and mitigations, when it already gets into very hot water the day DockerHub closes down, HashiCorp moves from open core to full proprietary or Elastic decides to no longer offer free binary builds?

The reality out there seems to be that infrastructure of "modern" IT orgs is managed similar to the Windows 95 installation of my childhood. You just grab running binaries from somewhere and run them. The main difference seems to be that you no longer have the inconvenience of downloading a .xls from geocities you've to rename to .rar and that it's legal.

Takeway II

In the end the binary supply is like a drug for the user, and somehow the Debian project is also just another dealer / middle man in this setup. There are probably a lot of open questions to think about in that context.

Are we the better dealer because we care about signed sources we retrieve from upstream and because we engage in reproducible build projects?

Are our own means of distributing binaries any better than a binary download from github via https with a manual checksum verification, or the Debian repo at download.docker.com?

Is the approach of the BSD/Gentoo ports, where you have to compile at least some software from source, the better one?

Do I really want to know how some of the software is actually build?

Or some more candid ones like is gnutls a good choice for the https support in apt and how solid is the gnupg code base? Update: Regarding apt there seems to be some movement.