|
| 1 | +* **RFC:** 2 |
| 2 | +* **Title:** Binary Distribution Format |
| 3 | +* **Slug:** `binary-distribution-format` |
| 4 | +* **Start Date:** 2024-06-18 |
| 5 | +* **Status:** Proposed Standard |
| 6 | +* **Pull Request:** [pgxn/rfcs#2](https://github.com/pgxn/rfcs/pull/2) |
| 7 | +* **Implementation Issue:** TBD |
| 8 | + |
| 9 | +# RFC--2 --- Binary Distribution Format |
| 10 | + |
| 11 | +## Abstract |
| 12 | + |
| 13 | +This RFC specifies the binary distribution format for [PGXN] packages, also |
| 14 | +called the trunk format.[^wheel] A trunk is a ZIP-format archive with a |
| 15 | +specially formatted file name and the `.trunk` extension. It contains a single |
| 16 | +distribution nearly as it would be installed by [PGXS]. Although a specialized |
| 17 | +installer is recommended, a trunk file may be installed by simply copying |
| 18 | +directories of files to destinations defined by [pg_config]. |
| 19 | + |
| 20 | +## Introduction |
| 21 | + |
| 22 | +Currently [PGXN] distributes only source code packages. Users wishing to |
| 23 | +install and use PGXN distributions must install build tools, including `make`, |
| 24 | +a compiler, and PostgreSQL development packages; then download, compile, and |
| 25 | +install the distribution. Many users do not have the expertise to follow these |
| 26 | +steps. Those wishing to use extension in a production environment may not wish |
| 27 | +to include a compiler and tooling, let alone perform compilation, on a |
| 28 | +production host, and so must find an appropriate binary package or else create |
| 29 | +their own. |
| 30 | + |
| 31 | +The proposed binary distribution format, or "trunk", aims to provide |
| 32 | +pre-compiled PGXN distributions in a format that's straightforward to download |
| 33 | +and install in directories defined by [pg_config]. This format will serve as a |
| 34 | +building block for building comprehensive extension packaging for multiple |
| 35 | +versions of PostgreSQL, CPU architectures, and --- unlike other packaging |
| 36 | +systems, --- a diversity of operating systems, including Linux, macOS, various |
| 37 | +BSDs, and Windows. |
| 38 | + |
| 39 | +## Guide-level explanation |
| 40 | + |
| 41 | +TODO. |
| 42 | + |
| 43 | +<!-- |
| 44 | +Explain the proposal as if it already existed and you were teaching it to |
| 45 | +another extension developer. That generally means: |
| 46 | +
|
| 47 | +* Introducing new named concepts. |
| 48 | +* Explaining the feature largely in terms of examples. |
| 49 | +* Explaining how extension programmers should *think* about the feature, and |
| 50 | + how it should impact the way they use PGXN. It should explain the impact |
| 51 | + as concretely as possible. |
| 52 | +* If applicable, provide sample error messages, deprecation warnings, or |
| 53 | + migration guidance. |
| 54 | +* If applicable, describe the differences between teaching this to existing |
| 55 | + extension programmers and new extension programmers. |
| 56 | +* Discuss how this impacts the ability to develop, build and distribute |
| 57 | + extension packages. |
| 58 | + --> |
| 59 | + |
| 60 | +## File Format |
| 61 | + |
| 62 | +### File name convention |
| 63 | + |
| 64 | +The trunk filename is: |
| 65 | + |
| 66 | +``` |
| 67 | +{package}-{version}+{pg}-{platform}.trunk |
| 68 | +``` |
| 69 | + |
| 70 | +Definition of variables: |
| 71 | + |
| 72 | +* `package`: Package name, e.g. `pgmq`, `postgis`, `pgAdmin`, `pg_top`. |
| 73 | +* `version`: Distribution version in [SemVer] format without build metadata, |
| 74 | + e.g., `0.8.6` or `1.0.0-beta`. |
| 75 | +* `pg`: Major version of Postgres the binary was built against, e.g., |
| 76 | + `pg15`, `pg16`. |
| 77 | +* `platform`: The platform the binary was built for. Will be made up of one |
| 78 | + to three hyphen-delimited[^hyphen] values for the OS, version |
| 79 | + information[^PEPs], and CPU architecture. Examples: `any`, |
| 80 | + `gnulinux-amd64`, `darwin-23.5.0.arm64`, `musllinux-1.2-amd64v3`. The |
| 81 | + allowed values will be defined one or more separate RFCs. |
| 82 | + |
| 83 | +#### Examples: |
| 84 | + |
| 85 | +* `pgtap-1.0.1+pg15-any.trunk` packages `pgtap` version 1.9.1, compatible |
| 86 | + with Postgres 15 (any minor release) on any platform. |
| 87 | +* `pair-0.32.1+pg16-gnulinux-amd64.trunk` packages `pair` version 0.32.1, |
| 88 | + compatible with Postgres 16 (any minor release) on GNU libc-based Linux |
| 89 | + for amd64 CPUs. |
| 90 | +* `pair-0.32.1+pg16-darwin-23.5.0-arm64.trunk` packages `pair` version |
| 91 | + 0.32.1, compatible with Postgres 16 (any minor release) on Darwin version |
| 92 | + 23.5.0 (macOS) for arm64 CPUs. |
| 93 | + |
| 94 | +#### Escaping and Unicode |
| 95 | + |
| 96 | +The `+` in the file name indicates the division between the package name and |
| 97 | +version and the package metadata. The package name and version must not |
| 98 | +include a `+`. This allows the file name, without the `.trunk` extension, to |
| 99 | +also function as a valid [SemVer]. |
| 100 | + |
| 101 | +Tools producing trunks should verify that the filename components do not |
| 102 | +contain `+`, as the resulting file may not be processed correctly if it does. |
| 103 | + |
| 104 | +The package name should be lowercase. |
| 105 | + |
| 106 | +The file name components should all be UTF-8. |
| 107 | + |
| 108 | +The filenames *inside* the archive are encoded as UTF-8. Although some ZIP |
| 109 | +clients in common use do not properly display UTF-8 filenames, the encoding is |
| 110 | +supported by the ZIP specification. |
| 111 | + |
| 112 | +#### Parsing |
| 113 | + |
| 114 | +Parsing of the file name takes place in four parts: |
| 115 | + |
| 116 | +1. For the file name, remove the `.trunk` extension. If working with the |
| 117 | + directory name (prefix) extracted from the archive, there will be no |
| 118 | + `.trunk` extension. |
| 119 | + |
| 120 | +2. Split the name into two parts at the `+` sign. The left part is the |
| 121 | + package name and [SemVer]. The right part is the platform specification. |
| 122 | + |
| 123 | +3. For the left part, split on the right-most dash. If the string to the |
| 124 | + right of the dash is a valid [SemVer], then the left part is the package |
| 125 | + name. If the right string is not a valid [SemVer], try again at the second |
| 126 | + right-most dash and check again. Continue until a valid SemVer is produced |
| 127 | + or else fail. |
| 128 | + |
| 129 | +4. Split the right string on dashes. There will be between two and four |
| 130 | + values as follows: |
| 131 | + |
| 132 | + * Two: the postgres version (`pg16`) and `any`. |
| 133 | + * Three: the postgres version (`pg16`), the OS (`gnulinux`, `darwin`, |
| 134 | + etc.), and the architecture (`amd64`, `arm64`, etc.) |
| 135 | + * Four: the postgres version (`pg16`), the OS (`gnulinux`, `darwin`, |
| 136 | + etc.), the OS version (`23.5.0`) and the architecture (`amd64`, |
| 137 | + `arm64`, etc.) |
| 138 | + |
| 139 | +##### Examples: |
| 140 | + |
| 141 | +* `pgtap-1.0.1+pg15-any` |
| 142 | + * Package: `pgtap` |
| 143 | + * Version: `1.0.1` |
| 144 | + * Postgres: `pg15` |
| 145 | + * Platform: `any` |
| 146 | +* `pair-0.32.1-beta1+pg16-gnulinux-amd64` |
| 147 | + * Package: `pair` |
| 148 | + * Version: `0.32.1-beta1` |
| 149 | + * Postgres: `pg16` |
| 150 | + * OS: `gnulinux` |
| 151 | + * Architecture: `amd64` |
| 152 | +* `pair-0.32.1+pg16-darwin-23.5.0-arm64` |
| 153 | + * Package: `pair` |
| 154 | + * Version: `0.32.1` |
| 155 | + * Postgres: `pg16` |
| 156 | + * OS: `darin` |
| 157 | + * OS Version: `23.5.0` |
| 158 | + * Architecture: `arm64` |
| 159 | + |
| 160 | +### File contents |
| 161 | + |
| 162 | +The contents of a trunk file should unpack into a directory with the same name |
| 163 | +as the file, but without the `.trunk` extension. The contents of the directory |
| 164 | +are: |
| 165 | + |
| 166 | +* `trunk.json` contains metadata necessary to install the extension. The |
| 167 | + format wil be subject to a future RFC, but at a minimum will include the |
| 168 | + trunk format version, package version, dependencies, license, language and |
| 169 | + runtime (e.g., libc implementation and version), platform metadata, and |
| 170 | + Postgres version and build configuration. Trunk installers should warn if |
| 171 | + the trunk version is greater than the version it supports, and must fail |
| 172 | + if the Trunk version has a greater major version than the version it |
| 173 | + supports. |
| 174 | + |
| 175 | +* `digests` contains a list of (almost) all the files in the trunk and their |
| 176 | + secure hashes. Each line lists a single file and its checksum in the [BSD |
| 177 | + digest format]: `{algorithm} ({filename}) = {checksum}`. Every file except |
| 178 | + `digests` --- which cannot contain a hash of itself --- must be listed in |
| 179 | + this file. The cryptographic hash algorithm must be [SHA-256] or better; |
| 180 | + specifically, MD5 and SHA-1 are not permitted, as signed trunk files rely |
| 181 | + on the strong hashes in `digests` to validate the integrity of the |
| 182 | + archive. |
| 183 | + |
| 184 | +* The `pgsql` directory contains one or more subdirectories named for |
| 185 | + `pg_config` directory configurations: `bin`, `doc`, `html`, `include`, |
| 186 | + `pkginclude`, `lib`, `pkglib`, `locale`, `man`, `share`, and `sysconf`. |
| 187 | + Each contains the files to be installed in the corresponding `pg_config` |
| 188 | + directory. |
| 189 | + |
| 190 | +* Dynamic language scripts must appear in `pgsql/bin` and begin with exactly |
| 191 | + `'#!{cmd}`, where `cmd` is the name of the interpreter, in order to enjoy |
| 192 | + script wrapper generation and shebang rewriting at install time. They must |
| 193 | + have no extension. The list of supported interpreters will depend on the |
| 194 | + features of the installer, but one can reasonably expect support for |
| 195 | + [Perl], [Python], and [Ruby]. If no appropriate instance of the given |
| 196 | + interpreter is present, the installer may abort the installation. |
| 197 | + |
| 198 | +* `README`, `LICENSE`, and `CHANGELOG` may optionally be in the directory. |
| 199 | + Each must be plain text or Markdown-formatted. In the latter case, they |
| 200 | + may use the extension `.md`. |
| 201 | + |
| 202 | +* Trunk, being an installation format intended to install pre-compiled |
| 203 | + binaries and supporting files, does not include a `Makefile`, `configure` |
| 204 | + file or any other tool for building the package contents. |
| 205 | + |
| 206 | +During extraction, trunk installers verify all the hashes in `digests` against |
| 207 | +the file contents. Apart from `digests` and its signatures, installation will |
| 208 | +fail if any file in the archive is not both mentioned and correctly hashed in |
| 209 | +`digests`. |
| 210 | + |
| 211 | +## Details |
| 212 | + |
| 213 | +### Installing a Trunk |
| 214 | + |
| 215 | +The following descriptions will use a trunk file named |
| 216 | +`pair-0.32.1+pg16-gnulinux-amd64.trunk`. Trunk installation notionally |
| 217 | +consists of two phases: |
| 218 | + |
| 219 | +1. Unpack |
| 220 | + * Validate digests. Ensure that every file, aside from `digests` itself, |
| 221 | + is listed in `digest` along with it valid hash digest. If any file is |
| 222 | + missing or has an invalid digest, installation should fail. If a file |
| 223 | + listed in `digests` is not present, installation should fail. |
| 224 | + * Parse the `trunk.json` file. Check that the distribution is compatible |
| 225 | + with: |
| 226 | + * The trunk format version |
| 227 | + * The platform (OS, OS version, and architecture); `any` is allowed |
| 228 | + for any platform |
| 229 | + * The PostgreSQL version |
| 230 | +2. Install |
| 231 | + * If applicable, update scripts starting with `#!{cmd}` to point to the |
| 232 | + correct interpreter. Fail if no such interpreter is present. |
| 233 | + * Iterate over each subdirectory of the `pgsql` directory. |
| 234 | + * If the directory corresponds to a directory configuration from |
| 235 | + [pg_config], install its contents in that target directory. |
| 236 | + |
| 237 | +## Drawbacks |
| 238 | + |
| 239 | +Many PostgreSQL extensions and applications are already distributed via |
| 240 | +well-tested and -maintained packaging systems, including the community [Yum] |
| 241 | +and [Apt] repositories. |
| 242 | + |
| 243 | +However, these systems serve a limited number of OSes; macOS and Windows, |
| 244 | +while served by their own packaging systems ([Homebrew] and [Chocolatey], |
| 245 | +among others), have access to fewer packages and are less integrated into |
| 246 | +community package distribution. |
| 247 | + |
| 248 | +[PGXN] aims to be the canonical repository for all publicly-available |
| 249 | +extensions, and to provide as many of them as possible in the same binary |
| 250 | +format to a variety of OSes. The trunk format is a key component for realizing |
| 251 | +that vision. |
| 252 | + |
| 253 | +## Rationale and alternatives |
| 254 | + |
| 255 | +This design is ideally suited to PostgreSQL extensions because it's built |
| 256 | +around the installation and configuration options provided by [pg_config]. |
| 257 | +This short list of directories into which to install appropriate distribution |
| 258 | +files is universal across OSes, and therefore suitable for distributing |
| 259 | +binaries for, ultimately, every OS supported by PostgreSQL itself. |
| 260 | + |
| 261 | +The alternatives available today include: |
| 262 | + |
| 263 | +* The community [Yum] and [Apt] repositories, which serve only Linux |
| 264 | + systems and require separate packages tied to the file layouts of those |
| 265 | + systems. The trunk format is OS-agnostic and provides files for any Linux |
| 266 | + distribution, regardless of the location of the PostgreSQL |
| 267 | + installation(s) on the file system. |
| 268 | +* [PGXMan] supports only Debian and Ubuntu Linux systems, and being |
| 269 | + downstream of the community [Apt] packages, is also dependent on its file |
| 270 | + layouts. Plans for macOS support have been promised, but the project |
| 271 | + has seen [little activity] in 2024. |
| 272 | +* [Trunk] inspired the design documented here, and from which it takes its |
| 273 | + name. That format is limited to a few file types, and lacks support for |
| 274 | + multiple OSes and architectures. This RFC may be considered an evolution |
| 275 | + of that format. |
| 276 | +* [StackBuilder] has little visibility or penetration beyond [EDB] Windows |
| 277 | + customers. I am unable to find a public list of available extensions or a |
| 278 | + description of the packaging format or how to contribute to it. |
| 279 | + |
| 280 | +Without the trunk binary distribution format, it will be difficult to build |
| 281 | +and deliver cross-platform binary distribution of all the packages on PGXN. |
| 282 | + |
| 283 | +## Prior art |
| 284 | + |
| 285 | +The design of the trunk binary distribution format is inspired by the original |
| 286 | +[Trunk] format, which demonstrated a pattern for distributing extensions |
| 287 | +agnostic of file locations. This design may be considered an evolution of the |
| 288 | +[Trunk] registry format. |
| 289 | + |
| 290 | +The design was also heavily inspired by the [Python wheel] format. Although |
| 291 | +locations for installable files in the trunk format relate directly to |
| 292 | +[pg_config] directories, most of the other aspects of the design were borrowed |
| 293 | +from wheel, including the `digests` file and the `trunk.json` metadata file. |
| 294 | + |
| 295 | +## Unresolved questions |
| 296 | + |
| 297 | +* Should the archive format be Zip or tarball? PGXN had traditionally used |
| 298 | + Zip, since it's supported everywhere, including Windows. So does the |
| 299 | + [Python Wheel] format. But many other packaging systems use tarballs, |
| 300 | + including [Homebrew] and [OCI]. The emerging idea to [distribute trunks |
| 301 | + via OCI registries] may favor tarballs. |
| 302 | +* The list of platforms to support and the strings to indicate them, |
| 303 | + including CPU alternatives, will be defined in a forthcoming RFC. |
| 304 | + |
| 305 | +## Future possibilities |
| 306 | + |
| 307 | +Some other ideas for the format, in either the short or long term: |
| 308 | + |
| 309 | +* Adopt the [Python wheel signing pattern] |
| 310 | +* Include an [SPDX SBOM](https://spdx.dev)? |
| 311 | +* Allow non-postgres libraries to be included, such as OS dependencies, |
| 312 | + either in the appropriate `pgsql` subdirectory or perhaps in a separate |
| 313 | + `sys` directory |
| 314 | + |
| 315 | +## References |
| 316 | + |
| 317 | +* [Python Binary distribution format][Python wheel] |
| 318 | +* [trunk POC] |
| 319 | +* [Previous discussion] |
| 320 | + |
| 321 | + [^wheel]: With much inspiration and from and gratitude to the [Python wheel] |
| 322 | + format. |
| 323 | + [^hyphen]: Why hyphens? They allow the entire file name, between the package |
| 324 | + name and the `.trunk` extension, to be a valid [SemVer]. |
| 325 | + [^PEPs]: See for example [PEP 600] defining Python wheel tags for different |
| 326 | + versions of GNU libc and [PEP 656] defining tags for different versions of |
| 327 | + musl libc. See also how [Homebrew] uses [macOS version names] in file |
| 328 | + names for its packages. |
| 329 | + |
| 330 | + [PGXN]: https://pgxn.org "PostgreSQL Extension Network" |
| 331 | + [PGXS]: https://www.postgresql.org/docs/current/extend-pgxs.html |
| 332 | + "PostgreSQL Docs: Extension Building Infrastructure" |
| 333 | + [pg_config]: https://www.postgresql.org/docs/current/app-pgconfig.html |
| 334 | + "PostgreSQL Docs: pg_config" |
| 335 | + [Python wheel]: https://packaging.python.org/en/latest/specifications/binary-distribution-format/ |
| 336 | + [SemVer]: https://semver.org "Semantic Versioning 2.0.0" |
| 337 | + [PEP 600]: https://peps.python.org/pep-0600/ |
| 338 | + "PEP 600 – Future ‘manylinux’ Platform Tags for Portable Linux Built Distributions" |
| 339 | + [PEP 656]: https://peps.python.org/pep-0656/ |
| 340 | + "PEP 656 – Platform Tag for Linux Distributions Using Musl" |
| 341 | + [Homebrew]: https://brew.sh "Homebrew: The Missing Package Manager for macOS (or Linux)" |
| 342 | + [macOS version names]: https://github.com/oras-project/oras/issues/237#issuecomment-815250008 |
| 343 | + "oras-project/oras#237 Comment from sjackman" |
| 344 | + [BSD digest format]: https://stackoverflow.com/q/1299833/79202 |
| 345 | + [SHA-256]: https://en.wikipedia.org/wiki/SHA-2 "Wikipedia: SHA-2" |
| 346 | + [Perl]: https://perl.org "The Perl Programming Language" |
| 347 | + [Python]: https://python.org "The Python Programming Language" |
| 348 | + [Ruby]: https://ruby-lang.org/en/ "The Ruby Programming Language" |
| 349 | + [Yum]: https://yum.postgresql.org "PostgreSQL Yum Repository" |
| 350 | + [Apt]: https://wiki.postgresql.org/wiki/Apt "PostgreSQL packages for Debian and Ubuntu" |
| 351 | + [Homebrew]: https://brew.sh "The Missing Package Manager for macOS (or Linux)" |
| 352 | + [Chocolatey]: https://chocolatey.org "The Package Manager for Windows" |
| 353 | + [PGXMan]: https://pgxman.com "npm for PostgreSQL" |
| 354 | + [little activity]: https://github.com/pgxman/buildkit/commits/main/?since=2024-01-01&until=2024-07-11 |
| 355 | + [Trunk]: https://pgt.dev "Trunk is an open-source package installer and registry for PostgreSQL extensions" |
| 356 | + [StackBuilder]: https://www.enterprisedb.com/docs/supported-open-source/postgresql/installing/using_stackbuilder/ |
| 357 | + [EDB]: https://www.enterprisedb.com "Enterprise DB" |
| 358 | + [OCI]: https://github.com/opencontainers/image-spec/blob/main/media-types.md |
| 359 | + "OCI Image Media Types" |
| 360 | + [distribute trunks via OCI registries]: https://justatheory.com/2024/06/trunk-oci-poc/ |
| 361 | + "POC: Distributing Trunk Binaries via OCI" |
| 362 | + [Python wheel signing pattern]: https://packaging.python.org/en/latest/specifications/binary-distribution-format/#signed-wheel-files |
| 363 | + "Python Binary distribution format: Signed wheel files" |
| 364 | + [trunk POC]: https://gist.github.com/theory/7dc164e5772cae652d838a1c508972ae |
| 365 | + "trunk POC using PGXS, bash, tar, shasum, and jq" |
| 366 | + [Previous discussion]: https://github.com/orgs/pgxn/discussions/2 |
| 367 | + "Proposal: Binary Distribution Format" |
0 commit comments