Skip to content

Commit ccde190

Browse files
committed
RFC: Binary Distribution Format
Add a new RFC describing the proposed trunk binary distribution format for PGXN packages. Inspired by Python wheel and pgt.dev, aiming to support binaries for every OS and architecture supported by PostgreSQL itself, as well as many versions of PostgreSQL.
1 parent 13310e2 commit ccde190

File tree

1 file changed

+367
-0
lines changed

1 file changed

+367
-0
lines changed
+367
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,367 @@
1+
* **RFC:** 2
2+
* **Title:** Binary Distribution Format
3+
* **Slug:** `binary-distribution-format`
4+
* **Start Date:** 2024-06-18
5+
* **Status:** Proposed Standard
6+
* **Pull Request:** [pgxn/rfcs#2](https://github.com/pgxn/rfcs/pull/2)
7+
* **Implementation Issue:** TBD
8+
9+
# RFC--2 --- Binary Distribution Format
10+
11+
## Abstract
12+
13+
This RFC specifies the binary distribution format for [PGXN] packages, also
14+
called the trunk format.[^wheel] A trunk is a ZIP-format archive with a
15+
specially formatted file name and the `.trunk` extension. It contains a single
16+
distribution nearly as it would be installed by [PGXS]. Although a specialized
17+
installer is recommended, a trunk file may be installed by simply copying
18+
directories of files to destinations defined by [pg_config].
19+
20+
## Introduction
21+
22+
Currently [PGXN] distributes only source code packages. Users wishing to
23+
install and use PGXN distributions must install build tools, including `make`,
24+
a compiler, and PostgreSQL development packages; then download, compile, and
25+
install the distribution. Many users do not have the expertise to follow these
26+
steps. Those wishing to use extension in a production environment may not wish
27+
to include a compiler and tooling, let alone perform compilation, on a
28+
production host, and so must find an appropriate binary package or else create
29+
their own.
30+
31+
The proposed binary distribution format, or "trunk", aims to provide
32+
pre-compiled PGXN distributions in a format that's straightforward to download
33+
and install in directories defined by [pg_config]. This format will serve as a
34+
building block for building comprehensive extension packaging for multiple
35+
versions of PostgreSQL, CPU architectures, and --- unlike other packaging
36+
systems, --- a diversity of operating systems, including Linux, macOS, various
37+
BSDs, and Windows.
38+
39+
## Guide-level explanation
40+
41+
TODO.
42+
43+
<!--
44+
Explain the proposal as if it already existed and you were teaching it to
45+
another extension developer. That generally means:
46+
47+
* Introducing new named concepts.
48+
* Explaining the feature largely in terms of examples.
49+
* Explaining how extension programmers should *think* about the feature, and
50+
how it should impact the way they use PGXN. It should explain the impact
51+
as concretely as possible.
52+
* If applicable, provide sample error messages, deprecation warnings, or
53+
migration guidance.
54+
* If applicable, describe the differences between teaching this to existing
55+
extension programmers and new extension programmers.
56+
* Discuss how this impacts the ability to develop, build and distribute
57+
extension packages.
58+
-->
59+
60+
## File Format
61+
62+
### File name convention
63+
64+
The trunk filename is:
65+
66+
```
67+
{package}-{version}+{pg}-{platform}.trunk
68+
```
69+
70+
Definition of variables:
71+
72+
* `package`: Package name, e.g. `pgmq`, `postgis`, `pgAdmin`, `pg_top`.
73+
* `version`: Distribution version in [SemVer] format without build metadata,
74+
e.g., `0.8.6` or `1.0.0-beta`.
75+
* `pg`: Major version of Postgres the binary was built against, e.g.,
76+
`pg15`, `pg16`.
77+
* `platform`: The platform the binary was built for. Will be made up of one
78+
to three hyphen-delimited[^hyphen] values for the OS, version
79+
information[^PEPs], and CPU architecture. Examples: `any`,
80+
`gnulinux-amd64`, `darwin-23.5.0.arm64`, `musllinux-1.2-amd64v3`. The
81+
allowed values will be defined one or more separate RFCs.
82+
83+
#### Examples:
84+
85+
* `pgtap-1.0.1+pg15-any.trunk` packages `pgtap` version 1.9.1, compatible
86+
with Postgres 15 (any minor release) on any platform.
87+
* `pair-0.32.1+pg16-gnulinux-amd64.trunk` packages `pair` version 0.32.1,
88+
compatible with Postgres 16 (any minor release) on GNU libc-based Linux
89+
for amd64 CPUs.
90+
* `pair-0.32.1+pg16-darwin-23.5.0-arm64.trunk` packages `pair` version
91+
0.32.1, compatible with Postgres 16 (any minor release) on Darwin version
92+
23.5.0 (macOS) for arm64 CPUs.
93+
94+
#### Escaping and Unicode
95+
96+
The `+` in the file name indicates the division between the package name and
97+
version and the package metadata. The package name and version must not
98+
include a `+`. This allows the file name, without the `.trunk` extension, to
99+
also function as a valid [SemVer].
100+
101+
Tools producing trunks should verify that the filename components do not
102+
contain `+`, as the resulting file may not be processed correctly if it does.
103+
104+
The package name should be lowercase.
105+
106+
The file name components should all be UTF-8.
107+
108+
The filenames *inside* the archive are encoded as UTF-8. Although some ZIP
109+
clients in common use do not properly display UTF-8 filenames, the encoding is
110+
supported by the ZIP specification.
111+
112+
#### Parsing
113+
114+
Parsing of the file name takes place in four parts:
115+
116+
1. For the file name, remove the `.trunk` extension. If working with the
117+
directory name (prefix) extracted from the archive, there will be no
118+
`.trunk` extension.
119+
120+
2. Split the name into two parts at the `+` sign. The left part is the
121+
package name and [SemVer]. The right part is the platform specification.
122+
123+
3. For the left part, split on the right-most dash. If the string to the
124+
right of the dash is a valid [SemVer], then the left part is the package
125+
name. If the right string is not a valid [SemVer], try again at the second
126+
right-most dash and check again. Continue until a valid SemVer is produced
127+
or else fail.
128+
129+
4. Split the right string on dashes. There will be between two and four
130+
values as follows:
131+
132+
* Two: the postgres version (`pg16`) and `any`.
133+
* Three: the postgres version (`pg16`), the OS (`gnulinux`, `darwin`,
134+
etc.), and the architecture (`amd64`, `arm64`, etc.)
135+
* Four: the postgres version (`pg16`), the OS (`gnulinux`, `darwin`,
136+
etc.), the OS version (`23.5.0`) and the architecture (`amd64`,
137+
`arm64`, etc.)
138+
139+
##### Examples:
140+
141+
* `pgtap-1.0.1+pg15-any`
142+
* Package: `pgtap`
143+
* Version: `1.0.1`
144+
* Postgres: `pg15`
145+
* Platform: `any`
146+
* `pair-0.32.1-beta1+pg16-gnulinux-amd64`
147+
* Package: `pair`
148+
* Version: `0.32.1-beta1`
149+
* Postgres: `pg16`
150+
* OS: `gnulinux`
151+
* Architecture: `amd64`
152+
* `pair-0.32.1+pg16-darwin-23.5.0-arm64`
153+
* Package: `pair`
154+
* Version: `0.32.1`
155+
* Postgres: `pg16`
156+
* OS: `darin`
157+
* OS Version: `23.5.0`
158+
* Architecture: `arm64`
159+
160+
### File contents
161+
162+
The contents of a trunk file should unpack into a directory with the same name
163+
as the file, but without the `.trunk` extension. The contents of the directory
164+
are:
165+
166+
* `trunk.json` contains metadata necessary to install the extension. The
167+
format wil be subject to a future RFC, but at a minimum will include the
168+
trunk format version, package version, dependencies, license, language and
169+
runtime (e.g., libc implementation and version), platform metadata, and
170+
Postgres version and build configuration. Trunk installers should warn if
171+
the trunk version is greater than the version it supports, and must fail
172+
if the Trunk version has a greater major version than the version it
173+
supports.
174+
175+
* `digests` contains a list of (almost) all the files in the trunk and their
176+
secure hashes. Each line lists a single file and its checksum in the [BSD
177+
digest format]: `{algorithm} ({filename}) = {checksum}`. Every file except
178+
`digests` --- which cannot contain a hash of itself --- must be listed in
179+
this file. The cryptographic hash algorithm must be [SHA-256] or better;
180+
specifically, MD5 and SHA-1 are not permitted, as signed trunk files rely
181+
on the strong hashes in `digests` to validate the integrity of the
182+
archive.
183+
184+
* The `pgsql` directory contains one or more subdirectories named for
185+
`pg_config` directory configurations: `bin`, `doc`, `html`, `include`,
186+
`pkginclude`, `lib`, `pkglib`, `locale`, `man`, `share`, and `sysconf`.
187+
Each contains the files to be installed in the corresponding `pg_config`
188+
directory.
189+
190+
* Dynamic language scripts must appear in `pgsql/bin` and begin with exactly
191+
`'#!{cmd}`, where `cmd` is the name of the interpreter, in order to enjoy
192+
script wrapper generation and shebang rewriting at install time. They must
193+
have no extension. The list of supported interpreters will depend on the
194+
features of the installer, but one can reasonably expect support for
195+
[Perl], [Python], and [Ruby]. If no appropriate instance of the given
196+
interpreter is present, the installer may abort the installation.
197+
198+
* `README`, `LICENSE`, and `CHANGELOG` may optionally be in the directory.
199+
Each must be plain text or Markdown-formatted. In the latter case, they
200+
may use the extension `.md`.
201+
202+
* Trunk, being an installation format intended to install pre-compiled
203+
binaries and supporting files, does not include a `Makefile`, `configure`
204+
file or any other tool for building the package contents.
205+
206+
During extraction, trunk installers verify all the hashes in `digests` against
207+
the file contents. Apart from `digests` and its signatures, installation will
208+
fail if any file in the archive is not both mentioned and correctly hashed in
209+
`digests`.
210+
211+
## Details
212+
213+
### Installing a Trunk
214+
215+
The following descriptions will use a trunk file named
216+
`pair-0.32.1+pg16-gnulinux-amd64.trunk`. Trunk installation notionally
217+
consists of two phases:
218+
219+
1. Unpack
220+
* Validate digests. Ensure that every file, aside from `digests` itself,
221+
is listed in `digest` along with it valid hash digest. If any file is
222+
missing or has an invalid digest, installation should fail. If a file
223+
listed in `digests` is not present, installation should fail.
224+
* Parse the `trunk.json` file. Check that the distribution is compatible
225+
with:
226+
* The trunk format version
227+
* The platform (OS, OS version, and architecture); `any` is allowed
228+
for any platform
229+
* The PostgreSQL version
230+
2. Install
231+
* If applicable, update scripts starting with `#!{cmd}` to point to the
232+
correct interpreter. Fail if no such interpreter is present.
233+
* Iterate over each subdirectory of the `pgsql` directory.
234+
* If the directory corresponds to a directory configuration from
235+
[pg_config], install its contents in that target directory.
236+
237+
## Drawbacks
238+
239+
Many PostgreSQL extensions and applications are already distributed via
240+
well-tested and -maintained packaging systems, including the community [Yum]
241+
and [Apt] repositories.
242+
243+
However, these systems serve a limited number of OSes; macOS and Windows,
244+
while served by their own packaging systems ([Homebrew] and [Chocolatey],
245+
among others), have access to fewer packages and are less integrated into
246+
community package distribution.
247+
248+
[PGXN] aims to be the canonical repository for all publicly-available
249+
extensions, and to provide as many of them as possible in the same binary
250+
format to a variety of OSes. The trunk format is a key component for realizing
251+
that vision.
252+
253+
## Rationale and alternatives
254+
255+
This design is ideally suited to PostgreSQL extensions because it's built
256+
around the installation and configuration options provided by [pg_config].
257+
This short list of directories into which to install appropriate distribution
258+
files is universal across OSes, and therefore suitable for distributing
259+
binaries for, ultimately, every OS supported by PostgreSQL itself.
260+
261+
The alternatives available today include:
262+
263+
* The community [Yum] and [Apt] repositories, which serve only Linux
264+
systems and require separate packages tied to the file layouts of those
265+
systems. The trunk format is OS-agnostic and provides files for any Linux
266+
distribution, regardless of the location of the PostgreSQL
267+
installation(s) on the file system.
268+
* [PGXMan] supports only Debian and Ubuntu Linux systems, and being
269+
downstream of the community [Apt] packages, is also dependent on its file
270+
layouts. Plans for macOS support have been promised, but the project
271+
has seen [little activity] in 2024.
272+
* [Trunk] inspired the design documented here, and from which it takes its
273+
name. That format is limited to a few file types, and lacks support for
274+
multiple OSes and architectures. This RFC may be considered an evolution
275+
of that format.
276+
* [StackBuilder] has little visibility or penetration beyond [EDB] Windows
277+
customers. I am unable to find a public list of available extensions or a
278+
description of the packaging format or how to contribute to it.
279+
280+
Without the trunk binary distribution format, it will be difficult to build
281+
and deliver cross-platform binary distribution of all the packages on PGXN.
282+
283+
## Prior art
284+
285+
The design of the trunk binary distribution format is inspired by the original
286+
[Trunk] format, which demonstrated a pattern for distributing extensions
287+
agnostic of file locations. This design may be considered an evolution of the
288+
[Trunk] registry format.
289+
290+
The design was also heavily inspired by the [Python wheel] format. Although
291+
locations for installable files in the trunk format relate directly to
292+
[pg_config] directories, most of the other aspects of the design were borrowed
293+
from wheel, including the `digests` file and the `trunk.json` metadata file.
294+
295+
## Unresolved questions
296+
297+
* Should the archive format be Zip or tarball? PGXN had traditionally used
298+
Zip, since it's supported everywhere, including Windows. So does the
299+
[Python Wheel] format. But many other packaging systems use tarballs,
300+
including [Homebrew] and [OCI]. The emerging idea to [distribute trunks
301+
via OCI registries] may favor tarballs.
302+
* The list of platforms to support and the strings to indicate them,
303+
including CPU alternatives, will be defined in a forthcoming RFC.
304+
305+
## Future possibilities
306+
307+
Some other ideas for the format, in either the short or long term:
308+
309+
* Adopt the [Python wheel signing pattern]
310+
* Include an [SPDX SBOM](https://spdx.dev)?
311+
* Allow non-postgres libraries to be included, such as OS dependencies,
312+
either in the appropriate `pgsql` subdirectory or perhaps in a separate
313+
`sys` directory
314+
315+
## References
316+
317+
* [Python Binary distribution format][Python wheel]
318+
* [trunk POC]
319+
* [Previous discussion]
320+
321+
[^wheel]: With much inspiration and from and gratitude to the [Python wheel]
322+
format.
323+
[^hyphen]: Why hyphens? They allow the entire file name, between the package
324+
name and the `.trunk` extension, to be a valid [SemVer].
325+
[^PEPs]: See for example [PEP 600] defining Python wheel tags for different
326+
versions of GNU libc and [PEP 656] defining tags for different versions of
327+
musl libc. See also how [Homebrew] uses [macOS version names] in file
328+
names for its packages.
329+
330+
[PGXN]: https://pgxn.org "PostgreSQL Extension Network"
331+
[PGXS]: https://www.postgresql.org/docs/current/extend-pgxs.html
332+
"PostgreSQL Docs: Extension Building Infrastructure"
333+
[pg_config]: https://www.postgresql.org/docs/current/app-pgconfig.html
334+
"PostgreSQL Docs: pg_config"
335+
[Python wheel]: https://packaging.python.org/en/latest/specifications/binary-distribution-format/
336+
[SemVer]: https://semver.org "Semantic Versioning 2.0.0"
337+
[PEP 600]: https://peps.python.org/pep-0600/
338+
"PEP 600 – Future ‘manylinux’ Platform Tags for Portable Linux Built Distributions"
339+
[PEP 656]: https://peps.python.org/pep-0656/
340+
"PEP 656 – Platform Tag for Linux Distributions Using Musl"
341+
[Homebrew]: https://brew.sh "Homebrew: The Missing Package Manager for macOS (or Linux)"
342+
[macOS version names]: https://github.com/oras-project/oras/issues/237#issuecomment-815250008
343+
"oras-project/oras#237 Comment from sjackman"
344+
[BSD digest format]: https://stackoverflow.com/q/1299833/79202
345+
[SHA-256]: https://en.wikipedia.org/wiki/SHA-2 "Wikipedia: SHA-2"
346+
[Perl]: https://perl.org "The Perl Programming Language"
347+
[Python]: https://python.org "The Python Programming Language"
348+
[Ruby]: https://ruby-lang.org/en/ "The Ruby Programming Language"
349+
[Yum]: https://yum.postgresql.org "PostgreSQL Yum Repository"
350+
[Apt]: https://wiki.postgresql.org/wiki/Apt "PostgreSQL packages for Debian and Ubuntu"
351+
[Homebrew]: https://brew.sh "The Missing Package Manager for macOS (or Linux)"
352+
[Chocolatey]: https://chocolatey.org "The Package Manager for Windows"
353+
[PGXMan]: https://pgxman.com "npm for PostgreSQL"
354+
[little activity]: https://github.com/pgxman/buildkit/commits/main/?since=2024-01-01&until=2024-07-11
355+
[Trunk]: https://pgt.dev "Trunk is an open-source package installer and registry for PostgreSQL extensions"
356+
[StackBuilder]: https://www.enterprisedb.com/docs/supported-open-source/postgresql/installing/using_stackbuilder/
357+
[EDB]: https://www.enterprisedb.com "Enterprise DB"
358+
[OCI]: https://github.com/opencontainers/image-spec/blob/main/media-types.md
359+
"OCI Image Media Types"
360+
[distribute trunks via OCI registries]: https://justatheory.com/2024/06/trunk-oci-poc/
361+
"POC: Distributing Trunk Binaries via OCI"
362+
[Python wheel signing pattern]: https://packaging.python.org/en/latest/specifications/binary-distribution-format/#signed-wheel-files
363+
"Python Binary distribution format: Signed wheel files"
364+
[trunk POC]: https://gist.github.com/theory/7dc164e5772cae652d838a1c508972ae
365+
"trunk POC using PGXS, bash, tar, shasum, and jq"
366+
[Previous discussion]: https://github.com/orgs/pgxn/discussions/2
367+
"Proposal: Binary Distribution Format"

0 commit comments

Comments
 (0)