-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
/
Copy pathpnp-data.md
298 lines (164 loc) Β· 13.1 KB
/
pnp-data.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
---
category: advanced
path: /advanced/pnp-spec
title: "PnP Specification"
description: In-depth documentation of the PnP spec.
---
```toc
# This code block gets replaced with the Table of Contents
```
## About this document
To make interoperability easier for third-party projects, this document describes the specification we follow when installing files on disk under the [Plug'n'Play install strategy](/features/pnp). It also means:
- any change we make to this document will follow semver rules
- we'll do our best to preserve backward compatibility
- new features will be intended to gracefully degrade
## High-level idea
Plug'n'Play works by keeping in memory a table of all packages part of the dependency tree, in such a way that we can easily answer two different questions:
- Given a path, what package does it belong to?
- Given a package, where are the dependencies it can access?
Resolving a package import thus becomes a matter of interlacing those two operations:
- First, locate which package is requesting the resolution
- Then retrieve its dependencies, check if the requested package is amongst them
- If it is, then retrieve the dependency information, and return its location
Extra features can then be designed, but are optional. For example, Yarn leverages the information it knows about the project to throw semantic errors when a dependency cannot be resolved: since we know the state of the whole dependency tree, we also know why a package may be missing.
## Basic concepts
All packages are uniquely referenced by **locators**. A locator is a combination of a **package ident**, which includes its scope if relevant, and a **package reference**, which can be seen as a unique ID used to distinguish different instances (or versions) of a same package. The package references should be treated as an opaque value: it doesn't matter from a resolution algorithm perspective that they start with `workspace:`, `virtual:`, `npm:`, or any other protocol.
## Fallback
For improved compatibility with legacy codebases, Plug'n'Play supports a feature we call "fallback". The fallback triggers when a package makes a resolution request to a dependency it doesn't list in its dependencies. In normal circumstances the resolver would throw, but when the fallback is enabled the resolver should first try to find the dependency packages amongst the dependencies of a set of special packages. If it finds it, it then returns it transparently.
In a sense, the fallback can be seen as a limited and safer form of hoisting. While hoisting allows unconstrainted access through multiple levels of dependencies, the fallback requires to explicitly define a fallback package - usually the top-level one.
## Package locations
While the Plug'n'Play specification doesn't by itself require runtimes to support anything else than the regular filesystem when accessing package files, producers may rely on more complex data storage mechanisms. For instance, Yarn itself requires the two following extensions which we strongly recommend to support:
### Zip access
Files named `*.zip` must be treated as folders for the purpose of file access. For instance, `/foo/bar.zip/package.json` requires to access the `package.json` file located within the `/foo/bar.zip` zip archive.
If writing a JS tool, the [`@yarnpkg/fslib`](https://yarnpkg.com/package/@yarnpkg/fslib) package may be of assistance, providing a zip-aware filesystem layer called `ZipOpenFS`.
### Virtual folders
In order to properly represent packages listing peer dependencies, Yarn relies on a concept called [Virtual Packages](/advanced/lexicon#virtual-package). Their most notable property is that they all have different paths (so that Node.js instantiates them as many times as needed), while still being baked by the same concrete folder on disk.
This is done by adding path support for the following scheme:
```
/path/to/some/folder/__virtual__/<hash>/<n>/subpath/to/file.dat
```
When this pattern is found, the `__virtual__/<hash>/<n>` part must be removed, the `hash` ignored, and the `dirname` operation applied `n` times to the `/path/to/some/folder` part. Some examples:
```
/path/to/some/folder/__virtual__/a0b1c2d3/0/subpath/to/file.dat
/path/to/some/folder/subpath/to/file.dat
/path/to/some/folder/__virtual__/e4f5a0b1/0/subpath/to/file.dat
/path/to/some/folder/subpath/to/file.dat (different hash, same result)
/path/to/some/folder/__virtual__/a0b1c2d3/1/subpath/to/file.dat
/path/to/some/subpath/to/file.dat
/path/to/some/folder/__virtual__/a0b1c2d3/3/subpath/to/file.dat
/path/subpath/to/file.dat
```
If writing a JS tool, the [`@yarnpkg/fslib`](https://yarnpkg.com/package/@yarnpkg/fslib) package may be of assistance, providing a virtual-aware filesystem layer called `VirtualFS`.
## Manifest reference
When [`pnpEnableInlining`](/configuration/yarnrc#pnpEnableInlining) is explicitly set to `false`, Yarn will generate an additional `.pnp.data.json` file containing the following fields.
This document only covers the data file itself - you should define your own in-memory data structures, populated at runtime with the information from the manifest. For example, Yarn turns the `packageRegistryData` table into two separate memory tables: one that maps a path to a package, and another that maps a package to a path.
> **Note:** You may notice that various places use arrays of tuples in place of maps. This is mostly intended to make it easier to hydrate ES6 maps, but also sometimes to have non-string keys (for instance `packageRegistryData` will have a `null` key in one particular case).
import pnpSchema from '@yarnpkg/gatsby/static/configuration/pnp.json';
import theme from 'prism-react-renderer/themes/vsDark';
import {JsonDoc} from 'react-json-doc';
<JsonDoc theme={theme} extraTheme={{
container: {borderRadius: `var(--ifm-code-border-radius)`},
inactiveHeader: {},
activeHeader: {borderRadius: `var(--ifm-code-border-radius)`, background: `#3d437c`},
annotation: {borderRadius: `var(--ifm-code-border-radius)`, background: `#383944`, color: `#ffffff`},
anchor: {scrollMarginTop: 60},
section: {fontFamily: `SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace`},
}} data={pnpSchema}/>
## Resolution algorithm
> **Note:** for simplicity, this algorithm doesn't mention all the Node.js features that allow mapping a module to another, such as [`imports`](https://nodejs.org/api/packages.html#imports), [`exports`](https://nodejs.org/api/packages.html#exports), or other vendor-specific features.
### NM_RESOLVE(`specifier`, `parentURL`)
1. This function is specified in the [Node.js documentation](https://nodejs.org/api/esm.html#resolver-algorithm-specification)
### PNP_RESOLVE(`specifier`, `parentURL`)
1. Let `resolved` be **undefined**
2. If `specifier` is a Node.js builtin, then
1. Set `resolved` to `specifier` itself and return it
3. Otherwise, if `specifier` starts with "/", "./", or "../", then
1. Set `resolved` to **NM_RESOLVE**(`specifier`, `parentURL`) and return it
4. Otherwise,
1. Note: `specifier` is now a bare identifier
2. Let `unqualified` be **RESOLVE_TO_UNQUALIFIED**(`specifier`, `parentURL`)
3. Set `resolved` to **NM_RESOLVE**(`unqualified`, `parentURL`)
### RESOLVE_TO_UNQUALIFIED(`specifier`, `parentURL`)
1. Let `resolved` be **undefined**
2. Let `ident` and `modulePath` be the result of **PARSE_BARE_IDENTIFIER**(`specifier`)
3. Let `manifest` be **FIND_PNP_MANIFEST**(`parentURL`)
4. If `manifest` is null, then
1. Set `resolved` to **NM_RESOLVE**(`specifier`, `parentURL`) and return it
5. Let `parentLocator` be **FIND_LOCATOR**(`manifest`, `parentURL`)
6. If `parentLocator` is null, then
1. Set `resolved` to **NM_RESOLVE**(`specifier`, `parentURL`) and return it
7. Let `parentPkg` be **GET_PACKAGE**(`manifest`, `parentLocator`)
8. Let `referenceOrAlias` be the entry from `parentPkg.packageDependencies` referenced by `ident`
9. If `referenceOrAlias` is **undefined**, then
1. If `manifest.enableTopLevelFallback` is **true**, then
1. If `parentLocator` **isn't** in `manifest.fallbackExclusionList`, then
1. Set `referenceOrAlias` to **RESOLVE_VIA_FALLBACK**(`manifest`, `ident`)
10. If `referenceOrAlias` is still **undefined**, then
1. Throw a resolution error
11. If `referenceOrAlias` is **null**, then
1. Note: It means that `parentPkg` has an unfulfilled peer dependency on `ident`
2. Throw a resolution error
12. Otherwise, if `referenceOrAlias` is an array, then
1. Let `alias` be `referenceOrAlias`
2. Let `dependencyPkg` be **GET_PACKAGE**(`manifest`, `alias`)
3. Return `dependencyPkg.packageLocation` concatenated with `modulePath`
13. Otherwise,
1. Let `reference` be `referenceOrAlias`
2. Let `dependencyPkg` be **GET_PACKAGE**(`manifest`, {`ident`, `reference`})
3. Return `dependencyPkg.packageLocation` concatenated with `modulePath`
### GET_PACKAGE(`manifest`, `locator`)
1. Let `referenceMap` be the entry from `parentPkg.packageRegistryData` referenced by `locator.ident`
2. Let `pkg` be the entry from `referenceMap` referenced by `locator.reference`
3. Return `pkg`
1. Note: `pkg` cannot be **undefined** here; all packages referenced in any of the Plug'n'Play data tables **MUST** have a corresponding entry inside `packageRegistryData`.
### FIND_LOCATOR(`manifest`, `moduleUrl`)
Note: The algorithm described here is quite inefficient. You should make sure to prepare data structure more suited for this task when you read the manifest.
1. Let `bestLength` be **0**
2. Let `bestLocator` be **null**
3. Let `relativeUrl` be the relative path between `manifest` and `moduleUrl`
1. Note: Make sure it always starts with a `./` or `../`
4. If `relativeUrl` matches `manifest.ignorePatternData`, then
1. Return **null**
5. For each `referenceMap` value in `manifest.packageRegistryData`
1. For each `registryPkg` value in `referenceMap`
1. If `registryPkg.discardFromLookup` **isn't true**, then
1. If `registryPkg.packageLocation.length` is greater than `bestLength`, then
1. If `relativeUrl` starts with `registryPkg.packageLocation`, then
1. Set `bestLength` to `registryPkg.packageLocation.length`
2. Set `bestLocator` to the current `registryPkg` locator
6. Return `bestLocator`
### RESOLVE_VIA_FALLBACK(`manifest`, `specifier`)
1. Let `topLevelPkg` be **GET_PACKAGE**(`manifest`, {**null**, **null**})
2. Let `referenceOrAlias` be the entry from `topLevelPkg.packageDependencies` referenced by `ident`
3. If `referenceOrAlias` is defined, then
1. Return it immediately
4. Otherwise,
1. Let `referenceOrAlias` be the entry from `manifest.fallbackPool` referenced by `ident`
2. Return it immediatly, whether it's defined or not
### FIND_PNP_MANIFEST(`url`)
Finding the right PnP manifest to use for a resolution isn't always trivial. There are two main options:
- Assume that there is a single PnP manifest covering the whole project. This is the most common case, as even when referencing third-party projects (for example via the [`portal:` protocol](/features/protocols#whats-the-difference-between-link-and-portal)) their dependency trees are stored in the same manifest as the main project.
To do that, call **FIND_CLOSEST_PNP_MANIFEST**(`require.main.filename`) once at the start of the process, cache its result, and return it for each call to **FIND_PNP_MANIFEST** (if you're running in Node.js, you can even use `require.resolve('pnpapi')` which will do this work for you).
- Try to operate within a multi-project world. **This is rarely required**. We support it inside the Node.js PnP loader, but only because of "project generator" tools like `create-react-app` which are run via `yarn create react-app` and require two different projects (the generator one `and` the generated one) to cooperate within the same Node.js process.
Supporting this use case is difficult, as it requires a bookkeeping mechanism to track the manifests used to access modules, reusing them as much as possible and only looking for a new one when the chain breaks.
### FIND_CLOSEST_PNP_MANIFEST(`url`)
1. Let `manifest` be **null**
2. Let `directoryPath` be the directory for `url`
3. Let `pnpPath` be `directoryPath` concatenated with `/.pnp.cjs`
4. If `pnpPath` exists on the filesystem, then
1. Let `pnpDataPath` be `directoryPath` concatenated with `/.pnp.data.json`
2. Set `manifest` to `JSON.parse(readFile(pnpDataPath))` and return it
5. Otherwise, if `directoryPath` is `/`, then
1. Return **null**
6. Otherwise,
1. Return **FIND_PNP_MANIFEST**(`directoryPath`)
### PARSE_BARE_IDENTIFIER(`specifier`)
1. If `specifier` starts with "@", then
1. If `specifier` doesn't contain a "/" separator, then
1. Throw an error
2. Otherwise,
1. Set `ident` to the substring of `specifier` until the second "/" separator or the end of string, whatever happens first
2. Otherwise,
1. Set `ident` to the substring of `specifier` until the first "/" separator or the end of string, whatever happens first
3. Set `modulePath` to the substring of `specifier` starting from `ident.length`
4. Return {`ident`, `modulePath`}