Skip to content
This repository was archived by the owner on Jul 3, 2019. It is now read-only.

Commit b1e731f

Browse files
authored
feat(integrity): subresource integrity support (#78)
BREAKING CHANGE: The entire API has been overhauled to use SRI hashes instead of digest/hashAlgorithm pairs. SRI hashes follow the Subresource Integrity standard and support strings and objects compatible with [`ssri`](https://npm.im/ssri). * This change bumps the index version, which will invalidate all previous index entries. Content entries will remain intact, and existing caches will automatically reuse any content from before this breaking change. * `cacache.get.info()`, `cacache.ls(), and `cacache.ls.stream()` will now return objects that looks like this: ``` { key: String, integrity: '<algorithm>-<base64hash>', path: ContentPath, time: Date<ms>, metadata: Any } ``` * `opts.digest` and `opts.hashAlgorithm` are obsolete for any API calls that used them. * Anywhere `opts.digest` was accepted, `opts.integrity` is now an option. Any valid SRI hash is accepted here -- multiple hash entries will be resolved according to the standard: first, the "strongest" hash algorithm will be picked, and then each of the entries for that algorithm will be matched against the content. Content will be validated if *any* of the entries match (so, a single integrity string can be used for multiple "versions" of the same document/data). * `put.byDigest()`, `put.stream.byDigest`, `get.byDigest()` and `get.stream.byDigest() now expect an SRI instead of a `digest` + `opts.hashAlgorithm` pairing. * `get.hasContent()` now expects an integrity hash instead of a digest. If content exists, it will return the specific single integrity hash that was found in the cache. * `verify()` has learned to handle integrity-based caches, and forgotten how to handle old-style cache indices due to the format change. * `cacache.rm.content()` now expects an integrity hash instead of a hex digest.
1 parent 82a977e commit b1e731f

29 files changed

+593
-594
lines changed

README.md

+86-47
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@ can just as easily be used on its own
3535
* [`clearMemoized`](#clear-memoized)
3636
* [`tmp.mkdir`](#tmp-mkdir)
3737
* [`tmp.withTmp`](#with-tmp)
38+
* Integrity
39+
* [Subresource Integrity](#integrity)
3840
* [`verify`](#verify)
3941
* [`verify.lastRun`](#verify-last-run)
4042

@@ -49,7 +51,7 @@ const cachePath = '/tmp/my-toy-cache'
4951
const key = 'my-unique-key-1234'
5052

5153
// Cache it! Use `cachePath` as the root of the content cache
52-
cacache.put(cachePath, key, '10293801983029384').then(digest => {
54+
cacache.put(cachePath, key, '10293801983029384').then(integrity => {
5355
console.log(`Saved content to ${cachePath}.`)
5456
})
5557

@@ -66,7 +68,7 @@ cacache.get.stream(
6668
})
6769

6870
// The same thing, but skip the key index.
69-
cacache.get.byDigest(cachePath, tarballSha512).then(data => {
71+
cacache.get.byDigest(cachePath, integrityHash).then(data => {
7072
fs.writeFile(destination, data, err => {
7173
console.log('tarball data fetched based on its sha512sum and written out!')
7274
})
@@ -107,8 +109,7 @@ cacache.ls(cachePath).then(console.log)
107109
{
108110
'my-thing': {
109111
key: 'my-thing',
110-
digest: 'deadbeef',
111-
hashAlgorithm: 'sha512',
112+
integrity: 'sha512-BaSe64/EnCoDED+HAsh=='
112113
path: '.testcache/content/deadbeef', // joined with `cachePath`
113114
time: 12345698490,
114115
metadata: {
@@ -119,8 +120,7 @@ cacache.ls(cachePath).then(console.log)
119120
},
120121
'other-thing': {
121122
key: 'other-thing',
122-
digest: 'bada55',
123-
hashAlgorithm: 'whirlpool',
123+
integrity: 'sha1-ANothER+hasH=',
124124
path: '.testcache/content/bada55',
125125
time: 11992309289
126126
}
@@ -141,8 +141,7 @@ cacache.ls.stream(cachePath).on('data', console.log)
141141
// Output
142142
{
143143
key: 'my-thing',
144-
digest: 'deadbeef',
145-
hashAlgorithm: 'sha512',
144+
integrity: 'sha512-BaSe64HaSh',
146145
path: '.testcache/content/deadbeef', // joined with `cachePath`
147146
time: 12345698490,
148147
metadata: {
@@ -154,8 +153,7 @@ cacache.ls.stream(cachePath).on('data', console.log)
154153

155154
{
156155
key: 'other-thing',
157-
digest: 'bada55',
158-
hashAlgorithm: 'whirlpool',
156+
integrity: 'whirlpool-WoWSoMuchSupport',
159157
path: '.testcache/content/bada55',
160158
time: 11992309289
161159
}
@@ -165,20 +163,23 @@ cacache.ls.stream(cachePath).on('data', console.log)
165163
}
166164
```
167165

168-
#### <a name="get-data"></a> `> cacache.get(cache, key, [opts]) -> Promise({data, metadata, digest, hashAlgorithm})`
166+
#### <a name="get-data"></a> `> cacache.get(cache, key, [opts]) -> Promise({data, metadata, integrity})`
169167

170168
Returns an object with the cached data, digest, and metadata identified by
171169
`key`. The `data` property of this object will be a `Buffer` instance that
172170
presumably holds some data that means something to you. I'm sure you know what
173-
to do with it! cacache just won't care. `hashAlgorithm` is the algorithm used
174-
to calculate the `digest` of the content. This algorithm must be used if you
175-
fetch later with `get.byDigest`.
171+
to do with it! cacache just won't care.
172+
173+
`integrity` is a [Subresource
174+
Integrity](#integrity)
175+
string. That is, a string that can be used to verify `data`, which looks like
176+
`<hash-algorithm>-<base64-integrity-hash>`.
176177

177178
If there is no content identified by `key`, or if the locally-stored data does
178179
not pass the validity checksum, the promise will be rejected.
179180

180181
A sub-function, `get.byDigest` may be used for identical behavior, except lookup
181-
will happen by content digest, bypassing the index entirely. This version of the
182+
will happen by integrity hash, bypassing the index entirely. This version of the
182183
function *only* returns `data` itself, without any wrapper.
183184

184185
##### Note
@@ -197,15 +198,12 @@ cache.get(cachePath, 'my-thing').then(console.log)
197198
metadata: {
198199
thingName: 'my'
199200
},
200-
digest: 'deadbeef',
201-
hashAlgorithm: 'sha512'
201+
integrity: 'sha512-BaSe64HaSh',
202202
data: Buffer#<deadbeef>
203203
}
204204

205205
// Look up by digest
206-
cache.get.byDigest(cachePath, 'deadbeef', {
207-
hashAlgorithm: 'sha512'
208-
}).then(console.log)
206+
cache.get.byDigest(cachePath, 'sha512-BaSe64HaSh').then(console.log)
209207
// Output:
210208
Buffer#<deadbeef>
211209
```
@@ -217,12 +215,12 @@ Returns a [Readable Stream](https://nodejs.org/api/stream.html#stream_readable_s
217215
If there is no content identified by `key`, or if the locally-stored data does
218216
not pass the validity checksum, an error will be emitted.
219217

220-
`metadata` and `digest` events will be emitted before the stream closes, if
218+
`metadata` and `integrity` events will be emitted before the stream closes, if
221219
you need to collect that extra data about the cached entry.
222220

223221
A sub-function, `get.stream.byDigest` may be used for identical behavior,
224-
except lookup will happen by content digest, bypassing the index entirely. This
225-
version does not emit the `metadata` and `digest` events at all.
222+
except lookup will happen by integrity hash, bypassing the index entirely. This
223+
version does not emit the `metadata` and `integrity` events at all.
226224

227225
##### Example
228226

@@ -232,21 +230,18 @@ cache.get.stream(
232230
cachePath, 'my-thing'
233231
).on('metadata', metadata => {
234232
console.log('metadata:', metadata)
235-
}).on('hashAlgorithm', algo => {
236-
console.log('hashAlgorithm:', algo)
237-
}).on('digest', digest => {
238-
console.log('digest:', digest)
233+
}).on('integrity', integrity => {
234+
console.log('integrity:', integrity)
239235
}).pipe(
240236
fs.createWriteStream('./x.tgz')
241237
)
242238
// Outputs:
243239
metadata: { ... }
244-
hashAlgorithm: 'sha512'
245-
digest: deadbeef
240+
integrity: 'sha512-SoMeDIGest+64=='
246241

247242
// Look up by digest
248243
cache.get.stream.byDigest(
249-
cachePath, 'deadbeef', { hashAlgorithm: 'sha512' }
244+
cachePath, 'sha512-SoMeDIGest+64=='
250245
).pipe(
251246
fs.createWriteStream('./x.tgz')
252247
)
@@ -260,8 +255,7 @@ one exists.
260255
##### Fields
261256

262257
* `key` - Key the entry was looked up under. Matches the `key` argument.
263-
* `digest` - Content digest the entry refers to.
264-
* `hashAlgorithm` - Hashing algorithm used to generate `digest`.
258+
* `integrity` - [Subresource Integrity hash](#integrity) for the content this entry refers to.
265259
* `path` - Filesystem path relative to `cache` argument where content is stored.
266260
* `time` - Timestamp the entry was first added on.
267261
* `metadata` - User-assigned metadata associated with the entry/content.
@@ -274,7 +268,7 @@ cacache.get.info(cachePath, 'my-thing').then(console.log)
274268
// Output
275269
{
276270
key: 'my-thing',
277-
digest: 'deadbeef',
271+
integrity: 'sha256-MUSTVERIFY+ALL/THINGS=='
278272
path: '.testcache/content/deadbeef',
279273
time: 12345698490,
280274
metadata: {
@@ -298,16 +292,16 @@ fetch(
298292
'https://registry.npmjs.org/cacache/-/cacache-1.0.0.tgz'
299293
).then(data => {
300294
return cacache.put(cachePath, 'registry.npmjs.org|cacache@1.0.0', data)
301-
}).then(digest => {
302-
console.log('digest is', digest)
295+
}).then(integrity => {
296+
console.log('integrity hash is', integrity)
303297
})
304298
```
305299

306300
#### <a name="put-stream"></a> `> cacache.put.stream(cache, key, [opts]) -> Writable`
307301

308302
Returns a [Writable
309303
Stream](https://nodejs.org/api/stream.html#stream_writable_streams) that inserts
310-
data written to it into the cache. Emits a `digest` event with the digest of
304+
data written to it into the cache. Emits an `integrity` event with the digest of
311305
written contents when it succeeds.
312306

313307
##### Example
@@ -318,7 +312,7 @@ request.get(
318312
).pipe(
319313
cacache.put.stream(
320314
cachePath, 'registry.npmjs.org|cacache@1.0.0'
321-
).on('digest', d => console.log('digest is ${d}'))
315+
).on('integrity', d => console.log(`integrity digest is ${d}`))
322316
)
323317
```
324318

@@ -336,22 +330,23 @@ If provided, the data stream will be verified to check that enough data was
336330
passed through. If there's more or less data than expected, insertion will fail
337331
with an `EBADSIZE` error.
338332

339-
##### `digest`
333+
##### `integrity`
340334

341335
If present, the pre-calculated digest for the inserted content. If this option
342336
if provided and does not match the post-insertion digest, insertion will fail
343337
with an `EBADCHECKSUM` error.
344338

345-
To control the hashing algorithm, use `opts.hashAlgorithm`.
339+
`hashAlgorithm` has no effect if this option is present.
346340

347341
##### `hashAlgorithm`
348342

349343
Default: 'sha512'
350344

351-
Hashing algorithm to use when calculating the digest for inserted data. Can use
352-
any algorithm listed in `crypto.getHashes()` or `'omakase'`/`'お任せします'` to
353-
pick a random hash algorithm on each insertion. You may also use any anagram of
354-
`'modnar'` to use this feature.
345+
Hashing algorithm to use when calculating the [subresource integrity
346+
digest](#integrity)
347+
for inserted data. Can use any algorithm listed in `crypto.getHashes()` or
348+
`'omakase'`/`'お任せします'` to pick a random hash algorithm on each insertion. You
349+
may also use any anagram of `'modnar'` to use this feature.
355350

356351
##### `uid`/`gid`
357352

@@ -395,6 +390,10 @@ Alias: `cacache.rm`
395390
Removes the index entry for `key`. Content will still be accessible if
396391
requested directly by content address ([`get.stream.byDigest`](#get-stream)).
397392

393+
To remove the content itself (which might still be used by other entries), use
394+
[`rm.content`](#rm-content). Or, to safely vacuum any unused content, use
395+
[`verify`](#verify).
396+
398397
##### Example
399398

400399
```javascript
@@ -403,16 +402,16 @@ cacache.rm.entry(cachePath, 'my-thing').then(() => {
403402
})
404403
```
405404

406-
#### <a name="rm-content"></a> `> cacache.rm.content(cache, digest) -> Promise`
405+
#### <a name="rm-content"></a> `> cacache.rm.content(cache, integrity) -> Promise`
407406

408-
Removes the content identified by `digest`. Any index entries referring to it
407+
Removes the content identified by `integrity`. Any index entries referring to it
409408
will not be usable again until the content is re-added to the cache with an
410409
identical digest.
411410

412411
##### Example
413412

414413
```javascript
415-
cacache.rm.content(cachePath, 'deadbeef').then(() => {
414+
cacache.rm.content(cachePath, 'sha512-SoMeDIGest/IN+BaSE64==').then(() => {
416415
console.log('data for my-thing is gone!')
417416
})
418417
```
@@ -462,14 +461,54 @@ cacache.tmp.withTmp(cache, dir => {
462461
})
463462
```
464463

464+
#### <a name="integrity"></a> Subresource Integrity Digests
465+
466+
For content verification and addressing, cacache uses strings following the
467+
[Subresource
468+
Integrity](https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity)
469+
spec. That is, any time cacache expects an `integrity` argument or option, it
470+
should be in the format `<hashAlgorithm>-<base64-hash>`.
471+
472+
One deviation from the current spec is that cacache will support any hash
473+
algorithms supported by the underlying Node.js process. You can use
474+
`crypto.getHashes()` to see which ones you can use.
475+
476+
##### Generating Digests Yourself
477+
478+
If you have an existing content shasum, they are generally formatted as a
479+
hexadecimal string (that is, a sha1 would look like:
480+
`5f5513f8822fdbe5145af33b64d8d970dcf95c6e`). In order to be compatible with
481+
cacache, you'll need to convert this to an equivalent subresource integrity
482+
string. For this example, the corresponding hash would be:
483+
`sha1-X1UT+IIv2+UUWvM7ZNjZcNz5XG4=`.
484+
485+
If you want to generate an integrity string yourself for existing data, you can
486+
use something like this:
487+
488+
```javascript
489+
const crypto = require('crypto')
490+
const hashAlgorithm = 'sha512'
491+
const data = 'foobarbaz'
492+
493+
const integrity = (
494+
hashAlgorithm +
495+
'-' +
496+
crypto.createHash(hashAlgorithm).update(data).digest('base64')
497+
)
498+
```
499+
500+
You can also use [`ssri`](https://npm.im) to have a richer set of functionality
501+
around SRI strings, including generation, parsing, and translating from existing
502+
hex-formatted strings.
503+
465504
#### <a name="verify"></a> `> cacache.verify(cache, opts) -> Promise`
466505

467506
Checks out and fixes up your cache:
468507

469508
* Cleans up corrupted or invalid index entries.
470509
* Custom entry filtering options.
471510
* Garbage collects any content entries not referenced by the index.
472-
* Checks digests for all content entries and removes invalid content.
511+
* Checks integrity for all content entries and removes invalid content.
473512
* Fixes cache ownership.
474513
* Removes the `tmp` directory in the cache and all its contents.
475514

0 commit comments

Comments
 (0)