Skip to content
This repository has been archived by the owner on Aug 14, 2020. It is now read-only.

Commit

Permalink
lib/repository2: fix parallel pull synchronisation
Browse files Browse the repository at this point in the history
The parallel pull refactoring in
47f2cb9 introduced an unfortunate and
subtle race condition. Because i) we were asynchronously starting the
fetching of each layer [1][1] (which implies that the fetch is added to
the progress bar asynchronously, via getLayerV2 [2][2] [3][3]), and ii)
the progressutil package has no guards against calling PrintAndWait and
AddCopy in indeterminate order [4][4], it was possible for us to enter
the PrintAndWait loop [5][5] [6][6] before all of the layer fetches had
actually been added to it. Then, in the event that the first layer was
particularly fast, the CopyProgressPrinter could actually think it was
done [7][7] before the other layer(s) had finished. In this situation,
docker2aci would happily continue forward and try to generate an ACI
from each layer [8][8], and for any layer(s) that had not actually
finished downloading, the GenerateACI->writeACI->tarball.Walk call
[9][9] would be operating on a partially written file and hence result
in the errors we've been seeing ("unexpected EOF", "corrupt input
before offset 45274", and so forth).

A great case to reproduce this is the `docker:busybox` image because of
its particular characteristics: two layers, one very small (32B) and one
relatively larger (676KB) layer. Typical output looks like the following

```
% $(exit 0); while [ $? -eq 0 ]; do ./bin/docker2aci docker://busybox; done

Downloading sha256:385e281300c [===============================] 676 KB
/ 676 KB
Downloading sha256:a3ed95caeb0 [===============================]     32
B / 32 B

Generated ACI(s):
library-busybox-latest.aci

Downloading sha256:a3ed95caeb0 [==============================]      32
B / 32 B
Downloading sha256:385e281300c [==============================]  676 KB
/ 676 KB

Generated ACI(s):
library-busybox-latest.aci

Downloading sha256:a3ed95caeb0 [===================================] 32
B / 32 B
Error: conversion error: error generating ACI: error writing ACI: Error
reading tar entry: unexpected EOF

```

Note that i) the order in which the layers are registered with the
progress printer is indeterminate, and ii) every failure case (observed
so far) is when the small layer is retrieved first, and the stdout contains no
progress output at all from retrieving the other layer. This indeed
suggests that the progress printer returns before realising the second
layer is still being retrieved.

Tragically, @dgonyeo's test case [10][10] probably gives a false sense
of security (i.e. it cannot reproduce this issue), most likely
because the `dgonyeo/test` image contains a number of layers (5) of
varying sizes, and I suspect it's much less likely for this particular
race to occur.

Almost certainly fixes #166 - with this patch I'm unable to reproduce.

[1]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L89
[2]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L115
[3]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L335
[4]: coreos/pkg#63
[5]: https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L123
[6]:
https://github.com/coreos/pkg/blob/master/progressutil/iocopy.go#L115
[7]:
https://github.com/coreos/pkg/blob/master/progressutil/iocopy.go#L144
[8]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L149
[9]:
https://github.com/appc/docker2aci/blob/4e051449c0079ba8df59a51c14b7d310de1830b8/lib/internal/internal.go#L427
[10]: coreos/pkg#61 (comment)
  • Loading branch information
jonboulle committed May 29, 2016
1 parent ba503aa commit 08bd690
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion lib/internal/backend/repository/repository2.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import (
"regexp"
"strconv"
"strings"
"sync"
"time"

"github.com/appc/docker2aci/lib/common"
Expand Down Expand Up @@ -80,13 +81,17 @@ func (rb *RepositoryBackend) buildACIV2(layerIDs []string, dockerURL *types.Pars

var errChannels []chan error
closers := make([]io.ReadCloser, len(layerIDs))
var wg sync.WaitGroup
for i, layerID := range layerIDs {
errChan := make(chan error)
wg.Add(1)
errChan := make(chan error, 1)
errChannels = append(errChannels, errChan)
// https://github.com/golang/go/wiki/CommonMistakes
i := i // golang--
layerID := layerID
go func() {
defer wg.Done()

manifest := rb.imageManifests[*dockerURL]

layerIndex, err := getLayerIndex(layerID, manifest)
Expand Down Expand Up @@ -120,6 +125,8 @@ func (rb *RepositoryBackend) buildACIV2(layerIDs []string, dockerURL *types.Pars
errChan <- nil
}()
}
// Need to wait for all of the readers to be added to the copier (which happens during rb.getLayerV2)
wg.Wait()
err = copier.PrintAndWait(os.Stderr, 500*time.Millisecond, nil)
if err != nil {
return nil, nil, err
Expand Down

0 comments on commit 08bd690

Please sign in to comment.