-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GODRIVER-2550 Add fuzzer to bson packages #1077
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits and a few questions. Looking great, though. Thanks for all that super helpful info in the PR description, too.
done | ||
fi | ||
|
||
go test ${PARENTDIR} -run=${FUNC} -fuzz=${FUNC} -fuzztime=${FUZZTIME} || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not care about the output of go test
(see the || true
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will still pipe the output, but if the go test
panics/errors due to a crash (which is good in the case) we don't want the bash script to terminate, rather continue the generated corpus processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an indicator that will let us know the fuzz test caused a panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewdale Yes, this should still output the results of the test. If I added a panic
to the fuzz test and then added checkpoints to the script like this:
echo "before"
go test ${PARENTDIR} -run=${FUNC} -fuzz=${FUNC} -fuzztime=${FUZZTIME} || true
echo "after"
The output would be
Fuzzing "FuzzDecode" in "./bson/fuzz_test.go"
before
fuzz: elapsed: 0s, gathering baseline coverage: 0/460 completed
failure while testing seed corpus entry: FuzzDecode/seed#0
fuzz: elapsed: 0s, gathering baseline coverage: 0/460 completed
--- FAIL: FuzzDecode (0.08s)
--- FAIL: FuzzDecode (0.00s)
testing.go:1356: panic: test
goroutine 60 [running]:
runtime/debug.Stack()
/opt/homebrew/Cellar/go/1.19.1/libexec/src/runtime/debug/stack.go:24 +0x104
testing.tRunner.func1()
/opt/homebrew/Cellar/go/1.19.1/libexec/src/testing/testing.go:1356 +0x258
panic({0x104d9f300, 0x104e0eac0})
/opt/homebrew/Cellar/go/1.19.1/libexec/src/runtime/panic.go:884 +0x204
go.mongodb.org/mongo-driver/bson.FuzzDecode.func1(0x1400013d718?, {0x10476bb28?, 0x0?, 0x0?})
/Users/preston.vasquez/Developer/mongo-go-driver/bson/fuzz_test.go:19 +0xb8
reflect.Value.call({0x104da5400?, 0x104e0d9c0?, 0x13?}, {0x104cc889d, 0x4}, {0x140002d46c0, 0x2, 0x2?})
/opt/homebrew/Cellar/go/1.19.1/libexec/src/reflect/value.go:584 +0x688
reflect.Value.Call({0x104da5400?, 0x104e0d9c0?, 0x1400008c820?}, {0x140002d46c0?, 0x0?, 0x1400032e978?})
/opt/homebrew/Cellar/go/1.19.1/libexec/src/reflect/value.go:368 +0x90
testing.(*F).Fuzz.func1.1(0x1400007eba0?)
/opt/homebrew/Cellar/go/1.19.1/libexec/src/testing/fuzz.go:337 +0x1dc
testing.tRunner(0x14000221520, 0x140002d8480)
/opt/homebrew/Cellar/go/1.19.1/libexec/src/testing/testing.go:1446 +0x10c
created by testing.(*F).Fuzz.func1
/opt/homebrew/Cellar/go/1.19.1/libexec/src/testing/fuzz.go:324 +0x4c4
FAIL
exit status 1
FAIL go.mongodb.org/mongo-driver/bson 0.298s
after
seedBSONCorpus(f) | ||
|
||
f.Fuzz(func(t *testing.T, data []byte) { | ||
for _, typ := range []func() interface{}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not loop over []interface{}
? Is there a reason I'm not seeing for using these constructors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a practice I took from the Go team. I think the idea is to make the type instantiation more realistic, i.e. something like this:
x := interface{} // or in our case typ()
// unmarshal into &x
And not like
for _, x := range []interface{ ... } {
// unmarshal into &x
}
t.Fatal("failed to marshal", err) | ||
} | ||
|
||
if err := Unmarshal(encoded, i); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting that we unmarshal, marshal and unmarshal again. What's the reasoning there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first unmarshal is to check the validity of the extended JSON. We only want this to generate corpus data on a panic/crash. If the value is not valid BSON and we don't crash, then this subtest if over.
The first marshal is to check that we can encode valid data structures into BSON, it seems unnecessary to fuzz the encoding of invalid data structures.
The last unmarshal is where we check that we can decode valid BSON, if this fails/panics/crashes then we want to know about it.
@@ -0,0 +1,2 @@ | |||
go test fuzz v1 | |||
[]byte("\x10\x00\x00\x00\v\x00\x00\x00\b\x00\x00\v\x00\x00\x00\x00") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these testdata/fuzz/FuzzDecode
files separate from the bson-corpus
seeds? How are these added into the seed corpus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data is different from testdata/bson-corpus. Three of these were interesting cases generated by running the fuzzer. One is BSON that encapsulates all types for an initial maximum code coverage in the style of the encoding/json fuzz test.
From the documentation:
seed corpus: A user-provided corpus for a fuzz test which can be used to guide the fuzzing engine. It is composed of the corpus entries provided by f.Add calls within the fuzz test, and the files in the testdata/fuzz/{FuzzTestName} directory within the package. These entries are run by default with go test, whether fuzzing or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Great work 🧑🔧
done | ||
fi | ||
|
||
go test ${PARENTDIR} -run=${FUNC} -fuzz=${FUNC} -fuzztime=${FUZZTIME} || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks
# Move the file to the directory. | ||
mv $CORPUS_FILE $PROJECT_DIRECTORY/fuzz/$FUNC | ||
|
||
echo "Moved $CORPUS_FILE to $PROJECT_DIRECTORY/fuzz/$FUNC" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, sounds good.
bson/bson_corpus_spec_test.go
Outdated
func seedBSONCorpus(f *testing.F) { | ||
fileNames, err := findJSONFilesInDir(dataDir) | ||
if err != nil { | ||
f.Fatalf("failed to find JSON files in directory %q: %v", dataDir, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f.Fatalf("failed to find JSON files in directory %q: %v", dataDir, err) | |
f.Fatalf("failed to find JSON files in directory %q: %v", dataDir, err) |
@@ -0,0 +1,2 @@ | |||
go test fuzz v1 | |||
[]byte("\x10\x00\x00\x00\v\x00\x00\x00\b\x00\x00\v\x00\x00\x00\x00") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for the explanation.
done | ||
fi | ||
|
||
go test ${PARENTDIR} -run=${FUNC} -fuzz=${FUNC} -fuzztime=${FUZZTIME} || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an indicator that will let us know the fuzz test caused a panic?
.evergreen/run-fuzz.sh
Outdated
|
||
# Clean testing cache. | ||
go clean -testcache | ||
go clean -fuzzcache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is cleaning the testcache and fuzzcache necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewdale I don't think it's necessary. I will remove these lines, I typically clean them locally to see how code coverage compares between runs but it doesn't seem like something we'd care about in CI. I will remove these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! 👍
GODRIVER-2550
This ticket seeks to
test-fuzz
task that will serially run any fuzzer in our repository for 10 minutes, streaming the result to Task Log.GODRIVER-2561 is a followup ticket to add the
test-fuzz
task to a periodic CI build.Background and Theory
The goal of fuzz testing is to generate random inputs for a program until that program crashes, revealing a bug. From the Go documentation:
Go's Fuzz Library
The initial implementation of Go's first class Fuzzing library was developed by Dmitry Vyukov as a third party library. Issue #19109 was filed on behalf of Vyukov resulting in a subsequent design draft here which was accepted under Issue #44551.
American Fuzzy Lop (AFL)
It is important to note that the Go library does not directly use AFL, rather
Radamsa, ZZUF
In the "Fuzzing support for Go" proposal it is noted that the blind mutaiton algorithm will "apply random mutations to user-provided corpus of representative inputs (ZZUF, Radamsa)."
Algorithm
Vyukov's design implements this control loop internally:
References