GODRIVER-2550 Add fuzzer to bson packages #1077

prestonvasquez · 2022-09-15T20:37:43Z

This ticket seeks to

Fuzz the encoder and decoder BSON methods using Go Fuzzing, specifically in the style of the encoding/json pakage.
Seed the fuzz corpus with interesting use cases via the testdata/fuzz/{fuzzer} design.
Seed the fuzz corpus with all the extended JSON data from the specifications/source/bson-corpus.
Add a new fuzz matrix to the evergreen CI along with a test-fuzz task that will serially run any fuzzer in our repository for 10 minutes, streaming the result to Task Log.

GODRIVER-2561 is a followup ticket to add the test-fuzz task to a periodic CI build.

Background and Theory

The goal of fuzz testing is to generate random inputs for a program until that program crashes, revealing a bug. From the Go documentation:

While fuzzing is in progress, the fuzzing engine generates new inputs and runs them against the provided fuzz target. By default, it continues to run until a failing input is found, or the user cancels the process (e.g. with Ctrl^C)

Go's Fuzz Library

The initial implementation of Go's first class Fuzzing library was developed by Dmitry Vyukov as a third party library. Issue #19109 was filed on behalf of Vyukov resulting in a subsequent design draft here which was accepted under Issue #44551.

American Fuzzy Lop (AFL)

American fuzzy lop is a security-oriented fuzzer that employs a novel type of compile-time instrumentation and genetic algorithms to automatically discover clean, interesting test cases that trigger new internal states in the targeted binary. This substantially improves the functional coverage for the fuzzed code. The compact synthesized corpora produced by the tool are also useful for seeding other, more labor- or resource-intensive testing regimes down the road.

It is important to note that the Go library does not directly use AFL, rather

Vyukov's go-fuzz tool operates in a similar way to AFL, but is written specifically for Go.

Radamsa, ZZUF

Radamsa is a test case generator for robustness testing, a.k.a. a fuzzer. It is typically used to test how well a program can withstand malformed and potentially malicious inputs. It works by reading sample files of valid data and generating interestringly different outputs from them. The main selling points of radamsa are that it has already found a slew of bugs in programs that actually matter, it is easily scriptable and, easy to get up and running.

zzuf is a transparent application input fuzzer. Its purpose is to find bugs in applications by corrupting their user-contributed data (which more than often comes from untrusted sources on the Internet). It works by intercepting file and network operations and changing random bits in the program’s input. zzuf’s behaviour is deterministic, making it easier to reproduce bugs.

In the "Fuzzing support for Go" proposal it is noted that the blind mutaiton algorithm will "apply random mutations to user-provided corpus of representative inputs (ZZUF, Radamsa)."

Algorithm

Vyukov's design implements this control loop internally:

start with some (potentially empty) corpus of inputs
for {
    choose a random input from the corpus
    mutate the input
    execute the mutated input and collect code coverage
    if the input gives new coverage, add it to the corpus
}

In order to add coverage recording to a Go program, a developer first runs the go-fuzz-build command (instead of go build), which uses the built-in ast package to add instrumentation to each block in the source code, and sends the result through the regular Go compiler. Once the instrumented binary has been built, the go-fuzz command runs it over and over on multiple CPU cores with randomly mutating inputs, recording any crashes (along with their stack traces and the inputs that caused them) as it goes.

References

benjirewis

Some nits and a few questions. Looking great, though. Thanks for all that super helpful info in the PR description, too.

.evergreen/config.yml

.evergreen/run-fuzz.sh

benjirewis · 2022-10-04T23:21:38Z

.evergreen/run-fuzz.sh

+			done
+		fi
+
+		go test ${PARENTDIR} -run=${FUNC} -fuzz=${FUNC} -fuzztime=${FUZZTIME} || true


Do we not care about the output of go test (see the || true)?

This will still pipe the output, but if the go test panics/errors due to a crash (which is good in the case) we don't want the bash script to terminate, rather continue the generated corpus processing.

Is there an indicator that will let us know the fuzz test caused a panic?

Sounds good, thanks

@matthewdale Yes, this should still output the results of the test. If I added a panic to the fuzz test and then added checkpoints to the script like this:

echo "before" go test ${PARENTDIR} -run=${FUNC} -fuzz=${FUNC} -fuzztime=${FUZZTIME} || true echo "after"

The output would be

Fuzzing "FuzzDecode" in "./bson/fuzz_test.go" before fuzz: elapsed: 0s, gathering baseline coverage: 0/460 completed failure while testing seed corpus entry: FuzzDecode/seed#0 fuzz: elapsed: 0s, gathering baseline coverage: 0/460 completed --- FAIL: FuzzDecode (0.08s) --- FAIL: FuzzDecode (0.00s) testing.go:1356: panic: test goroutine 60 [running]: runtime/debug.Stack() /opt/homebrew/Cellar/go/1.19.1/libexec/src/runtime/debug/stack.go:24 +0x104 testing.tRunner.func1() /opt/homebrew/Cellar/go/1.19.1/libexec/src/testing/testing.go:1356 +0x258 panic({0x104d9f300, 0x104e0eac0}) /opt/homebrew/Cellar/go/1.19.1/libexec/src/runtime/panic.go:884 +0x204 go.mongodb.org/mongo-driver/bson.FuzzDecode.func1(0x1400013d718?, {0x10476bb28?, 0x0?, 0x0?}) /Users/preston.vasquez/Developer/mongo-go-driver/bson/fuzz_test.go:19 +0xb8 reflect.Value.call({0x104da5400?, 0x104e0d9c0?, 0x13?}, {0x104cc889d, 0x4}, {0x140002d46c0, 0x2, 0x2?}) /opt/homebrew/Cellar/go/1.19.1/libexec/src/reflect/value.go:584 +0x688 reflect.Value.Call({0x104da5400?, 0x104e0d9c0?, 0x1400008c820?}, {0x140002d46c0?, 0x0?, 0x1400032e978?}) /opt/homebrew/Cellar/go/1.19.1/libexec/src/reflect/value.go:368 +0x90 testing.(*F).Fuzz.func1.1(0x1400007eba0?) /opt/homebrew/Cellar/go/1.19.1/libexec/src/testing/fuzz.go:337 +0x1dc testing.tRunner(0x14000221520, 0x140002d8480) /opt/homebrew/Cellar/go/1.19.1/libexec/src/testing/testing.go:1446 +0x10c created by testing.(*F).Fuzz.func1 /opt/homebrew/Cellar/go/1.19.1/libexec/src/testing/fuzz.go:324 +0x4c4 FAIL exit status 1 FAIL go.mongodb.org/mongo-driver/bson 0.298s after

bson/bson_corpus_spec_test.go

benjirewis · 2022-10-04T23:35:04Z

bson/fuzz_test.go

+	seedBSONCorpus(f)
+
+	f.Fuzz(func(t *testing.T, data []byte) {
+		for _, typ := range []func() interface{}{


Why not loop over []interface{}? Is there a reason I'm not seeing for using these constructors?

It's a practice I took from the Go team. I think the idea is to make the type instantiation more realistic, i.e. something like this:

x := interface{} // or in our case typ() // unmarshal into &x

And not like

for _, x := range []interface{ ... } { // unmarshal into &x }

benjirewis · 2022-10-04T23:35:27Z

bson/fuzz_test.go

+				t.Fatal("failed to marshal", err)
+			}
+
+			if err := Unmarshal(encoded, i); err != nil {


Ah interesting that we unmarshal, marshal and unmarshal again. What's the reasoning there?

The first unmarshal is to check the validity of the extended JSON. We only want this to generate corpus data on a panic/crash. If the value is not valid BSON and we don't crash, then this subtest if over.

The first marshal is to check that we can encode valid data structures into BSON, it seems unnecessary to fuzz the encoding of invalid data structures.

The last unmarshal is where we check that we can decode valid BSON, if this fails/panics/crashes then we want to know about it.

benjirewis · 2022-10-04T23:37:18Z

bson/testdata/fuzz/FuzzDecode/002ae7d43f636100116fede772a03d07726ed75c3c3b83da865fe9b718adf8ae

@@ -0,0 +1,2 @@
+go test fuzz v1
+[]byte("\x10\x00\x00\x00\v\x00\x00\x00\b\x00\x00\v\x00\x00\x00\x00")


Are these testdata/fuzz/FuzzDecode files separate from the bson-corpus seeds? How are these added into the seed corpus?

This data is different from testdata/bson-corpus. Three of these were interesting cases generated by running the fuzzer. One is BSON that encapsulates all types for an initial maximum code coverage in the style of the encoding/json fuzz test.

From the documentation:

seed corpus: A user-provided corpus for a fuzz test which can be used to guide the fuzzing engine. It is composed of the corpus entries provided by f.Add calls within the fuzz test, and the files in the testdata/fuzz/{FuzzTestName} directory within the package. These entries are run by default with go test, whether fuzzing or not.

Great, thanks for the explanation.

benjirewis

Looks good! Great work 🧑‍🔧

benjirewis · 2022-10-05T16:57:17Z

.evergreen/run-fuzz.sh

+			done
+		fi
+
+		go test ${PARENTDIR} -run=${FUNC} -fuzz=${FUNC} -fuzztime=${FUZZTIME} || true


Sounds good, thanks

benjirewis · 2022-10-05T16:57:28Z

.evergreen/run-fuzz.sh

+					# Move the file to the directory.
+					mv $CORPUS_FILE $PROJECT_DIRECTORY/fuzz/$FUNC
+
+					echo "Moved $CORPUS_FILE to $PROJECT_DIRECTORY/fuzz/$FUNC"


Fair, sounds good.

benjirewis · 2022-10-05T16:58:00Z

bson/bson_corpus_spec_test.go

+func seedBSONCorpus(f *testing.F) {
+	fileNames, err := findJSONFilesInDir(dataDir)
+	if err != nil {
+		f.Fatalf("failed to find  JSON files in directory %q: %v", dataDir, err)


Suggested change

f.Fatalf("failed to find JSON files in directory %q: %v", dataDir, err)

f.Fatalf("failed to find JSON files in directory %q: %v", dataDir, err)

benjirewis · 2022-10-05T16:58:54Z

bson/testdata/fuzz/FuzzDecode/002ae7d43f636100116fede772a03d07726ed75c3c3b83da865fe9b718adf8ae

@@ -0,0 +1,2 @@
+go test fuzz v1
+[]byte("\x10\x00\x00\x00\v\x00\x00\x00\b\x00\x00\v\x00\x00\x00\x00")


Great, thanks for the explanation.

matthewdale · 2022-10-05T05:12:16Z

.evergreen/run-fuzz.sh

+			done
+		fi
+
+		go test ${PARENTDIR} -run=${FUNC} -fuzz=${FUNC} -fuzztime=${FUZZTIME} || true


Is there an indicator that will let us know the fuzz test caused a panic?

matthewdale · 2022-10-05T05:20:04Z

.evergreen/run-fuzz.sh

+
+	# Clean testing cache.
+	go clean -testcache
+	go clean -fuzzcache


Why is cleaning the testcache and fuzzcache necessary?

@matthewdale I don't think it's necessary. I will remove these lines, I typically clean them locally to see how code coverage compares between runs but it doesn't seem like something we'd care about in CI. I will remove these lines.

matthewdale

Looks good! 👍

prestonvasquez added 13 commits September 15, 2022 14:37

GODRIVER-2550 Add fuzzer to bson packages

daf51bc

GODRIVER-2550 update comment

bb4f333

GODRIVER-2550 add known crashers and new decoder targets

00be7df

GODRIVER-2550 unify the binary

58a2afd

GODRIVER-2550 organizing case types

839da00

GODRIVER-2550 update coverage

ee9ff08

GODRIVER-2550 remove debuggers

0cfb457

GODRIVER-2550 setup EG workflow

7337330

remove mongo orchestration

41efed0

remove fuzz tags

8a0d0e8

Merge branch 'master' into GODRIVER-2550

c4af531

give fuzz it's own matrix

f2398f3

add function to seed the bson corpus

86aae74

prestonvasquez requested review from matthewdale and benjirewis September 29, 2022 20:53

prestonvasquez marked this pull request as ready for review September 29, 2022 20:55

prestonvasquez added 8 commits September 29, 2022 15:00

abstract ext json seeding logic

e4ab0f6

clean up logic

c6d97d5

remove debug tools

55dac80

remove empty lines at the end of EG config

e210fb1

remove empty lines at the end of run-fuzz

4a322c1

decouple test case fuzz

1820a5e

rename rbytes to jbytes

045dc76

GODRIVER-2550 fix time (s) to (h) conversion typo

aec25d1

benjirewis reviewed Oct 4, 2022

View reviewed changes

GODRIVER-2550 review updates

0ee3576

prestonvasquez requested a review from benjirewis October 5, 2022 16:31

benjirewis approved these changes Oct 5, 2022

View reviewed changes

matthewdale reviewed Oct 5, 2022

View reviewed changes

matthewdale approved these changes Oct 5, 2022

View reviewed changes

GODRIVER-2550 remove cache cleanup, fix spacing typo

26b3710

prestonvasquez merged commit 6f84f7e into mongodb:master Oct 10, 2022

prestonvasquez deleted the GODRIVER-2550 branch October 10, 2022 16:17

Julien-Beezeelinx pushed a commit to Julien-Beezeelinx/mongo-go-driver that referenced this pull request Oct 20, 2022

GODRIVER-2550 Add fuzzer to bson packages (mongodb#1077)

4b3f23e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GODRIVER-2550 Add fuzzer to bson packages #1077

GODRIVER-2550 Add fuzzer to bson packages #1077

prestonvasquez commented Sep 15, 2022 •

edited

Loading

benjirewis left a comment

benjirewis Oct 4, 2022

prestonvasquez Oct 4, 2022

matthewdale Oct 5, 2022

benjirewis Oct 5, 2022

prestonvasquez Oct 5, 2022

benjirewis Oct 4, 2022

prestonvasquez Oct 5, 2022

benjirewis Oct 4, 2022

prestonvasquez Oct 5, 2022

benjirewis Oct 4, 2022

prestonvasquez Oct 5, 2022

benjirewis Oct 5, 2022

benjirewis left a comment

benjirewis Oct 5, 2022

benjirewis Oct 5, 2022

benjirewis Oct 5, 2022

benjirewis Oct 5, 2022

matthewdale Oct 5, 2022

matthewdale Oct 5, 2022

prestonvasquez Oct 5, 2022

matthewdale left a comment

		@@ -0,0 +1,2 @@
		go test fuzz v1
		[]byte("\x10\x00\x00\x00\v\x00\x00\x00\b\x00\x00\v\x00\x00\x00\x00")

	f.Fatalf("failed to find JSON files in directory %q: %v", dataDir, err)
	f.Fatalf("failed to find JSON files in directory %q: %v", dataDir, err)

GODRIVER-2550 Add fuzzer to bson packages #1077

GODRIVER-2550 Add fuzzer to bson packages #1077

Conversation

prestonvasquez commented Sep 15, 2022 • edited Loading

Background and Theory

Go's Fuzz Library

American Fuzzy Lop (AFL)

Radamsa, ZZUF

Algorithm

References

benjirewis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjirewis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewdale left a comment

Choose a reason for hiding this comment

prestonvasquez commented Sep 15, 2022 •

edited

Loading