Property testing with quickcheck #1159

shnewto · 2017-11-23T02:30:45Z

This PR represents an attempt to address issue #970. It also represents a portion of the meta issue for fuzzing #972.

The code base reflected here uses quickcheck to generate C headers that
include a variety of types including basic types, structs, unions,
function prototypes and function pointers. The headers generated by quickcheck
are passed to the csmith-fuzzing/predicate.py script. Examples of headers
generated by this iteration of the tooling can be viewed
here.

At the top of each header are two simple struct definitions,
whitelistable and blacklistable. Those types are present in the vector that
represents otherwise primitive types used to generate. They represent a naive
approach to exposing custom types without having to intuit generated type names like
struct_21_8 though any actual whitelisting logic isn't implemented
here.

Test success is measured by the success of the
csmith-fuzzing/predicate.py
script. This means that for a test to pass the following must be true:

bindgen doesn't panic
the resulting bindings compile
the resulting bindings layout tests pass

Usage

cd tests/property_test
cargo test

Some things I'm unsure of:

Where should this feature live?

At the moment it lives in tests/property_test but isn't run when
cargo test is invoked from bindgen's cargo manifest directory.

What's an acceptable ammount of time for these tests to take?

At this point, the source is genereated in ~1 second but the files are
large enough that it takes the predicate.py script ~30 seconds to run
through each one. In order for the tests to run in under a minute only 2 are
generated by quickcheck by default. This can be changed in the test_bindgen
function of the tests/property_test/tests/fuzzed-c-headers.rs file.

How do we expose the generated code for easy inspection?

For now the run_predicate_script function in the
tests/property_test/tests/fuzzed-c-headers.rs file contains a
commented block that will copy generated source in the tests/property_test/tests
directory. Should it be easier?

Special casing

There is some logic in the fuzzer that disallows 0 sized arrays because
tests will regulary fail due to issues documented in #684 and #1153. Should
this be special casing?

Does the fuzzer warrant its own crate?

After any iterations the reviewers are interested in required to make
this a functional testing tool, should/could the fuzzing library be made into
its own crate? I didn't move in that direction yet because having it all in one
place seemed like the best way to figure out what works an doesn't but I'm
interested in whether it might be useful as a standalone library.

What does it look like to expose more useful functionality?

I'm looking forward to feedback on how to make this a more useful tool
and one that provides the right configurability.

Thanks!

r? @fitzgen

@fitzgen

a portion of the meta issue for fuzzing #972. The code base reflected here uses quickcheck to generate C headers that include a variety of types including basic types, structs, unions, function prototypes and function pointers. The headers generated by quickcheck are passed to the `csmith-fuzzing/predicate.py` script. Examples of headers generated by this iteration of the tooling can be viewed [here](https://gist.github.com/snewt/03ce934f35c5b085807d2d5cf11d1d5c). At the top of each header are two simple struct definitions, `whitelistable` and `blacklistable`. Those types are present in the vector that represents otherwise primitive types used to generate. They represent a naive approach to exposing custom types without having to intuit generated type names like `struct_21_8` though _any actual whitelisting logic isn't implemented here_. Test success is measured by the success of the `csmith-fuzzing/predicate.py` script. This means that for a test to pass the following must be true: - bindgen doesn't panic - the resulting bindings compile - the resulting bindings layout tests pass ```bash cd tests/property_test cargo test ``` Some things I'm unsure of: At the moment it lives in `tests/property_test` but isn't run when `cargo test` is invoked from bindgen's cargo manifest directory. At this point, the source is genereated in ~1 second but the files are large enough that it takes the `predicate.py` script ~30 seconds to run through each one. In order for the tests to run in under a minute only 2 are generated by quickcheck by default. This can be changed in the `test_bindgen` function of the `tests/property_test/tests/fuzzed-c-headers.rs` file. For now the `run_predicate_script` function in the `tests/property_test/tests/fuzzed-c-headers.rs` file contains a commented block that will copy generated source in the `tests/property_test/tests` directory. Should it be easier? There is some logic in the fuzzer that disallows 0 sized arrays because tests will regulary fail due to issues documented in #684 and #1153. Should this be special casing? After any iterations the reviewers are interested in required to make this a functional testing tool, should/could the fuzzing library be made into its own crate? I didn't move in that direction yet because having it all in one place seemed like the best way to figure out what works an doesn't but I'm interested in whether it might be useful as a standalone library. I'm looking forward to feedback on how to make this a more useful tool and one that provides the right configurability. Thanks! r? @fitzgen

highfive · 2017-11-23T02:30:49Z

Thanks for the pull request, and welcome! The Servo team is excited to review your changes, and you should hear from @fitzgen (or someone else) soon.

fitzgen · 2017-11-27T17:57:27Z

@Snewt thanks very much for this PR! And thanks for your patience -- I've been on vacation the last week, which is why you haven't heard back from me.

I'll try to take a look at the code and answer your open questions today, but it might get pushed back to tomorrow.

fitzgen

Overall, this looks awesome -- thank you! Very close to what I was hoping for.

I left a lot of nitpick-y comments on the PR, this is not intended as harsh
criticism, just trying to bring us towards our code ideals.

I am very excited for this!

At the top of each header are two simple struct definitions,
whitelistable and blacklistable. Those types are present in the vector that
represents otherwise primitive types used to generate. They represent a naive
approach to exposing custom types without having to intuit generated type names like
struct_21_8 though any actual whitelisting logic isn't implemented
here.

I think instead of generating types named whitelistable or blacklistable, we
should track the names of types and variables that we generated and then
randomly choose some to whitelist (or none).

We should avoid blacklisting (at least for now) because it intentionally creates
bindings that won't compile, and it is the users' responsibility to provide
alternative definitions instead.

We should however also randomly mark types as opaque.

As far as tracking the names that we've generated, we can either do that:

On-the-fly as we generate types. Perhaps by introducing a
ArbitraryWithCurrentScope trait that is the same as quickcheck::Arbitrary
but with a "current scope" parameter containing all the names of things we've
defined thus far, and implementing that for everything rather than Arbitrary
directly.
Or we can do it in a secondary pass over the generated AST, a la MakeUnique

We can do all of this whitelisting and scope tracking in a follow up PR, so we
don't need to dig in too deep right now.

Test success is measured by the success of the
csmith-fuzzing/predicate.py
script. This means that for a test to pass the following must be true:

bindgen doesn't panic

the resulting bindings compile

the resulting bindings layout tests pass

Perfect!

Where should this feature live?

At the moment it lives in tests/property_test but isn't run when
cargo test is invoked from bindgen's cargo manifest directory.

Its fine for this to be a separate crate that is invoked separately. To get
really nitpick-y, I'd probably name this crate "quickchecking" rather than
"property_test".

We should add a new CI job that checks that this crate continues to build, at
minimum. Once bindgen is pretty reliably passing the property tests we can
start running them in CI as well.

To add the new CI job:

First, add a row to the env.matrix in .travis.yml:

- LLVM_VERSION="4.0.0" BINDGEN_JOB="quickchecking"

Second, add a new case to ci/script.sh:

"quickchecking")
    cd ./tests/quickchecking
    # TODO: Actually run quickchecks once `bindgen` is reliable enough.
    cargo check
    ;;

What's an acceptable ammount of time for these tests to take?

At this point, the source is genereated in ~1 second but the files are
large enough that it takes the predicate.py script ~30 seconds to run
through each one. In order for the tests to run in under a minute only 2 are
generated by quickcheck by default. This can be changed in the test_bindgen
function of the tests/property_test/tests/fuzzed-c-headers.rs file.

This is probably because predicate.py does cargo run with bindgen, so the
first time its called, it needs to build bindgen. We can explicitly do a
cargo build of bindgen before we begin quickchecking. Maybe there is another
work around as well.

Backing up a bit: it would be kind of cool if this whole quickchecking crate was
a [[bin]] target rather than (or in addition to?) a [lib] target. Then we
could punt all these questions to CLI flags, and allow people to fuzz overnight,
for example.

How do we expose the generated code for easy inspection?

For now the run_predicate_script function in the
tests/property_test/tests/fuzzed-c-headers.rs file contains a
commented block that will copy generated source in the tests/property_test/tests
directory. Should it be easier?

Sounds good, we just need to make sure we don't clobber any existing failing
test case in that directory. We don't want to lose valuable test cases!

Special casing

There is some logic in the fuzzer that disallows 0 sized arrays because
tests will regulary fail due to issues documented in #684 and #1153. Should
this be special casing?

This is the pragmatic approach.

It would be kind of nice to have cargo features control this, and by default we
would not generate code that is already known to be super problematic.

Does the fuzzer warrant its own crate?

After any iterations the reviewers are interested in required to make
this a functional testing tool, should/could the fuzzing library be made into
its own crate? I didn't move in that direction yet because having it all in one
place seemed like the best way to figure out what works an doesn't but I'm
interested in whether it might be useful as a standalone library.

Maybe eventually?

What does it look like to expose more useful functionality?

I'm looking forward to feedback on how to make this a more useful tool
and one that provides the right configurability.

The big next piece is whitelisting and marking types opaque.

Also bitfields.

Generating some C++ and templates and inheritance further down the line.

In general, if you go through the issue tracker (particularly I-bogus-codegen
issues) you can try and get a sense of what kinds of interactions and constructs
are tripping up bindgen and then think about how to add support for generating
those things and similar interactions.

Thanks!

Thank you!

Very excited to see the next iteration of this PR!

fitzgen · 2017-11-28T16:39:14Z