Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Property testing with quickcheck #1159

Merged
merged 2 commits into from
Nov 30, 2017
Merged

Property testing with quickcheck #1159

merged 2 commits into from
Nov 30, 2017

Conversation

shnewto
Copy link
Contributor

@shnewto shnewto commented Nov 23, 2017

This PR represents an attempt to address issue #970. It also represents a portion of the meta issue for fuzzing #972.

The code base reflected here uses quickcheck to generate C headers that
include a variety of types including basic types, structs, unions,
function prototypes and function pointers. The headers generated by quickcheck
are passed to the csmith-fuzzing/predicate.py script. Examples of headers
generated by this iteration of the tooling can be viewed
here.

At the top of each header are two simple struct definitions,
whitelistable and blacklistable. Those types are present in the vector that
represents otherwise primitive types used to generate. They represent a naive
approach to exposing custom types without having to intuit generated type names like
struct_21_8 though any actual whitelisting logic isn't implemented
here
.

Test success is measured by the success of the
csmith-fuzzing/predicate.py
script. This means that for a test to pass the following must be true:

  • bindgen doesn't panic
  • the resulting bindings compile
  • the resulting bindings layout tests pass

Usage

cd tests/property_test
cargo test

Some things I'm unsure of:

Where should this feature live?

At the moment it lives in tests/property_test but isn't run when
cargo test is invoked from bindgen's cargo manifest directory.

What's an acceptable ammount of time for these tests to take?

At this point, the source is genereated in ~1 second but the files are
large enough that it takes the predicate.py script ~30 seconds to run
through each one. In order for the tests to run in under a minute only 2 are
generated by quickcheck by default. This can be changed in the test_bindgen
function of the tests/property_test/tests/fuzzed-c-headers.rs file.

How do we expose the generated code for easy inspection?

For now the run_predicate_script function in the
tests/property_test/tests/fuzzed-c-headers.rs file contains a
commented block that will copy generated source in the tests/property_test/tests
directory. Should it be easier?

Special casing

There is some logic in the fuzzer that disallows 0 sized arrays because
tests will regulary fail due to issues documented in #684 and #1153. Should
this be special casing?

Does the fuzzer warrant its own crate?

After any iterations the reviewers are interested in required to make
this a functional testing tool, should/could the fuzzing library be made into
its own crate? I didn't move in that direction yet because having it all in one
place seemed like the best way to figure out what works an doesn't but I'm
interested in whether it might be useful as a standalone library.

What does it look like to expose more useful functionality?

I'm looking forward to feedback on how to make this a more useful tool
and one that provides the right configurability.

Thanks!

r? @fitzgen

a portion of the meta issue for fuzzing #972.

The code base reflected here uses quickcheck to generate C headers that
include a variety of types including basic types, structs, unions,
function
prototypes and function pointers. The headers generated by quickcheck
are
passed to the `csmith-fuzzing/predicate.py` script. Examples of headers
generated by this iteration of the tooling can be viewed
[here](https://gist.github.com/snewt/03ce934f35c5b085807d2d5cf11d1d5c).

At the top of each header are two simple struct definitions,
`whitelistable`
and `blacklistable`. Those types are present in the vector that
represents
otherwise primitive types used to generate. They represent a naive
approach to
exposing custom types without having to intuit generated type names like
`struct_21_8` though _any actual whitelisting logic isn't implemented
here_.

Test success is measured by the success of the
`csmith-fuzzing/predicate.py`
script. This means that for a test to pass the following must be true:
- bindgen doesn't panic
- the resulting bindings compile
- the resulting bindings layout tests pass

```bash
cd tests/property_test
cargo test
```

Some things I'm unsure of:
At the moment it lives in `tests/property_test` but isn't run when
`cargo test`
is invoked from bindgen's cargo manifest directory.

At this point, the source is genereated in ~1 second but the files are
large
enough that it takes the `predicate.py` script ~30 seconds to run
through each
one. In order for the tests to run in under a minute only 2 are
generated by
quickcheck by default. This can be changed in the `test_bindgen`
function of the
`tests/property_test/tests/fuzzed-c-headers.rs` file.

For now the `run_predicate_script` function in the
`tests/property_test/tests/fuzzed-c-headers.rs` file contains a
commented block
that will copy generated source in the `tests/property_test/tests`
directory.
Should it be easier?

There is some logic in the fuzzer that disallows 0 sized arrays because
tests
will regulary fail due to issues documented in #684 and #1153. Should
this be
special casing?

After any iterations the reviewers are interested in required to make
this
a functional testing tool, should/could the fuzzing library be made into
its own
crate? I didn't move in that direction yet because having it all in one
place
seemed like the best way to figure out what works an doesn't but I'm
interested
in whether it might be useful as a standalone library.

I'm looking forward to feedback on how to make this a more useful tool
and one
that provides the right configurability.

Thanks!

r? @fitzgen
@highfive
Copy link

Thanks for the pull request, and welcome! The Servo team is excited to review your changes, and you should hear from @fitzgen (or someone else) soon.

@fitzgen
Copy link
Member

fitzgen commented Nov 27, 2017

@Snewt thanks very much for this PR! And thanks for your patience -- I've been on vacation the last week, which is why you haven't heard back from me.

I'll try to take a look at the code and answer your open questions today, but it might get pushed back to tomorrow.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks awesome -- thank you! Very close to what I was hoping for.

I left a lot of nitpick-y comments on the PR, this is not intended as harsh
criticism, just trying to bring us towards our code ideals.

I am very excited for this!


At the top of each header are two simple struct definitions,
whitelistable and blacklistable. Those types are present in the vector that
represents otherwise primitive types used to generate. They represent a naive
approach to exposing custom types without having to intuit generated type names like
struct_21_8 though any actual whitelisting logic isn't implemented
here
.

I think instead of generating types named whitelistable or blacklistable, we
should track the names of types and variables that we generated and then
randomly choose some to whitelist (or none).

We should avoid blacklisting (at least for now) because it intentionally creates
bindings that won't compile, and it is the users' responsibility to provide
alternative definitions instead.

We should however also randomly mark types as opaque.

As far as tracking the names that we've generated, we can either do that:

  • On-the-fly as we generate types. Perhaps by introducing a
    ArbitraryWithCurrentScope trait that is the same as quickcheck::Arbitrary
    but with a "current scope" parameter containing all the names of things we've
    defined thus far, and implementing that for everything rather than Arbitrary
    directly.

  • Or we can do it in a secondary pass over the generated AST, a la MakeUnique

We can do all of this whitelisting and scope tracking in a follow up PR, so we
don't need to dig in too deep right now.

Test success is measured by the success of the
csmith-fuzzing/predicate.py
script. This means that for a test to pass the following must be true:

  • bindgen doesn't panic
  • the resulting bindings compile
  • the resulting bindings layout tests pass

Perfect!

Where should this feature live?

At the moment it lives in tests/property_test but isn't run when
cargo test is invoked from bindgen's cargo manifest directory.

Its fine for this to be a separate crate that is invoked separately. To get
really nitpick-y, I'd probably name this crate "quickchecking" rather than
"property_test".

We should add a new CI job that checks that this crate continues to build, at
minimum. Once bindgen is pretty reliably passing the property tests we can
start running them in CI as well.

To add the new CI job:

  • First, add a row to the env.matrix in .travis.yml:
- LLVM_VERSION="4.0.0" BINDGEN_JOB="quickchecking"
  • Second, add a new case to ci/script.sh:
"quickchecking")
    cd ./tests/quickchecking
    # TODO: Actually run quickchecks once `bindgen` is reliable enough.
    cargo check
    ;;

What's an acceptable ammount of time for these tests to take?

At this point, the source is genereated in ~1 second but the files are
large enough that it takes the predicate.py script ~30 seconds to run
through each one. In order for the tests to run in under a minute only 2 are
generated by quickcheck by default. This can be changed in the test_bindgen
function of the tests/property_test/tests/fuzzed-c-headers.rs file.

This is probably because predicate.py does cargo run with bindgen, so the
first time its called, it needs to build bindgen. We can explicitly do a
cargo build of bindgen before we begin quickchecking. Maybe there is another
work around as well.

Backing up a bit: it would be kind of cool if this whole quickchecking crate was
a [[bin]] target rather than (or in addition to?) a [lib] target. Then we
could punt all these questions to CLI flags, and allow people to fuzz overnight,
for example.

How do we expose the generated code for easy inspection?

For now the run_predicate_script function in the
tests/property_test/tests/fuzzed-c-headers.rs file contains a
commented block that will copy generated source in the tests/property_test/tests
directory. Should it be easier?

Sounds good, we just need to make sure we don't clobber any existing failing
test case in that directory. We don't want to lose valuable test cases!

Special casing

There is some logic in the fuzzer that disallows 0 sized arrays because
tests will regulary fail due to issues documented in #684 and #1153. Should
this be special casing?

This is the pragmatic approach.

It would be kind of nice to have cargo features control this, and by default we
would not generate code that is already known to be super problematic.

Does the fuzzer warrant its own crate?

After any iterations the reviewers are interested in required to make
this a functional testing tool, should/could the fuzzing library be made into
its own crate? I didn't move in that direction yet because having it all in one
place seemed like the best way to figure out what works an doesn't but I'm
interested in whether it might be useful as a standalone library.

Maybe eventually?

What does it look like to expose more useful functionality?

I'm looking forward to feedback on how to make this a more useful tool
and one that provides the right configurability.

The big next piece is whitelisting and marking types opaque.

Also bitfields.

Generating some C++ and templates and inheritance further down the line.

In general, if you go through the issue tracker (particularly I-bogus-codegen
issues) you can try and get a sense of what kinds of interactions and constructs
are tripping up bindgen and then think about how to add support for generating
those things and similar interactions.

Thanks!

Thank you!

Very excited to see the next iteration of this PR!

1 => DeclarationC::FunctionPtrDecl(FunctionPointerDeclarationC::arbitrary(g)),
2 => DeclarationC::StructDecl(StructDeclarationC::arbitrary(g)),
3 => DeclarationC::UnionDecl(UnionDeclarationC::arbitrary(g)),
_ => DeclarationC::VariableDecl(BasicTypeDeclarationC::arbitrary(g)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more clear of our intent with this match, lets do:

4 => DeclarationC::VariableDecl(...),
_ => unreachable!(),

}

trait MakeUnique {
fn make_unique(&mut self, stamp: usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on what abstraction this trait represents. In general, some doc comments in this file would be super useful. I like to turn on #![deny(missing_docs)] as a forcing function for myself, and I think that would be beneficial here as well.

impl MakeUnique for DeclarationC {
fn make_unique(&mut self, stamp: usize) {
match self {
&mut DeclarationC::FunctionDecl(ref mut d) => d.make_unique(stamp),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is slightly more idiomatic (and slightly shorter) to do

match *self {
    DeclarationC::FunctionDecl(ref mut d) => ...,
    ...
}


impl Arbitrary for DeclarationC {
fn arbitrary<G: Gen>(g: &mut G) -> DeclarationC {
match usize::arbitrary(g) % 5 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.rs/quickcheck/0.4.2/quickcheck/trait.Rng.html#method.gen_range is going to have a proper uniform distribution, where mod won't unless n is a divisor of usize::MAX, which it isn't in this case (5).

"whitelistable",
"blacklistable",
];
match base_type.iter().nth(usize::arbitrary(g) % base_type.len()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for _ in 1..dimensions {
def += &format!("[{}]", (usize::arbitrary(g) % 15) + 1);
}
ArrayDimensionC { def: def }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time we have foo: foo inside a struct literal, we can use this shorthand:

ArrayDimensionC { def }


impl MakeUnique for BasicTypeDeclarationC {
fn make_unique(&mut self, stamp: usize) {
self.ident_id += &format!("_{}", stamp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh ok so this trait is to make unique identifiers! Yeah, doc comments on the trait's definition would have been super helpful to me as a reader.

impl Arbitrary for BasicTypeDeclarationC {
fn arbitrary<G: Gen>(g: &mut G) -> BasicTypeDeclarationC {
BasicTypeDeclarationC {
type_qualifier: TypeQualifierC::arbitrary(g).def,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some intention behind pulling out the strings early, rather than making BasicTypeDeclarationC's type_qualifier be a TypeQualifierC, etc? Was it just easiest? Perhaps to break type recursion?

What I would naively expect would be having structured types instead of strings, well placed Boxes to break type recursion, and the Display implementations handling all stringification. For example, I'd expect a StructDeclarationC to have a Vec of StructFieldC rather than a string concatenation of its fields, and with something like PointerLevelC, I wouldn't expect it to contain a String, but instead just a number, and then its Display would do

for _ in 0..self.level {
    write!(f, "*")?;
}

In general, leveraging types and the type system as far as it will go and as long as we can before devolving into string concatenation (bash programming amirite?) is a good rule of thumb. It gets the compiler to double check that we don't do things like provide a PointerLevelC where we expect a TypeQualifierC; with strings the compiler can't help us here.

So, unless there is some underlying reason why we can't abide by this rule in this case, I think we should move to more structured types instead of strings for all of the various BlahC definitions.

fn arbitrary<G: Gen>(g: &mut G) -> StructDeclarationC {
let mut fields_string = String::new();
// reduce generator size as a method of putting a bound on recursion.
// when size < 1 the empty list is generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize the first word in a sentence.

.output()?)

// omit close, from tempdir crate's docs:
// "Closing the directory is actually optional, as it would be done on drop."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this comment.

@shnewto
Copy link
Contributor Author

shnewto commented Nov 28, 2017

@fitzgen thanks for the awesome and thorough feedback, those are exactly sort of things I hoped / needed to hear. I'll keep you posted on progress or questions. Ha also I'm all about "nitpick-y" comments, I really appreciate getting familiar with and having the opportunity to live up to this project's code ideals!

@shnewto
Copy link
Contributor Author

shnewto commented Nov 30, 2017

Address requested changes to quickchecking crate.

  • Remove whitelistable and blacklistable types.
  • Rename test crate directory from property_test to quickchecking.
  • Add new CI job that checks that this crate continues to build.
  • Revise matching logic to be more idomatic.
  • Phase out modular arithmetic in favor of gen_range.
  • Incorporate unreachable! into match statements.
  • Revise logic for accessing random element of vector, favor choose over nth.
  • Proper punctuation and capitalization in comments.
  • Using actual structures rather than converting everything to strings in order to leverage type system.
  • Add #![deny(missing_docs)] and filled in documentation required for the project to build again.
  • Add special case logic so we don't generate structs with long double fields as it will cause tests to fail unitl issue Generate sensible code for long double / stuff that has more than 64-bit alignment. #550 is resolved

Note on making sure we don't lose test cases we're interested in preserving:
We're copying the directories TempDir makes so we get things like this:

├── bindgen_prop.1WYe3F5HZU1c
│   └── prop_test.h
├── bindgen_prop.H4SLI1JX0jd8
│   └── prop_test.h

I'm not sure that TempDir makes any claims about uniqueness, so collisions
probably aren't impossible. I'm up for any suggestions on a more bulletproof
solution.

Tasks not addressed by this PR:

  • TODO: Add cargo features logic to allow generating problematic code.
  • TODO: Make a [bin] target with CLI to manage test settings.
  • TODO: Whitelisting and opaque types.
  • TODO: Generate bitfields, C++, I-bogus-codegen cases.

Figured this would be a good point to update the PR but if any of the above TODO
items should be incorporated before moving forward I'm up for it!

Thanks for taking another look!

r? @fitzgen

@shnewto
Copy link
Contributor Author

shnewto commented Nov 30, 2017

For reference, code that was generated after PR change requests were addressed is here.

- Remove `whitelistable` and `blacklistable` types.
- Rename test crate directory from `property_test` to `quickchecking`.
- Add new CI job that checks that this crate continues to build.
- Revise matching logic to be more idomatic.
- Phase out modular arithmetic in favor of `gen_range`.
- Incorporate `unreachable!` into match statements.
- Revise logic for accessing random element of vector, favor `choose`
  over `nth`.
- Proper punctuation and capitalization in comments.
- Using actual structures rather than converting everything to strings
  in order to leverage type system.
- Add `#![deny(missing_docs)]` and filled in documentation required for
  the project to build again.
- Add special case logic so we don't generate structs with `long double`
  fields as it will cause tests to fail unitl issue \#550 is resolved

Note on making sure we don't lose test cases we're interested in
preserving:
We're copying the directories `TempDir` makes so we get things like
this:
```
├── bindgen_prop.1WYe3F5HZU1c
│   └── prop_test.h
├── bindgen_prop.H4SLI1JX0jd8
│   └── prop_test.h
```
I'm not sure that `TempDir` makes any claims about uniqueness, so
collisions
probably aren't impossible. I'm up for any suggestions on a more
bulletproof
solution.

_Tasks not addressed by this PR:_
* TODO: Add `cargo features` logic to allow generating problematic code.
* TODO: Make a [bin] target with CLI to manage test settings.
* TODO: Whitelisting and opaque types.
* TODO: Generate bitfields, C++, I-bogus-codegen cases.

Figured this would be a good point to update the PR but if any of the
above TODO
items should be incorporated before moving forward I'm up for it!

Thanks for taking another look!

r? @fitzgen
@fitzgen
Copy link
Member

fitzgen commented Nov 30, 2017

I'm not sure that TempDir makes any claims about uniqueness, so collisions
probably aren't impossible. I'm up for any suggestions on a more bulletproof
solution.

This seems good enough to me.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🎉 🎉

This is awesome!!

Care to file follow up issues for the remaining TODO items?

Thanks 💯 @Snewt !

@fitzgen
Copy link
Member

fitzgen commented Nov 30, 2017

@bors-servo r+

@bors-servo
Copy link

📌 Commit 2aa9b1d has been approved by fitzgen

@bors-servo
Copy link

⌛ Testing commit 2aa9b1d with merge cd920a3...

bors-servo pushed a commit that referenced this pull request Nov 30, 2017
Property testing with quickcheck

This PR represents an attempt to address issue #970. It also represents a portion of the meta issue for fuzzing #972.

The code base reflected here uses quickcheck to generate C headers that
include a variety of types including basic types, structs, unions,
function prototypes and function pointers. The headers generated by quickcheck
are passed to the `csmith-fuzzing/predicate.py` script. Examples of headers
generated by this iteration of the tooling can be viewed
[here](https://gist.github.com/snewt/03ce934f35c5b085807d2d5cf11d1d5c).

At the top of each header are two simple struct definitions,
`whitelistable` and `blacklistable`. Those types are present in the vector that
represents otherwise primitive types used to generate. They represent a naive
approach to exposing custom types without having to intuit generated type names like
`struct_21_8` though _any actual whitelisting logic isn't implemented
here_.

Test success is measured by the success of the
`csmith-fuzzing/predicate.py`
script. This means that for a test to pass the following must be true:
- bindgen doesn't panic
- the resulting bindings compile
- the resulting bindings layout tests pass

#### Usage
```bash
cd tests/property_test
cargo test
```

Some things I'm unsure of:
#### Where should this feature live?
At the moment it lives in `tests/property_test` but isn't run when
`cargo test` is invoked from bindgen's cargo manifest directory.

#### What's an acceptable ammount of time for these tests to take?
At this point, the source is genereated in ~1 second but the files are
large enough that it takes the `predicate.py` script ~30 seconds to run
through each one. In order for the tests to run in under a minute only 2 are
generated by quickcheck by default. This can be changed in the `test_bindgen`
function of the `tests/property_test/tests/fuzzed-c-headers.rs` file.

#### How do we expose the generated code for easy inspection?
For now the `run_predicate_script` function in the
`tests/property_test/tests/fuzzed-c-headers.rs` file contains a
commented block that will copy generated source in the `tests/property_test/tests`
directory. Should it be easier?

#### Special casing
There is some logic in the fuzzer that disallows 0 sized arrays because
tests will regulary fail due to issues documented in #684 and #1153. Should
this be special casing?

#### Does the fuzzer warrant its own crate?
After any iterations the reviewers are interested in required to make
this a functional testing tool, should/could the fuzzing library be made into
its own crate? I didn't move in that direction yet because having it all in one
place seemed like the best way to figure out what works an doesn't but I'm
interested in whether it might be useful as a standalone library.

#### What does it look like to expose more useful functionality?
I'm looking forward to feedback on how to make this a more useful tool
and one that provides the right configurability.

Thanks!

r? @fitzgen
@shnewto
Copy link
Contributor Author

shnewto commented Nov 30, 2017

@fitzgen ah awesome! yeah I can definitely file the follow up todos.

Thanks!!

@bors-servo
Copy link

☀️ Test successful - status-travis
Approved by: fitzgen
Pushing cd920a3 to master...

@bors-servo bors-servo merged commit 2aa9b1d into rust-lang:master Nov 30, 2017
@shnewto shnewto deleted the feat/970-quickcheck-fuzzing branch November 30, 2017 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants