Skip to content
This repository has been archived by the owner on Jan 12, 2020. It is now read-only.

Thoughts on README in no particular order #2

Open
Shnatsel opened this issue Aug 21, 2018 · 26 comments
Open

Thoughts on README in no particular order #2

Shnatsel opened this issue Aug 21, 2018 · 26 comments

Comments

@Shnatsel
Copy link

Opening an issue because github doesn't provide a better communication mechanism. Hopefully some of these points are actionable.

Downside is, the std data structures we're testing don't have any sanitizers turned on etc on account of the project is run against the usual Rust releases.

libdislocator can catch many (but not all) memory errors for black-box binaries, assuming they use the system allocator.

Rust sanitizer integration lists steps to rebuild stdlib with sanitizer support, but calls it "better backtraces". We might want to get in touch and check how exactly sanitizers can be enabled for stdlib primitives.

This project will happily take fuzz code. Or, if someone could contrive to combine a fuzzer with a shrinking step this project will jump on that in a heartbeat.

Both AFL and libfuzzer shrink the inputs as they go along. They have Rust integration as afl.rs and cargo-fuzz respectively. Both have a one-shot testcase minimization command, which doesn't usually do a great job. However, AFL also comes with a crash exploration mode lets you explore alternative crashing inputs and minimize the testcase to your heart's content.

Also, Rust QuickCheck README mentions proptest as an alternative to QuickCheck that improves on shrinking and links to some comparisons of the two.

@Eh2406
Copy link

Eh2406 commented Aug 22, 2018

I was not sure how to give this feedback, but now that we have a issue open for random feedback...
A big +1 for proptest!

@Shnatsel
Copy link
Author

@Eh2406 could you elaborate on what advantages it has over QuickCheck?

@blt
Copy link
Owner

blt commented Aug 22, 2018

Opening an issue because github doesn't provide a better communication mechanism.

@Shnatsel This works for me.

libdislocator can catch many (but not all) memory errors for black-box binaries, assuming they use the system allocator.

Ah, now there's a thought. I'm not sure what allocator is built-in to the published releases but, at the same time, I'm also not sure how far this project will get without making custom builds of the standard library itself.

Rust sanitizer integration lists steps to rebuild stdlib with sanitizer support, but calls it "better backtraces". We might want to get in touch and check how exactly sanitizers can be enabled for stdlib primitives.

Agreed. I'll put together some better labels for the project and make an 'investigation' ticket out of this.

This project will happily take fuzz code. Or, if someone could contrive to combine a fuzzer with a shrinking step this project will jump on that in a heartbeat.

Both AFL and libfuzzer shrink the inputs as they go along. They have Rust integration as afl.rs and cargo-fuzz respectively. Both have a one-shot testcase minimization command, which doesn't usually do a great job. However, AFL also comes with a crash exploration mode lets you explore alternative crashing inputs and minimize the testcase to your heart's content.

I dashed this comment off over lunch and should have put more thought into it. What I had in mind is a hybrid of fuzzing and QuickCheck. In the 'exploration' phase the tool acts as a fuzzer, exploring the branch space et al of the system under test. On error, the tool enters a 'shrink' phase, using the shrink on any generated Arbitrary instances to find a smaller example set. In 'exploration' phase the tool works in terms of a big byte blob, in 'shrink' phase it works in terms of types.

Speaking of which, I need to define shrink for Op<K,V> in the hashmap model.

Also, Rust QuickCheck README mentions proptest as an alternative to QuickCheck that improves on shrinking and links to some comparisons of the two.

You know, I've never experimented much with proptest in Rust. I have used hypothesis in Python to good effect. Effective shrinking has been less of a problem than fast generation, at least in my experience QC'ing Rust projects. That said, @Eh2406 I'd be very interested to see some work done in proptest, if you had time to donate. I'm not married to QuickCheck.

Somewhat as an aside, these are features I think would be extremely valuable in a QC tool for this project:

  • test executions limited by time and not test loops
  • collection of property violations on-disk, allowing for all-night runs without stopping

@Eh2406
Copy link

Eh2406 commented Aug 22, 2018

So I am not experienced enough with this stuff to really speak from authority. The linked discussion express the soundbites I have heard about QuickCheck/Proptest better than I can. My enthusiasm comes from having repeatedly read the documentation for both and having some sense of how to get started with Proptest. To quote @AltSysrq "To be honest, I've always had a "Am I holding it wrong?" sort of feeling with quickcheck"
Edit: I here that the interop betean the two is getting better soon. So if QuickCheck makes more sense to you, then we may soon be able to mix and match.

collection of property violations on-disk, allowing for all-night runs without stopping

proptest has the saving to disk part.
https://docs.rs/proptest/*/proptest/#failure-persistence
The not stopping could probably be done with a wrapper script or a PR would probably be considered.

@Shnatsel
Copy link
Author

I've experimented with this a bit. Turns out the allocator used for stdlib Vec is whatever your binary is using. My experiments were geared towards detecting uses of uninitialized memory, but they should be informative for this project as well.

https://gist.github.com/Shnatsel/0c024a51b64c6e0b6c6e66f991904816 - this is a trivial testcase with obvious use of uninitialized memory.

I have tested three configurations, all in release mode: jemalloc (i.e. Rust's default), system allocator, and system allocator + libdislocator injected via LD_PRELOAD.

Turns out in release mode with system allocator the values inside the vector vary between runs of the entire process, but are always the same until you restart the process. With jemalloc and libdislocator they do not vary at all.

I have also made a patch for libdislocator that makes it clobber every newly allocated buffer with a different value. This allows detecting use of uninitialized memory simply by running an operation twice in the same process and comparing the output.

@Shnatsel
Copy link
Author

For running a job for a specific time there's the Unix command timeout. Just set an absurdly high number of iterations and run timeout 8h cargo test to run QuickCheck for 8 hours.

@Eh2406
Copy link

Eh2406 commented Aug 22, 2018

Just FYI, proptest is working on new functionality for testing data structures:
proptest-rs/proptest#84

@blt
Copy link
Owner

blt commented Aug 22, 2018

For running a job for a specific time there's the Unix command timeout. Just set an absurdly high number of iterations and run timeout 8h cargo test to run QuickCheck for 8 hours.

Hey now, that's useful!

@blt
Copy link
Owner

blt commented Aug 22, 2018

Just FYI, proptest is working on new functionality for testing data structures:
proptest-rs/proptest#84

@Eh2406 interesting. We're using a stateful approach already, building up a vector of operations and interpreting them against the data structure and its model. I'd be real interested to see a re-implementation of what we have in hash_map.rs compared to the stateful work in proptest.

@Shnatsel
Copy link
Author

The way I see a hybrid of a feedback-driven fuzzer and property testing is that the feedback-driven fuzzer is used to generate lots of inputs really fast and records the ones that have resulted in new execution paths. AFL achieves at about a billion executions a day on a single machine for reasonably fast targets (such as decoding ~1kb images or audio files). The resulting corpus of interesting inputs can be then fed to actual property testing to discover correctness issues. Absence of memory errors could be also checked by the fuzzer seeded with the generated corpus and combined with address sanitizer or libdislocate.

The only tricky part I can think of is representing series of operations in a fuzzer-accessible way and making that dense enough to not interfere with useful mutations and simple/stupid enough not to be a source of bugs by itself. Which actually should not be that hard.

I might or might not get around to experimenting with this, so feel free to run with the idea.

@blt
Copy link
Owner

blt commented Aug 22, 2018

The way I see a hybrid of a feedback-driven fuzzer and property testing is that the feedback-driven fuzzer is used to generate lots of inputs really fast and records the ones that have resulted in new execution paths. AFL achieves at about a billion executions a day on a single machine for reasonably fast targets (such as decoding ~1kb images or audio files). The resulting corpus of interesting inputs can be then fed to actual property testing to discover correctness issues. Absence of memory errors could be also checked by the fuzzer seeded with the generated corpus and combined with address sanitizer or libdislocate.

This is pretty well what I have in mind, which is a good sign for the idea, I think.

The only tricky part I can think of is representing series of operations in a fuzzer-accessible way and making that dense enough to not interfere with useful mutations and simple/stupid enough not to be a source of bugs by itself. Which actually should not be that hard.

Agreed. This fuzz target implements my best thoughts on this idea right now. The Arbitrary instance is generated from the fuzzer's blob directly. That is, the fuzzer side of things only has to understand execution paths and the QuickCheck side of things generating instances of Arbitrary, performing shrinking when directed.

I might or might not get around to experimenting with this, so feel free to run with the idea.

It's a toy that's on deep back-burner for me. Please feel free.

@gilescope
Copy link

I think these kind of efforts are great to test HashMap, but wouldn't it be awesome if we could test every Map?

The eclectic crate has traits for collections (as num_traits does for numbers) and I've been playing around with getting the standard library tests to run against the traits (E.g. https://github.com/gilescope/eclectic/blob/master/src/tests.rs ).
(It's the idea not the crummy implementation of trait_tests that matters)

There's a lot of if / but / maybies, but I would really like a world where I could rock up with my own implementation of a hashmap and be able to run all the standard (and security) tests for free.

Wouldn't that be a safer world?

@blt
Copy link
Owner

blt commented Aug 24, 2018

@gilescope eclectic is an exciting project. Some of the QC generators will be a little odd with an aim to trigger known bad states but I for sure see a lot of utility in eclectic shipping with property testing.

@Shnatsel
Copy link
Author

Shnatsel commented Aug 25, 2018

https://www.reddit.com/r/rust/comments/9a6qf5/blog_easy_proc_macro_derives_with_synstructure/ this lets you derive Arbitrary for custom structures, might be relevant to this project. Nevermind, that's #6

@blt
Copy link
Owner

blt commented Oct 10, 2018

I'm going to close this conversation out. Thanks everyone!

@Eh2406
Copy link

Eh2406 commented Feb 11, 2019

hybrid of fuzzing and QuickCheck.

proptest just added the ability to pass through a source of randomness. I think this would allow generators written for proptest to be run via an external fuzzer. I don't know if it is useful but if it is not, I'd love to here why not or how it could be improved.

@blt
Copy link
Owner

blt commented Feb 11, 2019 via email

@Shnatsel
Copy link
Author

Thanks for pointing this out! I have emulated this with QuickCheck before, see https://gist.github.com/Shnatsel/4a907d44d6429de93d63d6e7c4d1361e

I've also prototyped a way to generate such fuzzing harnesses automatically based on parsing Rust source code. I've got a pass that parses Rust source code from a given file and extracts function names and types into an array of structs. It was trivial to implement based on syn visitor example. I don't have the generator part yet, but that also looks dead simple.

Sadly my employer would not let me release the code without a CLA attached, and I'm not thrilled about the prospect. If someone else kicks off the project and writes the same 20 LoC parser independently, I will be able to join in afterwards and we won't have to deal with copyright assignment.

@blt blt reopened this Feb 11, 2019
@Eh2406
Copy link

Eh2406 commented Feb 12, 2019

I think that the TestRunner will still support shrinking, even when made with pass through mode.

How much progress does someone need to make before you can contribute? Did you have a name in mind for the project?

@Shnatsel
Copy link
Author

Here's what the code I've written at the company does:

  1. Extract function and method definitions from code parsed with syn based on syn visitor example, and put names and types from them into a collection of custom structs
  2. Pretty-print custom structs by implementing Display for them. This was mostly for debugging and was not producing code that would actually compile.

These are the only parts I've written and can't release without a CLA, so once that's squared away I should be able to contribute freely.

No clue about the name, sorry.

@blt
Copy link
Owner

blt commented Feb 12, 2019

@Eh2406 being able to pass through randomness from the fuzzer is extremely valuable. I'd be interested to see what happens when we adapt an existing test. I bet it will go well. I look to March to have time to do this myself.

There's a few tricks to running under a fuzzer. Most fuzzers aren't just a dumb pool of random bytes, they're also trying to intelligently manipulate that pool. So, if proptest is in charge of shrinking that might give the fuzzer an aberrant signal. What I was doing with blt/rqc was experiment with a fuzzer+QC where the fuzzer was responsible for shrinking by shrinking the byte pool, leaving the QC aspect a framework for coercing bytes and reporting on the result of said coercion. Please be aware that blt/rqc is extremely bare-bones, more a research spike than anything.

I don't know if proptest does this or not, but quickcheck will quit a test as soon as a failure has been found (and shrunk). For long runs, especially once #14 is a thing, we'll want to collect tests over the specified test run time. This is something that Quviq's EQC does and it's very valuable. Failure collection, categorization is common to fuzz tools and I was going to have the 'fuzz' side of rqc manage collection, categorization.

We need to be exit(0) safe. Because allocation failure results in panics -- modulo some recent work -- we prefer the use of blt/bh_alloc in our tests to cause an exit(0) on allocation failure, which requires a fuzzer that doesn't mind if the executable cleanly exits under it. I don't know how this interacts with proptest but I thought it worth calling out.

The Angora paper gave me the sense that a fuzz+QC integration would work very well if the QC side of things were responsible solely for coercing bytes into a set of test data, signaling if this failed, and the fuzzer were responsible for interpreting signals up from the QC side of things, shrinking / mutating bytes and collecting errors. You can see that notion embedded here in the way the QC test signals back to the runner what's happened for a given byte payload.

@Eh2406
Copy link

Eh2406 commented Feb 13, 2019

So I'd imagine that we would disable shrinking, when running under a intelligent fuzz harness. Then have some kind of file persistence, ether proptest, or a hand coded, or one provided by the fuzzer. Then shrinking can be enabled only after the bug has been found.

As to the other issues, we are way past my experience.

@blt
Copy link
Owner

blt commented Feb 14, 2019

It's also worth noting that this project should be a collection of two kinds of tests:

  • straight ahead fuzz tests
  • model-checking QC tests driven by a fuzzer

I've spent most of my time so far on the last variety because there are, to my mind, some interesting questions on how to achieve that as an engineering effort. The first point was limited by infrastructure, neatly solved by clusterfuzz being open-sourced. Come early March I'm going to start focusing on increasing the scope for straight-ahead fuzzing, standing up a project clusterfuzz. @Shnatsel I'll take a crack at producing the harness code you've described at that time, if someone hasn't got there before me, at that time.

@Shnatsel
Copy link
Author

Could you describe what you want to achieve with straight-ahead fuzz tests and how is it different from the current state of affairs? I'm not sure I follow.

Also, the project for automatically generating fuzzing harnesses from function types got kicked off at https://github.com/Eh2406/auto-fuzz-test and has independently reimplemented everything that I've coded while at the company, so I'm now able to contribute to it. This should get us fuzz coverage of libraries with large API surfaces for cheap.

@blt
Copy link
Owner

blt commented Feb 18, 2019

Could you describe what you want to achieve with straight-ahead fuzz tests and how is it different from the current state of affairs? I'm not sure I follow.

Sure thing. I did a talk at work about this very subject; I'll try and get it cleaned up and published here in the next couple of weeks. But, my basic argument is that the model-checking quickcheck tests are suitable for determining for some inputs:

  • does the program crash / violate memory safety
  • do the model and the system-under-test agree, implying a defect in one if not

Building the model and running it against the SUT is not cheap. If you're solely interested in the first point that's where a "straight-ahead" fuzz test shines; they don't spend any CPU time setting up a model and feeding it input. The hash map test aligns with my notion of a model-checking test, the str::repeat test aligns with my notion of a straight-ahead fuzz test (though there is a little bit of logic checking).

I've begun to think that there's probably more utility to be wrung out of establishing a base of fuzz tests and the infrastructure to run them well, then moving on to model-checking.

@Shnatsel
Copy link
Author

https://github.com/rust-fuzz/targets provides a number of fuzzing targets already, although some are probably outdated. Also many projects have fuzzing harnesses in-tree, but not running on CI. https://github.com/Eh2406/auto-fuzz-test would also be of use.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants