This repository provides the source code for OGHarn: a prototype framework for automated generation of fuzzing harnesses for C library APIs.
This work is presented in our paper No Harness, No Problem: Oracle-guided Harnessing for Auto-generating C API Fuzzing Harnesses, appearing in the 2025 International Conference on Software Engineering (ICSE'25).
Citing this repository: |
@inproceedings{sherman:ogharn, title = {No Harness, No Problem: Oracle-guided Harnessing for Auto-generating C API Fuzzing Harnesses}, author = {Gabriel Sherman and Stefan Nagy}, booktitle = {{IEEE/ACM} {International} {Conference} on {Software Engineering} ({ICSE})}, year = {2025},}
|
Developers: | Gabriel Sherman (gabe.sherman@utah.edu) and Stefan Nagy (snagy@cs.utah.edu) |
License: | MIT License |
Disclaimer: | This software is provided as-is with no warranty. |
Our original paper-version artifact is located at: https://zenodo.org/records/14727592.
OGHarn relies on the following tools, which must both be installed to the /extras
directory:
- Multiplier: a code indexer used in OGHarn's static analysis of the library under test.
- AFL++: a grey-box fuzzer used in OGHarn's harness testing and code coverage analysis.
To install them, run either /extras/install_dependencies.sh
, or use our available Dockerfile.
Once Multiplier and AFL++ are installed to /extras
, run /extras/set_env.sh
to:
- Activate Multiplier's Python virtual environment and allows access to its API.
- Add AFL++, Multiplier, and OGHarn to your system's
PATH
. - Install OGHarn's other Python dependencies (
PyYAML
,cfile v0.2.0
).
The following details the necessary steps for configuring harness generation for new libraries. We recommend reviewing the demos folder for more detailed examples.
Library
: Harnesses generated by OGHarn are expected to be compiled with a dynamically-linked library instrumented with AFL++ and off-the-shelf sanitizers. Any of the Makefile targets titledlib
in the demos folder provide examples for building a library for use by OGHarn.Index
: OGHarn requires a.db
file generated by Multiplier containing the index of the library. Any of the Makefile targets titledrun_mx
provide examples for indexing a library using Multiplier. Refer to the Multiplier Docs for additional guidance.
OGHarn relies on a user-supplied Makefile
to define various flags and commands for compiling the harnesses targeting the library under test.
It must begin with the following:
CC_FUZZ
andCXX_FUZZ
: AFL's compilers (afl-clang-fast
andafl-clang-fast++
, respectively).CFLAGS_ASAN
andCXXFLAGS_ASAN
: Compiler flags for ASAN and UBSan instrumentation.DEPS
: Compile-time inclusion and linking commands for the eventual harnesses (e.g.,-I ... -L ...
).DEPS_STC
: Compile-time library flags for statically-linked dependencies.DEPS_DYN
: Compile-time library flags for dynamically-linked dependencies.DEPS_LDD
: Path to the directory containing the target's compiled shared library.
As an example, below are associated Makefile
definitions for libMagic:
# Compilers and instrumentation (leave as-is).
CC_FUZZ = afl-clang-fast
CXX_FUZZ = afl-clang-fast++
CFLAGS_ASAN = -fsanitize=address,undefined
CXXFLAGS_ASAN = -fsanitize=address,undefined
# Library-specific settings (update accordingly).
DEPS = -I library/src/ \
-L library/src/.libs
DEPS_STC = -l:libmagic.a -llzma -lbz2 -lz -lzstd
DEPS_DYN = -l magic
DEPS_LDD = library/src/.libs/
harness
: Command to compile the harness usingCC_FUZZ
andDEPS_DYN
(dynamic linking).showmap
: Command to execute the harness usingafl-showmap
. SetDEPS_LDD
accordingly.
If harnesses must be linked statically, the following commands also need to be supplied. Must be used in conjunction with OGHarn argument --execute_both
(see Optional Arguments and Experimental Modes).
harness_stc
: Same as above, but withDEPS_STC
(static linking).showmap_stc
: Same as above, but without settingLD_LIBRARY_PATH
.
See the following example. Make sure that $(OUT)
and $(SEED)
are present as shown below, as these will be populated and used during OGHarn's harness generation.
harness: # Command for compiling individual harnesses.
$(CC_FUZZ) -o $(OUT)/harness.out $(OUT)/harness.c $(DEPS) $(DEPS_DYN) $(CFLAGS_ASAN)
showmap: # Command for collecting harness code coverage.
LD_LIBRARY_PATH=$(DEPS_LDD) afl-showmap -o $(OUT)/tempfile -- $(OUT)/harness.out $(SEED)
These targets will generally be the same across libraries unless the library under test requires extra environment configuration (e.g., setting the MAGIC
environment variable during execution of harnesses for magic).
OGHarn relies on two user-supplied seed file directories:
seeds_valid
: Files that will be accepted by the target library (e.g., valid JSONs for cJSON).seeds_invalid
: Random, perturbed seeds that will be rejected by the library.
OGHarn's seeds are no different than those used by fuzzers. For best results, we recommend using a varied corpus of valid and invalid seeds spanning a wide range of file sizes. Example seeds for various formats are available here: https://github.com/FuturesLab/fuzzing-seeds.
OGHarn allows for extra control over harnessing via a user-supplied config.yaml
. Available settings include:
blacklist
: List of functions to avoid during harnessing.add_preamble
: Library function to be called before fuzz data injection in any harness.add_argument
: Hardcoded argument value for a particular function.add_defines
: Any#define
statements to include in each harness.
See demos/sqlite, demos/magic, and demos/pcre2 for relevant examples.
Before continuing, ensure that your Makefile
as well as the seeds_valid
and seeds_invalid
directories are contained in the same folder, e.g.:
in_dir/
/seeds_valid/
/seeds_invalid/
/Makefile
/config.yaml (optional)
This directory (e.g., in_dir
above) will serve as your input directory to OGHarn (passed via --input
).
To generate harnesses, run ogharn.py
and update the following arguments accordingly. As an example, OGHarn's command for harnessing cJSON is:
ogharn.py -i in -o out -n 3 --m in/lib.db -h cJSON.h -r b
See the run_ogharn.sh
scripts in each demos sub-directory for more examples.
--input
(-i
): Path to directory housing the user-providedMakefile
and bothseeds_
dirs.--output
(-o
): Path to the output directory where OGHarn's artifacts will be stored.--numfuncs
(-n
): Maximum functions to call per harness following "data entrypoint" routines.--mxdb
(-m
): Path to Multiplier's generated.db
database file.--headers
(-h
): Library headers to target, to be injected via#include
in each harness.--readhow
(-r
): Controls how the harnesses will read fuzzer-generated data:buf
(b
): Via buffer (e.g.,foo(char* buffer)
).file
(p
): Via file name/path (e.g.,bar(char* filename)
).
--config
(-c
): Path to optionalconfig.yaml
(see Optional Configuration).--debug
(-d
): Report the following information from the harnessing campaign:- Failed harnesses and why they failed.
- Successfully-generated harnesses.
- Inferred function-to-function dependencies.
- Multiplier-found declarations, typedef aliases, function pointers, enums, and macros.
- Functions that were successfully harnessed.
- Statistics about the harness generation campaign.
- If applicable, values extracted from function call site parameter tracking.
--execute_both
(-e
): Run harnesses both dynamically/statically linked. Useful for linker-related crashes.--recurse_headers
(-x
): Recursively parse all headers. Useful if definitions are spread across multiple files.--fast_mode
(-f
): Work faster by disabling exhaustive arg search, keeping only the first-successful one.--target_func
(-t
): Attempt harnessing to reach only the specified function. Useful for targeted fuzzing.--allow_stderr
(-as
): Keeps harnesses wherestderr
output seen. Useful ifstderr
is valid API behavior.--allow_lincov
(-al
): Keeps harnesses with linear codecov deltas. Useful for low input-dependent logic.--allow_consts
(-ac
): Considersconst
args from one function as potential non-const
args for others.--allow_deepaux
(-ad
): Arg resolution via deeper auxiliary sequences. Adds significant cost to harnessing.--allow_pvalret
(-ap
): Try to retrieve concrete parameter values via Multiplier's callsite analysis.
Below details several enhancements and limitations to OGHarn. We refer readers to our paper for full details.
The following enhancements were not present in our paper's version of OGHarn.
- Targeted Harnessing: OGHarn previously only supported library-wide harnessing, but now supports function-specific harnessing (via
--target_func
). This helped us find bugs like pganalyze/libpg_query#254, htacg/tidy-html5#1120, and OpenPrinting/cups#1026. This is still considered an experimental feature. - Struct Member Population: OGHarn previously avoided structs entirely, but now attempts limited population of their members. This is still considered an experimental feature. We anticipate that OGHarn's current implementation is not well-suited to many struct-based libraries (e.g., libPNG, libVPX, etc.).
- Deeper Argument Resolution Sequences: OGHarn previously only resolved API-specific types (e.g.,
cJSON *
) by injecting calls to single "auxiliary functions". By toggling-on--allow_deepaux
, OGHarn will attempt multi-function sequences for resolving such arguments. This is still considered an experimental feature. - Concrete Parameter Value Retrieval: OGHarn previously operated only on targeted header files, but now optionally invokes Multiplier's aggressive library-wide callsite analysis to learn and consider possible concrete values in its mutation of function parameters. This helped us find bugs some bugs like pantoniou/libfyaml#121 and pantoniou/libfyaml#122. Enable this mode via
--allow_pvalret
.
If harnessing appears to be failing, we recommend using the --debug
flag and inspecting OGHarn's reported information in the output/debug
directory. Two important debugging artifacts are:
log_failed.txt
: Each failing harness and why it was discarded by OGHarn.log_multiplier.txt
: Any available information about the library from Multiplier.
- C Libraries: OGHarn currently supports only C-based libraries.
- Multiplier: In cases where Multiplier fails, OGHarn will not work. We aren't sure of the extent of this.
- Struct Population: As described above, we are only at the beginning of testing out struct population. We leave refining this to future work.
- Server-Client APIs: OGHarn does not support functions requiring complex server-client-style setup.
We are pleased that OGHarn helped uncover the following software bugs and security vulnerabilities:
If you find any other bugs using OGHarn, please let us know!
This material is based upon work supported by the National Science Foundation under Grant No. 2419798: CICI: TCR: Practical, Systematic Fuzz Testing for Securing Scientific Software.