1

I have an IoT system that has a command-line-based interactive shell that can be used to configure the system. While examining the disassembly/decompilation, I realized that there is a lot of functionality/code to the CLI and a lot of possible logical paths in the program. As such, I have not outright identified any memory corruption vulnerabilities, but I suspect that there may be edge cases that could result in a bug. This is where I would normally apply fuzzing to bolster my coverage.

However, I am having trouble identifying an approach to creating a suitable input corpus to fuzz with. The CLI supports a number of commands, and some of them even spawn their own interactive CLI with many levels of namespaces. It may take several commands to reach certain parts of the program.

I have two thoughts on how to go about this:

  1. Create a comprehensive corpus, including a large number of commands and possible paths. Will be tedious to construct; impossible to cover everything.
  2. No input corpus; use entirely feedback-driven fuzzing (if even possible in this case). Seems like this would be very inefficient, as there would be many paths for the fuzzer to learn.

I am able to run the binary through the fuzzer and I believe the fuzzer is passing input to it correctly, so that's not an issue. I was planning on using honggfuzz for this, but I don't think that really matters for the question. I don't have source code, so this will be black box and un-instrumented fuzzing.

My question is, how should I approach creating an input corpus to fuzz a black-box program that has many possible inputs?

multithr3at3d
  • 611
  • 3
  • 15

1 Answers1

1

Thanks to @julian's comment, I was able to search for more relevant terms.

For this particular case, I decided to use AFL's dictionary mode, where you can give it a list of words that make up the target application's accepted syntax.

For example, let's pretend the target application is an interactive calculator, which supports all basic mathmatical operators, e.g. 4 + 5 or 500 / 2. For this, I would create a dictionary file with the following contents:

"+"
"-"
"*"
"/"
"^"
...

In addition to a typical set of input cases, this file (or a directory of files with one valid piece of syntax each) would be passed to AFL with the -x option, and AFL will try to create valid syntax to improve fuzzing coverage.

multithr3at3d
  • 611
  • 3
  • 15