GramTest: a tool for grammar-based test case generation

In a series of previous articles, we learnt about automated unit test generation using search-based and property-based methods. We also looked at Pathgrind, a tool for dynamic symbolic execution that can be used for automated fuzzing of binaries. Continuing on the same theme, in this article we will look at how grammar-based test case generation works in practice. We also present a new tool - Gramtest. Gramtest allows you to generate test cases based on arbitrary user defined grammars. Potential applications of the tool include automated fuzzing and testing.

Several programs (like parsers, interpreters and compilers) that work on structured input can be tested using grammars. These applications process their input in different stages like tokenizing, building parse tree, converting to AST and evaluating the AST. For such applications, due to the large number of control flow paths in the early processing, random fuzzing does not yield good test cases. Generating tests that exploit the structured nature of the input can provide better results. The simplest way to provide specify the input is in from of context-free grammars.

Context-free grammar

A context-free grammar or CFG is a set of recursive rewriting rules (also called productions) used to generate patterns of strings. As an example, consider the following CFG for arithmetic expressions.

<expression>  ::=   <term> <addOps> <expression> | <term>
<term>        ::=   <factor> <multOps> <term> | <factor>
<addOps>      ::=   + | -
<multOps>     ::=   * | /
<factor>      ::=   "(" <expression> ")" | <constant>
<constant>    ::=   0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

The above grammar captures the language of all strings using four operators (+,-,*,/) brackets ((,)) and numbers (0-9). The set of symbols that can appear in the strings generated by the grammar are called terminals. We can generate all the strings in the grammar by following the production rules. E.g. for generating the string (1 + 2) * 3, we can apply the following rules:

<expression> ::= <term>
    <term> ::= <factor> <multiOps> <term>
        <factor> ::= "(" <expression> ")"
            <expression> ::= <term> <addOps> <expression>
                <term>    ::= <factor>
                    <factor> ::=    <constant>
                        <constant> ::= 1
                <addOps> ::= +
                <expression> ::= <term>
                    <term>    ::= <factor>
                        <factor> ::= <constant>
                            <constant> ::= 2
        <multiOps> ::= *
        <term> ::= <factor>
            <factor> ::= <constant>
                <constant> ::= 3

Each string in the grammar starts at the first symbol and then follows the production rules till it reaches a terminal symbol. The rules of the grammar as given above are said to be in Backus-Naur Form (or BNF). It is one of two main notation techniques used for representing context-free grammars. Once we have specified the input to a program in BNF, we can do test case generation by exhaustively applying all the production rules to generate strings. We present Gramtest a tool written in Java that can be used for this purpose.

Gramtest

Gramtest is implemented using the ANTLR4 parser generator. To specify the structure of the inputs used to generate tests we use the BNF grammar available from the ANTLR repository. In addition, there are some useful Maven plugins that we use for our development while working with the grammars. The BNF grammar allows us to recognize any language given in the BNF format. The syntax for BNF can itself be represented with a BNF as follows:

 <syntax>         ::= <rule> | <rule> <syntax>
 <rule>           ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace>
                                 "::=" <opt-whitespace> <expression> <line-end>
 <opt-whitespace> ::= " " <opt-whitespace> | ""
 <expression>     ::= <list> | <list> "|" <expression>
 <line-end>       ::= <opt-whitespace> <EOL> | <line-end> <line-end>
 <list>           ::= <term> | <term> <opt-whitespace> <list>
 <term>           ::= <literal> | "<" <rule-name> ">"
 <literal>        ::= '"' <text> '"' | "'" <text> "'"

The grammar for arithmetic expressions given in previous section fits in this syntax. To generate tests from a given BNF grammar we need to exhaustively enumerate all the strings in the grammar. The Gramtest tool makes it easy to do just that. To run the tool and generate test cases for the arithmetic expressions grammar, we just run the following on the command line:

Asankhayas-MacBook-Pro:target asankhaya$ java -jar gramtest-0.1-SNAPSHOT-jar-with-dependencies.jar
-file ../src/test/resources/arithexp.bnf

Generating tests ...
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+0
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+1
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+2
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+3
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+4
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+5
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+6
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+7
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+8
...

The "-file" command tells Gramtest to look for the input grammar in the file "arithexp.bnf". By default the generated tests are printed on the screen. In case you want to save them to a folder to use with your program you can use the "-tests" option as follows:

Asankhayas-MacBook-Pro:target asankhaya$ java -jar gramtest-0.1-SNAPSHOT-jar-with-dependencies.jar
-file ../src/test/resources/arithexp.bnf -tests generated-tests

Generating tests ...
All tests have been saved in the generated-tests folder!

This will save all the test cases in the "generated-tests" folder.

Asankhayas-MacBook-Pro:target asankhaya$ ls generated-tests/
1.txt    17.txt    25.txt    33.txt    41.txt    5.txt    58.txt    66.txt    74.txt    82.txt    90.txt    99.txt
10.txt    18.txt    26.txt    34.txt    42.txt    50.txt    59.txt    67.txt    75.txt    83.txt    91.txt
100.txt    19.txt    27.txt    35.txt    43.txt    51.txt    6.txt    68.txt    76.txt    84.txt    92.txt
11.txt    2.txt    28.txt    36.txt    44.txt    52.txt    60.txt    69.txt    77.txt    85.txt    93.txt
...

The test cases can then be run with the target program for fuzzing and automated testing. As an another example, lets consider a BNF grammar for generating all strings that have the word "main" in them.

<program>   ::=   <letter*> m a i n <letter*>
<letter*>   ::=   { <letter> <letter*> }
<letter>    ::=   A | B | C | D | E | F | G | H | I | J | K | L | M | N |
                  O | P | Q | R | S | T | U | V | W | X | Y | Z |
                  a | b | c | d | e | f | g | h | i | j | k | l | m | n |
                  o | p | q | r | s | t | u | v | w | x | y | z

The above BNF grammar uses the curly brackets ("{", "}") construct in BNF to apply the production rule "\", zero or more times. Running Gramtest with this grammar as input produces strings that contain the world "main" somewhere in them.

Asankhayas-MacBook-Pro:target asankhaya$ java -jar gramtest-0.1-SNAPSHOT-jar-with-dependencies.jar
-file ../src/test/resources/main.bnf

Generating tests ...
AAAmainAAA
AAAmainAAB
AAAmainAAC
AAAmainAAD
AAAmainAAE
AAAmainAAF
AAAmainAAG
AAAmainAAH
AAAmainAAI
AAAmainAAJ
...

In addition to the special curly bracket symbols, Gramtest also supports the square brackets ("[", "]") for specifying an optional production rule. While, the parentheses ("(", ")") are used for repeating the rule one or more times. For details on the syntax support please refer to the BNF ANTLR4 grammar that is included with the sources of Gramtest. Hopefully, by now you are convinced that Gramtest is a useful tool to generate test cases from arbitrary user defined grammars.

We will look at some of the details behind the implementation of the tool in a future article. Meanwhile, do let us know your comments on grammar-based testing and please feel free to contribute to the tool by forking it on Github.

Now, we will examine some practical tips to keep in mind while implementing grammar-based test case generation. These guidelines are based on the experience of implementing Gramtest - a Java tool that allows you to generate test cases based on arbitrary user defined grammars. If you are curious about what is grammar-based test case generation, I suggest you look at our previous article on the topic. Let's jump right in on how we implemented Gramtest.

Implementation

The key aspect of the grammar-based test case generation algorithm in Gramtest is to follow all the production rules of the given BNF grammar and then generate strings that conform to the grammar. The production rules themselves form a tree, the root of the tree is the starting rule for generating all the strings in the grammar. For example, consider the following BNF grammar describing all the course codes at a university:

<coursecode>   ::= <acadunit> <coursenumber>
<acadunit>     ::= <letter> <letter> <letter>
<coursenumber> ::= <year> <semesters> <digit> <digit>
<year>         ::= <ugrad> | <grad>
<ugrad>        ::= 0 | 1 | 2 | 3 | 4
<grad>         ::= 5 | 6 | 7 | 9
<semesters>    ::= <onesemester> | <twosemesters>
<onesemester>  ::= <frenchone> | <englishone> | <bilingual>
<frenchone>    ::= 5 | 7
<englishone>   ::= 1 | 3
<bilingual>    ::= 9
<twosemesters> ::= <frenchtwo> | <englishtwo>
<frenchtwo>    ::= 6 | 8
<englishtwo>   ::= 2 | 4
<digit>        ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<letter>       ::= A | B | C | D | E | F | G | H | I | J | K | L | M | N |
                   O | P | Q | R | S | T | U | V | W | X | Y | Z

In this grammar the rule <coursecode> ::= <acadunit> <coursenumber> is at the root. In order to generate the strings in this grammar, we follow all the rules starting from the root (going from top to bottom) to a terminal. When we reach a terminal, we generate a string corresponding to that terminal. For rules that contain alternatives we need to follow all the alternate branches generating strings in an exhaustive manner. Thus, when we run Gramtest on this input it generates the following strings:

Asankhayas-MacBook-Pro:target asankhaya$
java -jar gramtest-0.1-SNAPSHOT-jar-with-dependencies.jar
-file ../src/test/resources/coursecodes.bnf
Generating tests ...
ZZX0989
ZZW0989
ZZW0988
ZZV0988
ZZV0987
ZZU0987
ZZU0986
ZZT0986
ZZT0985
ZZS0985
ZZS0984
ZZR0984
ZZR0983
ZZQ0983
ZZQ0982
ZZP0982
ZZP0981
...

This simple algorithm based on exhaustive search over the production rules guarantees that we will generate all possible strings in the grammar. However, it may not be feasible to do so all the time. Let us look at some of the challenges with this approach that make it difficult to use it for practical test case generation.

Challenges

In general, a given BNF grammar can contain infinitely many strings due to the recursive nature of the production rules. Recall the following grammar for arithmetic expressions from our previous article:

<expression>  ::=   <term> <addOps> <expression> | <term>
<term>        ::=   <factor> <multOps> <term> | <factor>
<addOps>      ::=   + | -
<multOps>     ::=   * | /
<factor>      ::=   "(" <expression> ")" | <constant>
<constant>    ::=   0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

This grammar captures all possible arithmetic expressions and thus if we blindly follow the rules and generate strings, the test case generation will never finish. It is also possible for a BNF grammar without recursive rules to have an unbounded number of strings if the grammar uses the repetition operator. Due to all these cases we need to find a way to terminate the test-case generation algorithm early, otherwise Gramtest would not be very useful for automated fuzzing and testing.

Practical Tips

We look at three useful ideas that improve on the simple naive exhaustive test case generation and provide a mechanism to address the challenges described in the previous section. All the following three tips are implemented in Gramtest, and if you are curious you can also have a look at the source code.

Tip 1: Restrict the number of tests to be generated

The easiest way to fix the problem is to just restrict the maximum number of test cases that can be generated. In Gramtest, this can be done by using the -num switch. This will ensure that the test case generation algorithm stops after generating the specified number of tests. For example we can generate 10 test cases from the BNF grammar of arithmetic expressions by setting -num 10 as shown below:

Asankhayas-MacBook-Pro:target asankhaya$
java -jar gramtest-0.1-SNAPSHOT-jar-with-dependencies.jar
-file ../src/test/resources/arithexp.bnf -num 10
Generating tests ...
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+0
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+1
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+2
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+3
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+4
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+5
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+6
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+7
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+8
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+9

Tip 2: Bound the depth of recursive rules

The first tip, though useful, will unfortunately not work for a grammar with recursive rules. While generating the test cases for a recursive rule, we can end up applying the rule again and again (due to recursion) and thus it is possible that the algorithm will not terminate even while generating a single string. To handle such cases we propose bounding the depth of the recursive rule. In Gramtest it can be done by setting the -dep parameter as shown below:

Asankhayas-MacBook-Pro:target asankhaya$
java -jar gramtest-0.1-SNAPSHOT-jar-with-dependencies.jar
-file ../src/test/resources/arithexp.bnf -num 10 -dep 1
Generating tests ...
(0)*0*0*0+0*0*0+0*0+0
(0)*0*0*0+0*0*0+0*0+1
(0)*0*0*0+0*0*0+0*0+2
(0)*0*0*0+0*0*0+0*0+3
(0)*0*0*0+0*0*0+0*0+4
(0)*0*0*0+0*0*0+0*0+5
(0)*0*0*0+0*0*0+0*0+6
(0)*0*0*0+0*0*0+0*0+7
(0)*0*0*0+0*0*0+0*0+8
(0)*0*0*0+0*0*0+0*0+9

By setting -dep 1 above, we ensure that when Gramtest sees a recursive rule it will apply the rule only once (follow the rule only once). Typically, we use this parameter in conjunction with restriction on the maximum number of test cases to ensure that the algorithm terminates. The -dep parameter also implicitly controls the length of the generated strings. If we compare the output above with the one under the previous tip where the default value of -dep (2) was used, it is clear that the length of the strings generated in this case are smaller.

Tip 3: Use a minimal sentence generator

If you have a careful look at the strings that are generated above, you will notice that they all exercise only one part of the grammar and they are very similar to each other. For good test case generation we want the generated tests to be more diverse so that they can exercise different paths in the program that is being tested. The quality of the test cases is usually measured using coverage criteria like statement coverage (percentage of statements in the program that are executed by the tests), branch coverage (percentage of conditional branches that are executed by the tests) etc. For grammar-based test case generation, a useful metric is the production coverage. Production coverage refers to the percentage of production rules in the grammar that are exercised by the test cases.

For achieving production coverage, we can also use a minimal sentence generator. A minimal sentence generator creates a string with the minimum length that is required for the given production rule. Paul Purdom presented a minimal sentence generator in his classical paper on testing parsers. Although the paper presents the parsers for simple LR(1) grammars, the same ideas can be extended and applied to other grammars. In my paper on Building Extensible Parsers using Camlp4 I describe one such variation of Purdom's algorithm that can be used to test the extensible grammars supported by Camlp4. Gramtest uses a similar variation for generating minimal sentences for BNF grammars.

The minimal sentence generator can be set using the -mingen flag as follows:

Asankhayas-MacBook-Pro:target asankhaya$
java -jar gramtest-0.1-SNAPSHOT-jar-with-dependencies.jar
-file ../src/test/resources/arithexp.bnf
-num 10 -dep 2 -mingen true
Generating tests ...
(2*0+9)*(1)+(0)/9
(2*0+9)*(1)+(0)*0
(3+0)-(9)*0
(3+0)-4*1
(3+0)+4*1
(3+0)+4/2
4*(3)+4/2
(3-2)*(2)+(2)*3
(3-1)*(3)-(2)*3
(3+1)/(2)-(1)

Looking at the output we see that the generated tests are much more diverse and cover different alternatives in the grammar using smaller sentences.

By using all the three tips we get a tool that is more useful and has practical applications. The default value of the options used in Gramtest are -num 100 -dep 2 -mingen true, but please go ahead and have a look at the source code or play around with the other options. For a given BNF grammar you may get better results with a different set of options. If you have any further tips based on your experience or have any other suggestions on improving Gramtest, do let us know in the comments.

Continuous fuzzing of Java projects with GramTest

Next we will see how you can use GramTest to generate continuous tests that can in-turn be used to fuzz Java libraries and applications.

<url> ::=   <httpaddress> | <ftpaddress> | <newsaddress> | <nntpaddress> |
      <prosperoaddress> | <telnetaddress> | <gopheraddress> | <waisaddress> | 
      <mailtoaddress>

<httpaddress> ::= h t t p : / / <hostport> [/ <path>] [? <search>]

<ftpaddress> ::=  f t p : / / <login> / <path> [; <ftptype>]

<newsaddress> ::= n e w s : <groupart>

<nntpaddress> ::=   n n t p : <group> / <digits>

<telnetaddress> ::=  t e l n e t : / / <login>

<gopheraddress> ::=  g o p h e r : / / <hostport> [/ <gtype> [<gcommand>]]

<mailtoaddress> ::=  m a i l t o : <xalphas> @ <hostname>

<waisaddress> ::= <waisindex> | <waisdoc>

<waisindex> ::=   w a i s : / / <hostport> / <database> [? <search>]

<waisdoc> ::=   w a i s : / / <hostport> / <database> / <wtype> / <wpath>

<wpath>   ::=   <digits> = <path> ; [<wpath>]

As an example we will use the grammar for URLs as defined in rfc1738. Part of the grammar is shown above and as you can see, it if fairly complex. If you directly run GramTest from command line using this grammar as input you will get some interesting test cases:

Generating tests ...
mailto:5ol@S*7
prospero://E:4/%Dd
http://c7
nntp:N.p/00
telnet://
news:O2
ftp://+n@aF5:21/
wais://B.U/*u/,4_/82=;
gopher://T53

This is good for test case generation but not ideal if you want to run a long fuzzing session with some library or application. For doing continuous fuzzing you can use GramTest as a library very easily. The TestRunner class provided in GramTest makes it easy to integrate with any application for fuzzing. You can even implement it as part of a test case:

/**
 * Test with url grammar
 * @throws java.io.IOException
 */
@Test
@Ignore("Non terminating test case")
public void testQueueGenerator() throws IOException, InterruptedException {
  final BlockingQueue<String> queue = new SynchronousQueue<>();
  TestRunner continuousRunner = new TestRunner(getClass().getResourceAsStream("/url.bnf"), queue, 10, 8, 32);
  new Thread(continuousRunner).start();
  consumeTests(queue);
}

private void consumeTests(BlockingQueue<String> queue) throws InterruptedException {
  while (true) {
    String testCase = queue.take();
    try {
      URL.parse(testCase);
    } catch (URLParseException e) {
      System.out.println(testCase);
    }
  }
}

Just pass the BNF grammar file as input and a BlockingQueue to read the generated tests. The queue just makes it easy to add multiple consumers that can each run in their own thread in parallel. This will allow you to run long fuzzing sessions against a target Java library or application. In fact with this exact set up and the given URL input grammar, GramTest found a bug in the Apache Commons URL validator.

If you use GramTest and find new bugs using it, please let us know. Until next time, happy fuzzing!

GramTest: a tool for grammar-based test case generation

GramTest: a tool for grammar-based test case generation

Context-free grammar

Gramtest

Implementation

Challenges

Practical Tips

Tip 1: Restrict the number of tests to be generated

Tip 2: Bound the depth of recursive rules

Tip 3: Use a minimal sentence generator

Continuous fuzzing of Java projects with GramTest

results matching ""

No results matching ""