The Oracle (tm) Users' Co-Operative FAQ

What is a Parse Tree?


Author's name: Carel-Jan Engel

Author's Email: cjpengel.dbalert@xs4all.nl

Date written: Mar 24, 2005

Oracle version(s): N/A

In documentation about tuning SQL, I see references to parse trees. What is a parse tree ?

Back to index of questions


A parse-tree is an internal structure, created by the compiler or interpreter while parsing some language construction. Parsing is also known as 'syntax analysis'.

An example (slightly adapted version of the example found at page 6 of the famous 'Dragon Book', Compilers: principles, techniques and tools, by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, Published by Addison Wesley. My copy is from 1986) will illustrate a parse tree. Rather than dealing with the complexities of a SQL statement, let's take a rather simple language construction: The assignment of the result of an expression to a variable:

result := salary + bonus * 1.10

When the compiler analyzes this statement the resulting parse-tree will look like this :

                   assignment

           ________ statement ____

          /             |         \

         /             :=          \

    identifier              ___ expression _______

        |                  /         |            \

      result              /          +             \

                     expression              __ expression ___

                         |                  /       |         \

                     identifier            /        *          \

                         |            expression            expression

                       salary             |                     |

                                      identifier              number

                                          |                     |

                                        bonus                  1.10

The picture is an upside-down representation of a tree. The language elements in this small simple assignment are: identifiers (result, salary, bonus), operators (:=, +, *), and a number (1.10). 'Identifier' is the language element that names a variable, function or procedure. 'Operator' is the language element that represents some action to be taken, upon the operands at either end of the operator. Number is a constant, 1.10 in this statement. The syntax rules (' grammar') will specify which 'sentences' are valid.

After successfully decomposing the statement into its internal representation, the compiler or interpreter can 'walk the tree' to create the executable code for the construction. An interpreter will not generate code for the execution, but will invoke built-in executing functions by itself. Let's take the interpreter for the rest of the explanation, execution of the steps is easier to explain than the code-generation of a compiler. For the example I assume the bonus to be 100, and the salary to be 1000. The tree-walk will start at the root of the tree, the assignment statement. The rule for the assignment will tell the interpreter that the right hand has to be evaluated first. This evaluation is also known as 'reduction'. The right hand side of the assignment needs to be reduced to a value, the result of the expression, before it can be assigned to the variable at the left hand side of the statement.

The first node at the right-hand side of the statement contains an expression with a '+' operator. The right hand side of the '+' operator needs to be assigned to the left hand side. So the walk goes on to the next node at the right hand side. There the interpreter will detect the expression with the '*' operator. The left hand side of this operator needs to be multiplied with the right hand side. The interpreter goes on to the right hand side, and detects an expression that consists of a single number: 1.10. This side is fully reduced, the result can be stored and the interpreter walks the tree back up to the '*' operator, and starts evaluating its left hand side. This is an expression that consists of one single identifier, representing a variable, 'bonus'., The memory location represented by this variable is read and it's contents (100) will be multiplied by the right hand side result, 1.10. This expression has been fully reduced to the result 110 now. The interpreter walks up, to the '+' operator, and starts evaluating its left hand side. There it will again detect an identifier, 'salary'. Its location is read (1000) and the expression is reduced to a number, 1000. The right and left hand side will be added, resulting in 1,110. Now the expression at the right hand side of the assignment is fully reduced, and the interpreter walks up the tree, finds the assignment operator ':='. This instructs the interpreter to copy the result of the expression to the left hand side. The left hand side contains an identifier, 'result'. The memory location represented by 'result' will be filled with the result of the expression, 1,110.

It is just a simplified explanation of how an interpreter or compiler uses a parse tree. It's out of scope of this answer to create a complete introduction to compiler building practices. However, it might be clear that creating a parse-tree consumes some resources. Before the language elements can be recognized they must be read character by character, type checking and possible conversion needs to be done, identifiers (tables, columns etc.) need to be identified and checked in the data dictionary, and so on. After this 'hard parse' the parse tree is composed, and is a far cheaper form to use to execute a statement than doing all this analysis over and over again. Therefore, storing the parse-tree in the SQL-area for future use can save quite some time during the processing of SQL-statements that have come across before.


Further reading: If you are interested in compiler building techniques, consider reading The Dragon Book. It can be found at: http://www.amazon.co.uk/exec/obidos/ASIN/0201101947/026-7499645-2696457


Back to top