Overview of YACC
One of the oldest and most respected parsing engine generators available to developers
is YACC. Like "vi" "grep" and "awk", this software is
considered the de facto standard in the UNIX world.
When developing a parser using YACC, the grammar description is added directly to the C
source code. Special notation is used to define the data type used to store reductions as
well as where YACC code is to link to the original C source. For instance, using a
"$1" will refer to the first token of a reduced rule.
Naturally this source code, containing both YACC grammar information and normal C,
cannot be compiled directly. Instead, YACC is used to analyze this source code and create
a new C program with the compiled parser tables and engine
"hard-wired" into it. After this process is complete, the new C program can be
compiled and executed. This type of approach is known as a "compiler-compiler".
For developers using C or C++ on the UNIX platform, YACC is an ideal tool. However, the
approach YACC uses has several drawbacks that limit it for developing modern interpreters
and compilers. The most notable drawback is the limitation on the programming language
used for development. There are several versions of YACC for different programming
languages, but in each case the grammar definition is not directly portable. Another major
drawback lies in the overall approach of compiler-compilers.
Comparison Between GOLD & YACC
Essentially, GOLD differs for the classic YACC parser in four important ways.
|
Rules are specified by Backus-Naur Form (non-enhanced at this time).
Terminals are represented through Regular Expressions
Character sets are represented through set notation (more or less). |
|
One of the key differences between YACC and GOLD is how the grammar is used in
conjunction with the developer's source code. Unlike YACC, which combines C source
code and grammar constructs, GOLD operationally separates the parse engine and the code
that derives the table information.
Instead of integrating the grammar description into C source code, GOLD reads the
grammar from a separate file and then generates the tables. Afterwards, the derived tables
can be saved to a separate binary file which can be used at a later time.
This allows developers to build a grammar on one platform, and then use this binary
file to develop the interpreter/compiler on another. For instance, the developer can use
GOLD Builder to analyze a grammar on a x86 machine and then use that binary file on a
UNIX, Mac, Linux, etc... Since the DFA and LALR algorithms are simple, creating a parsing
engine on other platforms can be accomplished with a minimum of coding. |
|
When a grammar is compiled and saved to a file by the GOLD Builder, the data exists
independent of any particular programming language. As a result, the parsing engine
that loads this file can be implemented in such languages as C, C++, Java, C#, Visual
Basic, Eiffel, etc...
There is an ActiveX DLL provided with GOLD Builder that reads and parses the
information stored in a Compiled Grammar Table file, but you create your own parser engine |
|
YACC provides a mechanism for providing operator precedence. The developer can specify
the order of operator precedence and whether each operator associates left-to-right or
right-to-left.
The GOLD Builder does not provide such a mechanism for an important reason. Operator
precedence actually consists of a series of rules. In the case with YACC, the extra rules
needed to implement the proper logic are created "behind the scene". This makes
sense for YACC since the additional rules can be hidden from the programmer and the
special logic needed for the parser engine is already implemented.
However, if these "hidden" rules were saved to the Compiled Grammar Table
file, reductions would take place in the parser engine that the grammar designer would not
expect (nor plan for). Essentially, it would give ambiguity to the parsing process and
defeat the purpose of a generalized parser generator such as GOLD.
Fortunately, implementing operator precedence is easy, though often tedious. |
|