The majority of the grammar, that you will write, will be used to specify the syntactic structure of the language. When an input string is parsed (such as the user's program), it is stored into a tree structure that follows the syntactic structure of the language. There are several ways to specifying the structure of a grammar and different parsing systems use different notations and formats.
Regardless of the parser, practically all use a variation of a notation known as Backus-Naur Form (or BNF for short). GOLD uses BNF to describe the syntax of the grammar and attempts to stay close to the original notation.
Backus-Naur Form consists of two different types of symbols: terminals and nonterminals. Terminals represent that pieces of text that makes a valid input string (such as a user's program). When an input string is parsed, terminals will be the "leaves" of the parse tree. Nonterminals represent other syntactic structures defined in the grammar. These will be the nodes in the parse tree.
Terminals are left without special formatting or are delimited by single quotes. Examples include: if, while, '=' and identifier. Typically, nonterminals are delimited by angle-brackets. Examples include <statement> and <exp>. Both terminals and nonterminals are referred to generically as "symbols".
The syntax of the grammar is defined using a series of "productions". These consist of a single nonterminal called the "head". The head is defined to consist of multiple symbols making up the production's "handle".
For instance, the following production defines a generic If-Statement syntax. The symbols 'if', 'then', and 'end' are terminals. The symbols <Expression> and <Statements> are nonterminals. The first nonterminal, before the ::= symbol, is the "head". All the terminals/nonterminals after the ::= is called a "handle". The "head" and "handle", together, are called a "production".
<Statement> | ::= | if <Expression> then <Statements> end |
The production above basically defines that a <Statement> as an 'if' terminal, an <Expression> defined elsewhere, a 'then' terminal, <Statements> defined elsewhere, and, finally, an 'end' terminal.
If you are declaring a series of productions that derive the same nonterminal (they have the same head), you can use pipe characters '|' to create a series of different handles. The pipe symbol basically means "or". So, in the example below, <Statement> is defined using three different handles. It can be equivalent to any of these.
<Statement> ::=
if <Expression> then <Statements> end | while <Expression> do <Statements> end | for Id = <Range> do <Statements> end |
Internally, the notation above will create three different productions:
<Statement> | ::= | if <Expression> then <Statements> end |
<Statement> | ::= | while <Expression> do <Statements> end |
<Statement> | ::= | for Id = <Range> do <Statements> end |
However, to prevent typos, GOLD only permits the single definition. In GOLD terminology, a series of related productions, with the same head, is called a "Rule".
A rule can also be declared as "nullable". This basically means that the nonterminal, that the rule represents, can contain zero terminals. For all intents and purposes, the nonterminal an be seen as optional. In grammars, nullable rules are often used for optional clauses (on statements) or creating lists that contain zero or more items.
A null production (and hence a nullable rule), is declared by simply creating a production that contains no symbols.
<Optional
Keyword> ::= Keyword | |
The second handle in the definition contains no symbols. The rule is therefore nullable. GOLD version 5 also permits an alternative notation. The text <>, used by itself, will declare a null production.
<Optional
Keyword> ::= Keyword | <> |
The following defines a rule called <Value> which can contain either an Identifier terminal or the contents of another rule called <Literal>
<Value> ::= Identifier
| <Literal> <Literal> ::= Number | String |
The <Literal> can contain either a Number or String terminal. As a result of this definition, a <Value> can contain an Identifier, Number or String .
The three clauses in the C-style For Statement are optional. The <Opt Exp> rule defines an optional expression.
<For Statement> ::=
for '(' <Opt Exp> ';'
<Opt Exp> ';'
<Opt Exp> ')' <Statements> <Opt Exp> ::= <Expression> | |
The following two rules define a comma delimited list of Identifiers.
<List> ::= <List>
',' Identifier | Identifier |
Operator precedence is an important aspect of most programming languages. The following rules define the common arithmetic operators.
<Expression> ::=
<Expression> '+' <Mult Exp> | <Expression> '-' <Mult Exp> | <Mult Exp> <Mult Exp> ::= <Mult Exp> '*' <Negate
Exp> <Negate Exp> ::= '-' <Value> <Value> ::= ID |