Glossary

Backus-Naur Form Backus-Naur Form is a notation used to describe the syntax of programming languages. In particular is it used to define productions.
Configuration In parsing terms, a configuration is a production in the process of being completed. Configurations play a major role in the construction of LR parse tables.
Deterministic Finite
Automata
A Deterministic Finite Automaton is often used to analyze a series of characters.  Often it is implemented using state driven transition graph. Please see theinline-arrow-r.gif (99 bytes)Deterministic Finite Automata page for more information.
Grammar Please see theinline-arrow-r.gif (99 bytes)Grammar and Backus-Naur Form page
LALR Parsing LALR Parsing, or "Lookahead LR parsing", is a variant of LR Parsing that combines different "configurations" to limit the size of the parse tables. As a result, the algorithm is slightly less powerful than the LR Parsing. Grammars that can be parsed by a LR parser, might be found to be "ambiguous" by LALR. However, this is very rarely the case and real-world examples are few.

The number of states eliminated by LALR are sometimes huge. The C programming language, for instance, has over 10,000 LR states. LALR drops this number to slightly more than 200.

LR Parsing

LR Parsing, or Left-to-right Right-derivative parsing, uses tables to determine when a rule is complete and when additional tokens are needed to be read from the source.The LR Parser does very little "thinking" at runtime. All decisions are based on the content of the parse tables. The construction of these tables where all the "thinking" takes place. LR parser generators, such as YACC and GOLD, construct these tables by analyzing the grammar and determining all the possible "states" the system can be in when parsing.

Each state represents a point in the parse process where a number of tokens have been read from the source and rules are in different states of completion. Each production in a state of completion is called a "configuration" and each state is really a configuration set.

LR parse tables can be huge, and, as a result, often a variant of LR Parsing is used. For more information, please see theinline-arrow-r.gif (99 bytes)LALR Algorithm page.

Nonterminal A nonterminal is a symbol used in Backus-Naur form represent a syntactic structures defined in the grammar.

Please see theinline-arrow-r.gif (99 bytes)Define Rules page
Nullable Rule

A nullable rule is a type of rule which is optional, or in other words, can contain no symbols. In the GOLD Meta-Language, a nullable rule can be specified by adding a blank entry:

<Optional ID> ::= Identifier
               |
Parser A parser is software, such as a procedure or library, that organizes text into a set of logical units used by a programming language.
Parser Generator A parser generator is a generic term that refers to any software program that helps develop a working parser. Examples include the GOLD, YACC and ANTLR.
Production Please see theinline-arrow-r.gif (99 bytes)Define Rules page.
Reduce-Reduce
Conflict
A Reduce-Reduce Conflict is a caused when a grammar allows two or more different rules to be reduced at the same time, for the same token. When this happens, the grammar becomes ambiguous since a program can be interpreted more than one way.

For instance, assume you have the following grammar:

<S> ::= <A> | <B>

<A> ::= Identifier
<B> ::= Identifier

When the system reads an identifier, it cannot determine if it has completed <A> or<B>.

When a LALR parser generator is analyzing a grammar and constructing the parse tables, these conflicts are located immediately.

Regular Expression A Regular Expression is a notation used describe patterns of characters. In programming language theory, they are often used to describe a languages' terminals. For more information, please see theinline-arrow-r.gif (99 bytes)Define Terminals page
Rule Please see theinline-arrow-r.gif (99 bytes)Define Rules page.
Semantics The semantics of a programming language refers to how the actual statements, constructs, etc... are interpreted. This is quite different from the syntax of a programming language which refers to how different symbols and reserved words are arranged.

For instance, in both Visual Basic and C++, the following is a valid expression:

a & b

Even though the syntax is the same between the two languages, the semantics are quite different. In C++, this expression is interpreted as a binary-and of "a" and "b". In Visual Basic, however, this expression returns the concatenation of two strings.

Symbol In parsing terms, a symbol is the building block of a grammar and can be either a terminal or nonterminal. Essentially, the term "symbol" is used to refer to either of its two forms.
Syntax The term "syntax" refers to the structure of a programming language, in particular, the different series of symbols and words that make up the basic parts of the language. This is quite different from the programming language's semantics - the actual meaning of those different parts.
Shift A "shift" is an action performed by a parsing engine when it reads a token that is valid in the current state. Lookahead parsers maintain a list of terminals which are expected to be read from each state, given that the syntax of the program is correct. If the token is not the expected, a syntax error occurs.

In the bottom-up parser engines, such as GOLD and YACC, a shift pushes the token onto an internal stack that is used to hold tokens that are being used to construct completed productions.

Shift-Reduce
Conflict
The Shift-Reduce Conflict is the most common type of conflict found in grammars. It is caused when the grammar allows a production to be reduced  for particular token, but, at the same time, allowing another rule to be shifted for that same token.

As a result, the grammar is ambiguous since a program can be interpreted more than one way.

This error is often caused by recursive grammar definitions where the system cannot determine when one rule is complete and another is just started. The Builder documentation contains an inline-arrow-r.gif (99 bytes)example of the common if-then-else grammar problem and how to fit it.

Terminal In Backus-Naur Form, a terminal is used to denote a programming language's reserved words and symbols. Please see thePlease see theinline-arrow-r.gif (99 bytes)Define Rules page.