|
Backus-Naur Form is a notation used to describe the syntax of
programming languages. For more information, please see theGrammar and Backus-Naur Form page |
|
Big Endian Byte Ordering is a binary storage format where the
most significant byte is stored in lower memory. This is the format used by the Motorola
family of processors. For more information, please see theByte Ordering page. |
|
In parsing terms, a configuration is a rule production in the
process of being completed. Configurations play a major role in the construction of LR parse tables. |
|
Please see theGrammar and Backus-Naur
Form page |
|
A Deterministic Finite Automaton is often used to analyze a
series of characters. Often it is implemented using state driven transition graph.
Please see theDeterministic Finite Automata page for more information. |
|
Please see theGrammar and Backus-Naur
Form page |
|
LALR Parsing, or "Lookahead LR
parsing", is a variant of LR Parsing that
combines different "configurations" to limit the size of the parse tables. As a
result, the algorithm is slightly less powerful than the LR Parsing. Grammars that can be
parsed by a LR parser, might be found to be "ambiguous" by LALR. However, this
is very rarely the case and real-world examples are few. The number of states
eliminated by LALR are sometimes huge. The C programming language, for instance, has over
10,000 LR states. LALR drops this number to slightly more than 200.
For more information, please see theLALR Algorithm page. |
|
Little Endian Byte Ordering is a binary storage format where
the least significant byte is stored in lower memory. This is the format used by the Intel
family of processors. For more information, please see theLittle Endian Byte Ordering page. |
|
LL Parsing, or Left-to-right Left-derivative
parsing, refers to a class of parsers that analyze text using a top-down algorithm. LL
Parsers have the advantage of being very simple in the design - at least conceptually.
Unlike LR Parsers, the system does not need to generate tables ahead of time. Instead,
only a set of rules and lookahead data is needed.
However, LL Parsers are not as powerful as LR parsers.
Rules cannot have left-recursion since it causes problems with the recursive decent
algorithm. In addition, LL Parsers are known as memory intensive and slow. Both of these
are due to the "search" nature of the algorithm. Optimization is possible, but
the complexity of the LL Parser runtime engine increases dramatically. |
|
LR Parsing, or Left-to-right Right-derivative
parsing, uses tables to determine when a rule is complete and when additional tokens are
needed to be read from the source. Unlike LL Parsing,
the LR Parser does very little "thinking" at runtime. All decisions are based on
the content of the parse tables. As a result, LR Parsing is faster and more memory
efficient than LL Parsing. The construction of these tables where all the
"thinking" takes place. LR parser generators, such as YACC and GOLD, construct
these tables by analyzing the grammar and determining all the possible "states"
the system can be in when parsing.
Each state represents a point in the parse process where a number of tokens have been
read from the source and rules are in different states of completion. Each production in a
state of completion is called a "configuration" and each state is really a
configuration set.
LR parse tables can be huge, and, as a result, often a variant of LR Parsing is used.
For more information, please see theLALR Algorithm page. |
|
Please see theGrammar and Backus-Naur
Form page |
|
A nullable rule is a type of rule
which is optional, or in other words, can contain no symbols. In the GOLD Meta-Language, a
nullable rule can be specified by adding a blank entry:
<Optional
ID> ::= Identifier
| |
|
|
A parser is software, such as a procedure or library, that
organizes text into a set of logical units used by a programming language. Please see theWhat is a Parser page for more information. |
|
A parser generator is a generic term that refers to any
software program that helps develop a working parser. Examples include the GOLD, YACC and
ANTLR. |
|
Please see theGrammar and Backus-Naur
Form page |
|
A Reduce-Reduce Conflict is a caused when a grammar allows two
or more different rules to be reduced at the same time,
for the same token. When this happens, the grammar becomes ambiguous since a program can
be interpreted more than one way. For instance, assume you have the following grammar:
<S>
::= <A> | <B>
<A> ::= Identifier
<B> ::= Identifier |
When the system reads an identifier, it cannot determine if it has completed <A> or<B> .
When a LALR parser generator is analyzing a
grammar and constructing the parse tables, these conflicts are located immediately. |
|
A Regular Expression is a notation used describe patterns of
characters. In programming language theory, they are often used to describe a languages'
terminals. For more information, please see theRegular
Expression page |
|
Please see theGrammar and Backus-Naur
Form page |
|
The semantics of a programming language refers to how the
actual statements, constructs, etc... are interpreted. This is quite different from the syntax of a programming language which refers to how
different symbols and reserved words are arranged. For instance, in both Visual Basic
and C++, the following is a valid expression:
Even though the syntax is the same between the two languages, the semantics are quite
different. In C++, this expression is interpreted as a binary-and of "a" and
"b". In Visual Basic, however, this expression returns the concatenation of two
strings. |
|
In parsing terms, a symbol is the building block of a grammar
and can be either a terminal or nonterminal. Essentially, the term "symbol"
is used to refer to either of its two forms. |
|
The term "syntax" refers to the structure of a
programming language, in particular, the different series of symbols and words that make
up the basic parts of the language. This is quite different from the programming
language's semantics - the actual meaning of those
different parts. |
|
A "shift" is an action performed by a parsing engine
when it reads a token that is valid in the current state. Lookahead parsers maintain a list of terminals which are expected to be read from each state,
given that the syntax of the program is correct. If the token is not the expected, a
syntax error occurs. In the bottom-up parser engines, such as GOLD and YACC, a shift
pushes the token onto an internal stack that is used to hold tokens that are being used to
construct completed rules. |
|
The Shift-Reduce Conflict is the most common type of conflict
found in grammars. It is caused when the grammar allows a rule
to be reduced for particular token, but, at the
same time, allowing another rule to be shifted for that
same token. As a result, the grammar is ambiguous since a program can be interpreted
more than one way.
This error is often caused by recursive grammar definitions where the system cannot
determine when one rule is complete and another is just started. The Builder documentation
contains an example of
the common if-then-else grammar problem and how to fit it. |
|
In Backus-Naur Form, a terminal is used to denote a programming
language's reserved words and symbols. Please see theGrammar and Backus-Naur Form page |
|
Please see theUnicode
page |