Example: Line-Based Grammar

Overview

The following grammars implement line-based programming languages. This type of grammar does not ignore the end of a line, but, instead, uses it as an essential part of the language. Real world examples include Visual Basic and many scripting languages.

To accomplish this, the grammar must be able to recognize the Newline as a terminal rather than simply considering it whitespace. The characters used to represent a Newline differ slightly between computer platforms. The Windows operating system uses the combination of a Carriage Return followed by a Line Feed; UNIX, on the other hand, merely uses the Carriage Return. The definition of a Newline terminal must take this into account.

In the grammars below, a Newline terminal is declared with the two possible permutations of the Carriage Return and Line Feed. It may also be advisable to make a solitary Line Feed recognized as a Newline terminal (for fault tolerance).

The Whitespace Terminal must also be declared such that it does not accept the Newline as whitespace. Below, Whitespace is declared as a series of the normal whitespace characters without the Carriage Return and Line Feed.

Solution #1 - NewLine Terminal

In this example, a NewLine is added to the end of each statement. So the grammar can allow blank lines, the statement rule also contains a simple NewLine.

"Start Symbol" = <Program>

{WS} = {Whitespace} - {CR} - {LF}

Whitespace = {WS}+
NewLine = {CR}{LF}|{CR}

<Program> ::= <Statements>

<Statement> ::= If Identifier Then NewLine <Statements> End NewLine
              | Print '(' Identifier ')' NewLine
              | Read '(' Identifier ')' NewLine
              | NewLine   !Allow blank lines

<Statements> ::= <Statement> <Statements>
               |

<Exp> ::= Identifier '=' Identifier
        | Identifier '<>' Identifier

Solution #2 - Using a NewLine rule and terminal

Although this solution above works for simple line-based grammars, it will not work well for more complex variants.

1 Select Case Value
2
3    Case 1, -1
4       Name = "True"
5
6    Case 0
7       Name = "False"
8    Case Else
9       Name = "Error
10 End Select
For grammars where the constructs can be quite complex, such as case-statements, this solution becomes difficult to write. For instance, assume you have the following Visual Basic Select-Case statement

Line #2 is a blank line and, as a result, must be specified in the grammar. The developer could manually declare each section where optional newlines are permitted, but this approach is very tedious and mistakes are easy to make.

A better solution is to use a rule that accepts NewLines rather than using the NewLine terminal at the end of each statement.

The following solution replaces each NewLine with a new rule called <nl> - for NewLines. The <nl> rule is designed to accept one or more NewLine tokens. This solution makes it far easier to write complex line-based grammars. Each line is now logically followed by one or more NewLines rather than just a one. The rule that accepted a blank line as a statement is no longer needed.

However, since NewLine characters are only acceptable following a statement, any blank lines before the start of the program must be removed. In the grammar below, the <nl opt> rule removes any NewLines before the start of the first actual line.

"Start Symbol" = <Program>

{WS} = {Whitespace} - {CR} - {LF}

Whitespace = {WS}+
NewLine = {CR}{LF}|{CR}

<nl>     ::= NewLine <nl>          !One or more
           | NewLine

<nl Opt> ::= NewLine <nl Opt>      !Zero or more
           |

! <nl opt> removes blank lines before first statement

<Program> ::= <nl Opt> <Statements> 

<Statement> ::= If Identifier Then <nl> <Statements> End <nl>
              | Print '(' Identifier ')' <nl>
              | Read '(' Identifier ')' <nl>

<Statements> ::= <Statement> <Statements>
               |