Deterministic Finite Automata (DFA)

A Deterministic Finite Automata (DFA) is a type of graph used to recognize patterns. Normally when you think of a "graph" you imagine a plotted mathematical function or graphical representation of data. In computer science terms, a "graph" is simply a collection of linked nodes. In the case of a DFA, the name itself accurately describes its properties:

It is deterministic meaning that from any given node there is one and only one path for any given input. In other words, there is no ambiguity in traversing nodes.
The graph is finite; meaning there are a fixed and known number of nodes (called "states") and branches.
And, finally, the graph is an automaton. The transition from state to state is completely determined by the input. The algorithm merely follows the correct branch based on the input.

Most parser engines, including this one, use a DFA to implement the tokenizer. This part of the engine scans the input and determines when and if a series of characters can be recognized as a token.

Simple Deterministic Finite Automata that recognizes common identifiers and numbers. The figure to the right is a simple Deterministic Finite Automata that recognizes common identifiers and numbers. For instance, assume that the input contains the text "gold". From State 1 (the initial state), the DFA moves to State 2 when the "g" is read. For the next three characters, "o", "l" and "d", the DFA continues to loop to State 2.

By design, the tokenizer attempts to match the longest series of characters possible before accepting a token. For example: if the tokenizer is reading the characters "count" from the source, it can match the first character "c" as an identifier. It would not be prudent for the tokenizer to report five separate identifiers: "c", "o", "u", "n" and "t".

Each time a token is identified, it is passed to the LALR parse engine and the tokenizer restarts at the initial state.

For more information, please refer to the following:

Modern Compiler Implementation, Andrew W. Appel, Cambridge University Press, 40 West 20th Street, New York City, New York 10011-4211
Crafting a Compiler, Charles N. Fischer & Richard J. LeBlanc Jr., The Benjamin/Cummings Publishing Company Inc., 2727 Sand Hill Road, Menlo Park, California 94025
... or just search for "Deterministic Finite Automata" on the Internet.