Chararacter Set Table

Overview

The Character Set Table is used by the DFA State Table to store the valid characters for each edge in the DFA state machine. The characters in each set are represented by a unsigned 16-bit integer. This allows each set to store any character in the Basic Multilingual Plane of the Unicode Character Set. Each set is sorted.

The following tags were added in Version 2.5 of the Builder.

Structure

##CHAR-SET-TABLE
...
[ ##CHAR-SETS
...
[ ##CHARS
...
##END-CHARS ]
...
##END-CHAR-SETS ]
...
##END-CHAR-SET-TABLE

Options

The following tags can be used to enhance the text generated by the skeleton program.

[ ##DELIMITER Text ]
[ ##RANGE-CHARS Text ]

 

Name Description
##DELIMITER This tag is used to specify the characters used to display lists. This variable is used in the construction of rule lists, symbol lists, and chararacter sets.
##RANGE-CHARS This tag is used to specify characters that can used to denote a range of characters. For instance, the set '1,2,3,4,10' can be represented with '1..4,10'. This tag is used primarily with the %Chars.RangeList% variable in the ##CHAR-TABLE block.

Variables

CHAR-SET-TABLE Block

Name Description
%Count% The Count variable contains the number of character sets in in the grammar's Character Set Table. This variable is useful for array declaration or setting the global variables before storing each set.

CHAR-SETS Block

Name Description
%CharCount% This variable contains the number of characters in the set. It can be used to declare arrays before storing characters using the CHARS block.
%Chars.List% This variable contains the numeric values of the character set. The values are delimited by the character specified using the ##Delimiter tag. This value can be used as an alternative to using the CHARS block.
%Chars.RangeList% This variable contains the numeric values of the character set using set ranges. Use the ##RANGE-CHARS tag to specify which character that will be used to create a range of characters.
%Chars.XML% The value of this variable contains the characters encoding using XML. Essentially, any character that is does not have value between 32 and 126, will be representing using the &#xxxx; format.
%Delimiter% This variable is used to create lists where a delimiter is used between each time. The value of this variable is set with the ##Delimiter option. For the last item in the list, the value of the Delimiter variable will be set to a number of spaces.
%Index% This variable contains the index of the symbol in the table.

CHARS Block

Name Description
%Delimiter% This variable is used to create lists where a delimiter is used between each time. The value of this variable is set with the ##Delimiter option. For the last item in the list, the value of the Delimiter variable will be set to a number of spaces.
%UnicodeIndex% Each character contains a unique character code - called a "code point" in the Unicode Character Set. These are the same values stored in the Compiled Grammar Table file.

Example

The following example displays the content of the Character Set Table using formatted text.

##CHAR-SET-TABLE
Table Count: %Count%
##CHAR-SETS
##RANGE-CHARS '..'
##DELIMITER ','
##CHAR-SETS
   Set %Index%
      Character Count   : %CharCount%
      Characters (XML)  : %Chars.XML%
      Characters (List) : %Chars.List%
      Characters (Range): %Chars.RangeList%
      Individual Characters:
##CHARS
         %UnicodeIndex%
##END-CHARS
##END-CHAR-SETS
##END-CHAR-SET-TABLE

 

If the "Simple" example grammar is used, the program template will create the following text for Character Set #10. The sets before and after #10 were excluded for brevity.

Table Count: 49
   .
   .
   .
   Set 10
      Character Count   : 10
      Characters (XML)  : 0123456789
      Characters (List) : 48, 49, 50, 51, 52, 53, 54, 55, 56, 57
      Characters (Range): 48..57
      Individual Characters:
          48
          49
          50
          51
          52
          53
          54
          55
          56
          57