The Character Set Table is used by the DFA State Table to store the valid characters for each edge in the DFA state machine. The characters in each set are represented by a unsigned 16-bit integer. This allows each set to store any character in the Basic Multilingual Plane of the Unicode Character Set. Each set is sorted.
The following tags were added in Version 2.5 of the Builder.
##CHAR-SET-TABLE | |||
... | |||
[ | ##CHAR-SETS | ||
... | |||
[ | ##CHARS | ||
... | |||
##END-CHARS | ] | ||
... | |||
##END-CHAR-SETS | ] | ||
... | |||
##END-CHAR-SET-TABLE |
The following tags can be used to enhance the text generated by the skeleton program.
[ | ##DELIMITER | Text | ] |
[ | ##RANGE-CHARS | Text | ] |
Name | Description |
---|---|
##DELIMITER | This tag is used to specify the characters used to display lists. This variable is used in the construction of rule lists, symbol lists, and chararacter sets. |
##RANGE-CHARS | This tag is used to specify characters that can used to denote a range of characters. For instance, the set '1,2,3,4,10' can be represented with '1..4,10'. This tag is used primarily with the %Chars.RangeList% variable in the ##CHAR-TABLE block. |
Name | Description |
---|---|
%Count% | The Count variable contains the number of character sets in in the grammar's Character Set Table. This variable is useful for array declaration or setting the global variables before storing each set. |
Name | Description |
---|---|
%CharCount% | This variable contains the number of characters in the set. It can be used to declare arrays before storing characters using the CHARS block. |
%Chars.List% | This variable contains the numeric values of the character set. The values are delimited by the character specified using the ##Delimiter tag. This value can be used as an alternative to using the CHARS block. |
%Chars.RangeList% | This variable contains the numeric values of the character set using set ranges. Use the ##RANGE-CHARS tag to specify which character that will be used to create a range of characters. |
%Chars.XML% | The value of this variable contains the characters encoding using XML. Essentially, any character that is does not have value between 32 and 126, will be representing using the &#xxxx; format. |
%Delimiter% | This variable is used to create lists where a delimiter is used between each time. The value of this variable is set with the ##Delimiter option. For the last item in the list, the value of the Delimiter variable will be set to a number of spaces. |
%Index% | This variable contains the index of the symbol in the table. |
Name | Description |
---|---|
%Delimiter% | This variable is used to create lists where a delimiter is used between each time. The value of this variable is set with the ##Delimiter option. For the last item in the list, the value of the Delimiter variable will be set to a number of spaces. |
%UnicodeIndex% | Each character contains a unique character code - called a "code point" in the Unicode Character Set. These are the same values stored in the Compiled Grammar Table file. |
The following example displays the content of the Character Set Table using formatted text.
##CHAR-SET-TABLE |
Table Count: %Count% |
##CHAR-SETS |
##RANGE-CHARS '..' |
##DELIMITER ',' |
##CHAR-SETS |
Set %Index% |
Character Count : %CharCount% |
Characters (XML) : %Chars.XML% |
Characters (List) : %Chars.List% |
Characters (Range): %Chars.RangeList% |
Individual Characters: |
##CHARS |
%UnicodeIndex% |
##END-CHARS |
##END-CHAR-SETS |
##END-CHAR-SET-TABLE |
If the "Simple" example grammar is used, the program template will create the following text for Character Set #10. The sets before and after #10 were excluded for brevity.
Table Count: 49 |
. |
. |
. |
Set 10 |
Character Count : 10 |
Characters (XML) : 0123456789 |
Characters (List) : 48, 49, 50, 51, 52, 53, 54, 55, 56, 57 |
Characters (Range): 48..57 |
Individual Characters: |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
55 |
56 |
57 |