File Structure Overview

Introduction

First, it is important to differentiate between the file's format and the file's content. When a document is saved to a file (for instance, a Microsoft Word file) it can be later retrieved and the work continued. This is possible since the application knows how information is stored and can, therefore, read its contents reliably. On the other hand, the actual content of the document can differ greatly from one file to the next (i.e. a report on Sacramento to an essay on parsing techniques).

This section describes the file format used. This documentation also contains a description of the Compiled Grammar Table file's content.

Data Structures

The file format is rather simple:

pic-sdb-header.gif (1084 bytes)

The first data structure in the file is the File Header which contains a null-terminated Unicode string. This string contains the the name and version of the type of information stored in the files records. In the case of a Compiled Grammar Table file, the file will contain the text:

GOLD Parser Tables/1.0

Since this string is stored in Unicode format, there are two bytes per each character give a total size of  (22+1)*2 = 46 bytes. The header should be read as any normal Unicode string, since its size could change depending its contents.

Following the Unicode string that consists the Header, the file will contain one or more records.

Design Considerations

The file format used to save the Compiled Grammar Tables was based on following principles:

  1. The file will normally be written to only once (when created) then then only read from sequentially
  2. The amount of information saved to the file will not be enormous, so data storage overhead is not an issue.
  3. The format should be easy to implement on numerous platforms. In other words, to be very simple structurally.
  4. The file structure should allow data structures to be added or expanded as needed in the future.
  5. The file structure should allow addition types of records to be added, if needed.