INTRODUCTION The data structures presented in this document use a C-like notation. However, none of the flexibility inherent in the C types is intended. A 'char' is exactly one byte, a 'short' is exactly two bytes, and a 'long' is exactly four bytes. A 'struct' has padding for alignment, as determined by the values in the dumpfile header. All short and char values are unsigned. Two other types used are 'uindex' and 'sector'. They are either two bytes or four bytes, as defined in the dumpfile header. This document describes the dumpfile format used by DGD 1.1, which uses dumpfile version number 2. Later DGD versions will use the same dumpfile version number as long as they are backwards compatible, but they may use new features not described in this document. SECTION 0: Header The dumpfile starts with a header that identifies various properties of the driver that made the dump, and of that driver's compiler and machine architecture. The size and alignment information provided in the header can be used to deduce the layout of the other structures used in the dumpfile. struct header { /* DGD sets the dumpflag to 0 while creating the dumpfile, and * sets it to 1 when done. So this byte can indicate an aborted * state dump. */ char dumpflag; /* Must be 1 */ char dumpversion; /* Must be 2 */ char driver; /* Driver development branch, standard 0 */ char typechecking; /* 1 for global typechecking, otherwise 0 */ /* Next two bytes are the swap sector size */ char secsize_msb; char secsize_lsb; /* Byte order information */ char short0; /* offset of msb in a short */ char short1; /* offset of lsb in a short */ char long0; /* offset of msb in a long */ char long1; /* offset of next-to-msb in a long */ char long2; /* offset of next-to-lsb in a long */ char long3; /* offset of lsb in a long */ /* Common examples: A dumpfile created on a little-endian * machine will have 1, 0, 3, 2, 1, 0. One created on a * big-endian machine will have 0, 1, 0, 1, 2, 3. */ char uindexsz; /* Size of uindex */ char sectorsz; /* Size of sector index */ char ptrsz; /* Size of (char *) */ /* Alignment requirements of various types */ char charalign; char shortalign; char longalign; char ptralign; char structalign; /* The next two are stored msb first, which need not be the byte order * of the machine that produced the dumpfile. */ long starttime; /* Timestamp; first boot time of this state */ long uptime; /* Uptime of this state */ }; SECTION 1: Swap sectors Next comes padding, up to (sectorsize - sizeof(dump_header)). Another way to view it is that the 0th sector contains two headers, struct header at the start and struct dump_header at the end. Now comes the dump_header itself: struct dump_header { long sectorsize; /* Size of a swap sector (must match value in header)*/ sector nsectors; /* Number of sectors assigned */ sector ssectors; /* Number of sectors allocated in swapfile */ sector nfree; /* Number of free sectors */ sector mfree; /* Start of free list in sector map */ }; Then come the swap sectors. Their number is given as 'ssectors' in the dump header above. Each is 'sectorsize' bytes long. Their internal structure varies. The swap sectors are followed by a sector map. The map has elements of type 'sector' and is 'nsectors' elements long. It maps assigned sector numbers, which are used in the higher-level data structures, to real sector numbers in the swapfile. The maps use the value SW_UNUSED as a special marker. SW_UNUSED is the largest value that will fit in a value of type 'sector'. SECTION 2: kfuns This section starts right after the end of the previous section. It begins with a dump header: struct dump_header { short nbuiltin; /* Number of builtin kfuns */ short nkfun; /* Number of other kfuns */ short kfnamelen; /* Length of kfun name list */ }; It is followed by a list of kfun names, each of which is a 0-terminated string. There will be 'nkfun' names. The total length of this list (in bytes) is 'kfnamelen'. The names are in the order in which the previous driver numbered its kfuns. The list of builtin kfuns is fixed. Some of the builtins are paired, such that the odd-numbered kfuns are typechecked, integer-oriented versions of the even-numbered kfuns. Normal Typechecked Builtin 0 1 val + val 2 3 val++ 4 5 val & val 6 7 val / val 8 9 val == val 10 11 val >= val 12 13 val > val 14 15 val <= val 16 17 int << int 18 19 val < val 20 21 int % int 22 23 val * val 24 25 val != val 26 27 ~int 28 29 !val 30 31 val | val 32 val[val..val] 33 val[val..] 34 val[..val] 35 val[..] 36 37 int >> int 38 39 val - val 40 41 val-- 42 (float) val 43 (int) val 44 45 !!val 46 47 -val 48 49 val ^ val 50 (string) val 51 check val[int..int] /* These three don't pop their */ 52 check val[int..] /* arguments and don't push a */ 53 check val[..int] /* return value. */ 54 summand /* Varargs! */ SECTION 3: Objects This section also begins with a dump header: struct dump_header { uindex free_obj; /* Start of free object list */ uindex nobjects; /* Number of objects allocated (includes free list) uindex nfreeobjs; /* Number of objects in free object list */ long onamelen; /* Length of object name list */ } The header is followed by the object table, which has 'nobjects' elements. Each object is a struct whose length depends on various type sizes and alignment requirements. There are several pointer values in the struct. These are ignored by the restore process, and the dumpfile does not contain enough information to make sense of them. There is a linked list of free objects, which starts at 'free_obj' and is threaded through the 'prev' field of the object structure, terminated with OBJ_NONE. The object table is followed by a table of object names, which is described at the end of this section. struct object { struct { object *next; /* Used for various disjunct lists (ignored) */ char *name; /* Ignored; the names are listed later */ } char flags; /* A list of flags is given below */ char etabi; /* External table index (ignored) */ uindex cref; /* Number of clone references (see below) */ uindex prev; /* Previous issue (used when upgrading), */ /* Next in free list (if count == 0) */ uindex index; /* Position in object table (redundant) */ long count; /* Unique identifier, see 'count' below. */ long update; /* Object version (used for upgrading) */ long ref; /* Reference count (if flags & O_MASTER), */ /* Index of master object (otherwise) */ control *ctrl; /* Pointer to control block (ignored) */ dataspace *data; /* Pointer to dataspace (ignored) */ sector cfirst; /* First swap sector of control block */ sector dfirst; /* First swap sector of data block */ }; There are five kinds of entries in the table: master objects, old versions of master objects, destructed master objects, clones, and free slots. The fields have slightly different meanings for all five. When a master object upgrades, its clones are initially left untouched. They are upgraded later, when they are referenced. This is why old versions are kept around. They are called issue objects, to clearly distinguish them from master objects. Similarly, when a master object is destructed, its entry is kept around as long as there are still clones that refer to it. 'next' This field is used at run-time as the thread for several different linked lists. However, none of them apply in a dumped state, and their start points are no longer available anyway. 'name' Each object is named, but clones construct their names from their master objects, so they have no name of their own. The name field is non-null for master objects and destructed master objects, and null everywhere else (including issue objects). Its specific value is meaningless -- see the discussion of the object name list, at the end of this section. 'flags' The following flags are used: O_MASTER 1 Object is not a clone O_AUTO 2 Object is the auto-inherited object O_DRIVER 4 Object is the driver object O_CREATED 8 Object has been initialized O_USER 16 Object was a user object (discarded) O_EDITOR 32 Object had an editor instance (discarded) O_COMPILED 64 Object is precompiled O_PENDIO 128 Object had pending output (discarded) O_MASTER is set for master objects, issue objects, and destructed master objects. Free slots have whatever flags they had when the slot was freed up. Issue objects have O_MASTER and no other flags. O_CREATED means that the creation function has been called in the object. This is done immediately in a clone, or the first time a function is called in a master object. O_USER, O_EDITOR, and O_PENDIO have little meaning because the data structures they refer to aren't part of the dumpfile. However, if O_PENDIO is set, buffered data may be found in the data block. See the description of data blocks, in the appendix. 'etabi' This field was an index into the user or editor tables. It is meaningless because those tables weren't dumped. 'cref' This field is used in master objects and destructed master objects to counts the number of clones. If there are old issue objects for this master object, the first issue object counts as one cref. Clones that have not been upgraded yet do not count as crefs. In old issue objects, this field is used to point to the next newer issue. (It is possible for a master object to have a whole chain of issue objects). The field is 0 for clones, and undefined for free slots. 'prev' In master objects, destructed master objects, and old issue objects, this field points to the next older issue. This completes the double-linking of the issue chain. (The link in the other direction is in the 'cref' field). If there is no older issue, the field contains the special OBJ_NONE value, which is an uindex with all bits set to 1. In free object slots, this field points to the next free slot. In clones, this field has no defined value. 'index' For master objects, destructed master objects, and clones, this field is the object's position in the object table (and thus redundant). For issue objects, this field is the index of the master object it is associated with. For free slots, the field has no defined value. 'count' Each master object and clone gets a unique number when it is created. This number is stored in the 'count' field. An object's 'count' number is set to 0 when it is destructed. Destructed master objects, and old issue objects have 0 in this field. Free slots have 0 here too, because masters and clones cannot be freed unless they were first destructed. 'update' Every master object has an update count, which is the total number of times it has been upgraded. The count starts at 0. A clone object's update field starts set to its master object's update field and is set to its master's new update field when it is upgraded. Clone objects are not automatically upgraded along with their master object (this happens when the clone is referenced), so their update fields can lag behind. Old issue objects keep their update count; this can be used to match non-upgraded clones to the correct issue object. Destructed master objects also keep their update count. For free slots the field is undefined. 'ref' In master objects this is the reference count. The formula is number of inheriting objects + number of clones + upgraded + 1. The extra +1 reference is removed when a master object is destructed. This ensures that only destructed master objects can be freed. The 'upgraded' reference is 1 if the master object has any old issue objects still around, and otherwise 0. When an inheriting object upgrades, its references to this object are removed immediately. Note that some "inheriting objects" may be counted more than once, if they inherit this object via several different inheritance paths. (See the control block layout, described in the appendix) Old issue objects calculate their refs the same way, except that they do not have the "not destructed" ref. Clones do not use this field for reference counting, they use it to store the index of the master object of which they are clones. The field is not defined for free object slots. 'ctrl' and 'data' These have no meaning because the structures they point to are not in the dumpfile. 'cfirst' The sector number of the first sector of the object's control block. It is set to SW_UNUSED if the object has no control block, which is true for clones and precompiled objects. Free object slots have no defined value here. Clones use the control block of their master object. If the master object has been upgraded, they use the control block of the issue that matches their 'update' field. The control block of an issue object can contain a 'varmap' (see the control block section). If it does not contain a varmap it will contain an old control block, but the references made by that block to other objects are no longer reliable. 'dfirst' The sector number of the first sector of the object's data block. It is set to SW_UNUSED if the object has no data block. Only 'created' objects (see O_CREATED flag) have a data block. Free object slots have no defined value here. The layout of control and data blocks is explained in the appendix. Telling the kinds of entries apart: Free objects are in the 'free_obj' free list, have 'count' 0, 'name' null, and either 'ref' 0 or !O_MASTER. Clones have 'name' null, !O_MASTER in the flags, 'cref' 0, 'index' pointing to itself, and 'count' > 0. Master objects have 'name' non-null, O_MASTER in the flags, 'index' pointing to itself, 'count' > 0, and 'ref' > 0. Destructed master objects have 'name' non-null, O_MASTER in the flags, 'index' pointing to itself, 'count' 0, and 'ref' > 0. Old issue objects have 'name' null, O_MASTER in the flags, 'index' pointing to another object, 'count' 0, and 'ref' > 0. A possible algorithm is: if 'count' == 0 and either 'ref' == 0 or O_MASTER not set in flags entry is free slot else if 'index' points to another object entry is old issue object else if O_MASTER not set in flags entry is clone else if 'count' == 0 entry is destructed master object else entry is master object After the object table comes a list of object names. The total size of this list is given in 'onamelen'. The elements are 0-terminated strings. There will be one name for every object that has a non-null 'name' field. DGD assumes that a null pointer has all bits 0. SECTION 4: Precompiled objects struct dump_header { uindex nprecomps; /* Size of precompiled objects table */ long ninherits; /* Size of inherits table */ long nstrings; /* Size of strings table */ long stringsz; /* Size of string data */ long nfuncdefs; /* Size of function definitions table */ long nvardefs; /* Size of variable definitions table */ long nfuncalls; /* Size of function calls table */ } The precompiled object data consists of seven tables of different element types. The size of each table is given in the header above. Each precompiled object is described by a collection of elements from these tables, as described by its entry in the first (precompiled objects) table. struct dump_precomp { long compiletime; /* Timestamp */ short ninherits; /* Number of inherited objects (includes indirect) */ short nstrings; /* Number of string constants */ long stringsz; /* Total size of string constants */ short nfuncdefs; /* Number of functions defined */ short nvardefs; /* Number of variables defined */ uindex nfuncalls; /* Number of function calls */ short nvariables; /* Number of variables (not used) */ } The fields 'ninherits' through 'nfuncalls' all count elements in the tables that follow. Each table contains the elements of that type for all precompiled objects, sequentially. The structure above lists for each precompiled objects how many elements it uses in each table. The field 'nvariables' contains the number of variables that comprise the object's state. It may be higher than 'nvardefs' because it includes inherited variables. The number can be deduced from the rest of the precompiled objects information, and DGD does not use it when restoring precompiled objects. The inherits table comes next. It has elements of the following structure: struct dump_inherit { uindex oindex; /* Index of inherited object */ uindex funcoffset; /* Function call offset */ short varoffset; /* Variable offset */ } An object's program code refers to functions and global variables by their indices in the object's function-call and variable tables. When an object is inherited, its function-call and variable tables are copied by the inheriting object. Since an object may inherit multiple objects, it is not always possible to keep the indices the same. That is why the "funcoffset" and "varoffset" fields exist. When an inherited program refers to functions or variables, the correct indices in the inheriting object's tables can be found by adding the corresponding offset value. Note that the function-call table is not the function definition table, and the variable table is not the variable definition table. The inherit table for a precompiled object lists all objects inherited by that object, whether directly or indirectly. It also lists the object itself. Some objects may be listed more than once, if they were inherited via several different inheritance paths. Next comes the table of string constants used by the precompiled objects. It consists of structures of two fields: struct dstrconst { long index; /* Start of string in string data block */ short len; /* Length of string */ } The 'index' fields point to the string data table that follows. They are relative to the start of the string data of the precompiled object they belong to. The strings in the string data table are not 0-terminated, and it is possible for strings to overlap. Next comes the string data table, which is just a long string of characters. It is indexed via the preceding table. The function definition table contains (for each precompiled object) one entry per function it defines. Its elements are of type dfuncdef, which is described as part of the control block layout in the appendix. The dfuncdef structure contains an index into the program bytecode. Precompiled objects have rudimentary bytecode that contains just the prototype definition for each function. These prototype definitions are not available in the dumpfile. The variable definition table contains (for each precompiled object) one entry per variable it defines. Its elements are of type dvardef, which is described as part of the control block layout in the appendix. The function call table is used to look up local function calls. It is indexed by values in the program bytecode. Each entry consists of two values of type 'char'. The first value is an index to the object's inherit table, and the second is an index to that inherited object's function definition table. (Note that this limits the size of the inherit table and the number of function definitions per object to 255 each. DGD enforces these limits.) SECTION 5: Callouts Like other sections, this one starts with a header. struct dump_header { uindex tablesize; /* Size of callout table */ uindex queuebrk; /* Top of long-callout heap */ uindex cycbrk; /* Bottom of short callouts */ uindex freelist; /* Start of short-callout free list */ uindex nshort; /* Number of short-term callouts */ uindex nlong; /* Number of long-term callouts */ long timestamp; /* Time the last alarm came */ long timediff; /* Times in data blocks are relative to timediff */ }; The callout table contains a partially ordered heap of long-delay callouts, which grows up from the bottom, and a collection of short-delay callouts which grows down from the top. The short-delay callouts are organised as many linked lists, one for each timeout time. The start points of these lists are kept separately, in the "cyclic buffer". After the header come two arrays of callouts. The first is the long-delay heap of callouts, with 'queuebrk' elements. The second is the collection of short-delay callouts, with 'tablesize' - 'cycbrk' elements. The unused table space between the long-delay and the short-delay callouts is not in the dumpfile. The value 'nshort' should be equal to 'tablesize' - 'cycbrk' minus the length of the freelist. The value 'nlong' should be equal to 'queuebrk'. Long-delay heap entry: struct long_call_out { uindex handle; /* callout handle */ uindex oindex; /* callout owner: index in object table */ long timeout; /* timestamp: when to call */ }; The callout handle is a number that the owning object can use to refer to this callout. It is unique for that object, but not globally unique. It is never zero. The 'oindex' field identifies the owner of this callout. It can never be a destructed object because callouts are deleted when an object is destructed. The 'timeout' field is the time that the callout should activate. The downtime between dumping and restoring should be added to this timeout. DGD uses 'timestamp' from the callout section header as the dumping time. The long callout table is organised such that the timeout for element i is sooner than that for elements i*2 and i*2+1. (The first element is number 1). Since this holds for the entire table, element 1 must be the callout that times out first. Short-delay list entry: struct short_call_out { uindex handle; /* callout handle */ uindex oindex; /* callout owner: index in object table */ long next; /* index of next callout in this one's list */ }; The 'handle' and 'oindex' fields have the same meaning as for long-delay callouts. The 'next' field is 0 if this is the last callout in the list, otherwise it is the index of the next callout. The timeout for a short callout is implicit, because each list starts in the cyclic buffer (described below), which defines a timeout value, and all callouts in one list share that timeout value. Free short-delay entry: struct free_short_call_out { uindex handle; /* 0 to indicate that this element is free */ uindex prev; /* index of previous callout in the free list */ long next; /* index of next callout in the free list */ }; The free list for short callouts starts at 'freelist' (in the callout section header), and is threaded through the 'next' and 'prev' fields. The cyclic buffer which is used to address the short callouts comes after the callout arrays. Each element has two fields as described below, and there are CYCBUF_SIZE elements. The default value of CYCBUF_SIZE is 128. It must be a power of two. The element for time t is found as cycbuf[t & (CYCBUF_SIZE-1)], provided that t >= timestamp && t < timestamp + CYCBUF_SIZE. The value 'timestamp' can be found in the callout section dump header. struct cbuf { uindex list; /* start of call_out list for this time value */ uindex last; /* last node of call_out list for this time value */ }; A value of 0 means that the list is empty. Buffer position 0 is never used for short callouts. The 'timediff' value is not used for the structures in this section, but it is used by the additional callout information in the data blocks. The 'oindex' and 'handle' fields are used to look up this additional information. APPENDIX: Swap sectors Each swap sector is either free, or part of a control block or a data block. Control and data blocks are not contiguous in the swapfile. The start points of these blocks are found in the object table, fields "cfirst" and "dfirst". Each block starts with a header, and an array of sector numbers (of type 'sector'). The array determines which sectors, in which order, are part of the block. The header always fits in this first sector, because the minimum sector size is 512 bytes. The array of sector numbers need not fit in the first sector in its entirety, as long as at least the first two numbers fit (which they will). CONTROL BLOCKS: The size of the control block header depends on the size of uindex and on alignment requirements. Its contents are: struct control_header { sector nsectors; /* Number of sectors (length of sector array) */ char flags; /* Flags, as yet unused (always 0) */ char ninherits; /* Number of entries in inherit table */ long compile_time; /* Timestamp of compilation */ long program_size /* Length of program bytecode */ short nstrings; /* Number of strings in string constant table */ long strsize; /* Size of string constant table */ char nfuncdefs; /* Nr of entries in function definition table */ char nvardefs; /* Nr of entries in variable definition table */ uindex nfuncalls; /* Number of entries in function call table */ short nsymbols; /* Number of entries in symbol table */ short nvariables; /* Number of variables */ short nfloatdefs; /* Number of float definitions */ short nfloats; /* Number of float variables */ short varmapsize; /* Size of variable map, or 0 for no map */ }; The array of sector numbers comes after this header. There are 'nsectors' numbers, each of type 'sector'. If 'varmapsize' is not 0, the array of sector numbers is followed only by the variable map, which is an array of 'varmapsize' elements of type 'short'. Otherwise, the array of sector numbers is followed by the inherit table, the program code, the string constants, the function definition table, the variable definition table, the function call table, and the symbol table. The fields 'nvariables' and 'nfloats' count variables either defined in this program or inherited from others. The field 'nfloatdefs' counts only variables defined in this program. All three could be calculated from the tables. Variable map: This is only present if 'varmapsize' > 0. It is used by old issue objects to map their variables to the variables of the next newer issue. It is indexed by the index of the variable in the new issue, and lists the index of the variable in the old issue that corresponds to it. If the variable is new, the special value NEW_INT or NEW_FLOAT is used to indicate that the variable should get the value 0 or 0.0. NEW_INT is an unsigned short with all bits set to 1, and NEW_FLOAT is NEW_INT - 1. Note that these indices are all for the variable table in the data block. The data block has an extra variable, not described by the control block, at the end of its variable table. This extra variable contains buffered I/O if the object has the O_PENDIO flag set. Only issue objects can have a varmap. If they do not have a varmap, they have a normal control block, except that the objects referenced by that block are not guaranteed to exist or still be the same ones. Inherit table: It has 'ninherits' entries, and each entry is a structure: struct inherit { uindex oindex; /* inherited object, index in object table */ uindex funcoffset; /* function call offset (see below) */ short varoffset; /* variable offset (see below) */ }; An object's program code refers to functions and global variables by their indices in the object's function-call and variable tables. When an object is inherited, its function-call and variable tables are copied by the inheriting object. Since an object may inherit multiple objects, it is not always possible to keep the indices the same. That is why the 'funcoffset' and 'varoffset' fields exist. When an inherited program refers to functions or variables, the correct indices in the inheriting object's tables can be found by adding the corresponding offset value. Note that the function-call table is not the function definition table, and the variable table is not the variable definition table. The inherit table lists all objects inherited by this object, whether directly or indirectly. It also lists the object itself. Some objects may be listed more than once, if they were inherited via several different inheritance paths. Program code: The object's program code is 'program_size' bytes long. The format of DGD program code will be explained in another section. String constants: The string constants are stored as a single block of text, of length 'strsize'. This block is preceded by a table of indices, of length 'nstrings', where each element contains an index into the string block and the length of the string. struct strconst { long index; /* Start of string in string block */ short len; /* Length of string */ }; Note that the strings in the string block are not 0-terminated, and that it is possible for strings to overlap. Function definition table: The table is of length 'nfuncdefs' and its elements are structs: struct dfuncdef { char class; /* See below */ char inherit; /* Index in inherit table (see below) */ short index; /* Index of function name in string constant table */ long offset; /* Location of function code in program bytecode */ }; The "class" field contains bit flags that indicate properties of the function. The following flags are used: C_PRIVATE 1 Function is private to its defining program C_STATIC 2 Function is private to this object C_NOMASK 4 Function cannot be redefined by inheriting objects C_VARARGS 8 Function may be called with excess arguments C_ATOMIC 16 (unused) C_TYPECHECKED 32 Function uses strict typechecking C_COMPILED 64 Function is precompiled C_UNDEFINED 128 Function has only a prototype declaration The field 'index' must be looked up in the string table of the object that contains the function name. The field 'inherit' and the inherit table can be used to look up the right control block. The field 'offset' points to the function prototype, which is followed by the function body. Variable definition table: The table is of length 'nvardefs' and its elements are structs. There is some similarity between the tables for variable definitions and function definitions. struct dvardef { char class; /* See below */ char inherit; /* Index of in inherit table (see below) */ short index; /* Index of variable name in string constant table */ short type; /* Variable type */ }; The 'class' field contains bit flags that indicate properties of the variable. The following flags are used: C_PRIVATE 1 Variable is private to its defining program C_STATIC 2 Variable is immune to save/restore_object() As with function definitions, the "index" field should be used with the control block of the object indicated by the "inherit" field. The "type" field is a short, but only its 8 least significant bits are used. Of those, the 4 least significant bits are used for the base type, and the 3 most significant bits are used for the number of array levels. 0 levels means that the variable is declared as the base type, 1 level means it is an array of the base type, and so forth. The base types are: T_INT 1 T_FLOAT 2 T_STRING 3 T_OBJECT 4 T_MAPPING 6 T_MIXED 8 DGD has more types, but they should not appear as base types of variable definitions. See the description of variables in the data block for another list of types, and the description of function headers in the program text for a third. The function call table: This table is used to look up function calls that could not be bound at compile time. It is indexed by opcodes in the program code, and each element is two unsigned chars, of which the first is an index to the current object's inherit table, and the second is an index to the inherited object's function definition table. The function call table is 'nfuncalls' elements long. The symbol table: This table is used to look up function calls by function name. It is indexed by a hash of the name string, and there are 'nsymbols' elements. struct symbol { char inherit; /* Object defining the function */ char index; /* Index in that object's function definition table */ short next; /* Next symbol in hash table */ }; The 'inherit' field is an index to the current object's inherit table. The 'next' field is an index to the symbol table. It points to itself if there is no next symbol. The hash function will not be explained in this document. DATA BLOCKS: The data block also starts with a header: struct data_header { sector nsectors; /* Number of sectors (length of sector array) */ short flags; /* Dataspace flags, as yet unused (always 0) */ short nvariables; /* Number of variables */ long narrays; /* Number of array values */ long eltsize; /* Total number of array elements */ long nstrings; /* Number of strings */ long strsize; /* Total size of string space */ uindex ncallouts; /* Size of callout table */ uindex fcallouts; /* Start of free callout list */ }; The array of sector numbers comes after this header. There are 'nsectors' numbers, each of type 'sector'. Variables: The variables are stored as an array of values, of length 'nvariables'. The 'nvariables' field in the data block will be one larger than the field in the corresponding control block, because DGD reserves one variable at the end. If the O_PENDIO flag is set in the object, this variable will contain buffered output. Each value starts with a type field. The types used are: T_INT 1 T_FLOAT 2 T_STRING 3 T_OBJECT 4 T_ARRAY 5 T_MAPPING 6 DGD has more types, but they should not appear as value types. See the description of variable declarations in the control block for another list of types, and the description of function headers in the program text for a third. The type field determines the layout of the value struct. struct int_value { short type; /* type 1 */ uindex unused; long number; /* int value */ }; struct float_value { short type; /* type 2 */ uindex high; /* most significant 16 bits of float value */ long low; /* least significant 32 bits of float value */ }; struct string_value { short type; /* type 3 */ uindex unused; long string; /* index in this data block's string table */ }; struct object_value { short type; /* type 4 */ uindex oindex; /* index in global object table */ long count; /* unique object identifier */ }; struct aggr_value { short type; /* type 5 (array) or 6 (mapping) */ uindex unused; long array; /* index in array table of this data block */ }; A floating point number is described as a 48-bit value. The most significant bit is the sign. The next 11 bits are the exponent, and the remaining 36 bits are the mantissa. The sign bit is 1 if the number is negative. If the exponent is 0, the floating point number is 0. Otherwise, to get the number's absolute value (in binary), first prepend a 1 bit to the mantissa (using the mantissa as the remainder), then subtract 1023 from the exponent and shift that number of bits to the left. A negative value means shifting to the right. The array table: Arrays are stored in two parts. The first is a table that is indexed by the "array" fields in the variable table. Its elements contain indices for the second part, which is a single block of array elements. The first part is called the array table. Its length is 'narrays', and its elements are structs: struct array { long index; /* index in element list (second part) */ short size; /* number of elements */ long ref; /* reference count */ long tag; /* unique value for each array */ }; The second part is called the element list. Its length is 'eltsize', and its elements are value structs just like those in the variable table described above. The 'ref' value is the number of value structs in this data block that point to this array. Arrays are shared only within a data block. The string table: Strings are organized just like arrays. The first part is called the string table. Its length is 'nstrings', and its elements are structs: struct string { long index; /* index in string text block */ short len; /* length of string */ long ref; /* reference count */ }; The second part is called the string text block. Its length is 'strsize', and its elements are characters. It may be possible for string texts to overlap. The 'ref' value is the number of value structs in this data block that point to this string. Strings are shared only within a data block. The callout table: This table is of length 'ncallouts'. It is indexed via "handles", which the LPC layer is expected to keep track of. To find the element corresponding to a handle, first subtract 1 from the handle. The handle 0 is used as an invalid value. struct callout { long time; /* time of call (see below) */ short nargs; /* number of arguments */ svalue function_name; /* name of function to call (as string) */ svalue args[3]; /* up to 3 arguments */ }; The 'time' field is not a direct timestamp. To get a timestamp, the 'timediff' value from the callout section header should be added to it. If the 'function_name' field has a 'type' field 0 (known as T_INVALID), then the callout entry is free. DGD threads a free list which starts at 'fcallouts' and uses 'time' and 'nargs' as prev and next fields; their values are handles. A handle of 0 indicates the end of the free list. Normally, the callout arguments are stored in the 'args' array. But if 'nargs' is greater than 3, only the first two arguments are stored directly, and args[2] is an array containing the remaining arguments. PROGRAM TEXT: The program text for a function consists of a header, followed by the bytecode for that function, followed by line number information. A program is a sequence of such functions. The starting point for each function is given by the function definition table, found in the control block. Two-byte values are stored with the most significant byte first. A function header has a length that depends on the number of arguments to that function. This number can be found in the third byte of the header. A precompiled function will not have the 'prog-size' field, but precompiled functions will not be encountered in the dumpfile. A function that has a prototype but no definition will still have a header, though without the 'depth', 'num-locals', and 'prog-size' fields. 1 function-class /* Same C_ flags as function definition table */ 1 function-type /* See below */ 1 num-arguments /* Number of arguments for this function */ 1 arg-type[num-arguments] /* Type of each argument */ 2 depth /* Number of stack elements to reserve */ 1 num-locals /* Number of local variables (not counting args) */ 2 prog-size /* Size of this function's program code */ The organisation of 'function-type' is just like that for variable declarations, except that T_VOID is also used. The 3 most significant bits are used for the number of array levels. T_INT 1 T_FLOAT 2 T_STRING 3 T_OBJECT 4 T_MAPPING 6 T_MIXED 8 T_VOID 9 DGD has more types, but they should not appear as function types. See the description of variable declarations in the control block for another list of types, and the description of variables in the data block for a third. The organisation of 'arg-type' is just like that for variable declarations, except that the extra bit (between the 3 for array levels and the 4 for the type) is used to indicate that this argument has an "ellipsis" modifier. (That is the name of the three dots following the last variable in a varargs declaration). The program code is a sequence of opcodes with their arguments. The number of arguments is implicit for each opcode. An opcode occupies the 5 least significant bits of its byte. The 2 most significant bits are the line offset, and the bit in between is the pop bit. The pop bit is set if the top stack value should be popped after executing the opcode. For certain opcodes (noted below) it has a different meaning. There are 32 different opcodes. Opcode 0: PUSH_ZERO (Push integer 0 on stack) This opcode has no arguments. Opcode 1: PUSH_ONE (Push integer 1 on stack) This opcode has no arguments. Opcode 2: PUSH_INT1 (Push small integer on stack) This opcode has a 1-byte argument, which is a signed number to push on the stack. Opcode 3: PUSH_INT4 (Push integer on stack) This opcode has a 4-byte argument, most significant byte first, which is a signed number to push on the stack. Opcode 4: PUSH_FLOAT (Push floating-point number on stack) This opcode has a 6-byte argument, which is a floating-point number to push on the stack. The format of a floating point number is described in the dataspace section. Opcode 5: PUSH_STRING (Push constant string on stack) This opcode has a 1-byte argument, which is the index of the string in the string table of the control block that points to this program. Opcode 6: PUSH_NEAR_STRING (Push inherited constant string on stack) This opcode has two 1-byte arguments. The first is an index to the inherit table of the control block that points to this program. The second is the index of the string in the string table of the control block of that inherited program. Opcode 7: PUSH_FAR_STRING (Push inherited constant string on stack) This opcode has a 1-byte argument and a 2-byte argument. Their meaning is the same as for opcode 6, except that the second argument has a greater range. Opcode 8: PUSH_LOCAL (Push local variable's value on stack) This opcode has a 1-byte argument, which is the signed number of a local variable to push on the stack. Positive numbers and zero refer to function arguments, negative numbers refer to local variables. Opcode 9: PUSH_GLOBAL (Push global variable's value on stack) This opcode has a 1-byte argument, which is the index, in the variable table of the control block that points to this program, of the variable to push on the stack. Opcode 10: PUSH_FAR_GLOBAL (Push inherited global variable's value on stack) This opcode has two 1-byte arguments. Ths first is an index to the inherit table of the control block that points to this program. This index is reversed, and inherit entry (tablesize-index) shoule be used. The inherit entry is used to find the varoffset for that inherited program. The varoffset is added to the second argument to find the index in this object's data block of the variable to push on the stack. Opcode 11: PUSH_LOCAL_LVAL (Push local variable as lvalue on stack) The pop bit is not used as a pop bit for this opcode. If it is set, the opcode has two 1-byte arguments, and the second is the type of the variable. If it is not set, the opcode has just one 1-byte argument. The first argument is the index of the local variable, used the same way as PUSH_LOCAL. Opcode 12: PUSH_GLOBAL_LVAL (Push global variable as lvalue on stack) The pop bit is not used as a pop bit for this opcode. If it is set, the opcode has two 1-byte arguments, and the second is the type of the variable. If it is not set, the opcode has just one 1-byte argument. The first argument is the index of the global variable, used the same way as PUSH_GLOBAL. Opcode 13: PUSH_FAR_GLOBAL_LVAL (Push inherited global variable as lvalue) The pop bit is not used as a pop bit for this opcode. If it is set, the opcode has three 1-byte arguments, and the third is the type of the variable. If it is not set, the opcode has just two 1-byte arguments. The first two arguments are used the same way as PUSH_FAR_GLOBAL. Opcode 14: INDEX (Pop index and aggregate, push agg[index]) This opcode has no arguments. Opcode 15: INDEX_LVAL (Pop index and aggregate, push agg[index] as lvalue) The pop bit is not used as a pop bit for this opcode. If it is set, the opcode has one 1-byte argument, otherwise it has none. The extra argument is the type of the element to be pushed as lvalue. Opcode 16: AGGREGATE (Pop elements and push as array or mapping) This opcode has a 1-byte and a 2-byte argument. The first is 0 for array aggregation or 1 for mapping aggregation. The second is the size of the aggregate to be created. Opcode 17: SPREAD (Pop array and push elements as values or lvalues) The pop bit is not used as a pop bit for this opcode. If it is set, the opcode has two 1-byte arguments, and the second is the type of the array elements. If it is not set, the opcode has just one 1-byte argument. The first argument is a signed value that indicates how many of the array's elements to push as values. The rest are pushed as lvalues. Opcode 18: CAST (Check a value's type) This opcode has one 1-byte argument, which is the type which the top stack element should match. Opcode 19: FETCH (Push lvalue's value) This opcode has no arguments. Opcode 20: STORE (Assign value to lvalue; pop both and push value) This opcode has no arguments. Opcode 21: JUMP (Jump) This opcode has one 2-byte argument, which is the (absolute) offset in the function's program code to jump to. Opcode 22: JUMP_ZERO (Jump if top of stack == 0) This opcode has one 2-byte argument, which is the (absolute) offset in the function's program code to jump to. Opcode 23: JUMP_NONZERO (Jump if top of stack != 0) This opcode has one 2-byte argument, which is the (absolute) offset in the function's program code to jump to. Opcode 24: SWITCH (Start of switch statement) This opcode has one 1-byte argument, which indicates the type of switch. An argument of 0 means a switch on integer values. An argument of 1 means a switch on integer values that has ranged cases. An argument of 2 means a switch on string values. The switch is followed by data. For integer switches: 2 switch-size /* number of case labels (counting default) */ 1 case-size /* size of integers in case labels */ 2 default-offset /* (absolute) offset of default branch */ It is followed by 'switch-size'-1 elements, each containing a single signed number stored in 'case-size' bytes, most significant byte first, and the 2-byte offset of the code for that case label. The entries are sorted by their case values. For integer switches with ranged cases: 2 switch-size /* number of case labels (counting default) */ 1 case-size /* size of integers in case labels */ 2 default-offset /* (absolute) offset of default branch */ It is followed by 'switch-size'-1 elements, each containing two signed numbers stored in 'case-size' bytes each, most significant byte first, and the 2-byte offset of the code for that range. The ranges are sorted by their low-end values. DGD forbids overlapping ranges. For string switches: 2 switch-size /* number of case labels (counting default) */ 2 default-offset /* (absolute) offset of default branch */ 1 zero-case /* 0 if there is a case 0, otherwise 1 */ If zero-case is 0, there is an extra 2-byte offset of the code for the 0 case. If it is present, it is counted in 'switch-size'. This is followed by 'switch-size'-1 elements ('switch-size'-2 if there is a zero case), each containing a "far string" (in the same format as used by PUSH_FAR_STRING, total of 3 bytes), and the 2-byte offset of the code for that string value. The entries are sorted by their string values. Opcode 25: CALL_KFUNC (Call kernel function) This opcode takes at least one 1-byte argument, the number of the kfun to be called. If that kfun has a "varargs" flag, it has a second argument, which is the number of arguments passed to the kfun. An array passed as arguments, using the ellipsis construct, is counted as one argument here. The kfun number is encoded. If the most significant bit is 1, the low bits are an index in the kfun name list. If it is 0, the low bits are the number of a builtin kfun. See the kfun section. Opcode 26: CALL_AFUNC (Call auto-object function) This opcode takes two 1-byte arguments. The first is the index in the auto object's function definition table, and the second is the number of arguments passed to the function. An array passed as arguments, using the ellipsis construct, is counted as one argument here. Opcode 27: CALL_DFUNC (Call bound function) This opcode takes three 1-byte arguments. The first is an index in the inherit table of the control block pointing to this program. This index is reversed, and inherit entry (tablesize-index) should be used. The second is an index in that inherited object's function definition table. The third is the number of arguments passed to the function. An array passed as arguments, using the ellipsis construct, is counted as one argument here. Opcode 28: CALL_FUNC (Call function) This opcode takes a 2-byte argument and a 1-byte argument. The first is an index in the function call table (see the discussion of 'funcoffset' in the description of control blocks), and the second is the number of arguments passed to the function. An array passed as arguments, using the ellipsis construct, is counted as one argument here. Opcode 29: CATCH (Catch construct) This opcode takes a 2-byte argument, which is the (absolute) offset of the location to which to jump if there is an error. Opcode 30: RLIMITS (Rlimits construct) This opcode takes a 1-byte argument, which is 0 if the rlimits construct still needs runtime verification, or 1 if it has been checked. Opcode 31: RETURN (Return from function, catch, or rlimits) This opcode takes no argument. It does not use the pop bit. Line number counting At the start of each function, the implicit line counter is 0. Each opcode uses the line offset to indicate how the line counter should change compared to the previous opcode. A line offset of 0, 1, or 2 is a simple offset. A line offset of 3 means that the offset should be looked up in the line number information that follows the function's program code. The line number information is a sequence of offsets. If the most significant bit of the first byte of an offset is 1, then the offset is 1 byte long, otherwise 2. In the 1-byte case, an offset in the range -64..63 is stored as (offset+64) in the low 7 bits. In the 2-byte case, an offset in the range -16384..16383 is stored as (offset+16384) in a 15-bit value with its high 7 bits in the low 7 bits of the first byte and its low 8 bits in the second byte. Offsets outside the range -16384..16383 are not supported. The line number of an opcode is looked up by adding up the offsets for all preceding opcodes plus this one. The opcodes should be taken in the order in which they appear in the code, not the order in which they would be executed.