A byte is an 8-bit unsigned quantity.
A word is a 16-bit unsigned quantity.
Bitfields are represented as blocks of characters, with the first character representing the most significant bit of the byte in question. Multi-bit subfields are indicated by using the same character multiple times, and values of 0 or 1 indicate that these bits are always of the specified value. Therefore a bitfield described as 010abbcc cccdd111 would be a two-byte bitfield containing four subfields, a, of 1 bit, b, 2 bits, c, 5 bits, and d, 2 bits, together with a field 'hardwired' to 010 and one to 111.
All multi-byte numbers are stored in big-endian form: most significant byte first, then in strictly descending order of significance.
The reader is assumed to already be familiar with the Z-machine; in particular its instruction set, memory map and stack conventions.
When form type names, which are four characters long, are set in running text, they're set in bold-face with spaces replaced by underscores. Thus ____ means "four spaces" and (c)_ means three letters of the copyright notation followed by a space.
For the purposes of flexibility, the overall format will be a new IFF type. A standard core is defined, and customised information can be stored by specific interpreters in such a way that it can be easily read by others. The FORM type is IFZS.
Several chunks are defined within this document to appear in the IFZS FORM:
'IFhd' 5.4 'CMem' 3.7 'UMem' 3.8 'Stks' 4.10 'IntD' 7.8
Several chunks may also appear by convention in any IFF FORM:
'AUTH' 7.2, 7.3 '(c) ' 7.2, 7.4 'ANNO' 7.2, 7.5
Since the contents of dynamic memory may be anything up to 65534 bytes, it is desirable to have some form of compression available as an option. Bryan Scattergood's port of ITF uses a method that is both elegant and effective, and this is the method adopted.
The data is compressed by exclusive-oring the current contents of dynamic memory with the original (from the original story file). The result is then compressed with a simple run-length scheme: a non-zero byte in the output represents the byte itself, but a zero byte is followed by a length byte, and the pair represent a block of n+1 zero bytes, where n is the value of the length byte.
It is not necessary to compress optimally, if to do so would be difficult. For example, an interpreter that does not store the whole of dynamic memory in physical memory may compress a single page at a time, ignoring the possibility of a run crossing a page boundary; this case can be encoded as two adjacent runs of bytes. It is required, however, that interpreters read encoded data even if it does not happen to be compressed to their particular page-boundary preferences. This is not difficult, requiring merely the maintenance of a small amount of state (namely the current run length, if any) across page boundaries on a read.
If the decoded data is shorter than the length of dynamic memory, then the missing section is assumed to be a run of zeroes (and hence equal to the original contents of that part of dynamic memory). This permits the removal of redundant runs at the end of the encoded block; again it is not necessary to implement this on writes, but it must be understood on reads.
Two error cases are possible on reads: the decoded data may be larger than dynamic memory, and the encoded data may finish with an incomplete run (a zero byte without a length byte). These should be dealt with in whatever way seems appropriate to the interpreter writer.
Dissenting voices have suggested that compression is unnecessary in today's world of cheap storage, and so the format also includes the capability to dump the contents of dynamic memory without modification. The ability to write such files is optional; the ability to read both types is necessary. It is an error for this dump to be shorter or longer than the expected length of dynamic memory.
The IFF chunk used to contain the compressed data has type CMem. Its format is as follows:
4 bytes 'CMem' chunk ID
4 bytes n chunk length
n bytes ... compressed data as above
The chunk used to contain the uncompressed data has type UMem. It has the format:
4 bytes 'UMem' chunk ID
4 bytes n chunk length
n bytes ... simple dump of dynamic memory
One of the biggest differences between current interpreters is how they handle the Z-machine's stacks. Conceptually, there are two, but many interpreters store both in the same array. This format stores both in the same IFF chunk, which has chunk ID Stks.
The IFF format includes a length field on each chunk, so we can write only the used portion of the stacks, to save space. The least recent frames on the stacks are saved first, to ensure that the missing part appears at the end of the data in the file.
Each frame has the format:
3 bytes ... return PC (byte address)
1 byte 000pvvvv flags
1 byte ... variable number to store result
1 byte 0gfedcba arguments supplied
1 word n number of words of evaluation stack used by this call
v words ... local variables
n words ... evaluation stack for this call
The return PC is a byte offset from the start of the story file.
The p flag is set on calls made by call_xN (discard result), in which case the variable number is meaningless (and should be written as a zero).
Assigning each of the possible 7 supplied arguments a letter a-g in order, each bit is set if its respective argument is supplied. The evaluation stack count allows the reconstruction of the chain of frame pointers for all possible stack models. Words on the evaluation stack are also stored least recent first.
Although some interpreters may impose an arbitrary limit on the size of the stacks (such as ZIP's 1024-word total stack size), others may not, or may set larger limits. This means that the size of a stack dump may be larger than will fit. If you cannot dynamically resize your stack you must trap this as an error.
The stack pointer itself is not stored anywhere in the save file, except implicitly, as the top frame on the stack will be the last saved.
The chunk itself is simply a sequence of frames as above:
4 bytes 'Stks' chunk ID
4 bytes n chunk length
n bytes ... frames (oldest first)
In Z-machine versions other than V6 execution starts at an address rather than at a routine, and therefore data can be pushed on the evaluation stack without anything being on the call stack. Therefore, in all versions other than V6 a dummy stack frame must be stored as the first in the file (the oldest chunk).
The dummy frame has all fields set to zero except n, the amount of evaluation stack used. Note that this may also be zero if the game does not use any evaluation stack at the top level.
This frame must be written even if no evaluation stack is used at the top level, and therefore interpreters may assume its presence on savefiles for V1-5 and V7-8 games.
We now come to one of the most difficult (yet most important) parts of the format: how to find the story file associated with this save file, or the related (but easier) problem of checking whether a given save file belongs to a given story.
Considering the easier second problem first, the actual name of the story file is often not much use. Firstly, filenames are highly dependent on the operating system in use, and secondly, many original Infocom story files were called simply 'story.data' or similar.
The method most existing interpreters use is to compare the variables at offsets $2, $12, and $1C in the header (that is, the release number, the serial number and the checksum), and refuse to load if they differ. These variables are duplicated in the file (since the header will be compressed with the rest of dynamic memory).
This data will be stored in a chunk of type IFhd. This chunk must come before the [CU]Mem and Stks chunks to save interpreters the trouble of decoding these only to find that the wrong story file is loaded. The format is:
4 bytes 'IFhd' chunk ID
4 bytes 13 chunk length
1 word ... release number ($2 in header)
6 bytes ... serial number ($12 in header)
1 word ... checksum ($1C in header)
3 bytes ... initial PC on restore
If the save file belongs to an old game that does not have a checksum, it should be calculated in the normal way from the original story file when saving. It is possible that a future version of this format may have a larger IFhd chunk, but the first 13 bytes will always contain this data, and if the other chunks described herein are present they will be guaranteed to contain the data specified.
The first problem (of trying to find a story file given only a save file) cannot really be solved in an operating-system independent manner, and so there is provision for OS-dependent chunks to handle this.
It should be noted that the current state of the IFhd chunk means it has odd length (13 bytes). It should, of course, be written with a pad byte (as mentioned in 8.4.1).
It must be specified exactly what the magic cookie returned by catch is, since this value can be stored in any random variable, on the evaluation stack, or indeed anywhere in memory.
For greatest independence of internal interpreter implementation, catch is hereby specified to return the number of frames currently on the system stack. This makes throw slightly inefficient on many interpreters (a current frame count can be maintained internally to avoid problems with catch), but this is unavoidable without using two stacks and a fixed-size activation record (always 15 local variables). Since most applications of catch/throw do not unwind enormous depths, (and they are somewhat infrequent), this should not be too much of a problem.
The numbers of pictures and sounds do not need specification, since they are requested by number by the story file itself.
One of the advantages of the IFF standard is that extra chunks can be added to the format to extend it in various ways. For example, there are three standard chunk types defined, namely AUTH, (c)_, and ANNO.
AUTH, (c)_, and ANNO chunks all contain simple ASCII text (all characters in the range 0x20 to 0x7E).
The only indication of the length of this text is the chunk length (there is no zero byte termination as in C, for example).
The IFF standard suggests a maximum of 256 characters in this text as it may be displayed to the user upon reading, although it could get longer if required.
The AUTH chunk, if present, contains the name of the author or creator of the file. This could be a login name on multi-user systems, for example. There should only be one such chunk per file.
The (c)_ chunk contains the copyright message (date and holder, without the actual copyright symbol). This is unlikely to be useful on save files. There should only be one such chunk per file.
The ANNO chunk contains any textual annotation that the user or writing program sees fit to include. For save files, interpreters could prompt the user for an annotation when saving, and could write an ANNO with the score and time for V3 games, or a chunk containing the name/version of the interpreter saving it, and many other things.
The ANNO, (c)_ and AUTH chunks are all user-level information. Interpreters must not rely on the presence or absence of these chunks, and should not store any internal magic that would not make sense to a user in them.
These chunks should be either ignored or (optionally) displayed to the user. (c)_ chunks should be prefixed with a copyright symbol if displayed.
The save-file may contain interpreter-dependent information. This is stored in an IntD chunk, which has format:
4 bytes 'IntD' chunk ID
4 bytes n chunk length
4 bytes ... operating system ID
1 byte 000000sc flags
1 byte ... contents ID
2 bytes 0 reserved
4 bytes ... interpreter ID
n-12 bytes ... data
The operating system and interpreter IDs are normal IFF 4-character IDs in form. Please register IDs used with Martin Frost (at the email address given below) so that this can be managed sensibly. They can then be added to future versions of this specification, and contents IDs can be assigned.
If the s flag is set, then the contents are only meaningful on the same machine/network on which they were saved. This covers filenames and similar things. How to handle checking if this is indeed the same machine is an open question, and beyond the scope of this document. It is certainly true, however, that if the operating system ID does not match the current system and this bit is set, then the chunk should not be copied.
If the c flag is set, the contents should not be copied when loading and saving a game--they are only relevant to the exact current state of play as stored in the file. The data need not be copied even if this flag is clear, but must not be copied if it is set.
If the interpreter ID is ____ (four spaces), then the chunk contains information useful to *all* interpreters running on a particular system. This can store a magical OS-dependent reference to the original story file, which need not worry about vagaries of filename handling on more than one system. This chunk may contain anything that can be put in a file and retrieved intact. If the file is restored on a suitable system this can be used to do Good Things.
If the operating-system ID is ____, then the chunk contains data useful to *all* ports of a particular interpreter. This may or may not be useful.
The interpreter and operating-system IDs may not both be ____. This should not be neccessary.
If neither ID is ____, the contents are meaningful only to a particular port of a particular interpreter. Save-file specific preferences probably fall into this category.
The contents ID will be defined when chunk IDs are picked. Its purpose is to allow multiple chunks to be written containing different data, which is necessary if they need different settings of the c and s flags.
These extensions add no overhead to interpreters which choose not to handle them, except for larger save files and more chunks to skip when reading files written on another program. Interpreters are not expected to preserve these optional chunks when files are re-saved, although some may be copied, at the option of the interpreter writer or user.
The only required chunks are IFhd, either CMem or UMem, and Stks. The total overhead to a save file is 12 bytes plus 8 for each chunk; in the minimal case (IFhd, [CU]Mem, Stks = 3 chunks), this comes to 36 bytes.
The following operating system IDs have been registered:
'DOS ' MS-DOS (also PC-DOS, DR-DOS)
'UNIX' Generic UNIX
The following interpreter IDs have been registered:
'JZIP' JZIP, the enhanced ZIP by John Holder
The following extension chunks have been registered to date:System ID Interp ID Content ID Section 'MACS' ' ' 0 7.22
The following chunk has been registered for MacOS, to enable a Macintosh interpreter to find a story file given a save file using the System 7 ResolveAlias call. The MacOS alias record can be of variable size: the actual size can be calculated from the chunk size. Aliases are valid only on the same network as they were saved.
4 bytes 'IntD' chunk ID
4 bytes n chunk length (variable)
4 bytes 'MACS' operating system ID: MacOS
1 byte 00000010 flags (s set; c clear)
1 byte 0 contents ID
2 bytes 0 reserved
4 bytes ' ' interpreter ID: any
n-12 bytes ... MacOS alias record referencing the story file; from NewAlias
Alias records are of variable length, reflected in the chunk length; they are only valid on the same network they were created.
8. Introduction to the IFF format
This is based on the official IFF standards document, which is rather long and contains much that is irrelevant to the task in hand. I also do not have an electronic copy, so I am including only that which is relevant. Feel free to mail me (i.e. Martin Frost) if there are errors, inconsistencies, or omissions. For the inquisitive, a document containing much of the original standard, including the philosophy behind the structure, can be found at http://www.cica.indiana.edu/graphics/image_specs/ilbm.format.txt
IFF stands for "Interchange File Format", and was developed by a committee consisting of people from Commodore-Amiga, Electronic Arts and Apple. It draws strongly on the Macintosh's concept of resources.
The most fundamental concept in an IFF file is that of a chunk.
A chunk starts with an ID and a length.
The ID is the concatenation of four ASCII characters in the range 0x20 to 0x7E.
If spaces are present, they must be the last characters (there must be no printing characters after a space).
IDs are compared using a simple 32-bit equality test - note that this implies case sensitivity.
The length is a 32-bit unsigned integer, stored in big-endian format (most significant byte, then second most, and so on).
After the ID and length, there follow (length) bytes of data.
If length is odd, these are followed by a single zero byte. This byte is *not* included in the chunk length, but it is very important, as otherwise many 68000-based readers will crash.
A simple IFF file (such as the ones we will be considering) consists of a *single* chunk of type FORM.
The contents of a FORM chunk start with another 4-character ID.
This ID is also the concatenation of four characters, but these characters may only be uppercase letters and trailing spaces. This is to allow the FORM sub-ID to be used as a filename extension.
After the sub-ID comes a concatenation of chunks. The interpretation of these chunks depends on the FORM sub-ID (in this proposal, the sub-ID is IFZS), except that a few chunk types always have the same meaning (notably the AUTH, (c)_ and ANNO chunks described in section 7). For reference, the other reserved types are: FOR[M1-9], CAT[ 1-9], LIS[T1-9], TEXT, and ____ (that is, four spaces).
Each of these chunks may contain as much data as required, in whatever format is required.
Multiple chunks with the same ID may appear; the interpretation of such chunks depends on the chunk. For example, multiple ANNO chunks are acceptable, and simply refer to multiple annotations. If more than one chunk of a certain type is found, when the reader was only expecting one, (for example, two IFhd chunks), the later chunks should simply be ignored (hopefully with a warning to the user).
Indeed, skipping is the expected procedure for dealing with any unknown or unexpected chunk.
Certain chunks may be compulsory if the FORM is meaningless without them. In this case the IFhd, [CU]Mem and Stks are compulsory.
9. Resources available
A set of patches exists for the Zip interpreter, adding Quetzal support. They can be obtained from:
A utility, ckifzs is available as C source code to check the validity of generated save files. A small set of correct Quetzal files are also available. These may be of use in debugging an interpreter supporting Quetzal. These may be obtained from the web page mentioned in 9.1.
This document is updated whenever errors are noticed or new extension chunks are registered. The latest version will always be available from the above web page. The latest revision designated stable (currently version 1.3) will be in the IF archive, ftp.gmd.de/if-archive, in the directory infocom/interpreters/specification/.
This document is itself available in a number of forms. The base version is in preformatted ASCII text, but there is also a PDF version (converted by John Holder) and this HTML version (converted by Graham Nelson). Links to all of these may be found on the web page.
A few interpreters support Quetzal; details will appear here as they become available.
This standard was created by Martin Frost (email: firstname.lastname@example.org). Comments and suggestions are always welcome (and any errors in this document are entirely my own, or those of the HTML typesetter, Graham Nelson).
The following people have contributed with ideas and criticism (in alphabetical order): King Dale, Marnix Klooster, Graham Nelson, Andrew Plotkin, Matthew T. Russotto, Bryan Scattergood, Miron Schmidt, Colin Turnbull, John Wood.
Queztal is not a compulsory part of the Z-Machine Standard, since it does not have implications for the behaviour of story files, but it is attached to the HTML copy of the Standard as a highly recommended "optional extra".
Links to related sections of the Z-Machine Standards Document:
Contents / Section 6.1 on the saved state
Opcodes: save / save_undo / restore / restore_undo / catch / throw