This document details the format of the LINK-80 relocatable files that Nestor80 generates when the build type of the generated file is "relocatable". The information here comes mainly from the original MACRO-80 manual, but extra information that was missing in the original manual has been added after doing some reverse engineering of .REL
files generated by MACRO-80.
Linkstor80, the linking loader that is part of the Nestor80 project, is backwards-compatible with the old LINK-80 tool that was bundled together with MACRO-80: Linkstor80 defines an extended relocatable file format that is a superset of the original relocatable file format used by MACRO-80 and LINK-80. This document details the original relocatable file format first, and then the additions introduced in the extended file format.
Relocatable files are encoded as bit streams, this means that the contained items are one or more bits in size and (except where otherwise noted) are not aligned to byte boundaries.
Thus parsing a .REL
file implies reading and interpreting bits one by one (or in groups of two or more, but not aligned at the start of a byte).
Spaces added between bits in the examples in this document are for readability only and don't imply any special boundaries in the actual data. Hexadecimal items represented as XXh
are always 8 bits in size.
A relocatable file can contain three types of items:
- Absolute bytes, to be interpreted and linked "as is".
- Relocatable values, these are 16 bit values that are relative to the starting location of the code segment, the data segment or a COMMON block.
- Link items, allow handling other types of information such as segment or location counter changes, external symbol references, public symbol declarations, or program "metadata".
There's no header of any kind, the file starts right away with the bit stream of items.
Absolute bytes are encoded as a single 0
bit followed by the 8 bits of the byte. For example, the sequence of bytes 1, 129, 255 is encoded as:
0 00000001 0 10000001 0 11111111
Relocatable values are encoded as three bits, the first one is always 1
and the next two indicate the segment the value belongs to, followed by the 16 bits of the value in little endian. The segment bits are:
01
for the code segment10
for the data segment11
for a COMMON block
For example, the code segment relative value 1234h is encoded as:
1 01 34h 12h
Link items are encoded as the following sequence of items:
- Three fixed bits,
100
. - A 4 bit link item type code.
- An optional (depends on the link item type) relocatable value (two segment identification bits + 16 value bits)
- An optional (depends on the link item type) symbol bytes field consisting of:
- 3 bits indicating the field size in bytes.
- The symbol bytes, as many as the size field indicates.
Despite its name, the "symbol bytes" field doesn't always hold information about a symbol.
Whether a link item has a relocatable value, symbol bytes or both is determined by the link item type code as follows:
- 0 to 4: symbol only.
- 5 to 7: address and symbol.
- 8 to 14: address only.
- 15: Neither (special "end of file" item).
For example, a "define public symbol" item (type code: 7) with a value of 1234h in the data segment and a symbol name of XYZ
would be encoded as follows (symbol characters are to be interpreted as their ASCII values, 8 bits per character):
100 0111 10 34h 12h 011 XYZ
type address length symbol
These are the defined link item types with their codes:
There's an entry of this type at the beginning of the relocatable file for each public and external symbol that is used afterwards in the file.
Indicates that further items are to be considered included in the COMMON block of the specified name.
Indicates the program name as defined with a NAME
instruction or with a TITLE
instruction in the source code.
Contains a file name specified by a .REQUEST
instruction in the source code.
This is an extension mechanism that allows to define additional link items, the items generated by MACRO-80 and Nestor80 are listed in "Extension link items".
The value is the size of the common block of the specified name, defined by the last address used in the block plus one. For example, the following code:
common /FOO/
org 100h
db 1
end
will generate a relocatable file size for the COMMON block FOO
will be 101h.
External symbol references in relocatable files are chained: each location in which the linker will put the resolved value holds another address that will also be the destination for the resolved value, and so on until a value of absolute zero is found.
This link item points to the head of such a chain: the value field holds the location and the symbol field is the external symbol name.
For example, if a relocatable file contains:
- An absolute zero (16 bits) value at address 1234h in the code segment.
- A relocatable value "code 1234h" at address ABCDh in the data segment.
- A "chain external" link item whose address field is "data ABCDh" and whose symbol field is
FOO
.
then the linker, once it resolves the actual value of FOO, will put it in the "code 1234h" and "data ABCDh" addresses.
Holds the name and the value of a symbol declared as public in the source code. The linker resolves external references by searching for matching public symbols in the other relocatable files involved in the linking process.
This link item is never generated by MACRO-80 nor by Nestor80.
Expressions involving external symbols are usually stored as the full collection of expression parts in postfix format, to be evaluated by the linker during the linking process; however, for expressions of type "symbol+value" there's this dedicated link item type. Of course, the value field is the value to be added and the symbol field is the external symbol name.
Note that "symbol-value" expressions get also an "external plus offset" link item (the offset is stored as a twos complement negative number).
The value is the size of the data segment, defined as the last address used in that segment. For example, this code:
dseg
org 100h
db 0
end
will generate a relocatable file whose size for the data segment will be 101h.
Indicates that further items are to be considered included in the absolute segment, the code segment, the data segment or a COMMON block starting at the specified address. ASEG
, CSEG
, DSEG
and COMMON
instructions will generate an item of this type (for COMMON blocks a "Select COMMON block" item is first used to specify the block name).
Changing to the absolute, code or data segments will set the location counter at the last used address for that segment plus one; changing to a COMMON block always sets the location counter to zero.
For example, the following code:
cseg
org 100h
db 1,2
dseg
org 10h
db 3,4,5,6
common /FOO/
db 7,8
cseg
db 9,10
dseg
db 11,12
common /FOO/
db 13,14
end
will generate a relocatable file with the following items:
Set loading location counter to: code segment 0100h
1,2
Set loading location counter to: data segment 0010h
3,4,5,6
Select COMMON block: FOO
Set loading location counter to: common block 0000h
7,8
Set loading location counter to: code segmment 0102h
9,10
Set loading location counte to: data segment 0014h
11,12
Select COMMON block: FOO
Set loading location counter: common block 0000h
13,14
This item is similar to "Chain external" but is supposed to be used for relocatable addresses instead of external symbol references. However this item is never generated by MACRO-80 or Nestor80.
The value is the size of the code segment, defined as the last address used in that segment. For example, this code:
cseg
org 100h
db 0
end
will generate a relocatable file whose size for the code segment will be 101h.
This item is generated by an END
instruction found in code (or when the end of the source file is reached). The value of the item is the address passed as an argument to the END
instruction if present, or absolute zero otherwise.
This item forces a byte boundary, this means that after the item has been written to the relocatable file as many zero bits are added so that the total number of written bytes is a multiple of 8.
This is the only link item that has neither a value nor a symbol field; thus its encoding in the relocatable file is the fixed sequence of bits 100 1111
. After this item is found the remaining content of the file (if any) is ignored.
When the link item type code is "Extension link item" the symbol field of the item is to be interpreted as follows:
- First byte: Extension link item type code.
- Remaining bytes (up to 6): Depends on the type of extension link item.
The following extension link items are generated by MACRO-80 and Nestor80, they are used to represent expressions that need to be evaluated at link time (this happens for two reasons: expressions that contain external symbol references, and non-absolute values that need to be stored as one single byte).
The symbol bytes field will have two bytes: the fixed value 41h, and an arithmetic operator code. These are the defined arithmetic operator codes:
- 1: "Store as byte". Placed at the end of a list of link items that represent a postifx expression, indicates that the result of the expression is to be stored in the linked file as one single byte.
- 2: "Store as word". Placed at the end of a list of link items that represent a postifx expression, indicates that the result of the expression is to be stored in the linked file as a two bytes value.
- 3: "High byte" operator.
- 4: "Low byte" operator.
- 5: "Not" operator.
- 6: Unary minus.
- 7: Substraction.
- 8: Addition.
- 9: Multiplication.
- 10: Division.
- 11: Module (remaining of integer division).
Note that not all the existing operators are in the list (e.g. AND, OR... are missing). The extended relocatable file format defines additional codes for the missing operators, these are supported by Linkstor80 but not by LINK-80. Trying to use the additional operators in an expression with external symbol references will result in an error at assembly time when using MACRO-80, and also when using Nestor80 with the --link80-compatibility
argument.
Here's an example of a complete link item representing the "Not" operator:
100 0100 010 41h 05h
type length ext. type operator
The symbol bytes field will have one byte with the fixed value 42h, and then from one to six bytes with the ASCII representation of the symbol. Example for symbol XYZ
:
100 0100 100 42h XYZ
type length ext. type symbol
The symbol bytes field will have four bytes:
- The fixed value 43h
- The segment of the value (
00
= absolute,01
= code,10
= data,11
= common) - The low byte of the value
- the high byte of the value
Example for the value 1234h in the data segment:
100 0100 100 43h 02h 34h 12h
type length ext. type segment value
The assembler instruction LD A,3+(NOT FOO##)
would generate the following sequence of extension link items in the generated relocatable file (notice the postfix format and the "Store as byte" operator, needed because the argument of LD A,n
is one single byte):
Value, absolute 0003h
External symbol reference, FOO
Arithmetic operator, NOT
Arithmetic operator, Plus
Arithmetic operator, Store as byte
A library file is a file that contains a collection of programs (each program is an individual relocatable file). Library files are managed with the LIB-80 tool.
The file structure of library files is pretty simple: they just contain the relocatable files for the programs concatenated one after another. The "End of program" items mark the boundaries between the end of one program and the start of the following one; that is, after an "End of program" item there's always either an "End of file" item or the beginning of another program.
The fact that "End of program" items force a byte boundary implies that the programs in a library file are truly independent of each other at the file contents level, thus it's possible to add new programs to the library by just concatenating their .REL files at the end of the library file, without having to worry about parsing the previous contents of the file (except for removing the old "End of file" item and adding a new one after the new program).
The extended relocatable file format defined for usage with Nestor80 and Linkstor80 is the same as the old relocatable file format used by MACRO-80 and LINK-80, with the following additions. These additions are backwards-compatible, meaning that old relocatable files created with MACRO-80 can be processed with Linkstor80.
An extended relocatable file always starts with the following fixed sequence of bytes:
85 D3 13 92 D4 D5 13 D4 A5 00 00 13 8F FF F0 9E
This sequence was not chosen randomly, it's actually the codification of the following sequence of link items:
- Program name,
LNKSTOR
. - Define size of data segment, 0.
- End of program, address FFFFh.
- End of file.
The header is defined in this way so that it will be identified as an empty program by LINK-80 in case this tool is mistakenly used instead of Linkstor80 to process the file.
Thus if the file starts with the header then it follows the extended relocatable file format, otherwise it can be assumed that it follows the old MACRO-80 and LINK-80 compatible format.
Note also that in library files the header is per program, not per file.
The structure of link items is extended to allow for symbol bytes field lengths of up to 4G bytes. The structure of these extended fields is as follows:
- The legacy length of the field is between 2 and 5 bytes.
- The first byte of the legacy field contents is
FFh
. - Then there's between 1 and 4 bytes with the actual length of the symbol contents, in little endian. As few bytes as needed to represent the length are used, so one byte for lengths up to 255, two bytes for lengths 256 to 65535, etc.
- If the length of the symbol contents is 256 bytes or more, then after the bytes representing the length as many bits as needed are added (between 0 and 7) to force a byte boundary, as in the case of the End of program link item.
- Finally the symbol bytes themselves can be found.
For example, an external link item of type "External symbol reference" for a symbol named INITIALIZE
would be as follows, note how the legacy length of the field is 2 bytes (FFh
+ one byte for the actual length) and the actual length is 11 (one byte for the extended link item type + 10 bytes for the symbol itself):
100 0100 010 FFh 0Bh 42h INITIALIZE
type length ext. type symbol
Here's another example, a "Program name" item with the symbol length being 260 bytes (104h):
100 0010 011 FFh 04h 01h 0...0 EN_UN_LUGAR_DE_LA_MANCHA...
type length force symbol (260 bytes)
byte boundary
Symbol fields are represented in the extended format in two cases:
- The length of the field is 8 bytes or higher.
- The length of the field is between 2 and 7 bytes, and the first actual byte of the field contents is
FFh
.
If the symbol field contents consists of just the single byte FFh
then it's represented in the old format:
100 0010 001 FFh
type length
A file conforming to the extended relocatable file format can not have link items with a symbol field having a length of 6 or 7 and having the first byte of the field equal to FFh
.
The character encoding used to represent symbols in the symbol fields is UTF-8 in all cases, regardless of whether the old short format or the new extended format is used. UTF-8 is backwards compatible with 7 bit ASCII, the only encoding supported by MACRO-80 and LINK-80 for symbols, so this is not a breaking change.
Additionally, the casing of all symbols is preserved (in MACRO-80 compatibility mode they get uppercased), although symbols are still considered as case-insensitive when comparing for equality both in Nestor80 and in Linkstor80.
The extended relocatable file format defines the following additional arithmetic operator codes:
- 16: Shift right
- 17: Shift left
- 18: Equals
- 19: Not equals
- 20: Less than
- 21: Less than or equal
- 22: Greater than
- 23: Greater than or equal
- 24: Bitwise AND
- 25: Bitwise OR
- 26: Bitwise XOR
A library file is a concatenation of relocatable programs, and thus there is no special format defined for it. However it's important to be aware of the fact that when using the extended relocatable file format, the file header appears once per program, and not per file. A library file can contain a mix of programs conforming to the old LINK-80 format and programs conforming to the extended format define here; each program in the extended format will have its own header.
As an example, this is the outline of a library file containing three programs, where only the first and the last one conform to the extended format:
extended file format header
"program name" link item
...
"end of program" link item
"program name" link item
...
"end of program" link item
extended file format header
"program name" link item
...
"end of program" link item
"end of file" link item