Skip to content

Latest commit

 

History

History
973 lines (758 loc) · 56.5 KB

FreeForth_Primer.md

File metadata and controls

973 lines (758 loc) · 56.5 KB

Note

This page preserves the writings of Christophe Lavarenne (1956-2011) about FreeForth.
FreeForth2 differs from the original in certain details called out in notes below.

[FreeForth Home Page] [FreeForth Code Generation]

FreeForth Primer

Forthers please note: FreeForth has some specific features uncommon to other Forth dialects, that you should be aware of before starting using FreeForth; they are hereunder highlighted in slant green as this paragraph.

PRIMER SUMMARY:

Distribution License

FreeForth is a public domain software. Its license offers total freedom to its users, as stated by its LICENSE file.

Note

FreeForth2 is covered by the Apache-2.0 license.

FreeForth isn't just a toy, it is a small/simple/efficient interactive compiler generating compact and fast i386 native code; since its creation in 2006, I've been using it for my everyday work as much as handy desktop calculator for decimal, hex, binary, and floating-point operations, as for interactive cross-development of industrial real-time security applications embedded in microcontrollers, and for PC-controlled industrial test-benches.

Getting Started

In the following text, commands that you have to type will be in blue fixed font, and what the computer displays will be in black fixed font.

Distribution Files

Note

File links point to the original FreeForth files.

───────────
LICENSE License covering FreeForth source files
README text file, with the same installation instructions as hereunder
───────────
fflin.asm master file for compiling Linux executable
fflinio.asm Linux specific file-I/O and dynamic-link libraries interface
fflin.boot Linux specific boot source
ff Linux executable generated by fasm from fflin.asm
───────────
ffwin.asm master file for compiling Windows executable
ffwinio.asm Windows specific file-I/O and dynamic-link libraries interface
ffwin.boot Windows specific boot source
ff.exe Windows executable generated by fasm from ffwin.asm
───────────
ff.asm core; ----- READ COMMENTS STARTING THIS FILE -----
ff.boot core extensions; included in the executable, compiled at boot
ff.ff automatically compiled at boot if found in FF-root-directory:
Linux: $HOME/ff/ or ./ Windows: %FFROOT% or C:\ff\ or .\
ff.help online documentation, for use with help
───────────
bed.ff minimalist block-editor
hello Linux example self-executable file
hanoi animated Hanoi tours (non-recursive implementation)
───────────

Installation under Linux

Tip

FreeForth2 installation is described in the README.md

Create $HOME/ff as FreeForth-root-directory and unzip ffXXXXXX.zip into it:

mkdir $HOME/ff
unzip ffXXXXXX.zip -d $HOME/ff

Alternatively, create $HOME/ff as a link to your prefered FF-root-directory:

ln -s /path/to/preferedFFrootdir $HOME/ff
unzip ffXXXXXX.zip -d $HOME/ff

If you also want FreeForth source files of your own to be self-executable even if not in FreeForth-root-directory, put as their first line (warning: the #! must be the file's first two characters):

#!/usr/bin/ff needs

and create /usr/bin/ff as a link to $HOME/ff/ff (you must be SuperUser):

sudo ln -s $HOME/ff/ff /usr/bin/ff

Then ff can be executed from any directory, as it will find ff.ff and ff.help (and any filename prefixed with a single quote, such as 'ff.ff or 'ff.help) in the FreeForth-root-directory.

Installation under Windows

Create C:\ff\ as FreeForth-root-directory and unzip ffXXXXXX.zip into it. For example, from a DOS window type:

md C:\ff
cd C:\ff
unzip ffXXXXXX.zip

Alternatively, define the environment variable %FFROOT% as a path to your prefered FF-root-directory with a final \, such as:

set FFROOT=C:\path\to\preferedFFrootdir\

Then add the FreeForth-root-directory to your %PATH% environment variable:

set PATH=C:\ff;%PATH%

Then ff can be executed from any directory, as it will find ff.ff and ff.help (and any filename prefixed with a single quote, such as 'ff.ff or 'ff.help) in the FreeForth-root-directory.

First Steps

To startup FreeForth, at the command-line prompt (in a terminal window under Linux, or in a DOS-command window under Windows), simply type:

ff

this will display FreeForth welcome banner, ending with FreeForth prompt ("0;"):

\ Loading'ff.ff: help normal .now {{{ +longconds f. uart! hid'm
\ FreeForth 1.2 <http://christophe.lavarenne.free.fr/ff>  Try: help
 0;

Tip

FreeForth2 has no preamble before the prompt. To see the loaded features, type -v as an argument to ff or at the prompt.

Now enjoy the "Hanoi Towers" animated demo by typing at the FreeForth prompt:

 0; needs  'hanoi

Tip

The leading ' above is replaced by the FreeForth-root-directory to make the full file name. FreeForth2 uses a search path instead which includes the current directory - type needs hanoi in the directory with the hanoi file.

I've put two spaces between needs and 'hanoi to make sure you'll see them clearly separated; you don't need to type two spaces, but at least one; one is usually enough between words.

When you are finished playing with the demo, let's compute your age in number of days: FreeForth understands date (or/and time) literals, that it converts into sequential numbers of days, which are then trivial to use with integer arithmetic operators (using postfix notation, as explained hereunder):

 0; 2008-10-14  1956-7-23  -  .  ;
19076 0;

Born on July 23rd 1956, I was 19076 days old on October 10th 2008.

Forthers note the semicolon after the dot: it is needed to trigger the execution of this "anonymous" definition, as explained hereunder.

Tip

FreeForth2 supplies the final semicolon automatically.

FreeForth also comes with a full-fledged floating-point package; let's verify that the tangent of pi/4 is equal to 1.0:

19076 0; fpi  4.  f/  ftan  f.  ;
1.000e0 0;

Tip

With FreeForth2, load the floating-point words first by typing needs full.ff.

To pass a command-line to the command-shell, simply type it after a double exclamation mark:

1.000e0 0; !! echo Works under Linux or Windows
Works under Linux or Windows
 0;

To exit FreeForth and return to the shell command-line prompt, simply type:

 0; bye

Getting Online Help

To get online help, as the FreeForth welcome banner encourages you to ("Try: help"), at the FreeForth prompt type:

help

this will display the introduction record of the online help system:

help` ( <name> -- ) displays name's related help (try: help bye)
  Items with no stack diagram are concepts, or hidden non-compilable words.
  Beginner: start with "help DATAstack" to understand "stack diagrams".
  Forth geek: look at "conditionals" and "flow-control" which are unusual,
    and at "anonymous" and "backquote" which are specific to FreeForth.
  Curious: start with the comments at beginning of file "ff.asm".
  see also: man win32.hlp bye ff.help  ff.ff
 0;

For example, if you want help about the ff.help file mentionned in the last line "see also:" above, type: help ff.help this will display the "ff.help" record in the ff.help file:

ff.help is the FreeForth help file, expected by "help`" (defined in ff.ff)
  to be found in FreeForth root directory (where ff.ff is found at boot time).
  Users may create their own help files following the same simple text format,
  and add them into "help`" definition, as explained in commented sample line.
  see also: ff.asm ff.boot ff.ff bed.ff hello
 0;

You can also directly read the plain-text ff.help file, which is organized in chapters grouping related topics and words.

Before playing more with FreeForth, let's first learn a bit more about its trivial syntax, handy literals, data flow (use of stacks), and control flow.

FreeForth Syntax

FreeForth (as other Forth dialects) uses a trivial syntax: any string of any characters except blank (space, tab, newline, etc) is a symbol or word, separated from other words by one or more blanks (whereas most other languages restrict their symbols to use only alphabetic or numeric characters, and use other "special" characters as separators). Here are seven examples of valid words:

5 TIMES  r 2* .  REPEAT ;

where the dot and semicolon are simple single-character words. Note that FreeForth is case-sensitive: you must spell words with the correct combination of uppercase and/or lowercase letters. So, don't fool yourself or others by mixing upper and lowercase letters, except maybe to separate words as in SeveralWordsInOne.

Most words represent a constant value, either literally, where the value is explicitly given by the characters composing the word name (such as the 5 above, literals syntax is fully specified hereunder), or symbolically, where the value is assigned to the word either explicitly by the programmer, to replace all the occurrences of a literal constant or constant expression with a meaningful name defined in a single place (see help constant), or implicitly by the compiler, when the programmer defines a data buffer (the compiler allocates memory and assigns its base address to the programmer-defined word, see help create), or a code label (the compiler assigns the current code address to the programmer-defined word, see help :).

What other languages call "expression" or "instruction" is in FreeForth a simple sequence of words called phrase, that the programmer may separate from other phrases by double space or a newline for easier reading, as in the above example which contains three phrases. Phrasing and indentation are mostly useful, as in every language, to exhibit the control structure of program flow; FreeForth offers a rich set of predefined control-structure words, which the programmer may extend at will if needed.

What other languages call "function" or "subroutine" is in FreeForth a sequence of words called definition, beginning with the single-character word : (colon) followed by one word as the definition name, and ending with the single-character word ; (semicolon). The colon and its following definition name may be omitted, as in the above example of seven valid words: such an anonymous definition, as it has no name to reference it later, is executed (and "forgotten") immediately when the semicolon is compiled (try it: it should display 8 4 6 2 0). This is where FreeForth provides its interactivity.

Forthers note this is the most important difference between FreeForth and other Forth dialects, which instead provide interactivity with the help of an "interpret" state outside definitions (by opposition with a "compile" state within definitions), where every word is immediately executed, which cannot support forward-references of regular control-structures words. By comparison, FreeForth is always in "compile" state because always within definitions, whether named or anonymous.

What other languages call "executable code and data space" is in FreeForth the memory space where all compiled definitions are stored, naturally called the dictionary, with a separate headers space storing the definitions names (for lookup during the compilation of a new definition), that other languages call "compilation symbol table" (and use only during compilation, and strip off the executable file, except when the debug option is set). Type words to list the contents of the headers space.

When FreeForth source code is compiled, each word is looked up in turn in the headers space; if it is found, it is either immediately executed if found with an appended backquote (it's then called a macro), or its execution is postponed (i.e. it is "compiled" into a call assembly instruction jumping to the definition entry point); otherwise, an unfound word is converted into a literal string or number, depending on its last character (see hereunder for more details), or raises an exception with the error message <-error: ??? if the conversion fails.

Forthers note FreeForth backquote-suffix makes defining and using macros much easier than the IMMEDIATE and POSTPONE genuine Forth words: see the ff.boot kernel source file for lots of examples.

FreeForth comments may be inserted in different ways within a source file:

  • within a line, or across several lines:
    • the single-character word ( (left bracket) opens the comment, which is closed by the first character ) (right bracket) found either on the same source line, or within a later source line (this is the equivalent of the /*...*/ in the C language)
  • upto end of line:
    • the single-character word \ (backslash) opens a comment, which is closed by the end of line (this is the equivalent of the // in C++ and C99); don't forget to insert at least one space after the backslash, otherwise the immediately following newline will be "eaten" as blank separator, and the next line will be commented out up to its end of line!
  • upto end of file:
    • the word EOF opens a comment up to the end of file; this may be a convenient place to add lots of comments such as documentation or versions followup.

You can also use comments when editing a command line at the FreeForth command prompt, if you ever find it useful; but as the "source file" is then limited to this single command line, the three commenting words are almost equivalent, except that a bracket-comment may be closed before the end of the command line.

FreeForth Literals

Forthers note this is very different from other Forth dialects, mainly for string literals, which embedded spaces must be replaced with underscores.

Tip

FreeForth2 permits space characters within strings.

As seen in FreeForth Syntax, when during compilation a word is not found in the dictionary headers space, it is converted to a literal string or number depending on its final character:

  • a final " (double-quote) marks a literal string, which initial character is processed specially:

    • an initial . (dot) compiles a counted string (i.e. with a first byte equal to the string byte-size, and a null byte appended after the last string byte) preceded by a call which, when later executed, will display the string contents; for example, ."Hello_world!" will display Hello world!
    • an initial ! (exclamation mark) compiles a counted string preceded by a call which, when later executed, will throw an exception with as argument the address of the counted string, such that the exception handler which catches the exception can display the string, typically as error message; for example, !"divide_by_zero" will display <-error: divide by zero
    • an initial " (double-quote) compiles a counted string preceded by a call which, when later executed, will return the string's first character address and its byte-size ( -- @ # )
    • an initial , (comma) compiles the string standalone, i.e. with no count and no preceding call; this is useful to compile inline any data and/or code
    • any other initial character with a final double-quote raises an exception with the error message <-error: ???

    Furthermore, five characters within literal strings are processed specially:

    • every _ (underscore) is replaced with a space -- i.e. you must replace every space within a string with an underscore, otherwise the space would be considered as a separator between two words
    • any character prefixed with a ^ (caret) is replaced with the corresponding "control" character with an ASCII value 64 less; for example, ^J is replaced with the ASCII 74-64 = ASCII 10, i.e. "newline"
    • any character suffixed with a ~ (tilde) is replaced with the corresponding byte with an ASCII value 128 more; for example, J~ is replaced with the ASCII 74+128 = ASCII 202
    • any " (double-quote) is ignored and skipped (such as after the initial dot or exclamation mark)
    • any character prefixed with a \ (backslash) is left as is, which is useful to "escape" these five special characters

    For example, ."Hello_\_my\__world!^J" will display Hello _my_ world! followed by a newline.

  • a final arithmetic operator character (among + - * / % & | ^) calls number to convert the string (without the final character) into a number, or if it fails calls find to look in the compiler symbol table for a header matching the string (without final), and if found with a "data" type, gets the number in its pointer field (or if it fails raises an exception with the error message <-error: ???); then these finals compile their corresponding binary-op instruction (+:add -:sub *:mul /:div %:mod &:and |:or ^:xor) with the above number as immediate argument, and DATAstack top as destination; simple and efficient "manual" optimization: for example, 3+ is faster and smaller than 3 +

  • a final @ (arobas) calls find as above, and compiles runtime code which will read memory cell at that address and push its contents onto DATAstack; with a variable named var for example, var@ is faster and smaller than var @

  • a final ! (exclamation-mark) calls find as above, and compiles runtime code which will store DATAstack top at the address and pop DATAstack; with a variable named var for example, var! is faster and smaller than var !

  • a final _ (underscore) calls number or find as above, and compiles runtime code which will replace DATAstack top with the obtained number; for example, 5_ is faster and smaller than drop 5 this saves a drop (and a push)

  • a final , (comma) calls number or find as above, and compiles runtime code which will store the obtained number at the location pointed at runtime by the compilation pointer (i386 register ebp), which will have to be incremented separately; useful for macros such as s1 which compile inline assembly code any other final calls number to convert the string (with its final) into a number (or if it fails raises an exception with the error message <-error: ???), and compiles runtime code which will push that number onto the DATAstack ( -- n ).

number converts the string, with an optional initial minus sign - (dash), starting with a decimal conversion base, which may be overridden, anywhere and any number of times within the string, as follows:

  • $ changes the conversion base to 16 (hexadecimal)
  • & changes the conversion base to 8 (octal)
  • % changes the conversion base to 2 (binary)
  • # changes the conversion base to the number converted so far

For example, the following literal numbers all give the same value:

18  $12  &22  %10010  3#200  12#16  %10&2  9%0  &11%0

number also interprets a few other special characters:

  • : (colon) multiplies by 60 the number converted so far, then adds to it the remaining string; this is typically for conversion of hours (or degrees), minutes, seconds, as for example:

    1:10:20 . ; 4220   \ = (1h *60m/h +10m) *60s/m +20s
    
  • _ (underscore) multiplies by 24 the number converted so far, then adds to it the remaining string; this is typically for conversion of days into hours, as for example:

    2_1 . ; 49   \ = 2d *24h/d +1h
    
  • - (dash) must be used twice to separate the three numbers of a date (year, month, day), and convert them into a sequential day number in the Gregorian calendar as for example:

    2000-3-1 . ; 730485
    

The following words (defined in the "Calendar date and time display" section in the ff.ff file) use the same characters to display date and/or time:

.d  \ n -- ; displays n (days since 0000-3-1) as a date
  730485 .d ;  2000-3-1
.t  \ n -- ; displays n (seconds since midnight) as a time
  47700 .t ;  13:15:00
.dt \ n -- ; displays n (seconds since 2000-3-1) as a date and time
  300'000'000 .dt ;  2009-9-2_5:20:0
.now` \ -- ; displays system's current date and time
  .now  wed 2009-8-12_16:4:37

FreeForth Data Flow

FreeForth (as other Forth dialects) uses two LIFO stacks (Last-In-First-Out, i.e. the last item pushed onto the stack will be the first one popped from it), the DATAstack for passing subroutine arguments and results, and the CALLstack for saving/restoring subroutines return addresses, intermediate results, and for storing loop counters.

The DATAstack is implicitly used by every word for receiving its input arguments and for returning its output results. This is conventionally documented in a word definition after the definition name, by a DATAstack-effect-diagram comment describing the DATAstack states before and after the word execution, separated by -- (double dash):

  • the left DATAstack state lists the input arguments
  • the right DATAstack state lists the output results

Each list item is a single word, in the sense of FreeForth Syntax, with the rightmost (last in the list) representing the top of DATAstack.

For example, the single-character word + (plus), defined to execute an addition taking two input arguments and returning a single output result equal to the sum of the second and top DATAstack items, could have either of the two equivalent following stack-diagram comments:

+ ( n2 n1 -- n ; n=n2+n1 )  \ the stack-diagram ends at the semicolon
+ ( x y -- x+y )            \ less verbose, x+y is the single output

A few words (the complete list is available in the ff.help file chapter "STACKS HANDLING") are defined only for their stack-effect, with no other arithmetic or logic operation or any side-effect other than rearranging the DATAstack, such as for examples:

drop ( x -- )               \ discards top of DATAstack
dup  ( x -- x x )           \ duplicates top of DATAstack
over ( x y -- x y x )       \ duplicates second of DATAstack
swap ( x y -- y x )         \ exchanges second and top of DATAstack

The CALLstack is implicitly used by every word for saving/restoring its return address, and for storing loop counters on top of the CALLstack. It is explicitly used by the words >r dup>r r r> rdrop for saving/restoring intermediate results to free up the DATAstack top; this is conventionally documented in a word definition by a CALLstack-effect-diagram comment describing the CALLstack states before and after the word execution, separated by == (double equal), with the same conventions as for the DATAstack-effect-diagram, such as for examples:

>r    ( x --   |   == x )
dup>r ( x -- x |   == x )
r     (   -- x | x == x )
r>    (   -- x | x ==   )
rdrop (   --   | x ==   )

When words are executed in sequence, their individual stack-effects accumulate into a single stack-effect; where two sequences join (after they have forked at a conditional forward jump, or after a backward jump), they should have the same cumulative stack-effect, otherwise the definition is probably bugged (this is a common error for beginners: now you're warned !-)

"Computing" cumulative stack-effects is trivial for short sequences; it is good practice to add at end of lines the current DATAstack state to make it easier to follow the data flow on the DATAstack, and check the DATAstack states match at control-flow joins.

A simple example:

: double  \ n -- 2n
  dup +   \ -- n+n
;

The colon defines the new word double which takes a single input argument n, and outputs a single result 2n which is the double of the input argument. double's definition specifies that to compute the double, the input argument is first duplicated on top of DATAstack by dup so the intermediate stack state is ( -- n n ), then the two items on top of DATAstack are added by + so the intermediate stack state is ( -- n+n ) and the cumulated stack effect is ( n -- n+n ). Finally, the semicolon closes double's definition.

Let's use anonymous definitions to test double with different values (note that, as you type the CarriageReturn/NewLine/Enter key after the semicolon to let FreeForth compile your edited line, the . (dot) word prints its output at the beginning of the next line):

0; 3 double . ;
6  0; \-4 double . ;
-8  0;

The main arithmetic and logic words (the complete list is available in the ff.help file chapter "INTEGER ARITHMETIC") are:

  • + ( x y -- x+y ) addition
  • - ( x y -- x-y ) subtraction
  • * ( x y -- x*y ) product
  • / ( x y -- x/y ) division (truncated)
  • % ( x y -- x%y ) modulo
  • 1+ ( x -- x+1 ) increment
  • 1- ( x -- x-1 ) decrement
  • 2* ( x -- 2x ) arithmetic shift left
  • 2/ ( x -- x/2 ) arithmetic shift right (signed)
  • negate ( x -- -x ) opposite (2's complement)
  • ~ ( x -- ~x ) bitwise not (1's complement)
  • ^ ( x y -- x^y ) bitwise exclusive-or
  • | ( x y -- x|y ) bitwise inclusive-or
  • & ( x y -- x&y ) bitwise and
  • << ( x y -- x<<y ) logic shift left
  • >> ( x y -- x>>y ) logic shift right (unsigned)

Forthers note words named after C operators. You can easily alias them (help alias) with ANS-Forth names if you like:

%` ' alias mod`
~` ' alias invert`
^` ' alias xor`
|` ' alias or`
&` ' alias and`
<<` ' alias lshift`
>>` ' alias rshift`

FreeForth Control Flow

As in most other languages, FreeForth default control flow is sequential: successive words are executed one after the other, except when executing flow-control words, which jump, i.e. disrupt the sequential execution, either conditionally or unconditionally.

Forthers note FreeForth control structures are quite different from other Forth dialects, mainly because conditional jumps test the i386 processor flags resulting from the last arithmetic or logic operation, and don't pop the DATAstack top (whereas most other Forth dialects test and pop the DATAstack top item); this requires some extra drop but saves code space and run time.

Subroutine Call and Return

The most employed unconditional jumps are:

  • the subroutine call, which saves the address of the next instruction (aka "return address") by pushing it on top of the CALLstack, and jumps to the entry point of the called subroutine,
  • the subroutine return, which exits the subroutine by jumping to the address saved on top of the CALLstack, and pops the CALLstack.

FreeForth definitions may have several entry points, and several exit points, although they usually have only one entry point and one exit point (most other languages, and even most Forth dialects, don't support multi-entry subroutines, although it saves code space and execution time; but you may avoid to use this if you don't like it).

Each entry point is declared with the word : (colon) followed by the entry point name, which may be any "word" as defined in the FreeForth Syntax chapter. An entry point must have been declared before its first use: forward references, i.e. forward jumps, are not supported for subroutine entry points (forward references are supported by other control structures).

  • to compile a subroutine call to an entry point of a definition, simply use the entry point's name, which must have been previously defined by the word : (colon), as seen in the double example above
  • to compile a subroutine return at the end of a definition, simply use the word ; (semicolon)
  • to compile a subroutine return inside a definition, use instead the word ;; (double semicolon); as this is an unconditional jump, the next word will never be executed, unless it is the target of another jump, i.e. it is preceded by a THEN or by an END
  • instead of the pair of words ;; THEN prefer the shorthand single word ;THEN
  • to compile a conditional subroutine return inside a definition, use the word 0; which checks the top of DATAstack, and if null drops it and returns, otherwise lets it on top of DATAstack and continues sequential execution

Both ; (semicolon) and ;; (double semicolon) compile a ret assembly instruction, unless they are preceded by a call assembly instruction, that they change into a jmp assembly instruction: this is called tail recursion optimization. This may be used for simple loops jumping back to the definition entry point, such as for example:

: countdown  \ n --
  dup .      \ -- n ; duplicates and displays n
  0;  1-     \ -- n-1 ; if n null, drop&returns, otherwise decrements
  countdown  \ compiles a "call countdown" assemby instruction
;            \ changes the "call countdown" into "jmp countdown"
0; 5 countdown ;
5 4 3 2 1 0  0;

Unstructured Conditional Jumps

Subroutine entries may also be used as target of conditional jumps with the help of the word ? (query mark), which checks that the last compiled instruction was a call (otherwise throws <-error: is not preceded by a call), and changes it into a conditional jump assembly instruction, which condition must be specified by one of the following words (note pairs of condition specifiers in this list are opposite/complementary; if no condition is specified, 0<> is assumed by default):

0<> 0=  0< 0>=  0<= 0>  C0? C1?   \ --
<> =  < >=  <= >  u< u>=  u<= u>  \ x y -- x y

The condition specifiers on the first line select at compile time the conditional jump assembly instruction which will at run time check the i386 processor flags resulting from the last arithmetic or logic operation (Forthers note again: they don't pop the DATAstack) :

  • 0<> ? compiles a jnz assembly instruction (i.e. jump if the last operation result was non-zero)
  • 0= ? compiles a jz assembly instruction (i.e. jump if the last operation result was zero)
  • 0< ? compiles a jl assembly instruction (i.e. jump if the last operation result was strictly less than zero)
  • 0>= ? compiles a jge assembly instruction (i.e. jump if the last operation result was greater than or equal to zero)
  • 0<= ? compiles a jle assembly instruction (i.e. jump if the last operation result was less than or equal to zero)
  • 0> ? compiles a jg assembly instruction (i.e. jump if the last operation result was strictly greater than zero)
  • C0? ? compiles a jnc assembly instruction (i.e. jump if the last operation cleared the carry flag)
  • C1? ? compiles a jc assembly instruction (i.e. jump if the last operation set the carry flag)

The condition specifiers on the second line first compile an assembly instruction comparing the second item NOS to the first item TOS on top of the DATAstack (cmp edx,ebx), then select at compile time the conditional jump assembly instruction which will at run time check the i386 flags resulting from the comparison (Forthers note again: they don't double-pop the DATAstack):

  • <> ? compiles a jnz assembly instruction (i.e. jump if NOS is different from TOS)
  • = ? compiles a jz assembly instruction (i.e. jump if NOS is equal to TOS)
  • < ? compiles a jl assembly instruction (i.e. jump if NOS is strictly less than TOS)
  • >= ? compiles a jge assembly instruction (i.e. jump if NOS is greater than or equal to TOS)
  • <= ? compiles a jle assembly instruction (i.e. jump if NOS is less than or equal to TOS)
  • > ? compiles a jg assembly instruction (i.e. jump if NOS is strictly greater than TOS)
  • u< ? compiles a jb assembly instruction (i.e. jump if NOS is strictly below TOS)
  • u>= ? compiles a jae assembly instruction (i.e. jump if NOS is above or equal to TOS)
  • u<= ? compiles a jbe assembly instruction (i.e. jump if NOS is below or equal to TOS)
  • u> ? compiles a ja assembly instruction (i.e. jump if NOS is strictly above TOS)

The last four condition specifiers are unsigned, i.e. above means "unsigned greater than", and below means "unsigned less than".

The countdown example at the end of the previous chapter may be rewritten using a conditional jump (instead of the conditional return):

: countdown  \ n --
  dup .      \ -- n ; duplicates and displays n
  1-         \ -- n-1 ; decrements n, this changes the i386 flags
  countdown  \ compiles a "call countdown" assemby instruction
  0>= ?      \ changes the "call countdown" into "jge countdown"
  drop       \ -- ; dispose of the -1 remaining on the DATAstack
;            \ this compiles a "ret" assembly instruction
0; 5 countdown ;
5 4 3 2 1 0  0;

Although this kind of unstructured conditional jumps may sometimes be useful, control-flow is usually better exhibited using control structures.

Two-Alternatives Control Structure

                  /---> alternativeIfTrue ---\
... compute cond?<(fork)                (join)> commonFollowingCode ...
                  \---> alternativeIfFalse---/

The above graphical representation of a two-alternatives control structure is composed of 5 parts:

  • compute is a place holder for arithmetic or logic code which modifies the i386 condition flags
  • cond is a place holder for a condition specifier, as described in the previous chapter Unstructured Conditional Jumps
  • alternativeIfTrue is a place holder for code to execute when the i386 condition flags match the condition specifier
  • alternativeIfFalse is a place holder for code to execute when the i386 condition flags don't match the condition specifier
  • commonFollowingCode is a place holder for code to execute after any of the two alternatives

Note the alternativeIfTrue and alternativeIfFalse both start with the same stacks state at the fork point; they should have the same stack effect, in order for the commonFollowingCode to start at the join point, whatever the executed alternative, with the same stacks state (in terms of stacks depths, of course, not in terms of stacks contents).

The above graphical representation must be ordered when specified with a textual/sequential language such as FreeForth, where this is done with the three words IF ELSE THEN:

... compute cond
IF   alternativeIfTrue
ELSE alternativeIfFalse
THEN
commonFollowingCode ...

If the alternativeIfFalse is empty, the ELSE may be omitted. If the alternativeIfTrue is empty, replace the cond with its opposite condition specifier, and omit the ELSE too. If cond is omitted, 0<> will be used as default condition specifier.

Tip

FreeForth2 requires an explicit condition or throws an error!

IF ELSE THEN are macros (i.e. executed at compile time) which resolve the forward jumps on the fly, in a single pass:

  • IF` compiles a conditional forward jump assembly instruction depending on the opposite of the condition specified by cond (because the jump must not be taken to execute the alternativeIfTrue), and pushes at compile time on the DATAstack the address of the jump-offset byte argument of the jump assembly instruction
  • ELSE` compiles an unconditional forward jump jmp assembly instruction, pushes at compile time on the DATAstack the address of the jump-offset byte argument of the jump assembly instruction, then pops at compile time the address pushed by the previous IF, computes the jump offset, and stores it at this address
  • THEN` resolves at compile time the previous forward jump, i.e. pops the address pushed by the previous ELSE (or IF if ELSE was omitted), computes the jump offset, and stores it at this address

Pushing/popping offset bytes addresses at compile time allow structured nesting of IF ELSE THEN control structures, as for example:

: sign  \ n -- ; displays n sign
  0-    \ -- n ; compares n to zero to setup the i386 condition flags
  0< IF ."negative"    \ "0< IF" compiles "jge A" assembly instruction
  ELSE  \ non-negative \ "ELSE" compiles "jmp D; A:" and resolves "jge A"
    0= IF ."null"      \ "0= IF" compiles "jnz B" assembly instruction
    ELSE  ."positive"  \ "ELSE" compiles "jmp C; B:" and resolves "jnz B"
    THEN  \ -- n       \ "THEN" resolves "jmp C" (here is label "C:")
  THEN  \ -- n         \ "THEN" resolves "jmp D" (here is label "D:" = "C:")
  drop  \ -- ; dispose of n remaining on top of DATAstack
;
0; -1 sign ;
negative  0; 5 sign ;
positive  0;

Multi-Alternatives and Loop Control Structures

Multi-alternatives and loop control structures have a start point and an end point; they may contain several conditional and/or unconditional forward jumps to the end point; moreover loop control structures may contains several conditional and/or unconditional backward jumps to the start point.

The start point is marked by one of the three macros  BEGIN` START` RTIMES` :

  • BEGIN` first compiles as many nop assembly instructions as are needed (zero to three) to align the compilation pointer on an address multiple of 4 (this will reduce the number of clock cycles of any jump to this aligned address), then it saves/pushes on the DATAstack at compile time the previous 64-bits contents of the compiler variable `mrk and stores in each of its two 32-bits cells the current compilation pointer, i.e. the start point address; the second one won't change up to the end point, and will be used to resolve backward jumps; the first one will be used as a header link through all the forward jumps, to be resolved when reaching the end point
  • START` compiles a forward unconditional jump assembly instruction which offset will have to be resolved by the ENTER` macro inside the control structure; then START` does the same thing as BEGIN`
  • RTIMES` does the same thing as BEGIN` and then compiles code which at run time will decrement the top of CALLstack and jump to the end point if the result is negative (i.e. if it was negative or null before the decrement); the loop-count on top of CALLstack is usually pushed from the DATAstack by >r before RTIMES
    • use the macro TIMES` as a shortcut for >r RTIMES

The end point is marked by one of the three macros  END` REPEAT` UNTIL :

  • END` compiles no code and resolves all the linked forward jumps, starting from the address contained in the first 32-bits cells of the `mrk compiler variable, until the link address is equal to the second 32-bits cell of the `mrk compiler variable; then, if the code compiled by RTIMES` is found at the start point, END` compiles an rdrop ( -- | n == ) to dispose of the loop count on top of the CALLstack; finally END` restores/pops from the DATAstack at compile time the contents of the `mrk compiler variable
  • REPEAT` compiles an unconditional backward jump to the start point; then it does the same thing as END`
  • UNTIL` compiles a conditional backward jump to the start point; as for every conditional jump, it must be preceded by a condition specifier (0<> by default), as defined in the above chapter Unstructured Conditional Jumps; then it does the same thing as END`

Pushing/popping the `mk compiler variable at compile time allow structured nesting of multi-alternatives/loop control structures, possibly nested with two-alternatives control structures, such as for example:

\ anonymous definition: displays a multiplication table
9 TIMES  9 r -    \ -- 9-i     | == i ; i from 8 downto 0, 9-i from 1 upto 9
  9 TIMES  9 r -  \ -- 9-i 9-j | == j ; j from 8 downto 0, 9-j from 1 upto 9
    over*         \ -- 9-i (9-j)*(9-i) ; compute product
    9 <= drop IF space THEN .  \ -- 9-i ; display product right justified
  REPEAT  drop cr \ -- ; dispose of 9-i, "carriage-return" to newline
REPEAT ;          \ -- ; end of anonymous definition: execute it
 1  2  3  4  5  6  7  8  9
 2  4  6  8 10 12 14 16 18
 3  6  9 12 15 18 21 24 27
 4  8 12 16 20 24 28 32 36
 5 10 15 20 25 30 35 40 45
 6 12 18 24 30 36 42 48 54
 7 14 21 28 35 42 49 56 63
 8 16 24 32 40 48 56 64 72
 9 18 27 36 45 54 63 72 81
 0;

Between the control structure start and end points, any number (including zero) of conditional or unconditional, backward jumps to the start point, or forward jumps to the end point, may be specified with the four macros  TILL` WHILE` AGAIN` BREAK` :

  • TILL` compiles a conditional backward jump to the start point
  • WHILE` compiles a conditional forward jump to the end point
  • AGAIN` compiles an unconditional backward jump to the start point
  • BREAK` compiles an unconditional forward jump to the end point

TILL` and WHILE` as every conditional jump must be preceded by a condition specifier (0<> by default), as defined in the above chapter Unstructured Conditional Jumps.
AGAIN` and BREAK` as every unconditional jump must be preceded/balanced by an IF (and resolve this latter's byte-offset), otherwise the code following them would never be executed.

Note that, as WHILE` and BREAK` use the `mrk compile variable to link the forward jumps of the multi-alternatives control structure, they can be embedded within a nest of two-alternatives control structures, which at compile time use the DATAstack to save/restore their byte-offset addresses, without interfering with the `mrk compiler variable.

The countdown example at the end of the Unstructured Conditional Jumps chapter may be rewritten using a loop control structure:

: countdown  \ n --
  BEGIN      \ -- n ; marks loop start point "A:"
    dup . 1- \ -- n-1 ; duplicates and displays n, then decrements it
  0< UNTIL   \ -- -1 ; "0< UNTIL" compiles a "jge A" and closes loop
  drop       \ -- ; dispose of the -1 remaining on the DATAstack
;

The sign example at the end of the Two-Alternative Control Structure chapter may be rewritten using a multi-alternatives control structure:

: sign  \ n -- ; displays n sign
  BEGIN 0-  \ -- n ; compares n to zero to setup i386 condition flags
  0< IF ."negative" BREAK \ "BREAK" resolves "0< IF" and compiles forward jump
  0= IF ."null"     BREAK \ "BREAK" resolves "0= IF" and compiles forward jump
        ."positive"       \ default alternative
  END   \ resolves the two forward jumps compiled by the two "BREAK"
  drop  \ -- ; dipose of n remaining on top of DATAstack
;

Long Jumps

The i386 processor offers two versions of the jump assembly instructions:

  • short jumps taking 2 bytes, the last of which is a signed offset, between -128 and +127, counted from the next byte
  • long jumps taking 6 bytes, the 4 last of which are a signed offset, between -2'147'483'648 and 2'147'483'647, counted from the next byte

For backward jumps, FreeForth compiles short jumps wherever possible, otherwise it automatically compiles long jumps when the offset is larger than -128.

For forward jumps, FreeForth compiles short jumps by default to reduce code size, and throws an <-error: jump off range if the jump offset is larger than +127. In that case, you can either:

  • move some code from within the control structure into an extra definition, and replace it by a subroutine call (5 bytes only) to this extra definition
  • before the offending definition, use the macro  +longconds` ( -- ; reveals alternative definitions of the control structure macros compiling long forward jumps), and after the offending definition, use the macro  -longconds` ( -- ; hides the alternative definitions of the control structure macros compiling long forward jumps, restoring the default control structure macros compiling short forward jumps)

Exception Handling

Exceptionally, typically under error conditions, you need to return from several nested subroutine calls at once, and restore some CALLstack and DATAstack saved states: this is called exception handling. FreeForth supports exception handling with the two words catch throw (they are not macros):

  • catch ( entryPoint -- errorCode ) pops entryPoint from the top of DATAstack, then pushes on top of the CALLstack an exception frame saving the CALLstack and DATAstack states and a link to the previous exception frame, saves the exception frame address (in the hidden variable xfp only used internally by catch and throw), and then issues a subroutine call to entryPoint; if the subroutine returns, i.e. if no throw is executed before, then catch returns a null errorCode, otherwise it returns the errorCode argument of throw
  • throw ( errorCode -- ) raises an exception identified by errorCode: it retrieves (from the hidden variable xfp) the address of the last exception-frame pushed by catch on the CALLstack, restores from it the CALLstack and DATAstack and hidden variable xfp saved states, and puhes errorCode on top of DATAstack as apparent result of catch

The compiler top-level loop, when executing an anonymous definition, catches any uncaught exception, but it expects the errorCode to be the address of a counted-string error-message, that it displays after an <-error: prefix; use a string literal with an initial ! (exclamation mark) to compile code which at run time will throw such an exception.
For example !"Error_message" within a definition will, if executed, raise an exception which, if not caught before the catch in the compiler top-level loop, will display <-error: Error message followed by the FreeForth 0; prompt.

To pass to catch an entryPoint of a definition, use its name (this will compile a call assembly instruction to the entryPoint) followed by the word ' (tick), which is a macro (i.e. executed at compile time) which checks that the last word compiled a call assembly instruction (otherwise raises the exception <-error: not preceded by a call), and replaces it at compile time by code which at run time will push the entryPoint on top of the DATAstack.

An example for fun:

: check  \ n key @ # -- n ; if n same as key, raise an exception with string
  drop   \ -- n key @ ; dispose of string count, also available at @-1
  >r     \ -- n key | == @ ; move string address away from DATAstack
  = drop \ -- n | == @ ; setup i386 condition flags, dispose of key
  IF r>  \ -- n @ ; get string address back on DATAstack
    1- throw  \ raise an exception with counted string address as errorCode
  THEN   \ -- n | == @ ; this was the DATAstack state before the IF
  rdrop  \ -- n ; dispose of the unused string address on CALLstack
;
: checkwins \ n -- n
  1 "wins_15_points!" check  \ -- n ; check n versus 1
  5 "wins_30_points!" check  \ -- n ; check n versus 5
  8 "wins_40_points!" check  \ -- n ; check n versus 8
;
: wins? \ n -- ; check whether n wins or looses
  checkwins '  \ -- n checkwins ; ' pushes checkwins entry point on DATAstack
  catch        \ -- n r ; r is catch result
  swap .       \ -- r ; displays n followed by a space
  0-           \ -- r ; compares catch result with 0 to setup i386 flags
  0= IF        \ -- 0 ; no throw was executed
    drop       \ -- ; dispose of 0 remaining on top of DATAstack
    ."wins_no_points."
  ;THEN        \ -- @ ; catch result is the counted string address
  c@+          \ -- @+1 # ; get string count and string start address
  type         \ -- ; display string
;
1 wins? ;
1 wins 15 points!  0; 3 wins? ;
3 wins no points.  0; 8 wins? ;
8 wins 40 points!  0;

FreeForth Symbolic Constants

As in other languages, it is often useful in FreeForth to replace a literal constant by a symbolic constant, which name may carry more meaning than the literal constant value, and which occurrences in the source code all refer to a single definition of its corresponding literal value (which modification, if needed, has then to be done in a single place).

To define a symbolic constant, generate its value with an anonymous definition (either with a simple literal constant, or with any computation), and use the defining macro constant` (or its alias equ`), which closes and executes the current anonymous definition, and associates the value popped from the DATAstack, with the next word parsed from the source code, to create in the compiler's headers space a new symbol tagged as a symbolic constant, which each later occurrence will be compiled into code pushing its value on the DATAstack. For example:

1024 equ KB \ one kilobyte

Note that operator-suffixes may be appended to symbolic constants exactly in the same way as for literal constants, as for example:

KB KB* equ MB \ one megabyte
MB KB* equ GB \ one gigabyte

Forthers note a FreeForth symbolic constant definition neither allocates any dictionary heap space, nor spends any time calling and returning from a "centralized" constant definition; instead every reference to a FreeForth symbolic constant is compiled into inline code which execution pushes on the DATAstack an embedded copy of the constant value.

FreeForth Variables

Thanks to its explicit stacks handling, FreeForth doesn't need named variables to pass arguments to subroutines, or to store intermediate expressions results (note we didn't use any in all the above code examples).

However, although much less used than in other computer languages, named variables are sometimes useful in FreeForth too, for global variables, which store "long-lived" contextual data.

Unlike most other computer languages (which try to check source code consistency with the help of "typed" variables, and spend a lot of efforts automatically typing generic operators and converting types around them), FreeForth offers "typeless" variables (each of which is only a symbolic name for the base address of the memory allocated by the compiler to the variable), and explicitly sized memory-access words (8bits/bytewise, 16bits/wordwise, 32bits/longwise, or even 64bits/wise).

The simplest way to create a 32bits variable is:
variable foo creates in the compiler's headers space a symbolic constant named foo equal to the current value of the compiler's heap memory allocation pointer, that it then increments by 4, and initializes at zero the 4 allocated bytes;
foo! ( n -- ) stores into foo the 32bits value n popped from the DATAstack;
foo@ ( -- n ) pushes on the DATAstack foo's 32bits contents;
foo ( -- @ ) alone pushes on the DATAstack foo's base address;
foo off ( -- ) stores 0 into foo (i.e. sets all its bits at zero);
foo on ( -- ) stores -1 into foo (i.e. sets all its bits at one);
foo +! ( n -- ) increments foo's contents by n (note the space between foo and +!);
foo -! ( n -- ) decrements foo's contents by n (note the space between foo and -!).

For variables of other sizes (than 32bits), such as for example for arrays, use:
create bar 8 allot ; creates in the compiler's header space a symbolic constant named bar equal to the current value of the compiler's heap allocation pointer, that 8 allot increment by 8, without any initialization of the 8 allocated bytes;
bar c! ( c -- ) stores into bar's first byte the 8 LSBits of c popped from the DATAstack;
bar 1+ c@ ( -- c ) pushes on the DATAstack bar's second byte;
bar w! ( w -- ) stores into bar's first word the 16 LSBits of w popped from the DATAstack;
4 bar+ w@ ( -- w ) pushes on the DATAstack bar's third word;
bar! ( n -- ) stores into bar's first long the 32bits value n popped from the DATAstack;
bar@ ( -- n ) pushes on the DATAstack bar's first long;
bar 2! ( lo hi -- ) stores into bar the two 32bits values hi then lo popped from DATAstack;
bar 2@ ( -- lo hi ) pushes on the DATAstack bar's 64bits contents;
bar 8 erase ( -- ) clears bar's 8 bytes;
bar 8 $FF fill ( -- ) sets bar's 8 bytes at $FF.

For initializing while allocating memory, such as for example for constant tables, use comma-words:
create table 2 , 1 , 0 , ; creates the symbolic constant table base address where three 32bits values are stored, first 2, then 1, then 0;
, ( n -- ) allocates the next 4 bytes from the heap (4 allot) and stores in them the 32bits of n popped from the DATAstack;
w, ( w -- ) allocates the next 2 bytes from the heap (2 allot) and stores in them the 16 LSBits of w popped from the DATAstack;
c, ( c -- ) allocates the next byte from the heap (1 allot) and stores in it the 8 LSBits of c popped from the DATAstack;
2 4* table+ @ ( -- 0 ) pushes on the DATAstack table's third long (at index 2).

WARNING: an easy way to crash FreeForth is to execute an anonymous definition which overwrites itself, such as for example:

create crash 10 TIMES -1 , REPEAT ;

To avoid this, execute a first anonymous definition to allocate memory, then execute a second anonymous definition to initialize the allocated memory:

create safe 40 allot ; safe 10 TIMES -1 over! 4+ REPEAT drop ; \ note the semicolon after allot

You may wonder why the following code doesn't crash:

create safe 40 allot safe 40 $FF fill ;

This is because the semicolon transforms the final call to fill into a jump (this is called "tail-recursion optimization"), then all the anonymous definition instructions are already executed when they are overwritten by the execution of fill (however, if any code were added between fill and the semicolon, the call to fill wouldn't be transformed into a jump, then it would return to code overwritten by its execution, which would eventually crash).

Conditional Compilation

Have Fun!

Now you know all the basics to become a productive FreeForth programmer.

  • Read the ff.help file, and use again and again the help command to get online documentation.
  • Read the ff.ff file to see which utilities it contains, and how they are coded and commented.
  • If you are curious of FreeForth internals, read the long comment at the beginning of the ff.asm file, and explore FreeForth source files.
  • If you have any comment about FreeForth, or if you find a bug (and maybe its correction :-) contact me.

This primer currently stops here. I'll extend it when I have more time.


CL101215