The Java code in this repository implements a set of tools for working with MIL programs, including code for generating MIL from LC or MIL source files, optimizing MIL code, and translating MIL to LLVM.
These tools are primarily intended for use as a backend for the alb compiler for Habit. (See https://github.com/habit-lang/alb.) However, they can also be used independently, as suggested by the examples described below.
A paper with more details about MIL is available via https://web.cecs.pdx.edu/~mpj/pubs/mil.html.
A first cut at a test suite for mil-tools is available from https://github.com/zipwith/mil-tester.
Work on mil-tools during the period 2014-2018 has been supported in part by funding from the National Science Foundation, Award No. CNS-1422979.
The following items are required to use the code in this repository:
-
Java Development Kit (version >= 1.8 should be sufficient)
-
Apache Ant (version >= 1.9.2 should be sufficient)
-
The LLVM Compiler Infrastructure (versions >= 4.0 should be sufficient)
First, we need to build the mil-tools, and the simplest way to do this (assuming you already have a suitable JDK and copy of Apache Ant installed) uses the following command sequence:
# Ensure a clean starting point:
ant clean
# Build mil-tools:
ant
[Use of ant clean
is recommended after every update to ensure
a consistent build.]
Next, copy the milc
shell script (or the milc.bat
file on
Windows) into a suitable folder on your path, and then edit that
file so that it includes a full path to the location of the
mil-tools.jar
file on your system. After these steps, you should
be able to run milc
(without arguments) in any directory to
produce output that starts as follows (run the command yourself
to see the full list of milc
options):
$ milc
usage: milc [inputs|options]
inputs: filename.milc Load options from specified file
filename.mil Load MIL source from specified file (.lmil for literate)
filename.lc Load LC source from specified file (.llc for literate)
options: -v verbose on
-d display debug messages
-ipathlist append items to input search path
...
$
The process for compiling an LC source file (in this case,
demo/fib.lc
) to MIL and then generating LLVM is illustrated by
the following commands (this time assuming that you have a
suitable installation of LLVM):
# Use mil-tools to compile fib.lc into LLVM:
milc -ilib --standalone demo/fib.lc -ltmp/fib.ll
# Build an executable from the generated LLVM code:
clang -o tmp/fib tmp/fib.ll demo/runtime.c
A brief explanation of the milc
command line options used here:
-
-ilib
sets the path for input files to include thelib
directory, which is where the defaultprelude.lc
and other library files are stored. -
--standalone
configuresmilc
to generate a program with an entrypoint calledmain
. -
demo/fib.lc
specifies the name of an LC source program. -
-ltmp/fib.ll
indicates thatmilc
should generate an LLVM version of the program and write a copy of the code in the filetmp/fib.ll
.
And a similarly brief explanation of the clang
command line options:
-
-o tmp/fib
sets the name (fib
) and folder location (tmp
) of the executable program thatclang
produces. -
tmp/fib.ll
specifies the name of the input LLVM code; of course, this is the name of the file that was generated by the preceedingmilc
command. -
demo/runtime.c
is the name of a small C source file that provides an implementation of theprintWord
function that is used in the originalfib.lc
program. Some additional examples that combine LC and C code in a single program are described later in this document.
Depending on details of your LLVM installation/platform (for
example, on macOS), you may need to substitute gcc
for clang
in the second command here. You may also need to substitute java -jar mil-tools.jar
in place of milc
, depending on your platform (Windows
users can substitute milc.bat
).
Other methods for compiling the generated LLVM code may be useful
in other settings. For example, you can use the following two
commands to compile fib.ll
via assembly with a specific target
architecture (in this case, 32-bit x86):
# Compile the generated LLVM code:
llc -O2 -filetype=asm -march=x86 tmp/fib.ll
# Build an executable that includes the generated code:
gcc -m32 -o tmp/fib -Wl,-no_pie demo/runtime.c tmp/fib.s
You may see a message of the form warning: overriding the module target triple with ...
when you run the commands above; we will
need to improve the LLVM code generator to avoid this in a future
release, but you should be able to ignore this message for now.
Finally, you can run the compiled program:
# Run the generated executable:
tmp/fib
The result should look something like the following, with a pair of arbitrarily chosen numbers bracketing two 144s, which have been calculated using two slightly different definitions of the Fibonnacci function:
91
144
144
17
Alternatively, you can use the following command line to run the program directly through the bytecode interpreter:
milc -ilib --standalone -x demo/fib.lc
Replace -x
with -m
to see the generated MIL code, or with -l
to
see the generated LLVM code, etc. (Or just run either milc
,
milc.bat
or java -jar mil-tools.jar
for a summary of additional
command line options.)
As a slightly larger and more complex example of the pattern
described above, the demo
folder also includes a pair of source
files based on the Priority Set case study that appears in the
Habit Language Report:
-
prioset.lc
is an LC implementation of a priority set (maximum heap) data structure, based on the code in the Habit report. (Some changes were made to include appropriate implementations for the external primitives that are needed in this program for basic operations such as arithmetic and reading/writing from memory. If we were starting instead from a Habit source program, these details would be generated automatically by the Habit compiler.) -
priodemo.c
is a simple C demonstration program that uses the functions fromprioset.lc
as a simple library.
The following commands can be used to compile and run this demonstration program (on a machine running macOS High Sierra; some changes will be required for other platforms):
# Compile the prioset.lc implementation:
milc -ilib demo/prioset.lc --llvm-main=initPrioset -ltmp/prioset.ll --target=x86_64-apple-macosx10.13.0 --64
# Compile the priodemo.c file and link with the code from prioset.lc:
clang -DWORD64 -o tmp/priodemo demo/priodemo.c tmp/prioset.ll
# Run the resulting executable:
tmp/priodemo
The following notes explain some of the command line options used here that were not already documented for the previous example:
-
--llvm-main=initPrioset
specifies a name for the initialization function that must be used to initialize the priority set data structure before using the other functions defined inprioset.lc
. The nameinitPrioset
is used inpriodemo.c
, so it is essential that the same name is specified here. (Further details about the--llvm-main=...
command line option are presented later in this document.) -
--target=x86_64-apple-macosx10.13.0
specifies the "target triple" string that is included in the generated LLVM code to specify details about the target platform; clearly, it will be necessary to modify this part of the command line if you are using a different platform. Note that you can find the target triple that was used to configure LLVM by running the commandllvm-config --host-target
, but a single LLVM installation can typically accomodate multiple target types. -
--64
specifies that the generated LLVM code will generate code assuming that theWord
type in LC is 64 bits wide. The-DWORD64
option in theclang
command line is used to ensure that the code inpriodemo.c
is configured in a corresponding way (in particular, by using the Clong
type as a representation forWord
values). Alternatively, it should also be possible to generate a 32 bit version of the program by using: the--32
option formilc
; a-DWORD32
option forclang
; and a modified version of the--target=...
option, such as replacingx86_64
withi386
. IMPORTANT: It is essential that the same word size is used consistently for all LC and C code fragments that are used together in a single program. Failure to satisfy this requirement will likely result in segmentation, protection, or other faults or errors when attempting to generate and run thepriodemo
example.
The output that is produced by the final command in this sequence is as follows:
$ tmp/priodemo
Priority set demo
Inserting some numbers:
Inserting 12
Inserting 5
Inserting 7
Inserting 128
Inserting 67
Removing three numbers:
Removing 128
Removing 67
Removing 12
Adding some more numbers:
Inserting 3
Inserting 32
Inserting 10
Draining the queue:
Removing 32
Removing 10
Removing 7
Removing 5
Removing 3
Done!
$
mil-tools uses a sequence of compilation "passes" to process
and transform input programs. The specific sequence of passes
that is used for a given task depends on what kind of output
is required: if a user requests LLVM output using the -l
flag, for example, then mil-tools uses extra passes that would
not be included if the user had just requested MIL output using
the -m
command line option.
For debugging purposes, however, and to help advanced users to
understand how the compilation process works, it is possible to
override the default and specify a custom sequence of parses
using the -p
command line option. We will use this in the
following sequence of examples to illustrate some of the key
passes that are provided by mil-tools. The sample program that
we use in these demonstrations is in a source file called demo/ex.lc
whose contents are as follows:
require "prelude.lc"
data Dup a = Dup a a
entrypoint swap7 :: Dup (Bit 7) -> Dup (Bit 7)
entrypoint swap8 :: Dup (Bit 8) -> Dup (Bit 8)
swap7 = swap
swap8 = swap
swap d = case d of Dup x y -> Dup y x
entrypoint d1
d1 = Dup B001 B110
If you just want to have mil-tools load, check, and compile this
program to MIL, you can use the command line option -p
, as in milc -ilib -m demo/ex.lc -p
. This, however, will produce quite a lot of output, and
we can simplify the code a little by running it through the MIL
optimizer. This can be accomplished by adding the character o
to the
end of the -p
command line option, producing the following output
(more generally, the -p
option takes a sequence of letters, each of
which specifies a particular pass; mil-tools will then attempt to run
exactly that sequence of passes on the MIL code that is loaded in
directly or generated by compiling LC source files):
$ milc -ilib -m demo/ex.lc -po
data -> (a :: *) (b :: *)
= Func ([a] ->> [b])
data Dup (a :: *)
= Dup a a
-----------------------------------------
-- not recursive
entrypoint d1 :: Dup (Bit 3)
d1 <-
Dup(B001, B110)
-----------------------------------------
-- not recursive
b84 :: forall (a :: *). [Dup a] >>= [Dup a]
b84[t0] =
assert t0 Dup
t1 <- Dup 0 t0
t2 <- Dup 1 t0
Dup(t2, t1)
-----------------------------------------
-- not recursive
b0 :: forall (a :: tuple). [] >>= a
b0[] =
halt(())
-----------------------------------------
-- not recursive
b81 :: forall (a :: *). [Dup a] >>= [Dup a]
b81[t0] =
case t0 of
Dup -> b84[t0]
_ -> b0[]
-----------------------------------------
-- not recursive
k54 :: forall (a :: *). {} [Dup a] ->> [Dup a]
k54{} t0 = b81[t0]
-----------------------------------------
-- not recursive
s2 :: forall (a :: *). [Dup a] ->> [Dup a]
s2 <-
k54{}
-----------------------------------------
-- not recursive
swap :: forall (a :: *). Dup a -> Dup a
swap <-
Func(s2)
-----------------------------------------
-- not recursive
entrypoint swap8 :: Dup (Bit 8) -> Dup (Bit 8)
swap8 <-
return swap
-----------------------------------------
-- not recursive
entrypoint swap7 :: Dup (Bit 7) -> Dup (Bit 7)
swap7 <-
return swap
-----------------------------------------
-- Entrypoints: d1 swap8 swap7
$
One of the details to note here is the fact that function values
like swap
are wrapped up in uses of the constructor function Func
,
which is introduced as the result of a (builtin) data type definition
data d -> r = Func ([d] ->> [r])
. You might also notice that the
case construct in block b81
includes a default branch (introduced
with an underscore instead of a constructor name), even though the
Dup
type that it is matching on has only one constructor. We can
eliminate these overheads by including the c
pass ("constructor
function rewrite") as part of the -p
setting:
$ milc -ilib -m demo/ex.lc -pco
data Dup (a :: *)
= Dup a a
-----------------------------------------
-- not recursive
entrypoint d1 :: Dup (Bit 3)
d1 <-
Dup(B001, B110)
-----------------------------------------
-- not recursive
b80 :: forall (a :: *). [Dup a] >>= [Dup a]
b80[t0] =
t1 <- Dup 0 t0
t2 <- Dup 1 t0
Dup(t2, t1)
-----------------------------------------
-- not recursive
k54 :: forall (a :: *). {} [Dup a] ->> [Dup a]
k54{} t0 = b80[t0]
-----------------------------------------
-- not recursive
swap :: forall (a :: *). [Dup a] ->> [Dup a]
swap <-
k54{}
-----------------------------------------
-- not recursive
entrypoint swap8 :: [Dup (Bit 8)] ->> [Dup (Bit 8)]
swap8 <-
return swap
-----------------------------------------
-- not recursive
entrypoint swap7 :: [Dup (Bit 7)] ->> [Dup (Bit 7)]
swap7 <-
return swap
-----------------------------------------
-- Entrypoints: d1 swap8 swap7
$
After this transformation, all references to the LC level function
type constructor ->
have been replaced by uses of the MIL function
type arrow ->>
. More generally, if a source program contains a
definition of a non recursive data type data T a = MkT t
, then
the c
pass will eliminate all uses of the T a
type (and all uses
of the MkT
constructor) by replacing them with a corresponding
instance of t
.
As a next step, we can eliminate polymorphism and parameterized
data types from a MIL program by running the specializer, represented
by the pass letter s
: (We follow that here with a second optimizer
run, just to clean up the resulting code, but that is not technically
required. Alternatively, we could achieve a similar effect to what
is shown here by using -pcso
, which only uses one optimizer pass.)
$ milc -ilib -m demo/ex.lc -pcoso
data Dup0
= Dup0 (Bit 7) (Bit 7)
data Dup1
= Dup1 (Bit 8) (Bit 8)
data Dup2
= Dup2 (Bit 3) (Bit 3)
-----------------------------------------
-- not recursive
b80 :: [Dup0] >>= [Dup0]
b80[t0] =
t1 <- Dup0 0 t0
t2 <- Dup0 1 t0
Dup0(t2, t1)
-----------------------------------------
-- not recursive
k54 :: {} [Dup0] ->> [Dup0]
k54{} t0 = b80[t0]
-----------------------------------------
-- not recursive
swap :: [Dup0] ->> [Dup0]
swap <-
k54{}
-----------------------------------------
-- not recursive
entrypoint swap7 :: [Dup0] ->> [Dup0]
swap7 <-
return swap
-----------------------------------------
-- not recursive
b801 :: [Dup1] >>= [Dup1]
b801[t0] =
t1 <- Dup1 0 t0
t2 <- Dup1 1 t0
Dup1(t2, t1)
-----------------------------------------
-- not recursive
k541 :: {} [Dup1] ->> [Dup1]
k541{} t0 = b801[t0]
-----------------------------------------
-- not recursive
s1 :: [Dup1] ->> [Dup1]
s1 <-
k541{}
-----------------------------------------
-- not recursive
entrypoint swap8 :: [Dup1] ->> [Dup1]
swap8 <-
return s1
-----------------------------------------
-- not recursive
entrypoint d1 :: Dup2
d1 <-
Dup2(B001, B110)
-----------------------------------------
-- Entrypoints: swap7 swap8 d1
$
The generated program is longer now because there are essentially
two different copies of the swap
function, each operating on a
different version of the Dup
type. As intended, however, the
program no longer contains any definitions with polymorphic types.
In order to generate working code for this program, we need to
provide representations for types like Bit 7
and Bit 8
that
show up in this program. In mil-tools, both of these types are
actually represented as values of type Word
, treating only the
lower 7 or 8 bits as being significant, in each case. The r
character can be used to specify a "representation transformation"
pass that rewrites input programs to reveal these low level
details:
$ milc -ilib -m demo/ex.lc -pcosoro
data Dup0
= Dup0 Word Word
-----------------------------------------
-- not recursive
entrypoint swap7 :: [Dup0] >>= [Dup0]
swap7[t0] =
t1 <- Dup0 0 t0
t2 <- Dup0 1 t0
Dup0(t2, t1)
-----------------------------------------
-- not recursive
entrypoint swap8 :: [Dup0] >>= [Dup0]
swap8[t0] =
t1 <- Dup0 0 t0
t2 <- Dup0 1 t0
Dup0(t2, t1)
-----------------------------------------
-- not recursive
entrypoint d1 :: Dup0
d1 <-
Dup0(1, 6)
-----------------------------------------
-- Entrypoints: swap7 swap8 d1
$
For types like Dup0
, however, it is possible to
provide more compact representations by replacing each of the
associated data type definitions with corresponding bitdata
definitions. This can be accomplished by using the letter
b
to specify a bitdata rewrite pass. Note that b
can
only be used immediately after s
because it relies on
information that is generated during specialization:
$ milc -ilib -m demo/ex.lc -pcosbo
bitdata Dup0 /14
= Dup0 [ Dup00 :: Bit 7 | Dup01 :: Bit 7 ]
-- predDup0(x :: Bit 14) = true
-- bit pattern:
-- ______________
bitdata Dup1 /16
= Dup1 [ Dup10 :: Bit 8 | Dup11 :: Bit 8 ]
-- predDup1(x :: Bit 16) = true
-- bit pattern:
-- ________________
bitdata Dup2 /6
= Dup2 [ Dup20 :: Bit 3 | Dup21 :: Bit 3 ]
-- predDup2(x :: Bit 6) = true
-- bit pattern:
-- ______
-----------------------------------------
-- not recursive
b80 :: [Dup0] >>= [Dup0]
b80[t0] =
t1 <- Dup0 0 t0
t2 <- Dup0.Dup0 0 t1
t3 <- Dup0.Dup0 1 t1
t4 <- Dup0.Dup0(t3, t2)
Dup0(t4)
-----------------------------------------
-- not recursive
k54 :: {} [Dup0] ->> [Dup0]
k54{} t0 = b80[t0]
-----------------------------------------
-- not recursive
swap :: [Dup0] ->> [Dup0]
swap <-
k54{}
-----------------------------------------
-- not recursive
entrypoint swap7 :: [Dup0] ->> [Dup0]
swap7 <-
return swap
-----------------------------------------
-- not recursive
b801 :: [Dup1] >>= [Dup1]
b801[t0] =
t1 <- Dup1 0 t0
t2 <- Dup1.Dup1 0 t1
t3 <- Dup1.Dup1 1 t1
t4 <- Dup1.Dup1(t3, t2)
Dup1(t4)
-----------------------------------------
-- not recursive
k541 :: {} [Dup1] ->> [Dup1]
k541{} t0 = b801[t0]
-----------------------------------------
-- not recursive
s1 :: [Dup1] ->> [Dup1]
s1 <-
k541{}
-----------------------------------------
-- not recursive
entrypoint swap8 :: [Dup1] ->> [Dup1]
swap8 <-
return s1
-----------------------------------------
-- not recursive
s2 :: Dup2.Dup2
s2 <-
Dup2.Dup2(B001, B110)
-----------------------------------------
-- not recursive
entrypoint d1 :: Dup2
d1 <-
Dup2(s2)
-----------------------------------------
-- Entrypoints: swap7 swap8 d1
$
If you're wondering why this version of the program looks
a little bit longer, that's because bitdata types use two
levels of constructor functions: for every constructor
function C
in a bitdata type T
, there is also an associated
layout type called T.C
, each of which also has a constructor
function with the same name.
At this point, we can add r
back to our list of passes to
run the representation transformation. At this point,
mil-tools determines that complete Dup0
and Dup1
values
can be represented within a single word (only 16 and 14 bits,
respectively, are needed in each case0) and replaces the
algebraic data type selectors and constructors with some
appropriate bit twiddling logic, as illustrated in the
final output shown below:
$ milc -ilib -m demo/ex.lc -pcosboro
-----------------------------------------
-- not recursive
entrypoint swap7 :: [Word] >>= [Word]
swap7[t0] =
t1 <- lshr((t0, 7))
t2 <- shl((t0, 7))
t3 <- and((t2, 16256))
or((t1, t3))
-----------------------------------------
-- not recursive
entrypoint swap8 :: [Word] >>= [Word]
swap8[t0] =
t1 <- lshr((t0, 8))
t2 <- shl((t0, 8))
t3 <- and((t2, 65280))
or((t1, t3))
-----------------------------------------
-- not recursive
entrypoint d1 :: Word
d1 <-
return 14
-----------------------------------------
-- Entrypoints: swap7 swap8 d1
$
Note also that, as part of the r
representation transformation,
the definitions of swap8
and swap7
have been changed from
definitions of function closures (using the ->>
arrow) to simple
block definitions, which are more easily accessed as standard
functions from other LLVM languages, as can be seen if we switch
from MIL to LLVM output in the previous example: (It is possible
to generate both with a single mil-tools invocation, but we only
generate one at a time here so that the output is a little easier
to follow.)
$ milc -ilib -l demo/ex.lc -pcosboro
@d1 = global i32 14
define i32 @swap7(i32 %r0) {
entry:
br label %swap7
swap7:
%r2 = lshr i32 %r0, 7
%r4 = shl i32 %r0, 7
%r3 = and i32 %r4, 16256
%r1 = or i32 %r2, %r3
ret i32 %r1
}
define i32 @swap8(i32 %r0) {
entry:
br label %swap8
swap8:
%r2 = lshr i32 %r0, 8
%r4 = shl i32 %r0, 8
%r3 = and i32 %r4, 65280
%r1 = or i32 %r2, %r3
ret i32 %r1
}
$
The tools described here can be used in various ways to support the development of software libraries, standalone programs, or variations in between. The following examples illustrate some of the possibilities, and the corresponding use of LC/MIL features and milc command line options.
Our first example is a small library of functions that are not
complete programs in themselves, but could potentially be linked
in as useful components of a variety of different applications.
This particular library, whose source code is in demo/funlib.lc
provides two implementations of the Fibonnaci function, and two
implementations of the factorial function, so it will probably
not be useful in many real world settings, but should provide
some familiar examples and should also be sufficient to demonstrate
the main ideas:
$ cat demo/funlib.lc
require "prelude.lc"
-- A standard recursive version of the Fibonnaci function:
entrypoint fib :: Word -> Word
fib n = if eq n 0 then 0
else if eq n 1 then 1
else add (fib (sub n 1)) (fib (sub n 2))
-- An iterative version of the Fibonnaci function:
entrypoint itfib :: Word -> Word
itfib = let loop a b n = if eq n 0 then a else loop b (add a b) (sub n 1)
in loop 0 1
-- A traditional recursive implementation of the factorial function:
entrypoint recfac :: Word -> Word
recfac n = if eq n 0 then 1 else mul n (recfac (sub n 1))
-- An iterative version of the factorial function:
entrypoint itfac :: Word -> Word
itfac = let loop a n = if eq n 0 then a else loop (mul a n) (sub n 1)
in loop 1
$
Perhaps the most important detail here is that each of the four
functions is declared as an entrypoint
. (The type signatures that
are provided for each function here could be useful as documentation,
but they can also be inferred automatically, so are not actually
required.) As a result, when this program is compiled using the
following commands, it will produce an LLVM output (in the file
tmp/funlib.ll
) that contains an externally visible function
definition for each of the four LC functions.
$ milc -ilib demo/funlib.lc -ltmp/funlib.ll
$ clang -c -o tmp/funlib.o tmp/funlib.ll
warning: overriding the module target triple with ...
1 warning generated.
$
[As before, assuming you have a conventional LLVM installation, you should be able ignore the warning message shown here.]
The command sequence shown above follows the use of milc
with a
clang
command that builds an object file tmp/funlib.o
that can
be linked with other programs. Alternatively, we can write a C
program that uses the functions in funlib.lc
and then use a
single clang
command to compile and link it with the
tmp/funlib.ll
source file:
$ cat demo/funlibtest.c
#include <stdio.h>
extern int fib(int);
extern int itfib(int);
extern int recfac(int);
extern int itfac(int);
int main(int argc, char** argv) {
for (int i=0; i<10; i++) {
printf("%d\t%d\t%d\t%d\t%d\n",
i, fib(i), itfib(i), recfac(i), itfac(i));
}
}
$ clang -o tmp/funlibtest demo/funlibtest.c tmp/funlib.ll
$
[Note: On macOS, you may be able (or need) to substitute gcc
for
clang
in the command lines shown here, depending on details of
your LLVM installation.]
Notice that the C code references the functions we have seen
defined in funlib.lc
as regular C functions with conventional
function prototypes. This is possible because, if an entrypoint
in the LC code is a curried function, then it will be exported
from the generated LLVM code as a fully uncurried function (i.e.,
as a function that takes all of its arguments at the same time, as
is the norm in languages like C that do not directly support
higher order functions). The result is an executable that blends
code from the two different source code components into a single
program:
$ tmp/funlibtest
0 0 0 1 1
1 1 1 1 1
2 1 1 2 2
3 2 2 6 6
4 3 3 24 24
5 5 5 120 120
6 8 8 720 720
7 13 13 5040 5040
8 21 21 40320 40320
9 34 34 362880 362880
$
Although there is no need for this in our funlib
example,
it is sometimes necessary to perform some kind of initialization
steps before some (or all) of the entrypoints in a library are
used for the first time. The following shows a contrived
example that includes entrypoints called fib12
and
fib15
that are supposed to contain the 12th and 15th
Fibonnaci numbers, respectively:
$ cat demo/needinit.lc
require "prelude.lc"
entrypoint fib :: Word -> Word
fib n = if eq n 0 then 0
else if eq n 1 then 1
else add (fib (sub n 1)) (fib (sub n 2))
entrypoint fib12, fib15
fib12 = fib 12
fib15 = fib 15
$
But now suppose that we link this with a C program that reads the
values of fib12
and fib15
and displays them on the screen. At
what point will those values be calculated? If we compile this
program using a similar milc
command line to the previous
example, we get an error informing us that the generated code
requires an initialization function:
$ milc -ilib demo/needinit.lc -ltmp/needinit.ll
ERROR: LLVM program requires initialization function (set using --llvm-main=NAME)
$
In this particular example, an initialization function is needed
to calculate values for fib12
and fib15
, and to save those
values in the corresponding variables. More generally, an
initialization function is needed for any program that requires
one time initialization of variables or memory areas before other
parts of the program are used.
The error message also suggests that we can fix this problem by
using an --llvm-main
command line option, which is precisely
what we do in the following, choosing to use the name initialize
for this function:
$ milc -ilib demo/needinit.lc -ltmp/needinit.ll --llvm-main=initialize
$
Now we can write C programs that use the needinit
library. And,
so long as we ensure that the initialize()
function is called
before the values of fib12
and fib15
are accessed, then those
variables will be properly initialized and everything should work
as expected:
$ cat demo/printfibs.c
#include <stdio.h>
extern void initialize();
extern int fib12, fib15;
int main(int argc, char** argv) {
initialize();
printf("fib(12)=%d, fib(15)=%d\n", fib12, fib15);
}
$ clang -o tmp/printfibs demo/printfibs.c tmp/needinit.ll
$ tmp/printfibs
fib(12)=144, fib(15)=610
$
In practice, it may be difficult to determine whether or
not an initialization function is actually required, short of
running a milc
command line like the one above to see if an
error is reported. For example, if we replace the fib
function
in needinit.lc
with the more efficient itfib
function, then
the mil-tools optimizer in actually able to compute the values
of fib12
and fib15
at compile-time, and there will not be any
need for an initialization function. For this reason, it is often
a good practice to specify an initialization function name using
the --llvm-main
option and to encourage users of the library to
call that function before attempting to accessing any of its other
features. If the library does not actually require special treatment,
then milc
will just generate an initialization function with an
empty body. But if something subsequently changes in the program
that does require proper initialization, then users of the library,
at least if they are following your advice, will already have
included the appropriate function call in their code.
In the previous examples, we wrote code in LC that was accessed
and executed from a main
function written in C. But what if
we wanted to write our main
program directly in LC, as in the
following simple example for displaying the value of fib 12
:
$ cat demo/program.lc
require "prelude.lc"
require "io.mil"
itfib :: Word -> Word
itfib = let loop a b n = if eq n 0 then a else loop b (add a b) (sub n 1)
in loop 0 1
-- A program to print the 12th Fibonnaci number:
export main
main = printWord (itfib 12)
$
This program does not specify any explicit entrypoints, but it
does contain a definition for a main
routine that is marked as
an export
. This allows us to compile the program using the
--standalone
flag for milc
:
$ milc -ilib --standalone demo/program.lc -ltmp/program.ll
$ clang -o tmp/program tmp/program.ll demo/runtime.c
warning: overriding the module target triple with ...
1 warning generated.
$ tmp/program
144
$
Note that, despite what the name --standalone
might suggest,
the clang
command line shown here does actually mention
the C source file demo/runtime.c
, which provides an
implementation for the printWord
function in the original
LC code. So, in fact, we are still mixing LC and C code,
but this time we are calling the C code from the LC code,
which is the opposite to what we saw in the earlier examples.
The --standalone
flag is really just a shorthand for the
following two command line options:
--llvm-main=main --mil-main=main
As we have already seen, the first of these specifies that the
generated LLVM code should include an initialization function
called main
. The second identifies a specific definition, also
with the name main
, but this time in the input program, that
will be executed at the end of the initialization function. At
most one main
definition can be specified in this way, but it is
possible to use a value with a different name, so long as it is
marked as an export
in the source program and identified using
an explicit --mil-main=NAME
setting on the command line.)
While --standalone
provides sensible defaults for both of
these command line options, the separate --mil-main
and
--llvm-main
options provide useful additional control
in some more advanced settings.