Skip to content

Latest commit

 

History

History
199 lines (152 loc) · 8.03 KB

README.AdLib.md

File metadata and controls

199 lines (152 loc) · 8.03 KB

Overview

AdLib is a small, self-contained C++ library intended primarily to write dependency-free and portable command line tools on POSIX systems. It can be seen as an alternative to scripting languages where speed matters more than the ease of use of scripting languages or for configuration and build tasks for larger systems where one does not want to create additional dependencies.

Basic system requirements are:

  • A POSIX compatible shell.
  • An Ansi-C compiler, C89 or later.
  • A C++ compiler, C++98 or later.
  • A POSIX compatible make; both GNU make and BSD make will work.

Optionally, AdLib will make use of an existing re2c installation if available. The goal here is to easily support generating lexers that can also be deployed without re2c.

Configuration

AdLib uses autosetup for configuration. Autosetup code is written in Tcl; as it uses an included standalone Tcl interpreter (JimTcl), no installation of Tcl proper is required.

The ./configure script supports the following options:

  • --devel for enabling debug mode.
  • --with-boehm-gc for using the Boehm GC instead of the integrated garbage collector.
  • --adlibdev to develop adlib itself, instead of an application on top of adlib. This primarily builds all tests by default.
  • --without-re2c excludes re2c rules from the generated Makefile. AdLib normally recognizes files ending in .re as re2c source files and creates Makefile rules for corresponding .cc files for those. Using this option omits these rules and .cc files won't be regenerated. The rules will also be omitted if re2c cannot be found.

The configure script will execute the following .tcl files if they can be found:

  • src/preconf.tcl before configuration.
  • src/postconf.tcl after configuration, but before any files have been generated.
  • src/postsetup.tcl after the configuration script has completed.

Also, the configurations script will look for the existings of a file in src/Makefile.extra and append it to the Makefile if it exists. This file can contain fixed extra rules or can be generated by either the preconfiguration or postconfiguration script.

The src/Makefile.extra file can contain a definition for PROGNAME. If no such definition is used, the build process will create an executable in bin/main, otherwise in bin/$(PROGNAME).

Deviation from normal C++ assumptions

AdLib only requires C++98 functionality for portability, so unless one knows that the system does have support for later versions of the standard, this means that quality of life features, such as auto variables are not available. In --adlibdev mode, the Makefile will actually add a -std=c++98 flag to facilitate adherence to the standard.

AdLib presumes that data will normally be allocated on the heap and accessed through pointers, not as values or through references. This greatly simplifies data structure design, but it creates some limitations. Foremost among those is that operator overloading is not really practical. As a result, we end up writing arr->at(i) instead of arr[i], for example. Also, as we only require C++98 features, even things such as smart pointers are not portably available; this is one of several reasons why we use a garbage collector.

The use of a garbage collector also makes destructors for heap-allocated objects impractical, but the application domain that AdLib is designed for does not generally require complex resource management.

Program structure

AdLib has its own main() function. The regular entrypoint for an AdLib program is called Main(), which can occur in any .cc file in the src directory. This function has the type signature void Main().

Example:

#include "adlib/lib.h"

void Main() {
  PrintLn("Hello, world!");
}

Instead of having argc and argv arguments, global variabls ArgC and ArgV are available. There is also a global variable Args of type StrArr *, which contains the arguments as an array.

Most of the basic functionality can be accessed by including "adlib/lib.h". Some additional functionality is available through separate header files, but for normal usage, no additional headers are needed.

Garbage collector use

The integrated garbage collector, tinygc, provides a subset of the functionality of the Boehm GC. Like the Boehm GC, it is a conservative, stop the world garbage collector.

Unlike with the Boehm GC, all global and static variables containing pointers or references (including as part of a class or struct) must be registered explicitly. Use the GcVar() function for that. GCVar() either takes a reference for a variable as its argument, or both such a reference and its initial value. Normal C++ style initialization for such variables should be avoided. Instead, the INIT() macro can be used instead:

Str *name;
INIT(Example, {
  name = new Str("Alice");
});

The macro takes an identifier and a piece of code as its argument. The identifier must be globally unique and not start with an underscore. The code can optionally be delimited by braces. It is executed during initialization.

Any C++ classes or structs need to inherit from the GC class if any instance of such a type is ever allocated on the heap. As this does not cause any overhead otherwise, it is safest to simply do that for all classes and structs. Alternatively, you can also use explicit allocation to allocate memory for them.

If you are positive that a class or struct does not contain any pointers, you can instead inherit from PtrFreeGC. However, for most of the applications that AdLib is intended for, this does not bring any real benefit and risks memory corruption if the object is not actually free of pointers.

Basic types

AdLib provides both signed and unsigned integral types. Int8, Int16, Int32, and Int64 are signed integral types of the respective bitsize. Word8, Word16, Word32, and Word64 are the corresponding unsigned types.

The types Int and Word are of the same size as uintptr_t, and should be used for all purposes where no specific size is required.

On a 32-bit system that does not support 64-bit operations, both Int64 and Word64 are downgraded to 32-bit types. You can check with #ifdef HAVE_64BIT_SUPPORT if the system supports actually 64-bit types.

Data structures

The following data structures are directly supported by AdLib:

  • Str -- dynamic, resizable strings.
  • Arr<T> -- dynamic, resizable arrays.
  • StrArr -- an alias for Arr<Str *>.
  • Map<K, V> -- hash tables.
  • Set<T> -- hash sets.
  • BitSet -- bit sets and matrices.

Note that in order to use Map, Set, or BitSet, you have to include "adlib/map.h", "adlib/set.h", or "adlib/bitset.h", respectively.

If you want to define any non-standard comparison or hash functions for Set<T> or Map<K, V>, these have to be defined before including the respective header files, or they have to be passed as arguments to the constructor.

POSIX functionality

In order to support interaction with the file system, AdLib defines a number of portable implementations to read or write files or the standard input or output of processes and to handle directory access.

Reading directories is only possible on systems that support standard <dirent.h> functionality, which should normally be true for all POSIX systems. However, you can check if it is actually supported by testing for #ifdef HAVE_DIR_SUPPORT. If not available, then the ListFiles() and ListFileTree() functions will return empty arrays.

All available functionality can be found in "adlib/os.h".

Interaction with re2c

AdLib's configure takes some extra steps to allow deployment without depending on re2c. To that end, you should distribute the .cc files generated by re2c, but not the .re files from which they were generated. This will result in the Makefile not containing any re2c rules.

You can use extra re2c flags by setting the RE2CFLAGS variable in src/Makefile.extra.

AdLib supports re2c, because the generated C++ code remains portable and can be distributed in lieu of the original .re files, thus creating no dependency on the scanner generator itself.