Skip to content

Structure and concepts

AtomCrafty edited this page Nov 5, 2023 · 55 revisions

Notation

We use an EBNF-like notation with the following operators: alternative |, zero or one repetitions ?, zero or more repetitions *, one or more repetitions +, and unordered list @. Parentheses ( ) group grammar symbols. Tokens are enclosed in single quotes. By convention, lexer rule names are ALL_CAPS, and parser rule names are in CamelCase, beginning with a capital letter.

Overall structure

The file extension for CoreDSL 2 files is .core_desc.

The following grammar defines the structure of a CoreDSL 2 file.

The top-level entity is the core description.

CoreDescription ::= Import* (InstructionSet | CoreDefinition)*

A CoreDSL 2 file may import core descriptions from other files.

Import          ::= 'import' STRING ';'

A core description contains an arbitrary number of instruction sets and/or core definitions. Both entities are structured into sections, describing architectural state, internal and external functions, always blocks, and instructions. Additional sections may be added in the future versions of the language.

InstructionSet  ::= 'InstructionSet' ID ('extends' ID)? '{' Sections '}'
CoreDefinition  ::= 'Core' ID ('provides' ID (',' ID)*)? '{' Sections '}'
Sections        ::= @(ArchState? Functions? AlwaysBlocks? Instructions?)
ArchState       ::= 'architectural_state' '{' ArchStateItem* '}'
Functions       ::= 'functions' '{' Function* '}'
AlwaysBlocks    ::= 'always' '{' AlwaysBlock* '}'
Instructions    ::= 'instructions' Attribute* '{' Instruction* '}'

Core and InstructionSet

The Core construct models a processor core conforming to the given architectural description. The InstructionSet construct allows the separation of instruction set definition and core definition. Hence, by referencing it in the optional provides clause, the same instruction set definition can be reused in multiple core definitions. Instruction sets may be organized hierarchically: An instruction set may extend a previously defined super instruction set, inheriting its architectural state, functions and instructions.

Note The remainder of this page is a syntax-focussed overview of the available language constructs. In addition, a CoreDSL 2 implementation must adhere to the scoping rules, which determine the visibility of identifiers, as well as the elaboration rules, which introduce a recipe to compose the effective ISA of a core definition and statically evaluate its parameters.

Architectural state

The architectural_state section may contain a subset of declarations and assignments that carry a special meaning for the modeled ISA. In all forms, optional attributes can impose further constraints.

A constant expression shall be comprised exclusively of literals and implementation parameters.

Implementation parameters and constants

Simple variable declarations yield implementation parameters. The parameter may be initialized at the declaration site, or assigned in an architectural state section. If the const keyword is present, the initialization is mandatory, and subsequent reassignments are forbidden.

ArchStateItem ::= TypeSpecifier ID ('=' ConstantExpression)? Attribute* ';'
ArchStateItem ::= 'const' TypeSpecifier ID '=' ConstantExpression Attribute* ';'

Examples

int XLEN;
int REG_LEN = 32;
const int MAGIC_NUM = 42;

Registers

A variable declaration with the register keyword defines a single, architectural register. An optional initializer must be a constant expression and is used as the register's reset value. If an array-like dimension specification is present, a register file is declared. The dimension must be a constant expression.

ArchStateItem ::= 'register' TypeSpecifier ID Attribute* ('=' ConstantExpression)? ';'
ArchStateItem ::= 'register' TypeSpecifier ID '[' ConstantExpression ']' Attribute* ';'

Examples

register unsigned int PC [[is_pc]] = 0; // program counter with reset value
register unsigned int X[REG_LEN];       // general-purpose register file

Assignments

Architectural state sections can additionally contain assignment statements to implementation parameters and registers, following the same semantics as the optional initializers at the declaration sites. The elaboration rules define the evaluation strategy in case multiple assignments to the same parameter/register are present.

ArchStateItem ::= ID ('[' ConstantExpression ']') '=' ConstantExpression ';'

Examples

XLEN = 16;
X[0] = 0;

Address spaces

An extern declaration represents an external entity, such as an I/O port (variable declaration) or address space (array declaration) of the given type. Address space declarations must provide exactly one dimension specifier, so multi-dimensional address spaces are not allowed. The optional keyword const expresses that the entity is read-only (w.r.t to the instruction behavior). The presence of volatile denotes that the value of the entity may change outside of the core definition.

ArchStateItem ::= 'extern' 'const'? 'volatile'? TypeSpecifier
                  ID ('[' ConstantExpression ']')? Attribute* ';'

Examples

extern                unsigned<8>         MEM[1 << XLEN]; // address space, RAM
extern       volatile unsigned<32>        ACC[128];       // address space, peripheral device
extern const volatile unsigned<1>         ACC_DONE;       // port, interrupt signal

Aliases

A declaration with an ampersand token between the type specifier and the identifier introduces an alias. Array declarations of this form result in range aliases. The initialization is mandatory, and the initial value must be an expression comprised of zero or more subscript ([ ]) operators applied to an architectural state element (includig other aliases). For scalar aliases, the expression's type must match the declared type of the alias. For range aliases, the base type and number of elements must match.

Aliases to external entities may use the const and volatile keywords as described above. The base entity's specifiers are not inherited, meaning that the respective keywords must be repeated in the alias declaration to maintain the same semantics as the base entity. Aliases to const entities must be declared const as well.

ArchStateItem ::= 'const'? 'volatile'? TypeSpecifier '&' ID ('[' ConstantExpression ']')?
                  '=' ConstantExpression Attribute* ';'

Example

unsigned<32> &ZERO = X[0];                       // register alias
unsigned<32> &mvendorid = CSR[0xF11];            // address space element alias
volatile unsigned<32> & ACC_REGS[8] = ACC[15:8]; // volatile address space range alias
unsigned<16> &x10_hi = X[10][31:16];             // bit-range alias

Summary

The following table summarizes valid modifier combinations (and their meaning) for declarations in the architectural state section.

↓ Ampersand / Keyword → None register extern
no Parameter Register Address space
yes Alias Error Error

Functions

The functions section contains function declarations and definitions following the usual C syntax.

Function ::= 'extern'? TypeSpecifier ID '(' ParameterList ')' ';'
          |  TypeSpecifier ID '(' ParameterList ')' Attribute* CompoundStatement
 

The extern keyword marks a declaration as a black box. Its invocation and behavior is implementation-specific.

Always blocks

The always section defines an arbitrary number of always blocks in the following format.

AlwaysBlock ::= ID Attribute* '{' Statement* '}'

After the block name, optional attributes may be present. The body of the always block expresses its (arbitrarily complex) behavior, written in the C-inspired language defined in the remainder of this specification document. The behavior will be executed repeatedly at the same rate as instructions are fetched, and in parallel with regular instructions and other always blocks.

If conflicting updates to architectural state elements occur in any instant of time, the update with the highest priority is performed. Updates originating from always blocks are prioritized by the order in the CoreDSL source code (extended naturally over the extends- and provides-inheritance), i.e. later assignments override earlier ones. Updates inside the behavior part of regular instructions take precedence over all always blocks.

Instructions

The instructions section contains an arbitrary number of instruction definitions in the following format below.

Instruction ::= ID Attribute* '{'
                  'encoding' ':' EncodingSpec ';'
                  ('assembly' ':' AssemblySpec ';')?
                  'behavior' ':' Statement
                '}'

After the instruction name, optional attributes may be present. Adding attributes to the instructions section is equivalent to attaching them to all enclosed instructions.

The instruction body is organised into tagged components.

Encoding

The encoding specifies the instruction encoding, which is a concatenation of fields.

EncodingSpec  ::= EncodingField ('::' EncodingField)*
EncodingField ::= (ID '[' IntegerConstant ':' IntegerConstant ']') | IntegerConstant

We distinguish named fields and patterns. Named fields comprise an identifier and a bit range, and can be thought of as parameters to the instruction. Patterns are integer constants intended to be matched in an instruction decode stage (or similar). Named fields can occur multiple times in the encoding (with different, non-overlapping bit ranges), to denote that the value is encoded using non-consecutive bits in the instruction word. The type of a named field is unsigned<k> with k being 1 greater than the highest bit number mentioned in any of the corresponding ranges. Bits that are not explicitly covered by a range are set to zero.

Example

SW { encoding: offset[11:5] :: src[4:0] :: base[4:0] :: 3'b010 :: offset[4:0] :: 7'b0100011; ...

We recommend to use Verilog-style integer literals in the encoding, as a C-style bit-literals will not capture leading zeros (e.g., 0b010 == 2'b10, not 3'b010 as one might expect).

Assembly format

The assembly directive defines the textual format of the instruction for use in assembler and disassembler tools.

AssemblySpec ::= '{' STRING, STRING '}' | STRING

The two strings in the curly-braces-enclosed form are interpreted as the mnemonic and a format string for the argument list. If only one string is given, it specifies the argument list format, and the instruction's name is used as the mnemonic.

NB: The longer form is useful when the desired mnemonic is not a valid identifier, e.g. because it contains a dot.

The argument list format follows the format string syntax of the fmt library with some minor extensions. A similar format specification is used by the Python string format() function. The arg_id identifiers are automatically derived from the named encoding fields (see above), e.g. in

encoding: b0010100 :: rs2[4:0] :: rs1[4:0] :: b001 :: rd[4:0] :: b1010011;
assembly: "f{rd}, f{rs1}, f{rs2}";

{rd} is replaced by the value of the named EncodingField "rd". All format specifier can be used:

encoding: imm[31:12] :: ...;
assembly: "{imm:#08x}";

formats the value of the EncodingField "imm" as an 8-digit hex value with leading zeros in alternate form (with leading 0x).

As an extension to the fmt syntax, the arg_id identifier can take the form of a function:

assembly: "{name(rd)}, {name(rs1)}, {name(rs2)}";

in this case a lookup function is being used to generate the string value to be printed. All fmt modifiers can be used here as well. This allows to specify alias names as given in the architectural_state section. The arguments to this function can also be simple arithmetic expressions like addition or subtraction to calculate offsets in the EncodingField value:

assembly: "{name(rd+8)}";

Behavior

The statement following the behavior tag expresses the instruction's (arbitrarily complex) semantic, written in the C-inspired language defined in the remainder of this specification document.