-
Notifications
You must be signed in to change notification settings - Fork 3
Structure and concepts
We use an EBNF-like notation with the following operators: alternative |
, zero or one repetitions ?
, zero or more repetitions *
, one or more repetitions +
, and unordered list @
.
Parentheses ( )
group grammar symbols.
Tokens are enclosed in single quotes.
By convention, lexer rule names are ALL_CAPS, and parser rule names are in CamelCase, beginning with a capital letter.
The file extension for CoreDSL 2 files is .core_desc
.
The following grammar defines the structure of a CoreDSL 2 file.
The top-level entity is the core description.
CoreDescription ::= Import* (InstructionSet | CoreDefinition)*
A CoreDSL 2 file may import
core descriptions from other files.
Import ::= 'import' STRING ';'
A core description contains an arbitrary number of instruction sets and/or core definitions. Both entities are structured into sections, describing architectural state, internal and external functions, always blocks, and instructions. Additional sections may be added in the future versions of the language.
InstructionSet ::= 'InstructionSet' ID ('extends' ID)? '{' Sections '}'
CoreDefinition ::= 'Core' ID ('provides' ID (',' ID)*)? '{' Sections '}'
Sections ::= @(ArchState? Functions? AlwaysBlocks? Instructions?)
ArchState ::= 'architectural_state' '{' ArchStateItem* '}'
Functions ::= 'functions' '{' Function* '}'
AlwaysBlocks ::= 'always' '{' AlwaysBlock* '}'
Instructions ::= 'instructions' Attribute* '{' Instruction* '}'
The Core
construct models a processor core conforming to the given architectural description.
The InstructionSet
construct allows the separation of instruction set definition and core definition.
Hence, by referencing it in the optional provides
clause, the same instruction set definition can be reused in multiple core definitions.
Instruction sets may be organized hierarchically: An instruction set may extend
a previously defined super instruction set, inheriting its architectural state, functions and instructions.
Note The remainder of this page is a syntax-focussed overview of the available language constructs. In addition, a CoreDSL 2 implementation must adhere to the scoping rules, which determine the visibility of identifiers, as well as the elaboration rules, which introduce a recipe to compose the effective ISA of a core definition and statically evaluate its parameters.
The architectural_state
section may contain a subset of declarations and assignments that carry a special meaning for the modeled ISA.
In all forms, optional attributes can impose further constraints.
A constant expression shall be comprised exclusively of literals and implementation parameters.
Simple variable declarations yield implementation parameters.
The parameter may be initialized at the declaration site, or assigned in an architectural state section.
If the const
keyword is present, the initialization is mandatory, and subsequent reassignments are forbidden.
ArchStateItem ::= TypeSpecifier ID ('=' ConstantExpression)? Attribute* ';'
ArchStateItem ::= 'const' TypeSpecifier ID '=' ConstantExpression Attribute* ';'
Examples
int XLEN;
int REG_LEN = 32;
const int MAGIC_NUM = 42;
A variable declaration with the register
keyword defines a single, architectural register.
An optional initializer must be a constant expression and is used as the register's reset value.
If an array-like dimension specification is present, a register file is declared.
The dimension must be a constant expression.
ArchStateItem ::= 'register' TypeSpecifier ID Attribute* ('=' ConstantExpression)? ';'
ArchStateItem ::= 'register' TypeSpecifier ID '[' ConstantExpression ']' Attribute* ';'
Examples
register unsigned int PC [[is_pc]] = 0; // program counter with reset value
register unsigned int X[REG_LEN]; // general-purpose register file
Architectural state sections can additionally contain assignment statements to implementation parameters and registers, following the same semantics as the optional initializers at the declaration sites. The elaboration rules define the evaluation strategy in case multiple assignments to the same parameter/register are present.
ArchStateItem ::= ID ('[' ConstantExpression ']') '=' ConstantExpression ';'
Examples
XLEN = 16;
X[0] = 0;
An extern
declaration represents an external entity, such as an I/O port (variable declaration) or address space (array declaration) of the given type. Address space declarations must provide exactly one dimension specifier, so multi-dimensional address spaces are not allowed.
The optional keyword const
expresses that the entity is read-only (w.r.t to the instruction behavior). The presence of volatile
denotes that the value of the entity may change outside of the core definition.
ArchStateItem ::= 'extern' 'const'? 'volatile'? TypeSpecifier
ID ('[' ConstantExpression ']')? Attribute* ';'
Examples
extern unsigned<8> MEM[1 << XLEN]; // address space, RAM
extern volatile unsigned<32> ACC[128]; // address space, peripheral device
extern const volatile unsigned<1> ACC_DONE; // port, interrupt signal
A declaration with an ampersand token between the type specifier and the identifier introduces an alias.
Array declarations of this form result in range aliases.
The initialization is mandatory, and the initial value must be an expression comprised of zero or more subscript ([ ]
) operators applied to an architectural state element (includig other aliases).
For scalar aliases, the expression's type must match the declared type of the alias.
For range aliases, the base type and number of elements must match.
Aliases to external entities may use the const
and volatile
keywords as described above. The base entity's specifiers are not inherited, meaning that the respective keywords must be repeated in the alias declaration to maintain the same semantics as the base entity. Aliases to const
entities must be declared const
as well.
ArchStateItem ::= 'const'? 'volatile'? TypeSpecifier '&' ID ('[' ConstantExpression ']')?
'=' ConstantExpression Attribute* ';'
Example
unsigned<32> &ZERO = X[0]; // register alias
unsigned<32> &mvendorid = CSR[0xF11]; // address space element alias
volatile unsigned<32> & ACC_REGS[8] = ACC[15:8]; // volatile address space range alias
unsigned<16> &x10_hi = X[10][31:16]; // bit-range alias
The following table summarizes valid modifier combinations (and their meaning) for declarations in the architectural state section.
↓ Ampersand / Keyword → | None | register |
extern |
---|---|---|---|
no | Parameter | Register | Address space |
yes | Alias | Error | Error |
The functions
section contains function declarations and definitions following the usual C syntax.
Function ::= 'extern'? TypeSpecifier ID '(' ParameterList ')' ';'
| TypeSpecifier ID '(' ParameterList ')' Attribute* CompoundStatement
The extern
keyword marks a declaration as a black box.
Its invocation and behavior is implementation-specific.
The always
section defines an arbitrary number of always blocks in the following format.
AlwaysBlock ::= ID Attribute* '{' Statement* '}'
After the block name, optional attributes may be present. The body of the always block expresses its (arbitrarily complex) behavior, written in the C-inspired language defined in the remainder of this specification document. The behavior will be executed repeatedly at the same rate as instructions are fetched, and in parallel with regular instructions and other always blocks.
If conflicting updates to architectural state elements occur in any instant of time, the update with the highest priority is performed.
Updates originating from always blocks are prioritized by the order in the CoreDSL source code (extended naturally over the extends
- and provides
-inheritance), i.e. later assignments override earlier ones.
Updates inside the behavior part of regular instructions take precedence over all always blocks.
The instructions
section contains an arbitrary number of instruction definitions in the following format below.
Instruction ::= ID Attribute* '{'
'encoding' ':' EncodingSpec ';'
('assembly' ':' AssemblySpec ';')?
'behavior' ':' Statement
'}'
After the instruction name, optional attributes may be present.
Adding attributes to the instructions
section is equivalent to attaching them to all enclosed instructions.
The instruction body is organised into tagged components.
The encoding
specifies the instruction encoding, which is a concatenation of fields.
EncodingSpec ::= EncodingField ('::' EncodingField)*
EncodingField ::= (ID '[' IntegerConstant ':' IntegerConstant ']') | IntegerConstant
We distinguish named fields and patterns.
Named fields comprise an identifier and a bit range, and can be thought of as parameters to the instruction.
Patterns are integer constants intended to be matched in an instruction decode stage (or similar).
Named fields can occur multiple times in the encoding (with different, non-overlapping bit ranges), to denote that the value is encoded using non-consecutive bits in the instruction word.
The type of a named field is unsigned<k>
with k
being 1 greater than the highest bit number mentioned in any of the corresponding ranges.
Bits that are not explicitly covered by a range are set to zero.
Example
SW { encoding: offset[11:5] :: src[4:0] :: base[4:0] :: 3'b010 :: offset[4:0] :: 7'b0100011; ...
We recommend to use Verilog-style integer literals in the encoding, as a C-style bit-literals will not capture leading zeros (e.g., 0b010
== 2'b10
, not 3'b010
as one might expect).
The assembly
directive defines the textual format of the instruction for use in assembler and disassembler tools.
AssemblySpec ::= '{' STRING, STRING '}' | STRING
The two strings in the curly-braces-enclosed form are interpreted as the mnemonic and a format string for the argument list. If only one string is given, it specifies the argument list format, and the instruction's name is used as the mnemonic.
NB: The longer form is useful when the desired mnemonic is not a valid identifier, e.g. because it contains a dot.
The argument list format follows the format string syntax of the fmt library with some minor extensions. A similar format specification is used by the Python string format() function. The arg_id identifiers are automatically derived from the named encoding fields (see above), e.g. in
encoding: b0010100 :: rs2[4:0] :: rs1[4:0] :: b001 :: rd[4:0] :: b1010011;
assembly: "f{rd}, f{rs1}, f{rs2}";
{rd}
is replaced by the value of the named EncodingField
"rd". All format specifier can be used:
encoding: imm[31:12] :: ...;
assembly: "{imm:#08x}";
formats the value of the EncodingField
"imm" as an 8-digit hex value with leading zeros in alternate form (with leading 0x
).
As an extension to the fmt syntax, the arg_id identifier can take the form of a function:
assembly: "{name(rd)}, {name(rs1)}, {name(rs2)}";
in this case a lookup function is being used to generate the string value to be printed. All fmt modifiers can be used here as well. This allows to specify alias names as given in the architectural_state
section. The arguments to this function can also be simple arithmetic expressions like addition or subtraction to calculate offsets in the EncodingField value:
assembly: "{name(rd+8)}";
The statement following the behavior
tag expresses the instruction's (arbitrarily complex) semantic, written in the C-inspired language defined in the remainder of this specification document.