-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline Assembly. #1041
Comments
I just had an idea that's very simple, but also very "raw," but I suppose it would be pretty good for the time being. For example:
Edit: Obviously there is the problem of Cranelift values being encoded either as registers or stack variables for instance, but maybe some annotations could be added for a value to be in a register, like |
I'd be all for this for the time being actually, with a small modification. Treat a block of code as an ebb essentially. Say what inputs it gets and what they should be stored into (registers, stack offsets, etc). The actual code block would be completely opaque. This would prevent cranelift from needing a disassembler. For example:
or
|
I'd like to see what @sunfishcode has to say on this. |
Interesting idea. My first question is, how do you envision using this? Looking at the idea itself, a factor to take into consideration is that machine encodings are really complex. A register doesn't typically end up being encoded in a byte; it's usually smaller than a byte, its position in its containing bytes will depend on which operand of the instruction it's for, and sometimes it requires other changes elsewhere in the instruction. Some examples: $ cat t.s
movq (%rcx), %rcx # The encoding of %rcx depends on where it appears in the instruction!
movq (%rsi), %rcx # These use different REX prefix bits than the instructions below!
movq (%r11), %rcx
movq (%r12), %rcx # This needs a extra byte!
movq (%r13), %rcx # This needs an extra byte in a different way!
movq (%r14), %rcx
$ cc -c t.s
$ objdump -d t.o
... I think we could still make this system work, if we defined a sufficiently elaborate manifest that could describe all the edits that one would need to make once one knows what registers and stack slots everything will be in. That'd be fairly elaborate, but it might still be simpler than fully parsing instructions from text. |
I'd use it for inline stuff on Nebulet. Like talking to io ports, etc. |
A few more thoughts here: I missed this above: in @lachlansneff's variant above, the IR specifies all the registers, so it wouldn't actually require Cranelift to fill in register values. It'd literally be a block of bytes that simply requires certain values in certain fixed registers at input and output. And maybe a list of clobbers. That seems like it wouldn't be too hard to implement. That said, it might actually be too simple, in the sense that it'd work for very simple things, but would be difficult to evolve into something that does more. If you're willing to specify all the register inputs and outputs yourself, it wouldn't be that much different to just put the code you want in a .s file and then call it, with a specialized calling convention if you want. Support for custom calling conventions is something we do have other uses for, so we wouldn't mind improving that. Or, for |
For your information, there is a rustc issue to stabilize how to do inline assembly. In particular, I suggested on this other thread a similar idea as @6a and @sunfishcode suggested, i-e. to have a set of bytes with constraints. This idea got some push back on the basis that some people want to do inline assembly with a register allocation handled by the compiler, but I honestly do not see how this can be done without standardizing some form of assembly code. |
In case anyone wants to see where the idea above might go, here's a slightly more complete sketch. Here's a hypothetical IR structure for this form of inline asm: struct InlineAsm {
/// the raw bytes to start with
data: vec<u8>,
/// descriptions of all explicit input and output values
constraints: vec<Constraint>,
/// extra registers, "memory", or other machine state which is clobbered
clobbers: vec<String>,
/// patches which can change or add bytes
patches: vec<Patch>,
}
struct Constraint {
// TODO: in/out/inout, tied, earlyclobber, hints, alternatives, register/memory/immediate classes, etc.
}
struct Patch {
offset: u64, // byte offset in `data` *before* any patches are applied
contents: PatchDetails,
}
enum PatchDetails {
X86RexPrefix {
w: bool, // REX.w
inputs: vec<u64>, // indices in `constraints` for register operands which determine if a REX prefix is needed and if so what bits it should have.
}
X86ModRMRegField {
input: u64, // index in `constraints` for the register to encode
}
X86ModRMRMField {
input: u64, // similar
}
P2Align {
// Insert fill bytes to align the following code to a 1<<p2align boundary.
p2align: u8,
// The byte value to insert.
fill: u8
}
// TODO: lots more stuff here
} I think this would deliver most of what LLVM/GCC-style inline asm do, assuming we filled it out. And it would eliminate the need for the backend to parse assembly text. That said, it's still fairly complex. And of course, no existing assemblers are built to work this way, including LLVM's assembler, so it would be a bunch of work to implement even in LLVM. |
Any idea? Inline assembly are essential for constructing OS or bare-metal environment. |
I'm not aware of any easy answers. Inline asm is a huge, complex, and nebulous set of features. From a broader perspective, there's a form of an XY problem. If you ask for "inline asm", it may be years before we can deliver it. We want to help, but you're asking for a lot, and we don't know of any way to do it faster while still achieving our other goals. If you instead were to ask for access to certain machine instructions, or control registers, or ways to guide the register allocator around key pieces of code, or other specific things, plausible solutions sometimes could take just a few days to implement. With no exaggeration, more specific features can often be a thousand times easier to implement, and typically end up more robust, with more well defined interfaces, and better-understood implementations. We may still implement inline asm eventually. People interested in seeing it happen are encouraged to get involved and help out! |
Hey, @sunfishcode, as far as I understand we cannot just allow using common register names like eax, ebx and stuff, right? It will require compiling the code with some of the already existing assembly compilers. |
@skyne98 , you can and you should because some instructions require special registers. However, many uses of inline assembly are meant to provide an assembly template with which a register allocator can work with. For example, making a SIMD library which is using inline assembly does not want to hard-code a specific register allocation, as multiple consecutive uses of this inline assembly would cause too much register congestion that would penalize the performance of SIMD. |
So, what I mean is that the template you provide does not map directly to some binary instruction, it depends on things -- therefore you cannot just parse them, convert to binary and put into the "executable", right? Otherwise, what is the complexity @sunfishcode was talking about? SimpleJIT is already "jitting" out some binary code. |
That's right. In general, the template cannot be parsed until the substitutions are made. And it may contain more than just instructions; it may use arbitrary assembler directives like .align or .section |
Also, @sunfishcode, what do you think about this project. To me seems like it can be at least partially helpful, however I have to admit I haven't read the source yet. Also, I don't think we need all the features you mentioned above. What I was trying to ask before is that if doing substitutions is possible and not too hard to do, then introducing only adding basic instructions like stack pushes, pops and 'mov'-s (just add 20 most used instructions in kernels as well as some important ones, we may even try to analyze code for that) will help A LOT and make writing bare metal code at least possible to do. What do you think? |
Dynasm-rs seems to be doing exactly what our problem is -- doing JIT compilation, but doing it for assembly template strings. Sounds very promising to me, especially considering the fact that the creator says that his project is in alpha state!
And, it is also capable of doing things like this (just for reference): #![feature(plugin)]
#![plugin(dynasm)]
#[macro_use]
extern crate dynasmrt;
use dynasmrt::{DynasmApi, DynasmLabelApi};
use std::{io, slice, mem};
use std::io::Write;
fn main() {
let mut ops = dynasmrt::x64::Assembler::new().unwrap();
let string = "Hello World!";
dynasm!(ops
; ->hello:
; .bytes string.as_bytes()
);
let hello = ops.offset();
dynasm!(ops
; lea rcx, [->hello]
; xor edx, edx
; mov dl, BYTE string.len() as _
; mov rax, QWORD print as _
; sub rsp, BYTE 0x28
; call rax
; add rsp, BYTE 0x28
; ret
);
let buf = ops.finalize().unwrap();
let hello_fn: extern "win64" fn() -> bool = unsafe {
mem::transmute(buf.ptr(hello))
};
assert!(
hello_fn()
);
}
pub extern "win64" fn print(buffer: *const u8, length: u64) -> bool {
io::stdout().write_all(unsafe {
slice::from_raw_parts(buffer, length as usize)
}).is_ok()
} |
Dynasm is a cool library and we're prototyping with it in Lightbeam. It does its parsing at compile time rather than runtime, so in its current form we can't just drop it into Cranelift to parse strings being passed in (Cranelift's runtime is the user's compile time). But if what you need to do can be done with dynasm, then you can certainly use it yourself directly. If someone added dynamic string parsing to dynasm, that could be interesting. That said, while some users don't need lots of features, others do, so it likely wouldn't suffice for the long term. Also, the hard part of the operand constraint problem is computing a set of registers and stack slots that satisfy all the constraints. Actually push/moving data into place is something we already have to do to support for other features :-). Analyzing inline asm usage in kernels may help here. Part of the problem with inline asm is that it's a big sprawling set of features, but if we could identify restricted feature sets that would work in practice for at least some users, that might give us more options. |
If we can help in any way, @sunfishcode, please tell us :) |
I included some ideas for projects people could start on in my posts above. I'm happy to answer specific questions, or to mentor people on projects. |
I would have liked to start such a project, but I don't think I am experienced enough in this area. I am up for experimentation, though, with decent mentorship 😄 My general idea was to try to write a simple kernel that was "jitting" itself, so I guess it could have been a good playground for testing such a project. |
How about: struct InlineAssembly {
inputs: Vec<Value>,
outputs: Vec<Value>,
contraints: isa::RecipeConstraints,
emit: Box<Fn(Vec<isa::registers::RegUnit>) -> Vec<u8>>
} This way cranelift only has to know the register constraints. |
@skyne98 If you're thinking about JITing, then it sounds like you're more in dynasm's space. Which is a cool space, so go ahead and have fun! @bjorn3 Yes, if we generalized |
@sunfishcode, more precisely my plan was to implement a small language frontend, then use cranelift as a backend. I did not intend to make a JIT compiler from the ground up, cranelift just looked to me as a much better alternative to LLVM for such a job. |
@skyne98 Implementing a small language frontend, using Cranelift's as a backend sounds like a great project! Check out the simplejit-demo for an example of how to get started, and please ask questions if anything is unclear. This GitHub issue is about inline asm, and most languages can be implemented without the use of inline asm. If you encounter something that seems to require inline asm, please ask about it, as we may be able to find alternatives. |
@sunfishcode, the "problem" is actually that eventually I want to try to write a kernel in it, and then it will need to have an ability to have inline assembly. That's really the reason I started writing here. Also, I would really love to try using Cranelift as a backend, and it will be even more awesome if you could have done a code review for the project in the future 😄 |
What features does your kernel need? Do you need access to specific instructions? Access to control registers? If you can name the features you need, I can help you design proper features for them, that will be safer to use, and more robust, than inline asm. |
For example, even setting up virtual memory and swapping out the tables on context switches requires some inline assembly. How can you build around that? Writing interface to them in rust and then just calling them by invoking rust functions? Or do you mean implementing a couple of custom features to the cranelift itself? |
What instructions would you use to set up virtual memory and swap out the tables on context switches? If you can show me a sequence of instructions that you need to emit, I can help you design a way to emit those instructions using Cranelift without using inline asm. |
Does there exist native assembly which is impossible for cranelift to emit comparable IR? Or the real question: could we be thinking of this backwards? What if the rust compiler took a user's "inline assembly" and emitted cranelift IR (when cranelift was the backend). It would not be ideal, but it would prevent having to worry about standardizing something on cranelift's side and fix the issue on rust's side. Edit: also if someone is specifying inline assembly aren't they supposed to know their back-end? Why shouldn't they be forced to rewrite the assembly blob in cranelift IR? |
First, cranelift is allowed to add arbitrary spill, fill and regmove instructions between the generated clif ir when regalloc thinks it is necessary. Cranelift is also allowed to perform arbitrary optimizations. The reason to write inline asm may be to prevent all those things, as your asm is faster, or the only correct one. If you want to setup a stack in inline asm, you dont want clif to insert spills before you are done. If you want to save and restore all registers in an OS, you dont want clif to insert writes to not yet saved/already restored registers. Second, a common reason to write inline asm is because you want to use a certain instruction. If you translate inline asm to clif ir, you would have to implement all thousands of instructions existing on the target arch. This is the main problem why clif doesnt support inline asm. It would take months at least to implement them all for just one arch. |
Yes, adding all the instruction encodings is a non-trivial project, though it is doable. It's also the easy part. One might assume assembly files are easy to parse. After all, it's just "mnemonic [operand]*". However, assembly files have an elaborate syntax. And that's not to mention the expression syntax with its own surprising operator precedence rules. Symbols should be easy because they're just names, but naming turns out to be it's own little special world. Back to parsing instructions; it should be easy, but even just parsing the mneomonics presents interesting subtleties. Parsing the operands also involves subtle concerns. And then there are the bugs you have to emulate. And then, because that was so much fun, some architectures have multiple syntaxes. Beyond parsing, let's talk about all the directives. Besides having multiple macro expanders to implement, many directives do subtle things with sections and fragments, exposing a lot of what would otherwise be implementation details, so your compiler backends basically have to be architected according to how C compilers have traditionally been architected in order to support all the interactions between compiler code and assembly code. But we're getting to the end of the Now we switch to the And while many of the machine-specific constraints sound simple, they (a) are all things the register allocator has to understand in detail, (b) can result in fixups and relocations that the whole compiler backend has to be able to represent, and (c) have complex interactions with other constraints in ways that aren't always documented. Also, don't miss the section on goto labels, because this means that an inline asm is effectively an arbitrary branch instruction too, so it can also create arbitrary n-way control-flow constructs which the rest of your compiler backend now has to be able to understand, and which the register allocator has to be able to spill and reload around, to satisfy the already complex constraint systems, because it all wasn't complex enough already. And with all the compiler implementation details inline asm exposes, there are no clear rules for what parts of the compiler's behavior are stable, and which (presumably) are undefined behavior to rely on. Can you do If we work to add inline asm to Cranelift, with our current resources, it will take us multiple years, and delay other features. This is not an exaggeration, because I'm familiar with the effort it took the LLVM project to adequately implement GCC-style inline asm to support common code, with far more resources, and to this day it's still not uncommon to find things that don't work. And, doing so will make it harder for us to maintain and evolve Cranelift beyond that, because it would lock the backend architecture into certain ways of doing things, because so much is exposed. Furthermore, this is not a project where people can easily contribute small steps to help get to the eventual goal. There is major design work to be done, deep within the most complex parts of the backend. |
@sunfishcode I didnt realize that inline asm was that complex. I assumed it would just be a matter of implementing all instructions and that the rest would be easy. |
@sunfishcode fantastic post! My question is this though: is there any way to de-scope? I get that some might consider an "ideal" implementation to be one where cranelift understands the assembly it is compiling, but I take the opposite approach. IMO cranelift (and most compilers for that matter) should understand nothing about the assembly they are compiling. It should be completely opaque. If someone wants to write assembly the interop work should be on them. They should essentially have to do this:
It is then the assembly writer's job to make sure that their assembly doesn't break things for whatever platform it is being compiled against. Cranelift should not be the one to compile the Cranelift would have to only provide platform-specific intrinsics for storing/restoring state (i.e. Will this lead to slower code (missed optimizations) than if cranelift knew how to compile the asm? Definitely. But as you point out, that is a huge can of worms -- and not only that, it seems to me it is a liability to the maintainability of the project in general. |
If you put the burden of marshalling values into specific registers on the user, and you don't care too much about micro-optimizations, the feature doesn't seem much better than out-of-line asm (.s files), though it'd still be a lot of work. More broadly, hypothetical use cases for inline asm are awkward because they tend to artificially constrain the design space, and make it difficult to determine appropriate priorities. Consequently, I'd like to request anyone wishing to discuss inline asm further to please include in your post: (a) a description of a concrete use case using Cranelift, (b) as complete as possible a description of what specific instructions, instruction sequences, or machine state needs to be accessed, and (c) an explanation for why intrinsics or out-of-line asm might not be sufficient for the use case. We can then discuss it starting from that point. This only pertains to inline asm, due to the extraordinary nature of this feature. Thanks! |
Does cranelift already support this? I'm really curious as to whether there can be an reasonably simple and well-defined solution which can solve any issue except performance (albeit with potentially more work needed from a programmer). |
You can use GNU |
Sorry to pop in. We're having a debate on Rust forums about whether Rust should ever stabilize inline assembly (currently an unstable feature which needs to be redesigned), and part of the discussion has to do with the difficulty of Cranelift supporting this functionality eventually. This isn't a short term issue. For now, the discussion is only about whether Rust should support inline assembly; there's no specification of how the redesigned version should work, let alone an implementation or stabilization. Also, Cranelift support in rustc isn't upstream. Even if inline assembly were stabilized and Cranelift support upstreamed, rustc could still probably use LLVM to compile just the functions containing inline assembly. @sunfishcode raises a lot of valid concerns in this thread, and it's clear they've thought about this deeply. Still, I think it is possible to mitigate many of those concerns, especially if the scope is limited to what's necessary for a future Rust inline assembly feature (as opposed to, say, compatibility with existing C codebases). The following is only a very broad sketch, but hopefully it can start a discussion: First, regarding parsing assembly mnemonics and directives: An alternative would be to add a compilation mode to Cranelift that generates assembly and passes it through an external assembler, instead of generating machine code directly. This way, inline assembly could just be spliced in, with no need for Cranelift to parse anything. Of course, this would still require adding a bunch of new functionality, both to emit the assembly and to run the assembler. And it would not be suitable for JIT use cases, but Rust doesn't currently need a JIT. Regarding constraint systems: A Rust inline assembly feature would probably have a drastically simpler constraint system, since compatibility with existing codebases isn't needed and GCC's constraints are really overcomplicated. Regarding As for the paragraph about implementation details... I've tried going through each of the questions and answering them:
No. Neither GCC nor Clang uses .pushsection to enter its sections, so on existing implementations this would just result in an error. But it is valid and useful to have matched pushsection/popsection pairs within an assembly string.
No. Only GCC emits these (not Clang) and their names are not predictable, so it would be hard to use them anyway.
If you mean with a pushsection/popsection pair: Why not? For the Rust use case where you're generating an object file, you already need to anticipate other objects being linked in which contain data in those sections, and thus need to generate relocations against symbols rather than sections. No harm in letting asm blocks do the same. If you mean using .section and just leaving the assembler in the other section at the end of the asm block: No. With no way to force the compiler to generate symbols in a specific order, you'd be putting some unknown number of other symbols into the other section. Even if you tried to change things back in a different asm block in the same function, the compiler is not required to output basic blocks in any particular order, and there's also the possibility of duplication (see below).
No: without a way to find the end of the function, such an assumption would be useless. But you can assume that the function's entry point is the address you get if you write the function's name, modulo things like the Thumb bit on ARM.
Yes, unless the architecture/OS uses execute-only pages or has special requirements on text sections. Why not?
Yes; GCC and Clang do.
Yes; ditto.
I tested it on x86: Clang errors out, GCC produces useless output. In Rust, I'd prefer to not support tying at all; you can just use a temporary variable instead, and it'll be more clear what's going on. If it does become supported, it would probably be an error to tie operands of different widths.
No (what modifiers?). Though there's probably no need for Rust to support an "m" constraint at all; on some architectures (ARM) it's ambiguous and mostly useless. If a memory constraint is supported at all, it would probably be best to make it architecture-dependent and more precisely specify what it expands to.
No, because that would break if the latter were duplicated, and there is no way to disable duplication. |
This is what I expect we'd have to do for the forseeable future.
It's a good point. This would be a shorter path (though still not an easy one) to supporting at least the "frontend" side of inline asm, for at least the way Rust is typically used today. It's unclear if this would be worth building though, as we know people already talking about JIT use cases, so if we find people willing to take on a project of this scale, we might prefer they build an assembler library anyway.
I'm not saying it's impossible. But the discussion in the linked thread often doesn't acknowledge that complexity doesn't always scale linearly when you combine features, generalize features, embed features in the middle of the most compile-time-sensitive and compile-quality-sensitive NP-complete problem approximating part of the backend, or even, say, do all of the above at the same time ;-).
I expect your answers are correct. But also, yes, this was just some questions I thought of off the top of my head. Inline asm is a massive expansion of the user-facing surface area of a compiler. And, much of it is not immediately visible, because it's not Rust syntax, and it's not even just assembly syntax, but it's also "how does assembly code written by the user interact with assembly code produced by the compiler", with a large list of directives at its disposal that can be involved in interactions. A lot of this area isn't documented or even really designed. We can usually figure out what to do in any given situation. But it's harder to design a backend in a way that we can be reasonably sure will work for the long term, and not set us up for years of figuring out situation after situation. People often ask, "Can't you just support a subset of inline asm?" But everyone seems to need a different subset. And, there are many intuitive subsets which turn out to be insufficient for what people actually need. And even one subset does emerge, it may grow over time -- people in the linked thread talk about "C Parity", which could put pressure on any reasonable subset. And then, subsets can still be a lot of work, and come with the risk that if the subset grows, this work may need to be redone in a more general way later. |
…ecodealliance#1041) Also fixes broken links to point to docs.rs; see bytecodealliance/cranelift#1041 (comment), there doesn't currently seem to be a way to link to a type in a dependent crate.
I guess this one strictly relates to rust-lang/rust#69171 , the good news is that the new |
Rustc_codegen_cranelift has solved this by wrapping inline assembly with a prologue and epilogue and the passing it to an external assembler. It can then be called as regular function from the Cranelift side. |
I'm also very much in need for this. I'm using Cranelift as a backend for my C compiler project. The lack of an assembler is quite unfortunate, which means I have to consider hacking a bit to either bypass Cranelift on assembly/raw/naked functions, or I have to write my own codegen using dynasm. |
@stevefan1999-personal you or anyone else interested is welcome to contribute! As outlined by sunfishcode above, this is a big design question, so any effort should probably start with an RFC; and then the implementation effort would probably be 3-6 months of fulltime work, I would estimate. Currently none of the core developers have the time/resources to build this, so it would likely have to come from a motivated contributor. |
@cfallin Looks like we should better off moving this into a GH discussion...rather than an issue |
@stevefan1999-personal we don't use the GitHub Discussions feature here; this issue is perfectly fine for now, I think! |
Sure. I just read how Cland and LLVM handles inline assembly and I find it maybe useful towards the end goal: https://youtu.be/MeB7Dp3G2UE?si=2boJuSqM-JLH2f7o&t=510 Basically, LLVM is also treating the inline assembly as string, replace the parameters, captures used registers and feed it to the register allocator, and then insert the output assembly at the specific point. I'm not sure if I missed something. |
Indeed, it sounds kind of simple when described at a high level like that. I'd recommend reading sunfishcode's comment above (and really, this whole thread): there are a huge number of complexities here that need to be fully appreciated and handled in any robust solution. As one example, we currently generate machine code directly, rather than using any existing assembler at the backend. And our machine-code representation is close but not exactly 1-to-1 with instructions, and we don't have representations for instructions we don't use. So at the very least, such a project would require (i) writing a new textual assembler that uses our emission backend, or adopting an existing assembler library in Rust to work with our backend and provide the same interface and metadata to the register allocator, and (ii) extending our instruction representation to support all known CPU instructions. That's an enormous effort. To state it more explicitly: the input that is most needed here is not ideas or brainstorming or problem-solving, nor confirmation that this is a useful feature; we have plenty of ideas, and we know this is a useful feature that some people want. The input that is needed is human time to actually develop the feature; something like 3-6 months of fulltime engineering. Cranelift is developed by folks who use it for various applications, and we're all quite busy, and don't have the free time for that. Anyone who does, and is sufficiently motivated to need this feature, is welcome to help out! |
In rustc_codegen_cranelift I have a bunch of code which wraps inline asm in a function which uses the SystemV calling convention and then pass this whole asm block to an external assembler. This may or may not work for your use case: https://github.com/bjorn3/rustc_codegen_cranelift/blob/master/src/inline_asm.rs |
Since cranelift is soon to be a backend for rust, it will need to support inline assembly. There is no good way to solve this right now, since rust uses the llvm inline asm syntax right now. I'm making this issue so we can think about this in the long-term.
The text was updated successfully, but these errors were encountered: