-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements needed, and planned features #38
Comments
Regarding GPU compilers and generally "offloading of some parts of the computation" you might have a look at DawnCC - some years ago I found their work quite inspiring (btw. they seem to be from Brazil, so might be more accessible for you in some way or another). |
A JIT compilerAdopting WASM opens lots of capabilities and possibilities and currently, it's a blue ocean for new languages and attracts attention. Windows supportThe priority should be very low in the current stage, so please avoid putting your precious resource into it right now. GPU compilersGPU compilers also can attract attention in terms of Machine Learning since I observe ML finally has entered the true realm of functional programming. https://github.com/google/jax
At least for ML, nowadays, the local physical GPU device usage is rather outdated. The recent ML researchers actively take advantage of Cloud TPUs, so you may also consider this movement. Compile Kind to HVM
I've noticed that and in fact
HVM looks very clean and mostly has no "statements" and I like better than Haskell and even better than Kind. My proposal is not to compile Kind to HVM but extends HVM itself. I suppose it is a simpler approach then should be far easier to extend HVM by setting up the suited syntax specially designed for the underlying HVM from scratch. HVM can have a simple module system to extend itself. It's like a standard library + procedural-macro in Rust.
Since HVM concise syntax itself is just AST, this approach should fit the HVM module system. For instance,
So type syntax is required on top of the native HVM then a static type checker runs. There is a good chance to implement first-class types like Idris. Perhaps we need more expansion of pattern-matching syntax or IMO, infix functions/ combinators/operators definition. All are module base extensions, so an incremental approach to the new perfect language. (EDIT) I think this AST to another AST approach (as same as Procedural macros in Rust) has been already done and working because HVM code can be already transpiled to C code.
|
Targeting GPUsIt might be worth taking a look at Futhark, which is a small language for writing code that can target CUDA and OpenCL. (There's an open issue for WebGPU support, though it hasn't received any attention) Futhark can compile to a C library; I haven't looked at HVM's compilation output in detail yet, but I'm guessing that GPU-targeting code could compile to C that calls a C library (compiled from Furthark) for using the GPU. C compiler/portabilityZig has done some extensive work on bundling Clang for easily setting up a C compiler; maybe there's some work from there which could be reused? |
Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded. I'd be interested in getting the ball rolling on this. Where is the normalize function? I can't find a function named |
I'm only dabbling in this topic, so it's possible that it may not be suited at all; but this paragraph is conspicuously lacking any mention of Vulkan. |
For GPGPU: For JIT compiler part: |
What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lots of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in principle? |
Here is the relevant function on Rust. Similar one on C. It is quite small, as it just recurses over a term, applying reduce() layer by layer, to compute the normal form. To integrate IO, we'd just modify that function like this:
Basically it would do the same as run_io here, preferably using Kind's IO Type. (Will answer the remaining questions later.) |
There is also PureScript, which supports alternate backends. |
Out of curiosity, why is there both a runtime.c and a runtime.rs? They appear to be extremely similar. Is the rust one the interpreter and the C one for linking when compiling? |
@jmorag Yes, runtime.rs is the code used for the interpreter and runtime.c is the template of boilerplate code used as the runtime by C programs generated by the compiler. |
Yes, that is a major long-term goal of HVM. Note that HVM is not an optimizing compiler. It is just the runtime. So, it would pair extremely well with GHC. As in,
Would be amazing to compile PureScript to HVM (and benchmark!). |
Just note.
Yes, possible; however, I suppose the priority for the author is currently low for the above reason, and such a project should be independently done by the Haskell community with support from the author. It's not a small work and I hope the resource will be primarily provided by the Haskell community itself. I guess the only reason the demo code is compared to the Haskell code is simply since Haskell is more well-known than Kind so that broader functional community users can easily reach the concept, then GHC performance benchmark. Although I think Kind has a better design than Haskell, I still think HVM is better than Kind and hope HVM can be extended with the compositional modules. Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" ruleIt seems that we need the type-chcker based on HVM in any way, so I think it's a simpler integrated approach toward the typed-HVM. |
Perhaps |
Actually I'm looking for a master thesis topic. Could we maybe talk about this? I would be interested in implementing this. |
@rowanG077 I'm on the PureScript core team and I recently supervised another thesis on a PureScript backend. |
My understanding of the HVM runtime is that it's lazy. How would strict languages like PureScript and Idris retain their semantics when compiling to HVM? |
@jmorag I don't know in details, but I know that purenix works in a scenario like that, in this section it explain the strictness difference impact on purenix usage. |
C compilerI just tried
There may be other issues, I didn't investigate further. I think these rule out embedding |
C compilerThanks @DylanSp for trying
This is a totally minor issue. See e.g. how V (which uses
Sounds like something minor to me.
So I'm convinced that if multiplatform quick compilation is desired (IMHO it is, but I'm not the one to decide), |
STG -> HVM might not be feasible, because, well, multiple reasons. The main problem is that GHC doesn't care about |
@rowanG077 sorry for the delay! Yes, feel free to mail me ([email protected]) or DM me in any of my social networks. It would be an honor if you did your thesis related to HVM, and certainly there are many places where such a thing would add a lot of value to it. |
I forgot to mention in terms of Windows support #52 Nowadays, they no longer use Cygwin. They simply use a real Linux kernel on Windows that is called Windows Subsystem for Linux WSL2. So I guess we don't have to be concerned about this. |
FYI, this is an excellent discussion related to GHC you might be interested in. #74 |
I agree with @stken2050 that HVM is well suited as a general purpose programming language. It just needs really well designed primitives, and over time, a standard library. |
I'd be interested in adding support for at least the unboxed numeric types as this seems like a low hanging fruit. I just wanted to check what the concensus was between implicit/explicit casting and how operators should function in regards to mixed input types. My instinct would just be to cast up to the 'bigger' type but maybe operators should only operate on the same type - combined with explicit casts. @VictorTaelin what do you think? |
Operators should just operate on the same type. So, if F32 is added, for example, ADD will work on either u32/u32 or f32/f32, but not anything else (i.e., no u64/u32, no f32/u32, etc.). |
Why not add a type conversion operator for primitive types too? Rust has it, and we could probably just easily use theirs under the hood? |
@nothingnesses I think explicit (i.e. written every single time by the programmer) conversion from one type to another will be needed one way or another. I think @VictorTaelin meant that there shouldn't be any implicit conversion as that always leads to some irregularities and confusion. |
Ah maybe. I don't think any conversion operators are currently implemented in the language right now though, which was why I brought it up. |
Yes, that's what I meant. No implicit conversions, just explicit. |
I posted an idea on Cardano Catalyst for supporting a HVM backend for PureScript. If it receives a lot of votes, it may receive some sponsorship: https://cardano.ideascale.com/c/idea/396001 |
Would something like arrayfire solve the GPGPU problem? |
I've been researching Roc and they have a nice idea for handling IO. The language doesn't handle IO at all, and instead delegates this to a platform which will provide IO capabilities. There are more details here and in an older talk. |
Regarding allocators: I have used |
Is GPU target done given this is closed? |
no, we're just not using this to track TODOs. We've raised funds to build a company around the HVM and will be doing many things, including a GPU target. We're hiring :) |
Better allocator
Currently, the C allocator just reserves 8 GB of memory when the program starts, and, if there are multiple threads, that memory is split between then. So, for example, if there are 8 threads, each one gets 1 GB to work with. A worker can read from other worker's memories (which may happen for shared data, i.e., when they pass through a DP0/DP1 pointer), but only the owner of the slice can allocate memory from it. Each worker also keeps its own freelist.
As such, a huge improvement would be thread-safe, global, possibly arena-based, alloc and free.
Better scheduler
Right now, the task scheduler will just partition threads evenly among the normal form of the result. So, for example, if the result is
(Pair A B)
, and there are 8 threads, 4 will be assigned to reduce A and B. This is obviously very limited: if A reduces quickly, it won't re-assign the threads to help reducing B, so the program will just use half of the cores from that point. And so on. This is why algorithms that return lists are often slower, they aren't using threads at all. In the worst case, it will just fallback to being single threaded.A huge improvement would be a proper scheduler, that adds potential redexes to a task pool. When I attempted to do that, the cost of synchronization added too much overhead, ruining the performance. Perhaps a heuristic to consider would be to limit the depth of the redexes for which global tasks are emitted; anything below, say, depth=8, would just be reduced by the thread that reaches it. Many improvements can be done, though.
I32, I64, U64, F32, F64
Right now, HVM only supports U32. The numeric types above should be supported too. I32 and F32 should be easy to add, since they are unboxed, like U32. I64, U64 and F64 require implementing boxed numbers, since they don't fit inside a 64-bit Lnk (due to the 4-bit pointer), but shouldn't be hard. Discussion on whether we should have unboxed 60-bit variants is valid.
On #81.
Improve the left-hand side flattener
On #54.
A nice API to use as a Rust library
Using HVM as a pure Rust lib inside other projects should be very easy, specially considering the interpreter has a fairly good performance (only about 50% slower than the single-thread compiled version). We must think of the right API to expose these features in a Rust-idiomatic, future-proof way. Input from the Rust community is appreciated.
A JIT compiler
Right now, HVM compiles to C. That means a C compiler like clang or gcc is necessary to produce executables, which means that it isn't portable when used as a lib. Of course, libs can just use the interpreter, but it is ~3x slower than the compiler, and is not parallel. Ideally, we'd instead JIT-compile HVM files using WASM or perhaps Cranelift, allowing lib users to enjoy the full speed of HVM in a portable way.
IO
Adding IO should be easy: just write an alternative version of
normalize
that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.Windows support
The compiler doesn't support Windows yet. The use of
-lpthreads
may be an issue. The interpreter should work fine, though.On #52.
GPU compilers
Compiling to the GPU would be amazing, and I see no reason this shouldn't be possible. The only complications I can think of are:
How do we alloc inside a GPU work unit?
Reduce() is highly branching, so this may destroy the performance. But it can be altered to group tasks by categories, and merge all branches in one go. The reduce function essentially does this:
branch -> match -> alloc -> subst -> free
. The last 4 steps are very similar independent of the branch it falls into. So, instead, the branch part can just prepare some data, and thematch -> alloc -> subst -> free
pass are all moved to the same branch, just with different indices and parameters. This will avoid thread divergence.GPGPU is a laborious mess. I'm under OSX, so I can't use CUDA; OpenCL is terrible; WebGPU is too new (and from what I could see, doesn't have the ability to read/write from the same buffer in the same pass!?). Regardless, we'd probably need different versions, Metal for OSX, CUDA and OpenCL for Windows depending on the GPU, etc. So, that's a lot of work that I can't do myself, it would require expensive engineers.
Compile Kind to HVM
HVM is meant to be a "low-level target", which is kinda confusing because it is actually very high-level, but in the sense that users shouldn't be writing it directly. Ideally, Kind will compile to HVM, but it needs a lot of adjustments before that is possible.
Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" rule
See this issue for more information, as well as the superposed duplication section on HOW.md.
On #61.
Tests
This project has basically no test. Write tests!
Will edit the thread as I think of more things. Feel free to add to this thread.
The text was updated successfully, but these errors were encountered: