Improvements needed, and planned features #38

VictorTaelin · 2022-01-31T16:35:58Z

Better allocator

Currently, the C allocator just reserves 8 GB of memory when the program starts, and, if there are multiple threads, that memory is split between then. So, for example, if there are 8 threads, each one gets 1 GB to work with. A worker can read from other worker's memories (which may happen for shared data, i.e., when they pass through a DP0/DP1 pointer), but only the owner of the slice can allocate memory from it. Each worker also keeps its own freelist.

As such, a huge improvement would be thread-safe, global, possibly arena-based, alloc and free.

Better scheduler

Right now, the task scheduler will just partition threads evenly among the normal form of the result. So, for example, if the result is (Pair A B), and there are 8 threads, 4 will be assigned to reduce A and B. This is obviously very limited: if A reduces quickly, it won't re-assign the threads to help reducing B, so the program will just use half of the cores from that point. And so on. This is why algorithms that return lists are often slower, they aren't using threads at all. In the worst case, it will just fallback to being single threaded.

A huge improvement would be a proper scheduler, that adds potential redexes to a task pool. When I attempted to do that, the cost of synchronization added too much overhead, ruining the performance. Perhaps a heuristic to consider would be to limit the depth of the redexes for which global tasks are emitted; anything below, say, depth=8, would just be reduced by the thread that reaches it. Many improvements can be done, though.

I32, I64, U64, F32, F64

Right now, HVM only supports U32. The numeric types above should be supported too. I32 and F32 should be easy to add, since they are unboxed, like U32. I64, U64 and F64 require implementing boxed numbers, since they don't fit inside a 64-bit Lnk (due to the 4-bit pointer), but shouldn't be hard. Discussion on whether we should have unboxed 60-bit variants is valid.

On #81.

Improve the left-hand side flattener

On #54.

A nice API to use as a Rust library

Using HVM as a pure Rust lib inside other projects should be very easy, specially considering the interpreter has a fairly good performance (only about 50% slower than the single-thread compiled version). We must think of the right API to expose these features in a Rust-idiomatic, future-proof way. Input from the Rust community is appreciated.

A JIT compiler

Right now, HVM compiles to C. That means a C compiler like clang or gcc is necessary to produce executables, which means that it isn't portable when used as a lib. Of course, libs can just use the interpreter, but it is ~3x slower than the compiler, and is not parallel. Ideally, we'd instead JIT-compile HVM files using WASM or perhaps Cranelift, allowing lib users to enjoy the full speed of HVM in a portable way.

IO

Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.

Windows support

The compiler doesn't support Windows yet. The use of -lpthreads may be an issue. The interpreter should work fine, though.

On #52.

GPU compilers

Compiling to the GPU would be amazing, and I see no reason this shouldn't be possible. The only complications I can think of are:

How do we alloc inside a GPU work unit?
Reduce() is highly branching, so this may destroy the performance. But it can be altered to group tasks by categories, and merge all branches in one go. The reduce function essentially does this: branch -> match -> alloc -> subst -> free. The last 4 steps are very similar independent of the branch it falls into. So, instead, the branch part can just prepare some data, and the match -> alloc -> subst -> free pass are all moved to the same branch, just with different indices and parameters. This will avoid thread divergence.
GPGPU is a laborious mess. I'm under OSX, so I can't use CUDA; OpenCL is terrible; WebGPU is too new (and from what I could see, doesn't have the ability to read/write from the same buffer in the same pass!?). Regardless, we'd probably need different versions, Metal for OSX, CUDA and OpenCL for Windows depending on the GPU, etc. So, that's a lot of work that I can't do myself, it would require expensive engineers.

Compile Kind to HVM

HVM is meant to be a "low-level target", which is kinda confusing because it is actually very high-level, but in the sense that users shouldn't be writing it directly. Ideally, Kind will compile to HVM, but it needs a lot of adjustments before that is possible.

Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" rule

See this issue for more information, as well as the superposed duplication section on HOW.md.

On #61.

Tests

This project has basically no test. Write tests!

Will edit the thread as I think of more things. Feel free to add to this thread.

The text was updated successfully, but these errors were encountered:

dumblob · 2022-01-31T20:09:07Z

Regarding GPU compilers and generally "offloading of some parts of the computation" you might have a look at DawnCC - some years ago I found their work quite inspiring (btw. they seem to be from Brazil, so might be more accessible for you in some way or another).

ken-okabe · 2022-01-31T22:34:20Z

A JIT compiler

Adopting WASM opens lots of capabilities and possibilities and currently, it's a blue ocean for new languages and attracts attention.

Windows support

The priority should be very low in the current stage, so please avoid putting your precious resource into it right now.

GPU compilers

GPU compilers also can attract attention in terms of Machine Learning since I observe ML finally has entered the true realm of functional programming.

JAX Vs TensorFlow Vs PyTorch: A Comparative Analysis / JAX is a Python library designed for high-performance numerical computing.

https://github.com/google/jax

JAX is Autograd and XLA, brought together for high-performance machine learning research.

JAX transformations only work on pure functions, which don't have side-effects and respect referential transparency (i.e. object identity testing with is isn't preserved). If you use a JAX transformation on an impure Python function, you might see an error like Exception: Can't lift Traced... or Exception: Different traces at same level.

JAX now runs on Cloud TPUs. To try out the preview, see the Cloud TPU Colabs.

At least for ML, nowadays, the local physical GPU device usage is rather outdated. The recent ML researchers actively take advantage of Cloud TPUs, so you may also consider this movement.

Compile Kind to HVM

HVM is meant to be a "low-level target", which is kinda confusing because it is actually very high-level

I've noticed that and in fact

HVM files look like untyped Haskell.

HVM looks very clean and mostly has no "statements" and I like better than Haskell and even better than Kind.

My proposal is not to compile Kind to HVM but extends HVM itself.

I suppose it is a simpler approach then should be far easier to extend HVM by setting up the suited syntax specially designed for the underlying HVM from scratch.

HVM can have a simple module system to extend itself. It's like a standard library + procedural-macro in Rust.

Procedural macros allow you to run code at compile time that operates over Rust syntax, both consuming and producing Rust syntax. You can sort of think of procedural macros as functions from an AST to another AST.
Procedural macros must be defined in a crate with the crate type of proc-macro.

Since HVM concise syntax itself is just AST, this approach should fit the HVM module system.

For instance,

HVM files look like untyped Haskell.

So type syntax is required on top of the native HVM then a static type checker runs. There is a good chance to implement first-class types like Idris.

Perhaps we need more expansion of pattern-matching syntax or IMO, infix functions/ combinators/operators definition.

All are module base extensions, so an incremental approach to the new perfect language.

(EDIT) I think this AST to another AST approach (as same as Procedural macros in Rust) has been already done and working because HVM code can be already transpiled to C code.
The difference of self-expansion module is

HVM -> C
Additional syntax on HVM -> the base HVM

DylanSp · 2022-02-01T01:44:44Z

Targeting GPUs

It might be worth taking a look at Futhark, which is a small language for writing code that can target CUDA and OpenCL. (There's an open issue for WebGPU support, though it hasn't received any attention) Futhark can compile to a C library; I haven't looked at HVM's compilation output in detail yet, but I'm guessing that GPU-targeting code could compile to C that calls a C library (compiled from Furthark) for using the GPU.

C compiler/portability

Zig has done some extensive work on bundling Clang for easily setting up a C compiler; maybe there's some work from there which could be reused?

jmorag · 2022-02-01T04:24:27Z

IO

Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.

I'd be interested in getting the ball rolling on this. Where is the normalize function? I can't find a function named normalize via grep.

JakobBruenker · 2022-02-01T08:00:00Z

GPGPU is a laborious mess. I'm under OSX, so I can't use CUDA; OpenCL is terrible; WebGPU is too new (and from what I could see, doesn't have the ability to read/write from the same buffer in the same pass!?). Regardless, we'd probably need different versions, Metal for OSX, CUDA and OpenCL for Windows depending on the GPU, etc. So, that's a lot of work that I can't do myself, it would require expensive engineers.

I'm only dabbling in this topic, so it's possible that it may not be suited at all; but this paragraph is conspicuously lacking any mention of Vulkan.

L-as · 2022-02-01T10:31:47Z

For GPGPU:
This is not really a "clean" approach, but you could try targetting Futhark rather than a concrete API. Futhark has a support for a lot of different backends, and would ensure wide support.

For JIT compiler part:
I disagree that it's not portable currently, things like tinycc exist, not to mention that libgccjit is also a thing. However, I think targetting Cranelift would be very useful. You'd have to redo how you do the whole runtime system I think? The current approach of having a template C file is ugly IMO anyway.

rowanG077 · 2022-02-01T11:19:26Z

What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lots of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in principle?

VictorTaelin · 2022-02-01T12:03:28Z

IO

Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.

I'd be interested in getting the ball rolling on this. Where is the normalize function? I can't find a function named normalize via grep.

Here is the relevant function on Rust. Similar one on C. It is quite small, as it just recurses over a term, applying reduce() layer by layer, to compute the normal form. To integrate IO, we'd just modify that function like this:

let term = reduce(mem, funcs, host, opt_id_to_name);
match term {
  ... instead of checking if it is an app / lam / etc, it will match against
  ... the expected shape of the IO monad... that is, one of these:
  ... (IO_Ask method param continuation)
  ... (IO_End return_value)
  ... and then call the selected IO method and pass the result to the continuation
}

Basically it would do the same as run_io here, preferably using Kind's IO Type.

(Will answer the remaining questions later.)

klarkc · 2022-02-01T13:57:04Z

@rowanG077

What about compiling Haskell to HVM?

There is also PureScript, which supports alternate backends.

jmorag · 2022-02-01T15:55:33Z

Out of curiosity, why is there both a runtime.c and a runtime.rs? They appear to be extremely similar. Is the rust one the interpreter and the C one for linking when compiling?

nothingnesses · 2022-02-01T16:00:48Z

@jmorag Yes, runtime.rs is the code used for the interpreter and runtime.c is the template of boilerplate code used as the runtime by C programs generated by the compiler.

VictorTaelin · 2022-02-01T17:40:12Z

What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lot's of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in princible?

Yes, that is a major long-term goal of HVM. Note that HVM is not an optimizing compiler. It is just the runtime. So, it would pair extremely well with GHC. As in, Haskell -> GHC Core -> HVM, allowing us to take advantage of all the optimizations performed by the compiler. Of course, GHC's runtime has millions of feature I don't even know, so that might take of work to do.

There is also PureScript, which supports alternate backends.

Would be amazing to compile PureScript to HVM (and benchmark!).

ken-okabe · 2022-02-01T17:41:07Z

What about compiling Haskell to HVM?

Just note.
The author(@VictorTaelin) already has developed a better language: Kind

A minimal, efficient and practical programming language that aims to rethink functional programming from the scratch, and make it right. Under the hoods, it is basically Haskell, except without historical mistakes, and with a modern, consistent design.

wouldn't this be possible in principle?

Yes, possible; however, I suppose the priority for the author is currently low for the above reason, and such a project should be independently done by the Haskell community with support from the author. It's not a small work and I hope the resource will be primarily provided by the Haskell community itself.

I guess the only reason the demo code is compared to the Haskell code is simply since Haskell is more well-known than Kind so that broader functional community users can easily reach the concept, then GHC performance benchmark.

Although I think Kind has a better design than Haskell, I still think HVM is better than Kind and hope HVM can be extended with the compositional modules.

Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" rule

It seems that we need the type-chcker based on HVM in any way, so I think it's a simpler integrated approach toward the typed-HVM.

jmorag · 2022-02-01T17:51:07Z

What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lot's of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in princible?

Yes, that is a major long-term goal of HVM. Note that HVM is not an optimizing compiler. It is just the runtime. So, it would pair extremely well with GHC. As in, Haskell -> GHC Core -> HVM, allowing us to take advantage of all the optimizations performed by the compiler. Of course, GHC's runtime has millions of feature I don't even know, so that might take of work to do.

Perhaps Haskell -> Core -> STG -> HVM would be possible, as then HVM could be bolted onto the external stg interpreter which has already done a ton of work re-implementing the GHC primops and emitting whole-program compilation information so that HVM wouldn't have to also become an incremental compiler.

rowanG077 · 2022-02-01T18:09:05Z

What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lot's of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in princible?

Yes, that is a major long-term goal of HVM. Note that HVM is not an optimizing compiler. It is just the runtime. So, it would pair extremely well with GHC. As in, Haskell -> GHC Core -> HVM, allowing us to take advantage of all the optimizations performed by the compiler. Of course, GHC's runtime has millions of feature I don't even know, so that might take of work to do.

There is also PureScript, which supports alternate backends.

Would be amazing to compile PureScript to HVM (and benchmark!).

Actually I'm looking for a master thesis topic. Could we maybe talk about this? I would be interested in implementing this.

f-f · 2022-02-01T20:09:25Z

@rowanG077 I'm on the PureScript core team and I recently supervised another thesis on a PureScript backend.
I'm excited about HVM and I'd be available to provide pointers (on the PureScript side) to help get this going - feel free to reach out if you'd like to chat about this 🙂 (see my profile for contact details)

jmorag · 2022-02-01T21:08:56Z

My understanding of the HVM runtime is that it's lazy. How would strict languages like PureScript and Idris retain their semantics when compiling to HVM?

klarkc · 2022-02-01T21:14:14Z

@jmorag I don't know in details, but I know that purenix works in a scenario like that, in this section it explain the strictness difference impact on purenix usage.

DylanSp · 2022-02-02T02:05:47Z

C compiler

I just tried tinycc on the output from compiling bench/TreeSum/main.hvm; C compilation fails due to at least two issues:

lack of <stdatomic.h>
initializer element is not constant in the declaration of HEAP_SIZE (I'm not sure why this is)

There may be other issues, I didn't investigate further. I think these rule out embedding tinycc as a portable C compiler, though.

dumblob · 2022-02-02T08:56:44Z

C compiler

Thanks @DylanSp for trying tcc out!

I just tried tinycc on the output from compiling bench/TreeSum/main.hvm; C compilation fails due to at least two issues:

lack of <stdatomic.h>

This is a totally minor issue. See e.g. how V (which uses tcc by default - i.e. in non--prod mode) solves this: https://github.com/vlang/v/tree/master/thirdparty/stdatomic and https://github.com/vlang/v/tree/master/vlib/sync/stdatomic .

initializer element is not constant in the declaration of HEAP_SIZE (I'm not sure why this is)

Sounds like something minor to me.

There may be other issues, I didn't investigate further. I think these rule out embedding tinycc as a portable C compiler, though.

So I'm convinced that if multiplatform quick compilation is desired (IMHO it is, but I'm not the one to decide), tcc is the best way forward.

Heimdell · 2022-02-02T17:48:18Z

STG -> HVM might not be feasible, because, well, multiple reasons. The main problem is that GHC doesn't care about Dup-explosion prevention, well, until Linear Haskell is a thing - and even then.

VictorTaelin · 2022-02-02T18:02:28Z

@rowanG077 sorry for the delay! Yes, feel free to mail me ([email protected]) or DM me in any of my social networks. It would be an honor if you did your thesis related to HVM, and certainly there are many places where such a thing would add a lot of value to it.

ken-okabe · 2022-02-03T16:15:57Z

I forgot to mention in terms of Windows support #52

Nowadays, they no longer use Cygwin. They simply use a real Linux kernel on Windows that is called Windows Subsystem for Linux WSL2.
A full environment of a Linux distribution such as Ubuntu can be installed on WSL2, even GUI applications.
https://ubuntu.com/wsl
In the latest situation, they can install and launch the Ubuntu terminal with a few clicks.

So I guess we don't have to be concerned about this.

ken-okabe · 2022-02-09T21:27:30Z

Actually I'm looking for a master thesis topic. Could we maybe talk about this? I would be interested in implementing this.

FYI, this is an excellent discussion related to GHC you might be interested in. #74

ghost · 2022-02-11T02:18:35Z

I agree with @stken2050 that HVM is well suited as a general purpose programming language. It just needs really well designed primitives, and over time, a standard library.

samtoth · 2022-02-12T10:45:59Z

I32, I64, U64, F32, F64

Right now, HVM only supports U32. The numeric types above should be supported too. I32 and F32 should be easy to add, since they are unboxed, like U32. I64, U64 and F64 require implementing boxed numbers, since they don't fit inside a 64-bit Lnk (due to the 4-bit pointer), but shouldn't be hard. Discussion on whether we should have unboxed 60-bit variants is valid.

I'd be interested in adding support for at least the unboxed numeric types as this seems like a low hanging fruit. I just wanted to check what the concensus was between implicit/explicit casting and how operators should function in regards to mixed input types. My instinct would just be to cast up to the 'bigger' type but maybe operators should only operate on the same type - combined with explicit casts.

@VictorTaelin what do you think?

VictorTaelin · 2022-02-13T01:15:26Z

Operators should just operate on the same type. So, if F32 is added, for example, ADD will work on either u32/u32 or f32/f32, but not anything else (i.e., no u64/u32, no f32/u32, etc.).

nothingnesses · 2022-02-13T07:28:36Z

Why not add a type conversion operator for primitive types too? Rust has it, and we could probably just easily use theirs under the hood?

dumblob · 2022-02-13T13:34:08Z

@nothingnesses I think explicit (i.e. written every single time by the programmer) conversion from one type to another will be needed one way or another. I think @VictorTaelin meant that there shouldn't be any implicit conversion as that always leads to some irregularities and confusion.

nothingnesses · 2022-02-13T14:17:04Z

Ah maybe. I don't think any conversion operators are currently implemented in the language right now though, which was why I brought it up.

VictorTaelin · 2022-02-13T17:11:00Z

Yes, that's what I meant. No implicit conversions, just explicit.

klarkc · 2022-02-22T22:37:32Z

I posted an idea on Cardano Catalyst for supporting a HVM backend for PureScript. If it receives a lot of votes, it may receive some sponsorship: https://cardano.ideascale.com/c/idea/396001

glyh · 2022-04-13T03:09:55Z

Would something like arrayfire solve the GPGPU problem?

rigille · 2022-04-14T15:00:47Z

IO

Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.

I've been researching Roc and they have a nice idea for handling IO. The language doesn't handle IO at all, and instead delegates this to a platform which will provide IO capabilities. There are more details here and in an older talk.
@quleuber might be interested in this

01mf02 · 2022-04-26T07:12:41Z

Regarding allocators: I have used mimalloc to great success in a few language interpreters -- it has boosted performance considerably compared to the default allocator. It has Rust bindings as well. Even if you might not be able to use it as is, its general design might inspire you.

glyh · 2023-03-17T06:05:11Z

Is GPU target done given this is closed?

VictorTaelin · 2023-03-17T10:40:40Z

no, we're just not using this to track TODOs. We've raised funds to build a company around the HVM and will be doing many things, including a GPU target. We're hiring :)

VictorTaelin mentioned this issue Jan 31, 2022

Deadlock/livelock/etc. prevention #36

Closed

dumblob mentioned this issue Feb 1, 2022

Non-uniform architectures and systems gleisonsdm/DawnCC-Compiler#8

Closed

steinerkelvin pinned this issue Feb 2, 2022

steinerkelvin mentioned this issue Feb 2, 2022

Code crashes with segfault #51

Closed

steinerkelvin added the enhancement New feature or request label Feb 2, 2022

ken-okabe mentioned this issue Feb 3, 2022

Windows support #52

Closed

steinerkelvin mentioned this issue Feb 3, 2022

C code portability #62

Closed

4 tasks

steinerkelvin mentioned this issue Feb 12, 2022

More numeric types #81

Closed

6 tasks

steinerkelvin mentioned this issue May 4, 2022

Basic line-oriented IO loop #106

Closed

Z-snails mentioned this issue Jun 5, 2022

Ideas on a proper allocator #126

Closed

lawcho mentioned this issue Sep 14, 2022

Platform IO #148

Closed

9 tasks

VictorTaelin closed this as completed Nov 23, 2022

VictorTaelin unpinned this issue Dec 14, 2022

Improvements needed, and planned features #38

Improvements needed, and planned features #38

Comments

VictorTaelin commented Jan 31, 2022 • edited by steinerkelvin Loading

Better allocator

Better scheduler

I32, I64, U64, F32, F64

Improve the left-hand side flattener

A nice API to use as a Rust library

A JIT compiler

IO

Windows support

GPU compilers

Compile Kind to HVM

Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" rule

Tests

dumblob commented Jan 31, 2022

ken-okabe commented Jan 31, 2022 • edited Loading

A JIT compiler

Windows support

GPU compilers

Compile Kind to HVM

DylanSp commented Feb 1, 2022

Targeting GPUs

C compiler/portability

jmorag commented Feb 1, 2022

IO

JakobBruenker commented Feb 1, 2022

L-as commented Feb 1, 2022

rowanG077 commented Feb 1, 2022 • edited Loading

VictorTaelin commented Feb 1, 2022 • edited Loading

IO

klarkc commented Feb 1, 2022 • edited Loading

jmorag commented Feb 1, 2022

nothingnesses commented Feb 1, 2022

VictorTaelin commented Feb 1, 2022

ken-okabe commented Feb 1, 2022 • edited Loading

Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" rule

jmorag commented Feb 1, 2022 • edited Loading

rowanG077 commented Feb 1, 2022

f-f commented Feb 1, 2022

jmorag commented Feb 1, 2022 • edited Loading

klarkc commented Feb 1, 2022 • edited Loading

DylanSp commented Feb 2, 2022 • edited Loading

C compiler

dumblob commented Feb 2, 2022

C compiler

Heimdell commented Feb 2, 2022

VictorTaelin commented Feb 2, 2022

ken-okabe commented Feb 3, 2022

ken-okabe commented Feb 9, 2022

ghost commented Feb 11, 2022

samtoth commented Feb 12, 2022

I32, I64, U64, F32, F64

VictorTaelin commented Feb 13, 2022

nothingnesses commented Feb 13, 2022

dumblob commented Feb 13, 2022

nothingnesses commented Feb 13, 2022

VictorTaelin commented Feb 13, 2022

klarkc commented Feb 22, 2022 • edited Loading

glyh commented Apr 13, 2022

rigille commented Apr 14, 2022 • edited Loading

01mf02 commented Apr 26, 2022

glyh commented Mar 17, 2023

VictorTaelin commented Mar 17, 2023 • edited Loading

VictorTaelin commented Jan 31, 2022 •

edited by steinerkelvin

Loading

ken-okabe commented Jan 31, 2022 •

edited

Loading

rowanG077 commented Feb 1, 2022 •

edited

Loading

VictorTaelin commented Feb 1, 2022 •

edited

Loading

klarkc commented Feb 1, 2022 •

edited

Loading

ken-okabe commented Feb 1, 2022 •

edited

Loading

jmorag commented Feb 1, 2022 •

edited

Loading

jmorag commented Feb 1, 2022 •

edited

Loading

klarkc commented Feb 1, 2022 •

edited

Loading

DylanSp commented Feb 2, 2022 •

edited

Loading

klarkc commented Feb 22, 2022 •

edited

Loading

rigille commented Apr 14, 2022 •

edited

Loading

VictorTaelin commented Mar 17, 2023 •

edited

Loading