Many things to improve or fix #269

darkanubis0100 · 2024-10-02T19:51:52Z

3,000 and 1 bugs...
Using CLANG is impossible because it directly brings up any infinity of bugs, under which version is that supposed to work without bugs? Because I'm using 18.1.3 on Ubuntu 24.04 and it's impossible to even use EXO without GPU because of CLANG bugs (the same one that has been shown many times in the repo issues).
If CLANG already didn't work for some reason, with CUDA it's even worse, it doesn't support WSL or I have to use a physical Linux to run EXO? Because CUDA fails me with a weird error but if I disable bfloat16, directly the error is a “Segmentation Fault” after it asks me to install “llvmlite”.
Detailed instructions and requirements in the repository? That doesn't exist here, I have to do magic trying to find out what programs or modules are missing to install and it still fails. Where is the complete list of dependencies? They don't even indicate that the “build-essential” is required.
How am I supposed to run it on Android? I tried using Termux with an Ubuntu and ran into several errors, most notably Tailscale....

AlexCheema · 2024-10-02T21:15:42Z

Thanks for being patient with us. exo is still highly experimental and there are indeed a lot of bugs that need to be fixed.

I think the general point here is the tinygrad inference engine isn't stable enough (no fault of tinygrad, it's a new library that we've integrated into a highly experimental project). We have other inference engines coming (PyTorch and llama.cpp). PyTorch is almost ready to merge: #139. I'm hoping since PyTorch is more mature and people are more familiar with it, it will be a lot more stable.

I will prioritise better instructions and requirements. The idea is that there shouldn't need to be much since it should "just work".

I've run on Android successfully with termux before we introduced tailscale. Is the tailscale dependency breaking it now?

darkanubis0100 · 2024-10-04T17:07:57Z

Thanks for being patient with us. exo is still highly experimental and there are indeed a lot of bugs that need to be fixed.

I think the general point here is the tinygrad inference engine isn't stable enough (no fault of tinygrad, it's a new library that we've integrated into a highly experimental project). We have other inference engines coming (PyTorch and llama.cpp). PyTorch is almost ready to merge: #139. I'm hoping since PyTorch is more mature and people are more familiar with it, it will be a lot more stable.

I will prioritise better instructions and requirements. The idea is that there shouldn't need to be much since it should "just work".

I've run on Android successfully with termux before we introduced tailscale. Is the tailscale dependency breaking it now?

I understand the situation, PyTorch is certainly more than welcome but I don't understand why I can't get it working in WSL. Does it require something from dbus?

In the case of Android, that is indeed the case, Tailscale. It seems that Tailscale in Python does not exist for ARM and that is why I cannot install it.

fullofcaffeine · 2024-11-22T02:26:22Z

+1

I really appreciate what you folks are doing, but it's been almost impossible to run Exo on a Cluster of 3 Linux systems (NVIDIA/CUDA) + a Mac M2 on tinygrad. All sorts of exceptions, network errors, OOM errors, etc -- it's very brittle. I managed to run it a few times, but one of the nodes always ends up crashing somehow. It's hard to debug and track (I've crated a few issues about some of the bugs/issues I've found).

I have fallen back to just running Llama 3.1 8B locally (via Ollama) on my M2 for now, which ended up being much more stable than via the Exo cluster (albeit a bit slower, less tokens/s).

I'll wait a few months and try again, as at the moment don't have the time nor the know-how to help now, either (other than testing it out on my systems). Don't mean to discourage your work at all, I'm still very excited about Exo!

AlexCheema · 2024-11-22T15:21:31Z

+1

I really appreciate what you folks are doing, but it's been almost impossible to run Exo on a Cluster of 3 Linux systems (NVIDIA/CUDA) + a Mac M2 on tinygrad. All sorts of exceptions, network errors, OOM errors, etc -- it's very brittle. I managed to run it a few times, but one of the nodes always ends up crashing somehow. It's hard to debug and track (I've crated a few issues about some of the bugs/issues I've found).

I have fallen back to just running Llama 3.1 8B locally (via Ollama) on my M2 for now, which ended up being much more stable than via the Exo cluster (albeit a bit slower, less tokens/s).

I'll wait a few months and try again, as at the moment don't have the time nor the know-how to help now, either (other than testing it out on my systems). Don't mean to discourage your work at all, I'm still very excited about Exo!

Appreciate the feedback. Our approach has been depth-first focusing on making Mac support as good as it can be before focusing on linux again. Hopefully we'll be able to delight you in a few months time once linux is more stable / mature.

AlexCheema mentioned this issue Oct 5, 2024

replace tailscale.devices with good old http, removing the need for tailscale dependency #292

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many things to improve or fix #269

Many things to improve or fix #269

darkanubis0100 commented Oct 2, 2024

AlexCheema commented Oct 2, 2024 •

edited

Loading

darkanubis0100 commented Oct 4, 2024

fullofcaffeine commented Nov 22, 2024 •

edited

Loading

AlexCheema commented Nov 22, 2024

Many things to improve or fix #269

Many things to improve or fix #269

Comments

darkanubis0100 commented Oct 2, 2024

AlexCheema commented Oct 2, 2024 • edited Loading

darkanubis0100 commented Oct 4, 2024

fullofcaffeine commented Nov 22, 2024 • edited Loading

AlexCheema commented Nov 22, 2024

AlexCheema commented Oct 2, 2024 •

edited

Loading

fullofcaffeine commented Nov 22, 2024 •

edited

Loading