You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
I've been trying out the metal implementation on an M1 mac, and main is working fine, but I would also like to be able to get embeddings. Accelerating this with metal would be fantastic for me.
I tried to understand what would need to change, but I'm not conversent enough with the code to figure it out. Happy to try to make the changes myself and submit a PR if that would be helpful.
Current Behavior
As far as I can tell, embeddings does not use metal. At least, the GPU usage stays at 0% when I give the -ngl 1 parameter.
I should also mention that using the llama-cpp-python wrapper to get embeddings also does not use GPU, while a 'normal' inference of the model does.
I haven't tested if this is the case with a CUDA backend, but I can do if that is useful information.
Environment and Context
I'm running on a 32GB M1 macbook pro
python = Python 3.10.10
make = GNU Make 3.81
cmake = cmake version 3.25.2
g++ = Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin22.5.0
Long noncoding RNAs (lncRNAs) regulate gene expression via their RNA product or through transcriptional interference, yet a strategy to differentiate these two processes is lacking. To address this, we used multiple small interfering RNAs (siRNAs) to silence GNG12-AS1, a nuclear lncRNA transcribed in an antisense orientation to the tumour-suppressor DIRAS3. Here we show that while most siRNAs silence GNG12-AS1 post-transcriptionally, siRNA complementary to exon 1 of GNG12-AS1 suppresses its transcription by recruiting Argonaute 2 and inhibiting RNA polymerase II binding. Transcriptional, but not post-transcriptional, silencing of GNG12-AS1 causes concomitant upregulation of DIRAS3, indicating a function in transcriptional interference. This change in DIRAS3 expression is sufficient to impair cell cycle progression. In addition, the reduction in GNG12-AS1 transcripts alters MET signalling and cell migration, but these are independent of DIRAS3. Thus, differential siRNA targeting of a lncRNA allows dissection of the functions related to the process and products of its transcription.
Steps to Reproduce
build with cmake ../ -DLLAMA_METAL=ON -DBUILD_SHARED_LIBS=ON
(shared libs is to workaround an issue with the python binding - hopefully not relevant to this)
run ./bin/embedding -f abs -c 1024 -ngl 1 -m ./Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin
Failure Logs
Metal does appear to be loading, and I get embeddings, but no GPU usage
llama.cpp: loading model from ./llms/guanaco-33B.bin
ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '(null)'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=258 "The file name is invalid."
@jacobfriedman Do you have ggml-metal.metal in the bin directory (or I guess next to wherever you're running embeddings from)? If I move it out I get that error, and I saw the same thing with llama-cpp-python wrapper until I saw this abetlen/llama-cpp-python#317 (comment)
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
I've been trying out the metal implementation on an M1 mac, and
main
is working fine, but I would also like to be able to get embeddings. Accelerating this with metal would be fantastic for me.I tried to understand what would need to change, but I'm not conversent enough with the code to figure it out. Happy to try to make the changes myself and submit a PR if that would be helpful.
Current Behavior
As far as I can tell,
embeddings
does not use metal. At least, the GPU usage stays at 0% when I give the-ngl 1
parameter.I should also mention that using the
llama-cpp-python
wrapper to get embeddings also does not use GPU, while a 'normal' inference of the model does.I haven't tested if this is the case with a CUDA backend, but I can do if that is useful information.
Environment and Context
I'm running on a 32GB M1 macbook pro
python = Python 3.10.10
make = GNU Make 3.81
cmake = cmake version 3.25.2
g++ = Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin22.5.0
Failure Information (for bugs)
I'm running
./bin/embedding -f abs -c 1024 -ngl 1 -m ./Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin
content of abs is:
Steps to Reproduce
build with
cmake ../ -DLLAMA_METAL=ON -DBUILD_SHARED_LIBS=ON
(shared libs is to workaround an issue with the python binding - hopefully not relevant to this)
run
./bin/embedding -f abs -c 1024 -ngl 1 -m ./Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin
Failure Logs
Metal does appear to be loading, and I get embeddings, but no GPU usage
The text was updated successfully, but these errors were encountered: