Use the DirectML and CPU nuget packages together in Windows App #425

AshD · 2024-05-09T15:20:25Z

AshD
May 9, 2024

Background: Fusion Quill is a Windows AI Word processor and Chat app on the Microsoft Store. It currently uses llama.cpp to support multiple AI models and switches between using CUDA, ROC and CPU llama.cpp dlls depending on what the end user's PC capabilities.

How do I switch between using DirectML and CPU GenAI packages at runtime. If the user has a GPU, I want to use the Microsoft.ML.OnnxRuntimeGenAI.DirectML package with the corresponding DirectML model and if the user does not have a GPU, I want to the the Microsoft.ML.OnnxRuntimeGenAI package with the CPU version of the model.

Thanks,
Ash

natke · 2024-05-09T20:53:02Z

natke
May 9, 2024
Collaborator

Hi @AshD, that should be possible if you use the DirectML package. If you already tried it and ran into issues, please let us know what the issues were

0 replies

AshD · 2024-05-09T21:05:33Z

AshD
May 9, 2024
Author

When I try to run the Phi-3-mini-128k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4 model with the DirectML package I get this error in generator.ComputeLogits().
Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'Non-zero status code returned while running Expand node. Name:'/model/attn_mask_reformat/input_ids_subgraph/Expand' Status Message: invalid expand shape'

Are you saying that I can use the DirectML package with the DirectML phi-3 model on a PC without a GPU and it will work?

I had tried that with some DirectML Stable Diffusion Onnx models that I created and it did not throw an error but returned brown image on a PC without a GPU. I was told by the Microsoft Olive team, that that Onnx models optimized for DirectML and could not be used on a PC without a DirectML GPU.

Thanks,
Ash

3 replies

natke May 9, 2024
Collaborator

I meant what you tried, which was to run the CPU model with DirectML package, but it seems like we have an issue.

natke May 9, 2024
Collaborator

I'll covert this over to an Issue

AshD May 10, 2024
Author

Thanks @natke That would be awesome. We could support the onnxruntime-genai as a way to run onnx AI models alongside .gguf models in our Fusion Quill app.

natke · 2024-06-10T22:34:48Z

natke
Jun 10, 2024
Collaborator

Hey @AshD, did have you tried this with the latest RC?

0 replies

AshD · 2024-06-10T23:08:24Z

AshD
Jun 10, 2024
Author

This works for smaller context sizes but fails for larger ones 64K context. I think you are aware of this bug, so I will close this one.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the DirectML and CPU nuget packages together in Windows App #425

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Use the DirectML and CPU nuget packages together in Windows App #425

AshD May 9, 2024

Replies: 4 comments · 3 replies

natke May 9, 2024 Collaborator

AshD May 9, 2024 Author

natke May 9, 2024 Collaborator

natke May 9, 2024 Collaborator

AshD May 10, 2024 Author

natke Jun 10, 2024 Collaborator

AshD Jun 10, 2024 Author

AshD
May 9, 2024

Replies: 4 comments 3 replies

natke
May 9, 2024
Collaborator

AshD
May 9, 2024
Author

natke May 9, 2024
Collaborator

natke May 9, 2024
Collaborator

AshD May 10, 2024
Author

natke
Jun 10, 2024
Collaborator

AshD
Jun 10, 2024
Author