Can I use SentencepieceTokenizer in C#? #468

tylike · 2023-06-09T10:32:47Z

hi!

I have an LLM in Onnx format and a sentencepiece.model, and I used HuggingFace and SententPiece together in Python. Now I plan to do inference in C# + OnnxRuntime, but I haven't found a suitable version of the SententPiece library in C#. I saw that there is a SentencepieceTokenizer here. Can I use SentencepieceTokenizer in C#?

My files were downloaded from here: https://huggingface.co/K024/ChatGLM-6b-onnx-u8s8/tree/main/chatglm-6b-int8-onnx-merged. Thank you."

wenbingl · 2023-07-17T23:03:09Z

Yes, the Nuget package could be found here: https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.Extensions/0.8.0

tylike · 2023-07-18T08:39:47Z

I only saw the C# demo code for registering the extension.
In Python, I need to use the tokenizer to get the Ids of the user input text first when using the LLM model for inference, and then do the subsequent processing.
But I didn’t find a class that wraps this tokenizer as a C# version in this extension library, did I misunderstand it?
Can you give some examples?

For example: I defined it like this in python:
from sentencepiece import SentencePieceProcessor
sp_model = SentencePieceProcessor(model_file=model_path)
ids = sp_model.encode(s)

wenbingl · 2023-07-20T22:18:33Z

@sayanshaw24 , can you add SPM tokenizer into our C# example?

nshoman · 2023-09-07T15:27:12Z

Perhaps this issue can be closed? I was looking for something similar on the decoder end and was able to develop what I think @tylike wanted.

I should note beforehand that this was done using v0.8.0; the function build_my_graph was renamed at some point to build_graph.

Here's a working solution I came up with:

##building the model
from onnxruntime_extensions._ortapi2 import make_onnx_model
from onnxruntime_extensions._cuops import SingleOpGraph
import onnx

kwargs={'model':open(path/to/model, 'rb').read()}
model = make_onnx_model(graph)
onnx.save_model(model,/outputpath/model.onnx)

##inference
import onnxruntime as _ort
from onnxruntime_extensions import get_library_path as _lib_path
import numpy as np

so = _ort.SessionOptions()
so.register_custom_ops_library(_lib_path())
sess = _ort.InferenceSession(/outputpath/model.onnx, so)

alpha = 0
nbest_size = 0
flags = 0

inp_dict = {'inputs': np.array(['your text here']),
            'nbest_size': np.array([nbest_size],dtype=np.int64),
            'alpha':np.array([alpha],np.float32),
            'add_bos':np.array([flags & 1], dtype=np.bool_),
            'add_eos':np.array([flags & 2], dtype=np.bool_),
            'reverse':np.array([flags & 4], dtype=np.bool_)}

outs = sess.run(None,input_feed=inp_dict)
token_array = outs[0]

While this isn't C#, hopefully it illustrates how to perform inference using the ONNX tokenizer; it should be relatively straightforward to implement from the python code.

Just make sure to load the extensions library when performing inference in C#:

SessionOptions options = new SessionOptions();
options.RegisterOrtExtensions();
session = new InferenceSession(model, options);

This test helped me quite a bit:

onnxruntime-extensions/test/test_sentencepiece_ops.py

Line 463 in 696e750

    
           ofunc = OrtPyFunction.from_customop('SentencepieceDecoder', model=open(fullname, 'rb').read())

GeorgeS2019 · 2023-10-12T19:40:46Z

@wenbingl

Where are the folder for c# Examples?

I see Java folder under the root but no trace of CSharp folder?

tylike closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use SentencepieceTokenizer in C#? #468

Can I use SentencepieceTokenizer in C#? #468

tylike commented Jun 9, 2023

wenbingl commented Jul 17, 2023

tylike commented Jul 18, 2023 •

edited

Loading

wenbingl commented Jul 20, 2023

nshoman commented Sep 7, 2023 •

edited

Loading

GeorgeS2019 commented Oct 12, 2023

Can I use SentencepieceTokenizer in C#? #468

Can I use SentencepieceTokenizer in C#? #468

Comments

tylike commented Jun 9, 2023

wenbingl commented Jul 17, 2023

tylike commented Jul 18, 2023 • edited Loading

wenbingl commented Jul 20, 2023

nshoman commented Sep 7, 2023 • edited Loading

GeorgeS2019 commented Oct 12, 2023

tylike commented Jul 18, 2023 •

edited

Loading

nshoman commented Sep 7, 2023 •

edited

Loading