diff --git a/README.md b/README.md index 176277a..e9dca9d 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,9 @@ [![Gem Version](https://badge.fury.io/rb/tiktoken_ruby.svg)](https://badge.fury.io/rb/tiktoken_ruby) + # tiktoken_ruby [Tiktoken](https://github.com/openai/tiktoken) is BPE tokenizer from OpenAI used with their GPT models. -This is a wrapper around it aimed primarily at enabling accurate counts of GPT model tokens used. +This is a wrapper around it aimed primarily at enabling accurate counts of GPT model tokens used. ## Request for maintainers @@ -20,18 +21,19 @@ If bundler is not being used to manage dependencies, install the gem by executin $ gem install tiktoken_ruby ## Usage + Usage should be very similar to the python library. Here's a simple example Encode and decode text + ```ruby require 'tiktoken_ruby' - -# note: retrieving an encoding is not currently thread safe until https://github.com/IAPark/tiktoken_ruby/pull/30 is merged enc = Tiktoken.get_encoding("cl100k_base") enc.decode(enc.encode("hello world")) #=> "hello world" ``` Encoders can also be retrieved by model name + ```ruby require 'tiktoken_ruby' @@ -59,7 +61,6 @@ bundle exec rake compile bundle exec rake spec ``` - ## License The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT). diff --git a/lib/tiktoken_ruby/encoding.rb b/lib/tiktoken_ruby/encoding.rb index 4bee10d..21cd70d 100644 --- a/lib/tiktoken_ruby/encoding.rb +++ b/lib/tiktoken_ruby/encoding.rb @@ -1,6 +1,8 @@ # frozen_string_literal: true class Tiktoken::Encoding + CACHE_MUTEX = Mutex.new + attr_reader :name # This returns a new Tiktoken::Encoding instance for the requested encoding @@ -15,8 +17,10 @@ def self.for_name(encoding) # @param encoding [Symbol] The name of the encoding to load # @return [Tiktoken::Encoding] The encoding instance def self.for_name_cached(encoding) - @encodings ||= {} - @encodings[encoding.to_sym] ||= Tiktoken::Encoding.for_name(encoding) + CACHE_MUTEX.synchronize do + @encodings ||= {} + @encodings[encoding.to_sym] ||= Tiktoken::Encoding.for_name(encoding) + end end # Encodes the text as a list of integer tokens. This encoding will encode special non text tokens