Skip to content

Commit

Permalink
⚡ Faster string generator (#213)
Browse files Browse the repository at this point in the history
* Always encode ULIDs

* Implement `ULID.gen` for String optimized generator
* Removed lazy calculation for `ULID#to_s`
* Prefer `ULID#encode` rather than `to_s` in implementation

* Update documents with renaming to `ULID.encode`

* Fix YARD

* Add stackprof for the profiling

* Add stackprof for the profiling

* Make 2x faster with `String#tr`
  • Loading branch information
kachick authored Jul 17, 2022
1 parent 73b628d commit 993d903
Show file tree
Hide file tree
Showing 17 changed files with 187 additions and 48 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
/spec/examples.txt
/test/tmp/
/test/version_tmp/
/tmp/
/tmp/*
!/tmp/.keep

# Used by dotenv library to load environment variables.
# .env
Expand Down
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ group(:development) do
gem('rbs', '~> 2.6.0', require: false)
gem('steep', '~> 1.0.1', require: false)
gem('benchmark-ips', '~> 2.10.0', require: false)
gem('stackprof')
gem('yard', '~> 0.9.28', require: false)
gem('rubocop', '~> 1.31.1', '!= 1.31.2', require: false)
gem('rubocop-rake', '~> 0.6.0', require: false)
Expand Down
40 changes: 30 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,37 +61,51 @@ ULID::VERSION
# => "0.5.0"
```

### Basic Generator
### Generator and Parser

The generated `ULID` is an object not just a string.
`ULID.generate` returns `ULID` instance. It is not just a string.

```ruby
ulid = ULID.generate #=> ULID(2021-04-27 17:27:22.826 UTC: 01F4A5Y1YAQCYAYCTC7GRMJ9AA)
```

### Parser

Get the objects from exists encoded ULIDs.
`ULID.parse` returns `ULID` instance from exists encoded ULIDs.

```ruby
ulid = ULID.parse('01ARZ3NDEKTSV4RRFFQ69G5FAV') #=> ULID(2016-07-30 23:54:10.259 UTC: 01ARZ3NDEKTSV4RRFFQ69G5FAV)
ulid = ULID.parse('01F4A5Y1YAQCYAYCTC7GRMJ9AA') #=> ULID(2021-04-27 17:27:22.826 UTC: 01F4A5Y1YAQCYAYCTC7GRMJ9AA)
```

### ULID object

Extract timestamps and binary formats.
It can extract timestamps and binary formats.

```ruby
ulid = ULID.parse('01F4A5Y1YAQCYAYCTC7GRMJ9AA') #=> ULID(2021-04-27 17:27:22.826 UTC: 01F4A5Y1YAQCYAYCTC7GRMJ9AA)
ulid.to_time #=> 2021-04-27 17:27:22.826 UTC
ulid.milliseconds #=> 1619544442826
ulid.encode #=> "01F4A5Y1YAQCYAYCTC7GRMJ9AA"
ulid.to_s #=> "01F4A5Y1YAQCYAYCTC7GRMJ9AA"
ulid.timestamp #=> "01F4A5Y1YA"
ulid.randomness #=> "QCYAYCTC7GRMJ9AA"
ulid.to_i #=> 1957909092946624190749577070267409738
ulid.octets #=> [1, 121, 20, 95, 7, 202, 187, 60, 175, 51, 76, 60, 49, 73, 37, 74]
```

`ULID.generate` can take fixed `Time` instance. `ULID.at` is the shorthand.

```ruby
time = Time.at(946684800).utc #=> 2000-01-01 00:00:00 UTC
ULID.generate(moment: time) #=> ULID(2000-01-01 00:00:00.000 UTC: 00VHNCZB00N018DCPJA4H9379P)
ULID.generate(moment: time) #=> ULID(2000-01-01 00:00:00.000 UTC: 00VHNCZB006WQT3JTMN0T14EBP)
ULID.at(time) #=> ULID(2000-01-01 00:00:00.000 UTC: 00VHNCZB002W5BGWWKN76N22H6)
```

Also `ULID.encode` can be used if you just want to get ID.
It returns [normalized](#variants-of-format) String without object creation (No huge pros in the speed for now).
It can take same arguments as `ULID.generate`.

```ruby
ULID.encode #=> "01G86M42Q6SJ9XQM2ZRM6JRDSF"
ULID.encode(moment: Time.at(946684800).utc) #=> "00VHNCZB00SYG7RCEXZC9DA4E1"
```

### Sortable with the timestamp

ULIDs are sortable when they are generated in different timestamp with milliseconds precision.
Expand All @@ -103,6 +117,12 @@ ulids = 1000.times.map do
end
ulids.uniq(&:to_time).size #=> 1000
ulids.sort == ulids #=> true

time = Time.at(946684800).utc #=> 2000-01-01 00:00:00 UTC
ulids = 1000.times.map do |n|
ULID.at(time + n)
end
ulids.sort == ulids #=> true
```

`ULID.generate` can take fixed `Time` instance. The shorthand is `ULID.at`.
Expand Down
6 changes: 6 additions & 0 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,12 @@ task(:benchmark_with_other_gems) do
end
end

task(:stackprof) do
sh('bundle exec ruby ./scripts/prof.rb')
sh('bundle exec stackprof tmp/stackprof-wall-*.dump --text --limit 1')
sh('bundle exec stackprof tmp/stackprof-cpu-*.dump --text --limit 1')
end

desc('Generate many sample data for snapshot tests')
task(:update_fixed_examples) do
sh('rm ./test/many_data/fixtures/dumped_fixed_examples_*.bin')
Expand Down
4 changes: 2 additions & 2 deletions benchmark/compare_with_othergems/kachick/generate.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
products = []

Benchmark.ips do |x|
x.report('ULID.generate.to_s') do
products << ULID.generate.to_s
x.report('ULID.encode') do
products << ULID.encode
end
end

Expand Down
11 changes: 11 additions & 0 deletions benchmark/generate_vs_encode.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# coding: utf-8
# frozen_string_literal: true

require('benchmark/ips')
require_relative('../lib/ulid')

Benchmark.ips do |x|
x.report('ULID.generate.to_s') { ULID.generate.to_s }
x.report('ULID.encode') { ULID.encode }
x.compare!
end
1 change: 1 addition & 0 deletions benchmark/generators.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

Benchmark.ips do |x|
x.report('ULID.generate') { ULID.generate }
x.report('ULID.encode') { ULID.encode }
x.report('ULID::MonotonicGenerator#generate') { monotonic_generator.generate }
x.report('ULID.parse') { ULID.parse(encoded) }
x.report('ULID.from_integer') { ULID.from_integer(fixed_integer) }
Expand Down
60 changes: 42 additions & 18 deletions lib/ulid.rb
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,14 @@ def self.generate(moment: current_milliseconds, entropy: reasonable_entropy)
from_milliseconds_and_entropy(milliseconds: milliseconds_from_moment(moment), entropy: entropy)
end

# @param [Integer, Time] moment
# @param [Integer] entropy
# @return [String]
def self.encode(moment: current_milliseconds, entropy: reasonable_entropy)
n32_encoded = encode_n32(milliseconds: milliseconds_from_moment(moment), entropy: entropy)
CrockfordBase32.from_n32(n32_encoded).upcase.freeze
end

# Short hand of `ULID.generate(moment: time)`
# @param [Time] time
# @return [ULID]
Expand Down Expand Up @@ -166,7 +174,12 @@ def self.from_integer(integer)
milliseconds = n32encoded_timestamp.to_i(32)
entropy = n32encoded_randomness.to_i(32)

new(milliseconds: milliseconds, entropy: entropy, integer: integer)
new(
milliseconds: milliseconds,
entropy: entropy,
integer: integer,
encoded: CrockfordBase32.from_n32("#{n32encoded_timestamp}#{n32encoded_randomness}").freeze
)
end

# @param [Range<Time>, Range<nil>, Range[ULID]] period
Expand Down Expand Up @@ -254,6 +267,17 @@ def self.milliseconds_from_moment(moment)
SecureRandom.random_number(MAX_ENTROPY)
end

private_class_method def self.encode_n32(milliseconds:, entropy:)
raise(ArgumentError, 'milliseconds and entropy should be an `Integer`') unless Integer === milliseconds && Integer === entropy
raise(OverflowError, "timestamp overflow: given #{milliseconds}, max: #{MAX_MILLISECONDS}") unless milliseconds <= MAX_MILLISECONDS
raise(OverflowError, "entropy overflow: given #{entropy}, max: #{MAX_ENTROPY}") unless entropy <= MAX_ENTROPY
raise(ArgumentError, 'milliseconds and entropy should not be negative') if milliseconds.negative? || entropy.negative?

n32encoded_timestamp = milliseconds.to_s(32).rjust(TIMESTAMP_ENCODED_LENGTH, '0')
n32encoded_randomness = entropy.to_s(32).rjust(RANDOMNESS_ENCODED_LENGTH, '0')
"#{n32encoded_timestamp}#{n32encoded_randomness}"
end

# @param [String, #to_str] string
# @return [ULID]
# @raise [ParserError] if the given format is not correct for ULID specs
Expand Down Expand Up @@ -378,16 +402,13 @@ def self.try_convert(object)
# @raise [OverflowError] if the given value is larger than the ULID limit
# @raise [ArgumentError] if the given milliseconds and/or entropy is negative number
def self.from_milliseconds_and_entropy(milliseconds:, entropy:)
raise(ArgumentError, 'milliseconds and entropy should be an `Integer`') unless Integer === milliseconds && Integer === entropy
raise(OverflowError, "timestamp overflow: given #{milliseconds}, max: #{MAX_MILLISECONDS}") unless milliseconds <= MAX_MILLISECONDS
raise(OverflowError, "entropy overflow: given #{entropy}, max: #{MAX_ENTROPY}") unless entropy <= MAX_ENTROPY
raise(ArgumentError, 'milliseconds and entropy should not be negative') if milliseconds.negative? || entropy.negative?

n32encoded_timestamp = milliseconds.to_s(32).rjust(TIMESTAMP_ENCODED_LENGTH, '0')
n32encoded_randomness = entropy.to_s(32).rjust(RANDOMNESS_ENCODED_LENGTH, '0')
integer = (n32encoded_timestamp + n32encoded_randomness).to_i(32)

new(milliseconds: milliseconds, entropy: entropy, integer: integer)
n32_encoded = encode_n32(milliseconds: milliseconds, entropy: entropy)
new(
milliseconds: milliseconds,
entropy: entropy,
integer: n32_encoded.to_i(32),
encoded: CrockfordBase32.from_n32(n32_encoded).upcase.freeze
)
end

# @dynamic milliseconds, entropy
Expand All @@ -397,18 +418,21 @@ def self.from_milliseconds_and_entropy(milliseconds:, entropy:)
# @param [Integer] milliseconds
# @param [Integer] entropy
# @param [Integer] integer
# @param [String] encoded
# @return [void]
def initialize(milliseconds:, entropy:, integer:)
def initialize(milliseconds:, entropy:, integer:, encoded:)
# All arguments check should be done with each constructors, not here
@integer = integer
@encoded = encoded
@milliseconds = milliseconds
@entropy = entropy
end

# @return [String]
def to_s
@string ||= CrockfordBase32.encode(@integer).freeze
def encode
@encoded
end
alias_method(:to_s, :encode)

# @return [Integer]
def to_i
Expand All @@ -427,7 +451,7 @@ def <=>(other)

# @return [String]
def inspect
@inspect ||= "ULID(#{to_time.strftime(TIME_FORMAT_IN_INSPECT)}: #{to_s})".freeze
@inspect ||= "ULID(#{to_time.strftime(TIME_FORMAT_IN_INSPECT)}: #{@encoded})".freeze
end

# @return [Boolean]
Expand Down Expand Up @@ -486,12 +510,12 @@ def randomness_octets

# @return [String]
def timestamp
@timestamp ||= (to_s.slice(0, TIMESTAMP_ENCODED_LENGTH).freeze || raise(UnexpectedError))
@timestamp ||= (@encoded.slice(0, TIMESTAMP_ENCODED_LENGTH).freeze || raise(UnexpectedError))
end

# @return [String]
def randomness
@randomness ||= (to_s.slice(TIMESTAMP_ENCODED_LENGTH, RANDOMNESS_ENCODED_LENGTH).freeze || raise(UnexpectedError))
@randomness ||= (@encoded.slice(TIMESTAMP_ENCODED_LENGTH, RANDOMNESS_ENCODED_LENGTH).freeze || raise(UnexpectedError))
end

# @note Providing for rough operations. The keys and values is not fixed.
Expand Down Expand Up @@ -552,7 +576,7 @@ def marshal_dump
# @return [void]
def marshal_load(integer)
unmarshaled = ULID.from_integer(integer)
initialize(integer: unmarshaled.to_i, milliseconds: unmarshaled.milliseconds, entropy: unmarshaled.entropy)
initialize(integer: unmarshaled.to_i, milliseconds: unmarshaled.milliseconds, entropy: unmarshaled.entropy, encoded: unmarshaled.to_s)
end

# @return [self]
Expand Down
14 changes: 11 additions & 3 deletions lib/ulid/crockford_base32.rb
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ class SetupError < UnexpectedError; end

CROCKFORD_BASE32_CHAR_PATTERN = /[#{N32_CHAR_BY_CROCKFORD_BASE32_CHAR.keys.join}]/.freeze

CROCKFORD_BASE32_CHAR_BY_N32_CHAR = N32_CHAR_BY_CROCKFORD_BASE32_CHAR.invert.freeze
N32_CHAR_PATTERN = /[#{CROCKFORD_BASE32_CHAR_BY_N32_CHAR.keys.join}]/.freeze
ORDERED_CROCKFORD_BASE32_CHARS = N32_CHAR_BY_CROCKFORD_BASE32_CHAR.keys.join.freeze
ORDERED_N32_CHARS = N32_CHAR_BY_CROCKFORD_BASE32_CHAR.values.join.freeze

STANDARD_BY_VARIANT = {
'L' => '1',
Expand All @@ -80,7 +80,7 @@ def self.decode(string)
# @return [String]
def self.encode(integer)
n32encoded = integer.to_s(32)
n32encoded.upcase.gsub(N32_CHAR_PATTERN, CROCKFORD_BASE32_CHAR_BY_N32_CHAR).rjust(ENCODED_LENGTH, '0')
from_n32(n32encoded).rjust(ENCODED_LENGTH, '0')
end

# @api private
Expand All @@ -89,5 +89,13 @@ def self.encode(integer)
def self.normalize(string)
string.gsub(VARIANT_PATTERN, STANDARD_BY_VARIANT)
end

# @api private
# @param [String] n32encoded
# @return [String]
def self.from_n32(n32encoded)
# `tr` is almost 2x Faster than `gsub(regex, hash)` in Ruby 3.1
n32encoded.upcase.tr(ORDERED_N32_CHARS, ORDERED_CROCKFORD_BASE32_CHARS)
end
end
end
4 changes: 4 additions & 0 deletions lib/ulid/monotonic_generator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@ def generate(moment: ULID.current_milliseconds)
end
end

# @todo Consider to provide this
# def encode
# end

undef_method(:freeze)

# @raise [TypeError] always raises exception and does not freeze self
Expand Down
2 changes: 1 addition & 1 deletion lib/ulid/version.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@
# shareable_constant_value: literal

class ULID
VERSION = '0.5.0'
VERSION = '0.6.0.pre'
end
13 changes: 13 additions & 0 deletions scripts/prof.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# coding: us-ascii
# frozen_string_literal: true

require 'stackprof'
require_relative '../lib/ulid'

StackProf.run(mode: :wall, out: "./tmp/stackprof-wall-#{Time.now.to_i}.dump") do
100000.times { ULID.encode }
end

StackProf.run(mode: :cpu, out: "./tmp/stackprof-cpu-#{Time.now.to_i}.dump") do
100000.times { ULID.encode }
end
Loading

0 comments on commit 993d903

Please sign in to comment.