-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support free-threaded Python #165
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #165 +/- ##
==========================================
- Coverage 88.92% 88.74% -0.19%
==========================================
Files 13 13
Lines 2195 2203 +8
Branches 2195 2203 +8
==========================================
+ Hits 1952 1955 +3
- Misses 148 153 +5
Partials 95 95
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
CodSpeed Performance ReportMerging #165 will improve performances by 11.93%Comparing Summary
Benchmarks breakdown
|
e4ab0b7
to
aa38b16
Compare
@@ -86,28 +85,34 @@ impl StringMaybeCache for StringNoCache { | |||
} | |||
} | |||
|
|||
static STRING_CACHE: GILOnceCell<GILProtected<RefCell<PyStringCache>>> = GILOnceCell::new(); | |||
static STRING_CACHE: OnceLock<Mutex<PyStringCache>> = OnceLock::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the use of a mutex here, the single-threaded benchmark is not meaningfully impacted.
We can worry about multithreaded performance in the far future if highly parallel uses of jiter should arise; and if users hit a pathological case before we do anything fancy they can always turn off the string cache.
98b54fd
to
787d2b0
Compare
def test_multithreaded_parsing(): | ||
"""Basic sanity check that running a parse in multiple threads is fine.""" | ||
expected_datas = [json.loads(data) for data in JITER_BENCH_DATAS] | ||
|
||
def assert_jiter_ok(data: bytes, expected: Any) -> bool: | ||
return jiter.from_json(data) == expected | ||
|
||
with ThreadPoolExecutor(8) as pool: | ||
results = [] | ||
for _ in range(1000): | ||
for data, expected_result in zip(JITER_BENCH_DATAS, expected_datas): | ||
results.append(pool.submit(assert_jiter_ok, data, expected_result)) | ||
|
||
for result in results: | ||
assert result.result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can confirm with this simple test that the Mutex
basically stops parallelism when the cache is enabled, using jiter.from_json(data, cache_mode="none")
leads to about 8x speedup on my machine.
I still don't mind that really, this PR just gets the freethreaded mode working and we can worry about that sort of optimization later.
787d2b0
to
58dace4
Compare
@samuelcolvin I think this is good to ship; main thing for you to be aware of is the mutex on the cache as per the above, so parallelism is not as good as it could be with the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please can we add something to the readme explaining that you'll want to set the string cache to none
if you want free-threading support.
Also an example in the readme of how free threading can speed things up might be nice.
Ok, so in the process of writing the README I wrote a more complete measurement: Measurement scriptfrom concurrent.futures import ThreadPoolExecutor
import time
import jiter
from pathlib import Path
JITER_BENCH_DIR = Path(__file__).parent.parent / 'jiter' / 'benches'
JITER_BENCH_SAMPLES = [
(JITER_BENCH_DIR / 'bigints_array.json').read_bytes(),
(JITER_BENCH_DIR / 'floats_array.json').read_bytes(),
(JITER_BENCH_DIR / 'massive_ints_array.json').read_bytes(),
(JITER_BENCH_DIR / 'medium_response.json').read_bytes(),
(JITER_BENCH_DIR / 'pass1.json').read_bytes(),
(JITER_BENCH_DIR / 'pass2.json').read_bytes(),
(JITER_BENCH_DIR / 'sentence.json').read_bytes(),
(JITER_BENCH_DIR / 'short_numbers.json').read_bytes(),
(JITER_BENCH_DIR / 'string_array_unique.json').read_bytes(),
(JITER_BENCH_DIR / 'string_array.json').read_bytes(),
(JITER_BENCH_DIR / 'true_array.json').read_bytes(),
(JITER_BENCH_DIR / 'true_object.json').read_bytes(),
(JITER_BENCH_DIR / 'unicode.json').read_bytes(),
(JITER_BENCH_DIR / 'x100.json').read_bytes(),
]
# warmup run, deliberately don't fill cache
with ThreadPoolExecutor(8) as pool:
results = []
for _ in range(1000):
for sample in JITER_BENCH_SAMPLES:
results.append(pool.submit(jiter.from_json, sample, cache_mode='none'))
del results[:]
# run without cache
with ThreadPoolExecutor(8) as pool:
results = []
start = time.monotonic()
for _ in range(1000):
for sample in JITER_BENCH_SAMPLES:
results.append(pool.submit(jiter.from_json, sample, cache_mode='none'))
no_cache_duration = time.monotonic() - start
del results[:]
# run with cache
with ThreadPoolExecutor(8) as pool:
results = []
start = time.monotonic()
for _ in range(1000):
for sample in JITER_BENCH_SAMPLES:
results.append(pool.submit(jiter.from_json, sample, cache_mode='all'))
with_cache_duration = time.monotonic() - start
del results[:]
print('ratio:', with_cache_duration / no_cache_duration) On my M1 Mac, this suggests that Also, testing against the GIL-enabled Python I see a similar total execution time, but a lot more total CPU consumed. Initial free-threaded Python release is slow :) Overall, I think this means my previous observation that the cache was a problem on the free-threaded build is premature, and we can leave it out of the README for now. |
Add freethreaded Python support.
Pushing a bit early so that I can see results of benchmarks & CI.