You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The language definition cache (introduced in #583) is designed to cache the necessary components for the Topiary API for a given input file type. This has been implemented to be thread-safe, as it is used concurrently, with a RwLock: First the read lock is acquired, to retrieve the cache contents if it exists; otherwise, the read lock is released and a write lock is acquired to update the cache and return the appropriate value.
This works and is thread-safe, but it doesn't exhibit the desired behaviour. When multiple consumers of the cache (in our case, async futures) query the cache more-or-less simultaneously, they (almost) all bypass the read branch of the code -- because the cache is empty -- and hit the write branch. This flaunts the point of the cache to reduce writes (which involves disk IO to produce the value).
To Reproduce
Create a directory named, for example, test. In that directory, create a simple JSON file and reproduce it 200 times:
Additional context
The obvious solution would be to wrap the cache (a HashMap) in a Mutex, rather than a RwLock; i.e., lock the whole thing on access. Unfortunately, this isn't as straightforward as it seems, because we have to use an async scope to placate the borrow checker. That imposes a Send constraint, which is not satisfied by this simple change.
💡 This issue may conflict with the lazy loading effort (see XXX).
The text was updated successfully, but these errors were encountered:
Describe the bug
The language definition cache (introduced in #583) is designed to cache the necessary components for the Topiary API for a given input file type. This has been implemented to be thread-safe, as it is used concurrently, with a
RwLock
: First the read lock is acquired, to retrieve the cache contents if it exists; otherwise, the read lock is released and a write lock is acquired to update the cache and return the appropriate value.This works and is thread-safe, but it doesn't exhibit the desired behaviour. When multiple consumers of the cache (in our case, async futures) query the cache more-or-less simultaneously, they (almost) all bypass the read branch of the code -- because the cache is empty -- and hit the write branch. This flaunts the point of the cache to reduce writes (which involves disk IO to produce the value).
To Reproduce
Create a directory named, for example,
test
. In that directory, create a simple JSON file and reproduce it 200 times:Format the contents of this directory, with the logging level turned up, and search for
Cache
in the output:$ RUST_LOG=debug topiary fmt test 2>&1 | grep Cache
You will see 201 lines that almost all look like this:
You may get lucky and some of the lines will look like this:
Expected behavior
In the logging output, we should see the first line look something like this:
Then we should see 200 lines that look like this:
Environment
Additional context
The obvious solution would be to wrap the cache (a
HashMap
) in aMutex
, rather than aRwLock
; i.e., lock the whole thing on access. Unfortunately, this isn't as straightforward as it seems, because we have to use an async scope to placate the borrow checker. That imposes aSend
constraint, which is not satisfied by this simple change.💡 This issue may conflict with the lazy loading effort (see XXX).
The text was updated successfully, but these errors were encountered: