Cache chunks in RAM #9

JackKelly · 2023-12-05T11:59:18Z

Why cache data in LSIO?

I'd like LSIO to do its own caching (and to always read using O_DIRECT), for several reasons:

LSIO's cache can be "aware" of the graph of computation. So it can "smartly" cache just the data that's going to be used for a subsequent computation.
LSIO's cache can avoid memory copies: If we rely on the operating system's page cache then even a cache hit will result in a memory copy from the OS page cache into the process' buffer. If, instead, the cache is in user-land (in LSIO) then we stand a chance of being able to use the cached data without any mem copies!

Unstructured notes

Probably needs to be the first step of the planning stage. We only want to load exactly what we need from cache.

The planning stage can't do all the work, because the planning stage doesn't know about any transform functions. Instead, the planning stage needs to express "load these chunks from cache. Load these other chunks from disk". (We don't want both the planning stage and the loading stage to have to perform cache lookups, because that's expensive).

Keep track of the time that each Vector was accessed, and the slice(s) accessed.

Maybe make a separate Rust crate for caching.

Just cache the raw data from IO. (Let's avoid caching the processed data: the processed data takes up more RAM and requires the caching mechanism to know which transforms have been applied to each chunk.)

Let users choose to cache all the data read from disk (e.g. when merging nearby reads. But mark the unwanted chunks as having been read "zero times"); or cache just the chunks that the user requested. Or cache nothing!

Let the user configure read-ahead or read-behind caching. (e.g. if the user requests 4/5th of a file, then, allocate enough RAM buffer for the entire file. But, first, only read what the user requested (returning a slice into the RAM buffer). And then, after giving the user what they requested, read the rest of the file in the background).

The user can specify the max amount of RAM to use for caching. Elements will be thrown away depending on the time they were last accessed.

Allow the user to observe:

the cache hit ratio
the total amount of RAM used for cache
the total amount of RAM used for data that has never been read (e.g. read-ahead that has never been read).

Consider this situation: In the first call, the user loads all of a 1 GB file. From then on, the user only reads the last 1 kB of that file. We shouldn't keep the entire file in RAM. Instead, we should move the last 1 kB to a new Vector, and throw away the 1 GB buffer. That's why we need to record to slice(s) requested for each Vector.

Data chunks returned to the user will always immutable. This allows LSIO's caching mechanism to be faster at returning cache hits than the operating system's page cache because LSIO doesn't have to memcopy anything. In contrast, the OS has to memcopy from page cache into the process' address space.

For now, let's just make sure the design can handle caching. But don't implement caching for the MVP. Record ideas in this github issue, and link to the issue from the design doc.

JackKelly · 2023-12-09T15:34:07Z

Simplify caching! And restrict it to a single layer!

Caching should exist only in a thin layer ~~after the planning, and before reading~~ UPDATE: actually, shouldn't caching exist between the user and the planning? I think we can ignore the few situations where thinking about the cache before the planning is useful (but I should check!)

See if we can satisfy requests (even partially) from cache. If so, there are several options:

If the user provided a buffer, then memcpy from cache into the buffer (and record the read) (possibly do the memcpy concurrently using Rayon's join)
If the user didn't provide a buffer, and the cache fully satisfies the read, then give a slice reference to the cache as the buffer
If the user didn't provide a buffer, and the cache only partially satisfies the read (or multiple cache elements are required to satisfy the read, or some of the requested read), then allocate an entire new buffer, and memcpy (concurrently) the part(s) of the cache which is/are useful (concurrently). Then submit a read (from IO) into the un-initialised part of the vector. And make a note to copy the newly read data into LSIO's cache. (There could also be a "defrag" method on the cache.)

Cache miss:

If the user provided a buffer, then write into that buffer, and make a note to copy that data into LSIO's cache after the read.

If the user didn't provide a buffer, then allocate one now, and make a note that LSIO owns this buffer, and so the buffer can be used as LSIO's cache and a slice can be returned to the user.

Need to think about what happens when the planning stage reads more than the user requested. How to cache the unrequested data?

JackKelly · 2023-12-09T15:36:43Z

Consider using a `Map<PathBuf, File>` where each `File` struct would store cached filesize, and cached data

Then, maybe assign one File to each FileChunks, so that we can safely plan FileChunks in parallel, because each File will only be owned by one FileChunks object.

JackKelly · 2023-12-15T11:34:47Z

Thing about using a rope data structure for storing the cache. @clbarnes first introduced me to "rope data structures" in his comment here: apache/arrow-rs#4612 (comment)

Each File would have its own "rope". As we load chunks of that file into RAM, those chunks would be represented by leaf notes in the rope. A "cache hit" would be when we can fully satisfy a user's request from the rope.

Ropes are nice because insertion is cheap (see the wikipedia page for a comparison of the performance characteristics of a rope vs an array). And ropes are nice because they naturally represent the ordering of the chunks. And make it clear how to implement rebalancing / merging operations.

But I need to think more about the performance of a rope vs a hash map. I perhaps should start with a hash map (because implementations exist in the standard library), and move to a rope if a hash map doesn't work out?

Also see: https://stackoverflow.com/questions/1863440/is-there-any-scenario-where-the-rope-data-structure-is-more-efficient-than-a-str

clbarnes · 2023-12-15T12:12:56Z

I guess the important thing is that the outer cache struct keeps the same interface, then the rest is just an implementation detail. Would this Map exist for each File, and be keyed on the byte range requested? In that case you'd need to iterate through the hashmap looking for ranges every time you wanted to read. A BTreeMap might be more efficient in that case, although the start index obviously isn't the only information you want to keep track of.

JackKelly · 2023-12-15T12:19:14Z

All good questions! (as always!)

I must admit I haven't put a huge amount of thought into the implementation of the cache yet. I'm deliberately postponing designing the cache until I've gotten something working (without a cache) because my brain keeps exploding when trying to design something with so many unknowns 🙂!

JackKelly added this to light-speed-io Dec 20, 2023

JackKelly added the performance Improvements to runtime performance label Feb 10, 2024

JackKelly modified the milestone: Perform at least as fast as `fio` Feb 10, 2024

JackKelly moved this to Todo in light-speed-io Feb 15, 2024

JackKelly moved this from Todo to Backlog in light-speed-io Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache chunks in RAM #9

Cache chunks in RAM #9

JackKelly commented Dec 5, 2023 •

edited

Loading

JackKelly commented Dec 9, 2023 •

edited

Loading

JackKelly commented Dec 9, 2023

JackKelly commented Dec 15, 2023 •

edited

Loading

clbarnes commented Dec 15, 2023

JackKelly commented Dec 15, 2023 •

edited

Loading

Cache chunks in RAM #9

Cache chunks in RAM #9

Comments

JackKelly commented Dec 5, 2023 • edited Loading

Why cache data in LSIO?

Unstructured notes

JackKelly commented Dec 9, 2023 • edited Loading

Simplify caching! And restrict it to a single layer!

JackKelly commented Dec 9, 2023

Consider using a Map<PathBuf, File> where each File struct would store cached filesize, and cached data

JackKelly commented Dec 15, 2023 • edited Loading

clbarnes commented Dec 15, 2023

JackKelly commented Dec 15, 2023 • edited Loading

JackKelly commented Dec 5, 2023 •

edited

Loading

JackKelly commented Dec 9, 2023 •

edited

Loading

Consider using a `Map<PathBuf, File>` where each `File` struct would store cached filesize, and cached data

JackKelly commented Dec 15, 2023 •

edited

Loading

JackKelly commented Dec 15, 2023 •

edited

Loading