Skip to content

Commit

Permalink
Update documentation on expiration policies
Browse files Browse the repository at this point in the history
  • Loading branch information
whitfin committed Sep 13, 2024
1 parent c9d3961 commit 4e5164f
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 17 deletions.
2 changes: 1 addition & 1 deletion docs/general/local-persistence.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Local Persistence

Cachex ships with basic support for dumping a cache to a local file using the [Erlang Term Format](https://www.erlang.org/doc/apps/erts/erl_ext_dist). These files can then be used to seed data into a new instance of a cache to persist values between cache instances.
Cachex ships with basic support for dumping a cache to a local file using the [External Term Format](https://www.erlang.org/doc/apps/erts/erl_ext_dist). These files can then be used to seed data into a new instance of a cache to persist values between cache instances.

As it stands all persistence must be handled manually via the Cachex API, although additional features may be added in future to add convenience around this. Note that the use of the term "dump" over "backup" is intentional, as these files are just extracted datasets from a cache, rather than a serialization of the cache itself.

Expand Down
81 changes: 65 additions & 16 deletions docs/management/expiring-records.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,82 @@
# Expiring Records

Cachex implements several different ways of working with key expirations, each operating in different ways with different behaviour. The two main techniques being currently used are the background TTL loop (i.e. the `Janitor`) and lazy key expiration. Alone these two techniques aren't sufficient to provide an efficient system with a consistent result, but together they ensure the reliability of your cache as well as ensuring correctness. Having said this it should be noted that there are cases where you may wish to use only one, as each technique is sufficient alone in specific scenarios. By default Cachex opts for a combination of both in order to ensure consistency to reduce surprises for the user.
Cachex implements several different ways to work with key expiration, with each operating with slightly different behaviour. The two main techniques in use currently are background expiration and lazy expiration. Although there are cases where you may wish to only use one of these approaches, you'll generally want a combination of both to ensure correctness of your cache. By default Cachex will combine both approaches to provide more intuitive behaviour for the developer.

## Janitor Processes
## Janitor Services

The Janitor is a background process which will purge the internal tables every so often. The Janitor operates using a full-table sweep of the records to ensure nothing is missed, and so it runs somewhat less frequently - by default only every few seconds. This interval can be controlled by the user, and a Janitor process exists on a per-cache basis (so that each cache doesn't have an interleaved dependency).
The Cachex Janitor (`Cachex.Services.Janitor`) is a background process used to purge the internal cache tables periodically. The Janitor operates using a full table sweep of records to ensure consistency and correctness. As such, a Janitor sweep will run somewhat less frequently - by default only once every few seconds. This frequency can be controlled by the developer, and can be controlled on a per-cache basis.

As it stands the Janitor is pretty well optimized as most expense is handed over to the ETS layer; it can currently check and purge 500,000 expired keys in around a second (where the removal takes the most time, the check is very fast). Keep in mind that the frequency of the Janitor execution affects the memory usage held by expired keys; a typical use case is probably running the Janitor every few seconds, which is pretty much the default. In a production application I know of using Cachex, Janitors have been running every 3 seconds for the last year and there has never been any noticeable slowdown.
In the current version of Cachex, the Janitor is pretty well optimized as most of the work happens in the ETS layer. As a rough benchmark, it can check and purge 500,000 expired records in around a second (where the removal is a majority of the work). Keep in mind that the frequency of the Janitor execution has an impact on the memory being held by the expired keyset in your cache. For most use cases the default frequency should be just fine. If you need to, you can customize the frequency on which the Janitor runs:

As of Cachex v3, the Janitor configuration is easier to understand, and will be enabled by default to avoid catching users off guard:
```elixir
import Cachex.Spec

- By default, the Janitor will run every 3 seconds.
- If you set `:interval` to `nil` it is disabled entirely. This means you will be solely reliant on the lazy expiration policy.
- If you set `:interval` to any numeric value above `0` it will run on this schedule (this value is in milliseconds!!).
Cachex.start(:my_cache, [
expiration: expiration(
interval: :timer.seconds(3)
)
])
```

The Janitor is the only feature which is enabled by default, as it was misleading for users when it was not running by default. To disable the Janitor completely, you can set the `:interval` option to `nil`. In this case you will either be fully reliant on lazy expirations, or have to implement your own expiration handling.

Please note that this is rolling interval that is set to trigger after completion of a run, meaning that if you schedule a Janitor every 5s it will be 5s after a successful run rather than 5s after the last trigger fired to start a run.

## Lazy Expiration
## Lazily Expiring Keys

A cache record contains an internal modification time, as well as an associated expiration time. These values do not change unless explicitly modified by a cache call. This means that we have access to these values when fetching a key, which allows us to quickly check expirations on retrieval.

If a key is retrieved after the expiration has passed, the key will be removed at that time and return `nil` to the caller just as if the key did not exist in the cache. This provides guarantees of consistency even if the Janitor hasn't run recently; you can still never accidentally fetch an expired key. In turn this allows us to run the Janitor a little less frequently as we don't have to be scared of stale values.

There is a very minimal overhead to this lazy checking, and there are cases where you don't need to be as accurate. For these reasons you can easily disable this behaviour by seting the `:lazy` option to false at cache startup:

```elixir
import Cachex.Spec

Cachex.start(:my_cache, [
expiration: expiration(
lazy: false
)
])
```

Another advantage of disabling this checking is that the execution times of your read operations become more uniform; there's no longer a case where a deletion may make a read take a little longer. That being said, the overhead is so small that it's recommended to leave this enabled unless you absolutely know you don't need it.

Naturally this technique cannot stand alone as it only evicts on key retrieval; if you never touch a record again, it would never be expired and thus your cache would just keep growing. For this reason the Janitor is enabled by default when an expiration is set to protect the user from memory errors in their application. It should also be noted that this approach only applies to single key retrieval; it does not activate on batch reads (such as `Cachex.stream/3`).

## Providing Key Expirations

There are a number of ways to provide expirations for entries inside a cache:

* Setting a default expiration for a cahe via `Cachex.start_link/1`
* Setting an expiration manually via `Cachex.expire/4` or `Cachex.expire_at/4`
* Setting the `:expire` option within calls to `Cachex.put/4` or `Cachex.put_many/3`
* Setting the `:expire` option within return tuples in `Cachex.fetch/4` or `Cachex.get_and_update/4`

Each of these approches is handled the same way internally, they just provide sugar for various use cases. In general you should visit the appropriate functions for the documentation of how to use them, but here are some examples:

A record contains an internal touch time and TTL associated with them, and these values do not change unless explicitly triggered by a Cachex call. This means that we have access to these values when we pull back a key, allowing us to very easily check for key expiry on retrieval before returning it to the user. If we check this at retrieval time and the record is expired, we would actually fire off a deletion at that time before returning `nil` to the user.

The advantage here is that if your Janitor hasn't run recently or is disabled completely, you can still never retrieve an expired key. This in turn allows the Janitor to run less frequently as you don't have to be as worried about stale values potentially coming back in cache calls. Naturally this technique cannot stand on it's own legs as it only evicts on key retrieval. If you never touch a record again, it would never be expired and thus your cache would just keep growing. It is for this reason that the Janitor is enabled by default when a TTL is set to protect the user from memory errors in their application.
```elixir
import Cachex.Spec

There are certain situations when you don't care about the consistency of expirations, only that they expire at some point. For this reason you can disable lazy expiration as of `v0.10.0` in order to remove the (extremely minimal) overhead of checking expirations on read which can be valuable in a cache where reads are of extremely high volume. To disable you can set the `:lazy` option to be `false` at cache start. Another big advantage of disabling lazy expiration is that the execution time of any given read operation is more predictable due to avoiding the case where some reads may also need to evict a key.
# default for all entries
Cachex.start(:my_cache, [
expiration: expiration(
default: :timer.seconds(60)
)
])

## Key Expirations
# setting an expiration manually
Cachex.put(:my_cache, "key", "value")
Cachex.expire(:my_cache, "key", :timer.seconds(60))

There are a number of ways to set key expirations inside a cache. A cache can have a default expiration to apply to all keys provided at startup, via the `:expiration` option. If this option is set, all keys attached to the cache will have this automatically applied - regardless of how they are inserted into the cache; whether it be by cache warmer, lazy evaluation, or direct insertion.
# using the `Cachex.put/4` shorthand rather than setting manually
Cachex.put(:my_cache, "key", "value", expire: :timer.seconds(60))

If you need different expiration times across your keyspace, then the best approach is to use the `Cachex.expire/4` (or the closely related `Cachex.expire_at/4`) function for a key that has already been inserted into a cache. These functions allow you to change expirations for keys in the cache multiple times, in case that's also a concern in your application.
# setting expiration via lazily computed values
Cachex.fetch(:my_cache, "key", fn ->
{ :commit, "value", expire: :timer.seconds(60) }
end)
```

The final option available to you is the `:ttl` option when calling `Cachex.put/4` or `Cachex.put_many/3`. This is the equivalent of calling `Cachex.put/4` without `:ttl` and calling `Cachex.expire/4` afterwards, but doing so in a single atomic operation. As such it's _ever so slightly_ more performant than making each call separately. In general this option should be avoided, as it leads to the expectation that `:ttl` is available in other functions where it cannot be implemented technically. There's potential that this option is removed entirely in a future major version, and so using `Cachex.expire/4` is generally preferred for ongoing compatibility.
There is no strong recommendation as to which you use, most of it falls to developer preference. The overhead of setting expirations is quite minimal, so feel free to take your pick. If you want the absolute fastest, inlining the `:expire` option against `Cachex.put/4` will be your best option.

0 comments on commit 4e5164f

Please sign in to comment.