Skip to content

Latest commit

 

History

History
167 lines (118 loc) · 6.41 KB

caching.md

File metadata and controls

167 lines (118 loc) · 6.41 KB
parent
Operator

Caching

Table of Contents

Overview

Caching is a major factor in CM's performance and optimization of Reddit API usage. Leveraging caching effectively in your operator configuration and in individual subreddit configurations can make or break your CM instance.

What Is Cache?

A Cache is a storage medium with high write and read speed that is generally used to store temporary, but frequently accessed data.

How CM Uses Caching

CM primarily caches two types of data:

Reddit API Calls

How Reddit's API Works

In order to communicate with Reddit to retrieve posts, comments, user information, etc... CM uses API calls. Each API call is composed of a

  • request -- CM asks Reddit for certain information
  • response -- Reddit responds with the request information

Reddit imposes an api quota on every individual account using the API through an application. This quota is 600 requests per 10 minutes. At the end of the 10 minutes period the quota is reset.

Additionally, some API calls have limits on how much data they can return. The most relevant of these is user history can only be returned 100 activities (submission/comments) per API call. EX if you want to get 500 activities from a user's history you will need to make 5 api calls.

Caching API Responses

In order to most effectively use the limited quota of API calls CM will automatically cache API responses based on the "fingerprint" of the request sent.

On an individual "item" basis that means these resources are always cached:

  • General user information (name, karma, age, profile description, etc..)
  • General subreddit information (name, nsfw, quarantined, etc...)
  • Individually processed activities (comment body, is comment author op, submission title, reports, link, etc...)

Additionally (and most importantly), responses for user history are cached based on what was requested. Example "fingerprint":

  • username
  • type of activities to retrieve (overview, only submissions, only comments)
  • range of activities to retrieve (last 100, last 6 months, etc...)

If the above "fingerprint" is used in three different Rules then

  • First fingerprint appearance -> CM make API call and caches response
  • Second fingerprint appearance -> CM uses cached response
  • Third fingerprint appearance -> CM uses cached response

So only one API call is made even though the history is used three times.

It is therefore important to re-use window criteria wherever possible to take advantage of this caching.

Rules and Filters

Once CM has processed a Rule or Filter (itemIs or authorIs) the results of that component is stored in cache. For Rules the result is stored for the lifecycle of the Activity being processed and then discarded. For Filters the result is stored for a short time in cache and can be re-used by other Activities.

Re-using Rules and Filters by either using the exact same configuration or by using names provides:

  • A major performance benefit since these do not need to be re-calculated
  • A low-to-medium optimization on API caching, depending on what criteria are being tested.

In general re-use should always be a goal.

Configuration

Cache Provider

CM supports two cache providers. By default all providers use memory:

  • memory -- in-memory (non-persistent) backend
    • Cache will be lost when CM is restarted/exits
  • redis -- Redis backend

Each provider object in configuration can be specified as:

  • one of the above strings to use the defaults settings or
  • an object with keys to override default settings

Refer to full documentation on cache providers in the schema

Some examples:

{
 "provider": {
   "store": "memory", // one of "memory" or "redis"
   "ttl": 60, // the default max age of a key in seconds
   "max": 500, // the maximum number of keys in the cache (for "memory" only)
   
   // the below properties only apply to 'redis' provider
   "host": 'localhost',
   "port": 6379,
   "auth_pass": null,
   "db": 0,
 }
}

YAML

provider:
  store: redis
  ttl: 60
  max: 500
  host: localhost
  port: 6379
  auth_pass: null
  db: 0

Providers can be specified in multiple locations, with each more specific location overriding the parent-level config:

  • top-level config
  • in individual bot configurations
  • in the web config
operator:
 name: example
# top level
caching:
 provider:
  ...
bots:
  - name: u/MyBot
    # overrides top level
    caching:
     provider:
      ...
web:
  # overrides top level
  caching:
    provider:
     ...

Cache TTL

The Time To Live (TTL) -- how long data may live in the cache before "expiring" -- can be controlled indepedently for different data types. Sane defaults are provided for all types but tweaking these can improve API caching optimization depending on the subreddit's configuration (use case).

Each of these can be specified in the caching property. TTL is measured in seconds.

  • authorTTL (default 60) -- user activity (overview, submissions, comments)
  • commentTTL (default 60) -- individually fetched comments
  • submissionTTL (default 60) -- individually fetched submissions
  • filterCriteriaTTL (default 60) -- filter results (itemIs and authorIs)
  • selfTTL (default 60) -- actions performed by the bot (creating comment reply, report). If action is in cache the bot ignores it if found during polling.
    • This helps prevent bot from reacting to things it did itself IE you don't want it to remove a comment because it reported the comment itself
  • subredditTTL (default 60) -- general information on fetched subreddit
  • userNotesTTL (default 300) -- Amount of time Toolbox User Notes are cached
  • wikiTTL (default 300) -- Wiki pages used for content (in messages, reports, bans, etc...) are cached for this amount of time