-
Notifications
You must be signed in to change notification settings - Fork 393
Caching
So ... what is up with this “caching” business? You know, the thing we do for some psychrometric functions. What does caching do? How is it supposed to work? The idea behind caching is to reuse computations rather than recomputing. Caching works well for computations that:
- Are functions, i.e., the outputs depend only on the inputs and there are no “destructive” side-effects.
- Do enough work such that looking up a precomputed value in a table is significantly faster than re-computing.
- Have relatively few inputs, the more inputs, the more difficult it is to organize a table to hold computed results.
The psychrometric functions are good examples of this, so we will use them as “mules”. One of the simplest mules is PsyTsatFnPb
, which has a single input, pressure:
Real64 PsyTsatFnPb(Real64 Pb);
A cached version of PsyTsatFnPb
uses of an array of structs:
struct Tsat_cache_t { int Pb_tag; Real 64 Tsat; }
int constexpr Tsat_cache_size = 1024 * 1024;
std::array<struct Tsat_cache_t, Tsat_cache_size> Tsat_cache;
And the logic of the function is:
- Search the array for an entry in which
Pb_Tag
matches thePb
input. - If found return
Tsat
from same entry. - If not found, create new entry where
Pb_Tag
isPb
andTsat
is computed using the existingPsyTsatFnPb
function.
Simple, right? Well, the key obvious is making the array search very efficient. This:
for (int i = 0; i < tsat_cache_size; ++i) {
if (Tsat_cache[i].Pb_tag == Pb) {
return Tsat_cache[i].Tsat;
}
}
is just not going to cut it. Not only is it potentially much slower than just computing PsyTsatFnPb
, but where do we put the
new entry if we don’t find it already in there? The key to effective caching is the “index function”, a function that (usually)
maps the inputs to the index in the cache array where the entry is found. The actual logic is much more like this.
int index = index_function(Pb);
if (Tsat_cache[index].Pb_tag != Pb) {
Tsat_cache[index].Pb_tag = Pb;
Tsat_cache[index].Tsat = PsyTsatFnPb(Pb);
}
return Tsat_cache[index].Tsat;
So, what is index_function
in the above example? How do you map a Real64
input to an integer index?
Before we do that, we should point out that one aspect of the index_function
is mapping integers that are larger than Tsat_cache_size
to an index that is between 0
and Tsat_cache_size - 1
. This can be done using the modulus operator %
, index % Tsat_cache_size
. Integer division is a relatively slow operation (it takes 24 cycles on most x86_64 processors), so a common trick is to make Tsat_cache_size a power of two, because the modulus operator “strength reduces” to index & (Tsat_cache_size - 1)
. File this under "fun party tricks with binary arithmetic". So:
int index = index_function(Pb) & (Tsat_cache_size - 1);
So what is index_function
really? Well, as we just saw, an aspect of high-performance programming that is sometimes important is understanding the bit representation of different values and knowing tricks for manipulating those bit representations. The goal of index_function
is to spread the common entries around in the cache so that we can have as many of them in the cache as possible without too many “collisions” or “conflicts”, i.e., two Pb values that happen to map to the same index. That means you want to exploit the input bits that vary a lot. If Pb
were a uniformly distributed integer, the bits that would vary most often would be the least significant ones, i.e., the bits corresponding to 1, 2, 4, 8, 16, etc. In this case, the index_function
of Pb
would just be Pb
. Applying modulus (or &
) to Pb
would naturally select these bits. But Pb
is not an int
, it’s a double
, i.e., Real64
. Where are the action bits in Real64
?
The bit representation of real numbers is defined by standard IEEE 754. It’s basically scientific notation, modulo the whole binary thing and some representational weirdness for exponents, but for 64 bit reals, the representation is as follows:
- Most significant bit is the sign.
- The next 11 bits are the exponent.
- The 52 least significant bits are the mantissa or significand, where the less significant bits represent more precision.
FWIW, 3 bits of binary precision is slightly less than 1 digit of decimal precision. So 52 significant bits is about 16 digits of decimal precision. 11 exponent bits gives you a range of E-308 to E+308. Also FWIW, 32-bit reals have 8 bits of exponent (range of E-38 to E+38) and 23 bits of mantissa (7 decimal digits). One wonders why EnergyPlus use 64-bit reals everywhere.
Whereas for integers, all the action is in the least significant bits, for reals all the action is in the most significant bits. Because the ultimate index is an integer which essentially uses the least significant bits we have to “right shift” the bits of the real that we want into that least significant position. Which bits we want depends on the “precision” of the caching that we are willing to tolerate.
This is a precision-speed tradeoff. Let’s say we agree that if two Pb
values are equal up to the X
significant
digit, we will consider them the same regardless of differences in lower significance digits. The smaller X
is the less
precision we will have because more pairs of Pb
will count as being the same. However, the more performance we will
have because the greater the likelihood that the function would have already been computed for a Pb
that is “close
enough”. Let’s set the number of precision digits to 5 which means the number of precision bits is 16 (or so). This means
we have to shift to the right by 52 - 16. Since you can’t really shift a Real64
, you have to tell the compiler that you want to
treat those bits as an int
. You can’t do this using ordinary casting, because ordinary casting will round the Real64
to
the nearest int
which is not what you want. You need to tell the compiler to treat the memory address as an address
of an int
and this is done using a reinterpret_cast
.
int Pb_tag = *reinterpret_cast<int*>(&Pb);
Pb_tag >>= (52 - 16);
int index = Pb_tag & (Tsat_cache_size - 1);
if (Tsat_cache[index].Pb_tag != Pb_tag) { // important to use Pb_tag and not Pb here
Tsat_cache[index].Pb_tag = Pb_tag; // same here
Tsat_cache[index].Tsat = PsyTsatFnPb(Pb);
}
return Tsat_cache[index].Tsat;
reinterpret_cast
is an inherently "type unsafe" feature--it allows you to treat a type as another type. It is also sometimes a necessary feature, which is why C++ exists. Converting a Real64
to an int
for cache indexing--or other hash-function--purposes is one of the legitimate uses of reinterpret_cast
. However, the compiler does not have AI and cannot tell what you are trying to do at a high level so it will warn you that you are doing something that is potentially unsafe. There are compiler specific pragma's for locally disabling warnings and EnergyPlus uses pre-processor macros to define these portably across compilers. Here's how to use them.
DISABLE_WARNING_PUSH // Begin a new disable-warning context
DISABLE_WARNING_STRICT_ALIASING // Disable the strict-aliasing warning
int Pb_tag = *reinterpret_cast<int*>(&Pb); // Do the type-unsafe thing
DISABLE_WARNING_POP // End the disable-warning context, warning now re-enabled
That’s the basics. Things get more difficult when you need to cache functions with multiple inputs. For instance, the
function PsyTwbFnTdbWPb
has three inputs: Tdb
, W
, and Pb
. The code for that looks like:
struct Twb_cache_t { int Tdb_tag, W_tag, Pb_tag; Real64 Twb; }
constexpr int Twb_cache_size = 1024 * 1024;
std::array<struct Twb_cache_t, Twb_cache_size> Twb_cache;
int Tdb_tag = *reinterpret_cast<int*>(&Tdb) >> (52 - Twb_precision_bits);
int W_tag = *reinterpret_cast<int*>(&Tdb) >> (52 - Twb_precision_bits);
int Pb_tag = *reinterpret_cast<int*>(&Tdb) >> (52 - Pb_precision_bits);
int index = ( Tdb_tag ^ W_tag ^ Pb_tag ) & (Twb_cache_size - 1); // ^ is XOR
if (Twb_cache[index].Tdb_tag != Tdb_tag ||
Twb_cache[index].W_tag != W_tag ||
Twb_cache[index].Pb_tag != Pb_tag )
{
Twb_cache[index].Tdb_tag = Tdb_tag;
Twb_cache[index].W_tag = W_tag;
Twb_cache[index].Pb_tag = Pb_tag;
Twb_cache[index].Twb = PsyTwbFnTdbWPb(Tdb, W, Pb);
}
return Twb_cache[index].Twb;
The example we used here is dynamic caching of small, single valued functions within a single run. Because in this case, the uncached functions are (relatively) fast, this type of caching relies on a highly optimized indexing function that leverages individual bits from the function inputs.
Caching is somewhat more general than this. One generalization is caching of multi-value results from larger functions (e.g., CTF coefficients for constructions). Another generalization is caching results across runs. These results can be model-independent (e.g., CTF coefficients, psychrometrics) or model dependent (e.g., shading fraction schedules). This can be combined with dynamic caching by pre loading the contents of a dynamic cache from a file (e.g., psychrometrics). Depending on the runtime of the uncached version of the function, a highly optimized cache organization, indexing function, and tagging scheme may not be necessary. For some types of results (e.g., CTF coefficients), a tag consisting of a string and an indexing function that searches the cache serially and matches the string tag may be sufficient.