-
Notifications
You must be signed in to change notification settings - Fork 393
Local Shortcut Variables: When to Use a Value Copy and When to Use a Reference
EnergyPlus has a large and deep state
data structure and it is often convenient to use local variables to create shortcuts into parts of that data structure. When should these shortcuts be values vs. references (or pointers, a reference a pointer are the same exact thing just with different syntax). Right now it appears that many existing shortcut variables that were created during the transition to the state
structure are references and the rationale behind this is "Why create a local copy if you don't have to? The compiler will create a local copy if it needs one". This explanation is true in a high-level sense, you certainly don't want to unnecessarily copy data and yes the compiler can create local copies of data and optimize around them. However, it misses three important nuances:
- When you create/write a reference/pointer, you are also copying something locally, you are just copying a different thing, the address vs. the value. Subsequently reading/using a value through a reference/pointer is more expensive than reading/using a value copy if the compiler has not optimized the access.
- References/pointers disrupt the compiler's ability to optimize.
- The difference between scalar and non-scalar variables. As always, the nuances are often more important than the general high-level rule and in this case they certainly are.
As with most things pertaining to programming and why idiom X is better than idiom Y, it helps to understand something about how X and Y translate to machine code and how the processor will execute that machine code.
The most important thing to understand here is the difference between the two types of storage the processor deals with: register and memory. [Ed: Before you say "what about disk/network/random-IO-device?" you should know that the processor has no idea that these things exist--these are BIOS/operating system constructs that to the processor look like memory.]
Registers are the fastest kind of memory. In modern processors--pretty much every processor built since the early 1980s--the computation path is layed out in such a way that reading and writing registers essentially has a cost of zero. Part and parcel of this cost is that registers are also the only type of storage on which the computation path can operate directly. The processor can add two register values and store the result in a third register. It can read the value of a register and decide whether to branch or not. The processor cannot directly add a memory value to a register value and store the result in a register. To achieve that effect, it has to perform two steps: i) read the memory value into a register (incidentally, to do this the address of the value already has to be in a register), ii) do a register-register add. Depending on the processor, the cost of reading something from memory into a register is something between 1 and 4--if the memory location happens to be in the on-chip cache, which it will be the majority of the time--and of course there is also the cost of executing the additional instruction.
Of course, the number of registers is limited. The x86_64 architecture has 16 64-bit general purpose registers [Ed: We are going to ignore SSE registers for now], meaning that at any point in time the compiler only has 16 values on which it can tell the processor to operate directly. If it wants more values, it has to shuttle values back and forth between the registers and memory. Meanwhile, the amount of memory available to the compiler is essentially unlimited, 2^64 bytes. [Ed: Of course, the computer doesn't actually have this much memory, but the operating system implements what is called "virtual memory" which makes it look like it does.] [Ed2: Incidentally, this is the meaning of "64-bit architecture", i.e., memory addresses are 64-bits, meaning that the compiler thinks that there are 2^64 bytes worth of memory and that registers are 64-bit wide so that they can hold addresses.]
Now that we know this about registers and memory, we can think about what value and reference variables look like to the processor. Let's look at this code, a simplified version of UpdateElectricBaseboard
:
void UpdateElectricBaseboard(EnergyPlusData &state, int baseboardNum)
{
auto &thisBaseboard = state.dataElectBaseboardRad->ElecBaseboard(baseboardNum);
Real64 TimeStepSys = state.dataHVACGlobal->TimeStepSys;
thisBaseboard.Energy = thisBaseboard.Power * TimeStepSys * DataHVACGlobals::SecInHour;
thisBaseboard.ConvEnergy = thisBaseboard.ConvPower * TimeStepSys * DataHVACGlobals::SecInHour;
thisBaseboard.RadEnergy = thisBaseboard.RadPower * TimeStepSys * DataHVACGlobals::SecInHour;
}
state
and baseboardNum
are parameters to the function and so the address of the state
structure will be in register R1
and the value of baseboardNum
will be in regiser R2
when the function body is invoked. In this example, the local shortcut thisBaseboard
is created as a reference, and it should be clear why. Two bad things would happen if it were created as a copy. First, a struct
is too big to fit into a register so a copy of it would be made on the stack, i.e., function local memory--this includes copies of all members that are not needed in this function. Worse, it's not like this local copy will save having to load struct
members that are used in this function into registers when they are needed, they would just be loaded from a different part of memory, the stack vs. the "heap" where state
resides. Second and more importantly, the struct
values will only be updated locally not in state
as they should be. However, also in this example the shortcut TimeStepSys
is created as a value copy. Here is what this code will translate into (pseudo-assembly, not x86_64, but close enough).
// auto &thisBaseboard = state.dataElectBaseboardRad->ElecBaseboard(baseboardNum);
LOAD R1, 208 -> R3 // Load state.dataElectBaseboardRad into R3. Reuse R3 since don't need to access dataHVACGlobal again
LOAD R3, 0 -> R3 // Load R3->ElecBaseboard into R3. Reuse R3 again
MULT R2, 40 -> R4 // Multiply numBaseboard by size of ElecBaseboard object to get offset in array
ADD R3, R4 -> R3 // By adding starting address of array (R3) to offset (R4), we get the address/reference to thisBaseboard into R3
// Real64 TimeStepSys = state.dataHVACGlobal->TimeStepSys;
LOAD R1, 200 -> R4 // Load state.dataHVACGlobal into R4, member dataHVACGlobal is at offset 200 in struct state
LOAD R4, 8 -> R4 // Load R3->TimeStepSys into R4, member TimeStepSys is at offset 8 in struct HVACGlobal. Reuse R4 since we don't need dataHVACGlobal for anything else
// thisBaseboard.Energy = thisBaseboard.Power * TimeStepSys * DataHVACGlobals::SecInHour;
LOAD R3, 80 -> R5 // Load R3.Power into R5
MULT R5, R4 -> R5 // Multiply by TimeStepSys
MULT R5, 3600 -> R5 // Multiply by SecInHour, this is a compile time constant so the compiler inserts it into the instruction
STORE R5 -> R3, 88 // Store R5 into R3.Energy
// thisBaseboard.ConvEnergy = thisBaseboard.ConvPower * TimeStepSys * DataHVACGlobals::SecInHour;
LOAD R3, 96 -> R5 // Load R3.ConvPower into R5
MULT R5, R4 -> R5 // Multiply by TimeStepSys
MULT R5, 3600 -> R5 // Multiply by SecInHour instruction
STORE R5 -> R3, 104 // Store R5 into R3.ConvEnergy
// thisBaseboard.RadEnergy = thisBaseboard.RadPower * TimeStepSys * DataHVACGlobals::SecInHour;
LOAD R3, 96 -> R5 // Load R3.RadPower into R5
MULT R5, R4 -> R5 // Multiply by TimeStepSys
MULT R5, 3600 -> R5 // Multiply by SecInHour, this is a compile time constant so the compiler inserts it into the instruction
STORE R5 -> R3, 104 // Store R5 into R3.RadEnergy
Now, let's look at a version of this function with the shortcut TimeStepSys
declared as a reference.
void UpdateElectricBaseboard(EnergyPlusData &state, int baseboardNum)
{
auto &thisBaseboard = state.dataElectBaseboardRad->ElecBaseboard(baseboardNum);
Real64 &TimeStepSys = state.dataHVACGlobal->TimeStepSys; // Reference instead of value
thisBaseboard.Energy = thisBaseboard.Power * TimeStepSys * DataHVACGlobals::SecInHour;
thisBaseboard.ConvEnergy = thisBaseboard.ConvPower * TimeStepSys * DataHVACGlobals::SecInHour;
thisBaseboard.RadEnergy = thisBaseboard.RadPower * TimeStepSys * DataHVACGlobals::SecInHour;
}
And let's look at the generated code, this time I will annotate only the lines that are different from the first example.
// auto &thisBaseboard = state.dataElectBaseboardRad->ElecBaseboard(baseboardNum);
LOAD R1, 208 -> R3
LOAD R3, 0 -> R3
MULT R2, 40 -> R4
ADD R3, R4 -> R3
// Real64 TimeStepSys = state.dataHVACGlobal->TimeStepSys;
LOAD R1, 200 -> R4
ADD R4, 8 -> R4 // Member TimeStepSys is at offset 8 in dataHVACGlobal, by adding it to address of dataHVACGlobal we get address of dataHVACGlobal->TimeStepSys into R4
// thisBaseboard.Energy = thisBaseboard.Power * TimeStepSys * DataHVACGlobals::SecInHour;
LOAD R3, 80 -> R5
LOAD R4, 0 -> R6 // Load TimeStepSys into R6
MULT R5, R6 -> R5 // Multiply by TimeStepSys
MULT R5, 3600 -> R5
STORE R5 -> R3, 88
// thisBaseboard.ConvEnergy = thisBaseboard.ConvPower * TimeStepSys * DataHVACGlobals::SecInHour;
LOAD R3, 96 -> R5
LOAD R4, 0 -> R6 // Load TimeStepSys into R6
MULT R5, R6 -> R5 // Multiply by TimeStepSys
MULT R5, 3600 -> R5
STORE R5 -> R3, 104
// thisBaseboard.RadEnergy = thisBaseboard.RadPower * TimeStepSys * DataHVACGlobals::SecInHour;
LOAD R3, 96 -> R5
LOAD R4, 0 -> R6 // Load TimeStepSys into R6
MULT R5, R6 -> R5 // Multiply by TimeStepSys
MULT R5, 3600 -> R5
STORE R5 -> R3, 104
What happened here? Well, when TimeStepSys
was declared as a value it was loaded from memory into a register once, and then used "for free" three times. When it was declared as a reference, its address was loaded (actually added, which is slightly cheaper than a load) into a register once, but then its value was loaded three times. Why could the compiler not have just loaded its value once and then reused it? Well, maybe it could have, but to do so it would have had to figure out that the value did not change between uses and the more pointers/references you use the more difficult that becomes, because the value could have been over-written between accesses through any of the other pointers/references in the function. This may be counter-intuitive, but making something into a pointer/reference does not give the compiler "more options" to optimize. To the contrary, it makes it more difficult for the compiler to optimize. The compiler does not want you to give it options, options are ambiguous. The compiler wants you to tell it as specifically and unambiguously as possible what you are trying to do. This is the second nuance we talked about at the beginning.
And we've already seen the first and third nuances. The third nuance is the difference between scalars and non-scalars. Scalars can be held in registers, non-scalars cannot be held in registers and so they are copied onto the stack. This doesn't actually save anything at read/use time, because rather than loading members from heap memory, you are loading them from stack memory. All you've achieved is introducing a memory copy from heap to stack.
The first nuance has to do with addresses. Addresses are also scalars and so saving an address locally into a register is not really any cheaper than saving the value, but then every time you need the value you have to do an additional load.
So what are the rules for local shortcut variables?
- A local shortcut to a scalar should be a value copy unless you need to write into the scalar, in which case don't use a shortcut at all because it's confusing!
- A local shortcut to a non-scalar, e.g.,
struct
should be a reference to avoid making a memory copy of thestruct
. If you need to write into a member of thestruct
, it should be a plain reference, otherwise it should be aconst
reference.
Notice, these are basically the same rules as we use for passing parameters.