Vandad Nahavandipoor
http://www.oreilly.com/pub/au/4596
Email: [email protected]
Blog: http://vandadnp.wordpress.com
Skype: vandad.np
This is the second article in the Swift Runtime series of the Swift Weekly. In this article, we will dig deeper into the Swift Runtime and how the compiler deals with producing code for enumerations. Saturday morning writings are always fun! Let's get this show started.
If you are an Objective-C or Swift programmer and have not done any Assembly programming or are simply not concerned with the low-level details of this article, jump right into the Conclusion section at the end to get the juice of this article.
Here is a simple enumeration that I've written:
enum CarType: Int{
case CarTypeSaloon = 0xabcdefa
case CarTypeHatchback
}
The value of the Saloon car-type is 0xabcdefa
and the next item which is the hatchback is naturally equal to the previous item + 1 in the case of integer enum items. So let's then start using it in the code:
func example1(){
let saloon = CarType.CarTypeSaloon
println(saloon.rawValue)
let hatchback = CarType.CarTypeHatchback
println(hatchback.rawValue)
}
I'm now going to show you the code that the compiler produces and then we are going to have a look at an explanation of how the code was produced:
push rbp
mov rbp, rsp
sub rsp, 0x20
mov edi, 0x0 ; argument #1 for method __TFO12swift_weekly7CarTypeg8rawValueSi
mov byte [ss:rbp+var_8], 0x0
call __TFO12swift_weekly7CarTypeg8rawValueSi
mov rcx, qword [ds:imp___got___TMdSi] ; imp___got___TMdSi
add rcx, 0x8
lea rdx, qword [ss:rbp+var_10]
mov qword [ss:rbp+var_10], rax
mov rdi, rdx
mov rsi, rcx
call imp___stubs___TFSs7printlnU__FQ_T_
mov edi, 0x1 ; argument #1 for method __TFO12swift_weekly7CarTypeg8rawValueSi
mov byte [ss:rbp+var_18], 0x1
call __TFO12swift_weekly7CarTypeg8rawValueSi
mov rcx, qword [ds:imp___got___TMdSi] ; imp___got___TMdSi
add rcx, 0x8
lea rdx, qword [ss:rbp+var_20]
mov qword [ss:rbp+var_20], rax
mov rdi, rdx
mov rsi, rcx
call imp___stubs___TFSs7printlnU__FQ_T_
add rsp, 0x20
pop rbp
ret
Holy Jesus that's a lot of code. But it's really simple to understand. This is what's happening:
-
The stack is being set up
-
Then we are seeing this code
mov edi, 0x0
and right after that, we see a call to this function__TFO12swift_weekly7CarTypeg8rawValueSi
. What the hell just happened? Before I explain that, have a look a few lines down and you will see that we are doing the samemov byte
insruction but with the value of0x01
as opposed to0x00
and then calling the same function__TFO12swift_weekly7CarTypeg8rawValueSi
, all the while bearing in mind that in the first go, we are putting the first item of ourCarType
enumeration into a value (first item,0x00
, get it?) and in the second time around, we get the value of the second item (second item,0x01
, get it again? okay!). So let's have a look at the implementation of the__TFO12swift_weekly7CarTypeg8rawValueSi
function:0x0000000100002dc0 55 push rbp 0x0000000100002dc1 4889E5 mov rbp, rsp 0x0000000100002dc4 4088F8 mov al, dil 0x0000000100002dc7 88C1 mov cl, al 0x0000000100002dc9 80E101 and cl, 0x1 0x0000000100002dcc 884DF8 mov byte [ss:rbp+var_8], cl 0x0000000100002dcf 84C9 test cl, cl 0x0000000100002dd1 8845F7 mov byte [ss:rbp+var_9], al 0x0000000100002dd4 751D jne 0x100002df3 0x0000000100002dd6 EB00 jmp 0x100002dd8 0x0000000100002dd8 8A45F7 mov al, byte [ss:rbp+var_9] ; XREF=__TFO12swift_weekly7CarTypeg8rawValueSi+22 0x0000000100002ddb A801 test al, 0x1 0x0000000100002ddd 7402 je 0x100002de1 0x0000000100002ddf EB00 jmp 0x100002de1 0x0000000100002de1 EB00 jmp 0x100002de3 ; XREF=__TFO12swift_weekly7CarTypeg8rawValueSi+29, __TFO12swift_weekly7CarTypeg8rawValueSi+31 0x0000000100002de3 48B8FADEBC0A00000000 mov rax, 0xabcdefa ; XREF=__TFO12swift_weekly7CarTypeg8rawValueSi+33 0x0000000100002ded 488945E8 mov qword [ss:rbp+var_18], rax 0x0000000100002df1 EB12 jmp 0x100002e05 0x0000000100002df3 EB00 jmp 0x100002df5 ; XREF=__TFO12swift_weekly7CarTypeg8rawValueSi+20 0x0000000100002df5 48B8FBDEBC0A00000000 mov rax, 0xabcdefb ; XREF=__TFO12swift_weekly7CarTypeg8rawValueSi+51 0x0000000100002dff 488945E8 mov qword [ss:rbp+var_18], rax 0x0000000100002e03 EB00 jmp 0x100002e05 0x0000000100002e05 488B45E8 mov rax, qword [ss:rbp+var_18] ; XREF=__TFO12swift_weekly7CarTypeg8rawValueSi+49, __TFO12swift_weekly7CarTypeg8rawValueSi+67 0x0000000100002e09 5D pop rbp 0x0000000100002e0a C3 ret
Now to fully understand this code, I have left the addresses for the code instructions on the left hand side margin as there are a lot of conditional and unconditional jumps
jmp
instructions in this code. Without the addresses, we won't be able to fully understand what is happening. In this code, pay particular attention to themov rax, 0xabcdefa
and then themov rax, 0xabcdefb
instructions. Wait a minute! These are our Saloon and Hatchback car types that we had created an enumeration from previously. Their values are right here in this method.When we came into this function, the
edi
register is the index of the item in ourCarType
enumeration whose value we want to get. Knowing that, let's have a look at the code. When this function is executed, we movedil
into theal
8 bit register.dil
is the lower 8 bits of theedi
register. Remember? So we read that, then check if it is 0, and then if not, we first jump to0x100002df3
into the code segment which is another jump instruction which itself jumps to0x100002df5
. That itself arrives to themov rax, 0xabcdefb
instruction which puts the value ofCarTypeHatchback
into therax
register. So this function really is translating ourCarType
enumeration into raw values, as we kind of guessed. After the raw value of the enumeration is retrieved, the value is placed inside therax
register and passed back to the original caller. -
After we get the raw value of the enumeration, we call the
println
function and so on... the rest is really easy.
Now let's look at an enumeration whose items are of type String
:
enum MaleNames: String{
case Vandad = "Vandad"
case Kim = "Kim"
}
enum FemaleNames: String{
case Sara = "Sara"
case Kim = "Kim"
}
Note that I've duplicated the value of Kim
in both enumerations to see later if the compiler is able to handle string interning
And we will just use it like so:
func example2(){
println(MaleNames.Vandad)
println(MaleNames.Kim)
println(FemaleNames.Sara)
println(FemaleNames.Kim)
}
First, let's see how Swift's compiler stores the values of our enumeration in the binary:
0x0000000100005120 db "Vandad", 0 ; XREF=__TFO12swift_weekly9MaleNamesCfMS0_FT8rawValueSS_GSqS0__+11, __TFO12swift_weekly9MaleNamesg8rawValueSS+39
0x0000000100005127 db "Kim", 0 ; XREF=__TFO12swift_weekly9MaleNamesCfMS0_FT8rawValueSS_GSqS0__+508, __TFO12swift_weekly9MaleNamesg8rawValueSS+82, __TFO12swift_weekly11FemaleNamesCfMS0_FT8rawValueSS_GSqS0__+508, __TFO12swift_weekly11FemaleNamesg8rawValueSS+82
0x000000010000512b db "Sara", 0 ; XREF=__TFO12swift_weekly11FemaleNamesCfMS0_FT8rawValueSS_GSqS0__+11, __TFO12swift_weekly11FemaleNamesg8rawValueSS+39
Great, so the string enumeration values are stored in the data segment. Look at the string Kim
though. Even though this string appears in both the MaleNames
and Female
names enumerations, it only appears once in the data segment. That's good. That's called string interning.
And also let's have a look at the code that the compiler generated for the example2()
Swift code:
push rbp
mov rbp, rsp
sub rsp, 0x20
lea rax, qword [ds:__TMdO12swift_weekly9MaleNames] ; __TMdO12swift_weekly9MaleNames
add rax, 0x8
lea rcx, qword [ss:rbp+0xfffffffffffffff8]
mov byte [ss:rbp+0xfffffffffffffff8], 0x0
mov rdi, rcx
mov rsi, rax
call imp___stubs___TFSs7printlnU__FQ_T_
lea rax, qword [ds:__TMdO12swift_weekly9MaleNames] ; __TMdO12swift_weekly9MaleNames
add rax, 0x8
lea rcx, qword [ss:rbp+0xfffffffffffffff0]
mov byte [ss:rbp+0xfffffffffffffff0], 0x1
mov rdi, rcx
mov rsi, rax
call imp___stubs___TFSs7printlnU__FQ_T_
lea rax, qword [ds:__TMdO12swift_weekly11FemaleNames] ; __TMdO12swift_weekly11FemaleNames
add rax, 0x8
lea rcx, qword [ss:rbp+0xffffffffffffffe8]
mov byte [ss:rbp+0xffffffffffffffe8], 0x0
mov rdi, rcx
mov rsi, rax
call imp___stubs___TFSs7printlnU__FQ_T_
lea rax, qword [ds:__TMdO12swift_weekly11FemaleNames] ; __TMdO12swift_weekly11FemaleNames
add rax, 0x8
lea rcx, qword [ss:rbp+0xffffffffffffffe0]
mov byte [ss:rbp+0xffffffffffffffe0], 0x1
mov rdi, rcx
mov rsi, rax
call imp___stubs___TFSs7printlnU__FQ_T_
add rsp, 0x20
pop rbp
ret
Okay, quite a bit of code. No surprise there. Let's break it down:
-
The
mov byte [ss:rbp+0xfffffffffffffff8], 0x0
puts the index of0x00
into the top of the stack. This is the index of the enum value we are trying to print, into theMaleNames
enum. Index of 0 is the string value of"Vandad"
. Okay. So that index is now in the stack. -
The
lea rax, qword [ds:__TMdO12swift_weekly9MaleNames]
code loads the address of theMaleNames
structure into therax
64-bit register. Think ofrax
as now a pointer to theMaleNames
enumeration. -
The
add rax, 0x8
code will then moverax
8 bytes forward in the memory. I have to say that I do not know why this is happening or why the compiler is doing this but I will try to find out now. I had a look at the memory address to whichrax
points to before this instruction and that data is obviously in the data segment like so:0x0000000100006480 dq 0x0000000100006380 ; XREF=__TFV12swift_weekly7Example8example2fS0_FT_T_+8, __TFV12swift_weekly7Example8example2fS0_FT_T_+40 0x0000000100006488 db 0x02 ; '.' 0x0000000100006489 db 0x00 ; '.' 0x000000010000648a db 0x00 ; '.' 0x000000010000648b db 0x00 ; '.' 0x000000010000648c db 0x00 ; '.' 0x000000010000648d db 0x00 ; '.' 0x000000010000648e db 0x00 ; '.' 0x000000010000648f db 0x00 ; '.' 0x0000000100006490 db 0x40 ; '@' 0x0000000100006491 db 0x64 ; 'd'
The
dq
is a pseudo-instruction that outputs a 64-bit or 8-bytes long value into the segment in which it is placed, in this case, the data segment. So ifrax
was pointing right into this memory address (remember?[ds:__TMdO12swift_weekly9MaleNames]
), then adding0x08
torax
will moverax
to the byte right after this quad-double value, or the value at0x0000000100006488
into the data-segment which itself is adb
, and contains only 1 byte long and contains the byte value of0x02
. But why? If you have any idea why this is, please submit a pull request so that everybody will get to know. -
Following the System V calling convention, the
rdi
register will be set to the index of the value into theMaleNames
enum which we are printing andrsi
will point to the tip of theMaleNames
enumeration. That is all theprintln()
function needs in order to be able to print the value that it needs to. Since this value is a string, we just pass its address in the data segment into theprintln()
function and that function will know what to do with it. -
Now look at the rest of the original code for the second item in the
MaleNames
enum.lea rcx, qword [ss:rbp+0xfffffffffffffff0] mov byte [ss:rbp+0xfffffffffffffff0], 0x1
You will notice that the index of the second item into the enumeration is placed on top of the stack and the rest is similar to before. The address of the top of the
MaleNames
enum is placed inside therax
register and then the mysteriousadd
instruction is called. Again, if you know what this does, please send a pull request and correct this article. -
If now you look at the rest of the original code in this section, you will realize that the
FemaleNames
code also works in the exact same way as theMaleNames
enum.
Let's have a look at our original enum again:
enum CarType: Int{
case CarTypeSaloon = 0xabcdefa
case CarTypeHatchback
}
And then write the following code:
func carType() -> CarType{
return .CarTypeHatchback
}
func example3(){
let type = carType()
switch type{
case .CarTypeSaloon:
println(0xaaaaaaaa)
case .CarTypeHatchback:
println(0xbbbbbbbb)
default:
println(0xcccccccc)
}
}
And let's have a look at the output. I'm going to keep the address of the code segments intact without removing them because there are a lot of jmp
and jne
unconditional and conditional jumps in this assembly code:
0x00000001000039d0 55 push rbp
0x00000001000039d1 4889E5 mov rbp, rsp
0x00000001000039d4 4883EC20 sub rsp, 0x20
0x00000001000039d8 E8E3FFFFFF call __TFV12swift_weekly7Example7carTypefS0_FT_OS_7CarType
0x00000001000039dd 88C1 mov cl, al
0x00000001000039df 80E101 and cl, 0x1
0x00000001000039e2 884DF8 mov byte [ss:rbp+var_8], cl
0x00000001000039e5 84C9 test cl, cl
0x00000001000039e7 8845E7 mov byte [ss:rbp+var_19], al
0x00000001000039ea 7539 jne 0x100003a25
0x00000001000039ec EB00 jmp 0x1000039ee
0x00000001000039ee 8A45E7 mov al, byte [ss:rbp+var_19] ; XREF=__TFV12swift_weekly7Example8example3fS0_FT_T_+28
0x00000001000039f1 A801 test al, 0x1
0x00000001000039f3 7402 je 0x1000039f7
0x00000001000039f5 EB00 jmp 0x1000039f7
0x00000001000039f7 EB00 jmp 0x1000039f9 ; XREF=__TFV12swift_weekly7Example8example3fS0_FT_T_+35, __TFV12swift_weekly7Example8example3fS0_FT_T_+37
0x00000001000039f9 488B0540260000 mov rax, qword [ds:imp___got___TMdSi] ; imp___got___TMdSi, XREF=__TFV12swift_weekly7Example8example3fS0_FT_T_+39
0x0000000100003a00 480508000000 add rax, 0x8
0x0000000100003a06 488D4DE8 lea rcx, qword [ss:rbp+var_18]
0x0000000100003a0a 48BAAAAAAAAA00000000 mov rdx, 0xaaaaaaaa
0x0000000100003a14 488955E8 mov qword [ss:rbp+var_18], rdx
0x0000000100003a18 4889CF mov rdi, rcx
0x0000000100003a1b 4889C6 mov rsi, rax
0x0000000100003a1e E84D030000 call imp___stubs___TFSs7printlnU__FQ_T_
0x0000000100003a23 EB2C jmp 0x100003a51
0x0000000100003a25 EB00 jmp 0x100003a27 ; XREF=__TFV12swift_weekly7Example8example3fS0_FT_T_+26
0x0000000100003a27 488B0512260000 mov rax, qword [ds:imp___got___TMdSi] ; imp___got___TMdSi, XREF=__TFV12swift_weekly7Example8example3fS0_FT_T_+85
0x0000000100003a2e 480508000000 add rax, 0x8
0x0000000100003a34 488D4DF0 lea rcx, qword [ss:rbp+var_10]
0x0000000100003a38 48BABBBBBBBB00000000 mov rdx, 0xbbbbbbbb
0x0000000100003a42 488955F0 mov qword [ss:rbp+var_10], rdx
0x0000000100003a46 4889CF mov rdi, rcx
0x0000000100003a49 4889C6 mov rsi, rax
0x0000000100003a4c E81F030000 call imp___stubs___TFSs7printlnU__FQ_T_
0x0000000100003a51 4883C420 add rsp, 0x20 ; XREF=__TFV12swift_weekly7Example8example3fS0_FT_T_+83
0x0000000100003a55 5D pop rbp
0x0000000100003a56 C3 ret
Holy Jesus, again. A bunch of code. But what is all of this doing? I think to be able to analyze this better, it's best to find the equivalent of each one of the chunks of code from Swift, to asm.
This code in Swift:
case .CarTypeSaloon:
println(0xaaaaaaaa)
Became this:
mov rax, qword [ds:imp___got___TMdSi] ; imp___got___TMdSi, XREF=__TFV12swift_weekly7Example8example3fS0_FT_T_+39
add rax, 0x8
lea rcx, qword [ss:rbp+var_18]
mov rdx, 0xaaaaaaaa
mov qword [ss:rbp+var_18], rdx
mov rdi, rcx
mov rsi, rax
call imp___stubs___TFSs7printlnU__FQ_T_
jmp 0x100003a51
Here what is happening is that we are placing the value of 0xaaaaaaaa
into the stack and then calling the println
function. Nothing magical here. The same happens for this code:
case .CarTypeHatchback:
println(0xbbbbbbbb)
However, what is very interesting here is that the value of our default case is nowhere to be found (0xcccccccc
). What happened to it? It turns out that the compiler realized that the value we assigned to the type
constant was a compile-time constant indeed. Knowing the value of that constant, and the various cases to which we compared it to in our switch statement, it didn't even compile the default case. So that's good to know.
- For every
Int
enumeration, Swift compiles a function that maps the enumeration items into their raw values. - The index of the item into the
Int
enum is passed through theedi
register to the function for translation into its raw value. Again, as we saw in the second issue of Swift Weekly, this is the System V calling convention. - The raw value of
Int
enumeration items are stored on thecs
segment (code segment), not data segment. - Values for enumeration items of type
String
are stored in the data segment, as opposed to theInt
enum items that are stored in the code segment. This is slower of course since the data has to be loaded from the data segment using its effective address as opposed to the directmov dword
instruction forInt
items. - The Swift compiler supports string interning for enumeration items of type
String
. - In the case of
String
enumeration values, the index of the enum item into the enumeration alongside the address to the top of the enumeration in the data segment is created in order to be passed around, into functions such asprintln()
. - If the value of a variable or a constant that is the target of a
switch
statement is a compile-time constant, thedefault
case of the switch statement will not be compiled if the constant/variable has already been handled in one of the cases. Smart!
This was the second article in which we dived into the Swift runtime to get an understanding how the compiler works. If you want to learn more about the Swift runtime and its internals, check back next week and we will talk more about it.